daemon stopping without reporting errors #158

francocatalano · 2020-10-03T10:26:45Z

After updating synda to v3.11 I got into the following problem.
I define a selection file to get some datasets from CMIP6, like this:
project=CMIP6
source_id=IPSL-CM6A-LR
experiment_id=historical
member_id=r1i1p1f1
table_id=Amon
frequency=mon
variable_id=tas psl pr hfls hfss evspsbl rlds rlus rsds rsus prw

The search command seems to work properly:
synda search -s selection/my_test_selfile.txt
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.hfss.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.tas.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rlus.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.pr.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rsus.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.psl.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rlds.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.evspsbl.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rsds.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.hfls.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.prw.gr.v20180803

Then I launch install:
synda install -s selection/my_test_selfile.txt
11 file(s) will be added to the download queue.
Once downloaded, 1.3 GB of additional disk space will be used.
Do you want to continue? [Y/n] Y
11 file(s) enqueued
You can follow the download using 'synda watch' and 'synda queue' commands
The daemon is not running. To start it, use 'synda daemon start'.

But when I launch the daemon it starts and stops immediately without apparently giving any error:
synda daemon start
Handing over to daemon process, you can check the daemons logs at /sgi_specs/a/.synda/log/transfer.log

synda watch
Daemon not running

and the transfer log file reports only the following info:
2020-10-03 12:18:38,583 INFO SDDAEMON-001 Daemon starting ...
INFO: Connected to /sgi_specs/a/.synda/db/sdt.db
2020-10-03 12:18:38,584 INFO SDTSCHED-533 Connected to /sgi_specs/a/.synda/db/sdt.db
2020-10-03 12:18:38,584 INFO SDTSCHED-993 Starting watchdog..
2020-10-03 12:18:38,585 INFO SDFILDAO-200 get_files time is 0.000953, search select * from file where status=:status ORDER BY priority DESC, checksum with {'status': 'running'}
Daemon successfully started

Anyone knows what's going on?
Thanks a lot for your help.

painter1 · 2020-10-05T01:42:31Z

Are there any files of the form /tmp/sdt_stacktrace_*.log ? If there's one written at about the right time, it might be revealing.

francocatalano · 2020-10-05T07:19:46Z

Hi. Yes, this is the content of the corresponding /tmp/sdt_stacktrace_*.log

Trace function called from '/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py' file in 'start' function at line 118
Exception occured at 2020-10-03 12:18:38.672102
Traceback (most recent call last):
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py", line 115, in start
main_loop()
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py", line 66, in main_loop
sdtaskscheduler.event_loop()
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdtaskscheduler.py", line 164, in event_loop
clear_failed_url()
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdtaskscheduler.py", line 94, in clear_failed_url
sdsqlutils.truncate_table("failed_url")
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdsqlutils.py", line 49, in truncate_table
conn.execute("delete from %s"%table)
OperationalError: no such table: failed_url

I am quite new to synda, Any suggestions?
Thanks a lot.

painter1 · 2020-10-05T14:53:20Z

failed_url is a table which newly has to be in the database.
In bash, type
sqlite3
Then in sqlite3 type everything but "sqlite3>" here:

sqlite3>  CREATE TABLE failed_url ( url_id INTEGER PRIMARY KEY, url TEXT, file_id INTEGER );
sqlite3>  CREATE UNIQUE INDEX idx_failed_url_1 ON failed_url (url);
sqlite3> .quit

Now I realize that we need an automated way to update the database thus, or to function without it, or at least issue a warning when it isn't there. I will work on that.

painter1 · 2020-10-05T15:00:33Z

BTW, the table "failed_url" is needed so that if a data node fails to supply data, Synda can go try another data node.

francocatalano · 2020-10-05T15:17:51Z

In bash, type
sqlite3
Then in sqlite3 type everything but "sqlite3>" here:

sqlite3>  CREATE TABLE failed_url ( url_id INTEGER PRIMARY KEY, url TEXT, file_id INTEGER );
sqlite3>  CREATE UNIQUE INDEX idx_failed_url_1 ON failed_url (url);
sqlite3> .quit

I've just tried that but when I launch the daemon again I still get the same error as before:
OperationalError: no such table: failed_url
Also tried deactivating and reactivating synda environment after issuing the sqlite3 commands but same problem.

painter1 · 2020-10-05T17:59:52Z

I'm sorry, I gave the wrong sqlite command. It should be
sqlite3 [path to your database]

francocatalano · 2020-10-05T18:15:50Z

I'm sorry, I gave the wrong sqlite command. It should be
sqlite3 [path to your database]

Now it worked.Thanks a lot.

painter1 · 2020-10-21T18:15:07Z

I’m looking for how this could happen. Was it a brand new database, nothing in the file table? Jeff From: Rafael Abreu <notifications@github.com> Sent: Wednesday, October 21, 2020 8:55 AM To: Prodiguer/synda <synda@noreply.github.com> Cc: Painter, Jeff <painter1@llnl.gov>; Mention <mention@noreply.github.com> Subject: Re: [Prodiguer/synda] daemon stopping without reporting errors (#158) I am having the same problem. Used the command provided by @painter1<https://github.com/painter1> but after that, another problem seems to occur: ============= Trace function called from '/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sddaemon.py' file in 'start' function at line 118 Exception occured at 2020-10-21 09:59:03.228342 Traceback (most recent call last): File "/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sddaemon.py", line 115, in start main_loop() File "/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sddaemon.py", line 66, in main_loop sdtaskscheduler.event_loop() File "/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sdtaskscheduler.py", line 166, in event_loop sdfiledao.highest_waiting_priority( True, True ) #initializes cache of max priorities File "/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sdfiledao.py", line 228, in highest_waiting_priority return (highest_waiting_priority.vals).get(data_nodes[0],None) IndexError: list index out of range — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#158 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAVLQMJIDYNXX3T4MUKKRNLSL373VANCNFSM4SCSH7PA>.

painter1 · 2020-10-21T18:38:52Z

This error involving sdfiledao.py line 228 (list index out of range) can be bypassed by editing sdconst.py to set GET_FILES_CACHING to False. However, if you have a large database and are simultaneously downloading from several data nodes, you will take a performance hit.
I will try to reproduce the problem (I think an empty database will do it), fix it (should be very simple), and submit a pull request soon.

painter1 · 2020-10-21T23:32:23Z

@francocatalano and Rafael Abreu: the problem involving failed_urls was supposedly fixed about a year and a half ago. There was about a one-month window in which this problem was clearly possible - although I can't certainly exclude the possibility of a bug in that fix. So I have some questions:

Exactly how did you get the Synda version you have. Exactly what date was it downloaded?
What is SYNDA_VERSION in your sdconst.py (very near the end of the file)? What is version in sdapp.py (around the middle of the file, the line above sdapputils.set_exception_handler())?
What is your database version? You can get the version thus:

bash> sqlite3 [path to your synda database]
sqlite> SELECT version FROM version;

Thank you!

rafaelcabreu · 2020-10-23T14:34:04Z

Thanks for the update @painter1. I deleted my original comment because I was able to get it working by running the daemon after running synda install.

As for the versions, I am using synda version 3.12 installed with conda and sqlite3 version 3.33.0.

painter1 · 2020-10-23T22:21:41Z

@rafaelcabreu, the database itself has a version number, different from the sqlite3 version number. Would you please check that?
Thankis.

painter1 self-assigned this Oct 5, 2020

pjournou-ipsl pushed a commit that referenced this issue Nov 13, 2020

issue #158 - daemon stopping without reporting errors

c5f6a3b

pjournou-ipsl closed this as completed Nov 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daemon stopping without reporting errors #158

daemon stopping without reporting errors #158

francocatalano commented Oct 3, 2020

painter1 commented Oct 5, 2020

francocatalano commented Oct 5, 2020 •

edited

Loading

painter1 commented Oct 5, 2020 •

edited

Loading

painter1 commented Oct 5, 2020

francocatalano commented Oct 5, 2020

painter1 commented Oct 5, 2020 •

edited

Loading

francocatalano commented Oct 5, 2020

painter1 commented Oct 21, 2020 via email

painter1 commented Oct 21, 2020

painter1 commented Oct 21, 2020

rafaelcabreu commented Oct 23, 2020

painter1 commented Oct 23, 2020

daemon stopping without reporting errors #158

daemon stopping without reporting errors #158

Comments

francocatalano commented Oct 3, 2020

painter1 commented Oct 5, 2020

francocatalano commented Oct 5, 2020 • edited Loading

painter1 commented Oct 5, 2020 • edited Loading

painter1 commented Oct 5, 2020

francocatalano commented Oct 5, 2020

painter1 commented Oct 5, 2020 • edited Loading

francocatalano commented Oct 5, 2020

painter1 commented Oct 21, 2020 via email

painter1 commented Oct 21, 2020

painter1 commented Oct 21, 2020

rafaelcabreu commented Oct 23, 2020

painter1 commented Oct 23, 2020

francocatalano commented Oct 5, 2020 •

edited

Loading

painter1 commented Oct 5, 2020 •

edited

Loading

painter1 commented Oct 5, 2020 •

edited

Loading