Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon stopping without reporting errors #158

Closed
francocatalano opened this issue Oct 3, 2020 · 12 comments
Closed

daemon stopping without reporting errors #158

francocatalano opened this issue Oct 3, 2020 · 12 comments
Assignees

Comments

@francocatalano
Copy link

After updating synda to v3.11 I got into the following problem.
I define a selection file to get some datasets from CMIP6, like this:
project=CMIP6
source_id=IPSL-CM6A-LR
experiment_id=historical
member_id=r1i1p1f1
table_id=Amon
frequency=mon
variable_id=tas psl pr hfls hfss evspsbl rlds rlus rsds rsus prw

The search command seems to work properly:
synda search -s selection/my_test_selfile.txt
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.hfss.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.tas.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rlus.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.pr.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rsus.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.psl.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rlds.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.evspsbl.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rsds.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.hfls.gr.v20180803
new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.prw.gr.v20180803

Then I launch install:
synda install -s selection/my_test_selfile.txt
11 file(s) will be added to the download queue.
Once downloaded, 1.3 GB of additional disk space will be used.
Do you want to continue? [Y/n] Y
11 file(s) enqueued
You can follow the download using 'synda watch' and 'synda queue' commands
The daemon is not running. To start it, use 'synda daemon start'.

But when I launch the daemon it starts and stops immediately without apparently giving any error:
synda daemon start
Handing over to daemon process, you can check the daemons logs at /sgi_specs/a/.synda/log/transfer.log

synda watch
Daemon not running

and the transfer log file reports only the following info:
2020-10-03 12:18:38,583 INFO SDDAEMON-001 Daemon starting ...
INFO: Connected to /sgi_specs/a/.synda/db/sdt.db
2020-10-03 12:18:38,584 INFO SDTSCHED-533 Connected to /sgi_specs/a/.synda/db/sdt.db
2020-10-03 12:18:38,584 INFO SDTSCHED-993 Starting watchdog..
2020-10-03 12:18:38,585 INFO SDFILDAO-200 get_files time is 0.000953, search select * from file where status=:status ORDER BY priority DESC, checksum with {'status': 'running'}
Daemon successfully started

Anyone knows what's going on?
Thanks a lot for your help.

@painter1
Copy link
Contributor

painter1 commented Oct 5, 2020

Are there any files of the form /tmp/sdt_stacktrace_*.log ? If there's one written at about the right time, it might be revealing.

@francocatalano
Copy link
Author

francocatalano commented Oct 5, 2020

Hi. Yes, this is the content of the corresponding /tmp/sdt_stacktrace_*.log

Trace function called from '/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py' file in 'start' function at line 118
Exception occured at 2020-10-03 12:18:38.672102
Traceback (most recent call last):
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py", line 115, in start
main_loop()
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py", line 66, in main_loop
sdtaskscheduler.event_loop()
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdtaskscheduler.py", line 164, in event_loop
clear_failed_url()
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdtaskscheduler.py", line 94, in clear_failed_url
sdsqlutils.truncate_table("failed_url")
File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdsqlutils.py", line 49, in truncate_table
conn.execute("delete from %s"%table)
OperationalError: no such table: failed_url

I am quite new to synda, Any suggestions?
Thanks a lot.

@painter1
Copy link
Contributor

painter1 commented Oct 5, 2020

failed_url is a table which newly has to be in the database.
In bash, type
sqlite3
Then in sqlite3 type everything but "sqlite3>" here:

sqlite3>  CREATE TABLE failed_url ( url_id INTEGER PRIMARY KEY, url TEXT, file_id INTEGER );
sqlite3>  CREATE UNIQUE INDEX idx_failed_url_1 ON failed_url (url);
sqlite3> .quit

Now I realize that we need an automated way to update the database thus, or to function without it, or at least issue a warning when it isn't there. I will work on that.

@painter1
Copy link
Contributor

painter1 commented Oct 5, 2020

BTW, the table "failed_url" is needed so that if a data node fails to supply data, Synda can go try another data node.

@painter1 painter1 self-assigned this Oct 5, 2020
@francocatalano
Copy link
Author

In bash, type
sqlite3
Then in sqlite3 type everything but "sqlite3>" here:

sqlite3>  CREATE TABLE failed_url ( url_id INTEGER PRIMARY KEY, url TEXT, file_id INTEGER );
sqlite3>  CREATE UNIQUE INDEX idx_failed_url_1 ON failed_url (url);
sqlite3> .quit

I've just tried that but when I launch the daemon again I still get the same error as before:
OperationalError: no such table: failed_url
Also tried deactivating and reactivating synda environment after issuing the sqlite3 commands but same problem.

@painter1
Copy link
Contributor

painter1 commented Oct 5, 2020

I'm sorry, I gave the wrong sqlite command. It should be
sqlite3 [path to your database]

@francocatalano
Copy link
Author

I'm sorry, I gave the wrong sqlite command. It should be
sqlite3 [path to your database]

Now it worked.Thanks a lot.

@painter1
Copy link
Contributor

painter1 commented Oct 21, 2020 via email

@painter1
Copy link
Contributor

This error involving sdfiledao.py line 228 (list index out of range) can be bypassed by editing sdconst.py to set GET_FILES_CACHING to False. However, if you have a large database and are simultaneously downloading from several data nodes, you will take a performance hit.
I will try to reproduce the problem (I think an empty database will do it), fix it (should be very simple), and submit a pull request soon.

@painter1
Copy link
Contributor

@francocatalano and Rafael Abreu: the problem involving failed_urls was supposedly fixed about a year and a half ago. There was about a one-month window in which this problem was clearly possible - although I can't certainly exclude the possibility of a bug in that fix. So I have some questions:

  • Exactly how did you get the Synda version you have. Exactly what date was it downloaded?
  • What is SYNDA_VERSION in your sdconst.py (very near the end of the file)? What is version in sdapp.py (around the middle of the file, the line above sdapputils.set_exception_handler())?
  • What is your database version? You can get the version thus:
bash> sqlite3 [path to your synda database]
sqlite> SELECT version FROM version;

Thank you!

@rafaelcabreu
Copy link

Thanks for the update @painter1. I deleted my original comment because I was able to get it working by running the daemon after running synda install.

As for the versions, I am using synda version 3.12 installed with conda and sqlite3 version 3.33.0.

@painter1
Copy link
Contributor

@rafaelcabreu, the database itself has a version number, different from the sqlite3 version number. Would you please check that?
Thankis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants