Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3.OperationalError: database is locked #205

Closed
gemygk opened this issue Jul 31, 2019 · 13 comments
Closed

sqlite3.OperationalError: database is locked #205

gemygk opened this issue Jul 31, 2019 · 13 comments
Assignees
Milestone

Comments

@gemygk
Copy link
Collaborator

gemygk commented Jul 31, 2019

Hi @lucventurini

I am getting an error with the Mikado version - mikado-2.0_rc1 for one of the Wheat accession.

This accession is having quite large input models compared to rest.

Even though I got this error, Mikado is still running. No logs (pick.log) have been generated from the point it crashed.

It looks like the serialise stage finished without any issues.

Can you please look into this?

Error:

2019-07-30 22:52:54,509 - main - __init__.py:123 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-07-30 22:52:54,511 - main - __init__.py:124 - ERROR - main - MainProcess - database is locked
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/Mikado/__init__.py", line 109, in main
    args.func(args)
  File "/usr/local/lib/python3.7/site-packages/Mikado/subprograms/pick.py", line 194, in pick
    creator()
  File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 1205, in __call__
    self._parse_and_submit_input(data_dict)
  File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 1176, in _parse_and_submit_input
    self.__submit_multi_threading(data_dict)
  File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 921, in __submit_multi_threading
    self.add_to_index(conn, cursor, current_locus.transcripts.values(), counter)
  File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 826, in add_to_index
    conn.commit()
sqlite3.OperationalError: database is locked

Working directory:

/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1

Command used:

sbatch -p ei-cb -c 32 --mem 123G -o out_mikado.serialise-and-pick.%j.log -J MAT_Mikado_SP --wrap "source mikado-2.0_rc1 && /usr/bin/time -v mikado serialise --seed 10 --procs 30 --json-conf mikado.configuration.pick.yaml --external-scores annotation_run1.metrics.txt && /usr/bin/time -v mikado pick --seed 10 --only-reference-update --procs 30 --json-conf mikado.configuration.pick.yaml --subloci_out mikado.subloci.gff3"

Logfile:

/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/out_mikado.serialise-and-pick.22465863.log

Thanks,
Gemy

@swarbred
Copy link
Collaborator

@lucventurini Not the same issue but I had a crash with mikado-2.0_rc1 mikado prepare which was due to the input rather than mikado but after the error the slurm job did not exit (this was with multiple threads), and I had to kill the job manually.

@lucventurini
Copy link
Collaborator

@lucventurini Not the same issue but I had a crash with mikado-2.0_rc1 mikado prepare which was due to the input rather than mikado but after the error the slurm job did not exit (this was with multiple threads), and I had to kill the job manually.

Hi @swarbred , I think the issue you mention might have been solved by fixing #196 (regarding the malformed input). As for the hanging ... I will have a look. It can happen, unfortunately, when using multiprocessing.

@lucventurini
Copy link
Collaborator

Hi @lucventurini
I am getting an error with the Mikado version - mikado-2.0_rc1 for one of the Wheat accession.
This accession is having quite large input models compared to rest.
Even though I got this error, Mikado is still running. No logs (pick.log) have been generated from the point it crashed.
It looks like the serialise stage finished without any issues.
Can you please look into this?

Hi @gemygk, yes, sure thing. It is a bit of a weird instance, the database is read-only for all other processes.

@gemygk
Copy link
Collaborator Author

gemygk commented Jul 31, 2019

Hi @lucventurini

Yes, I had the same feeling.

Just found this from google:
https://stackoverflow.com/a/8618328

connection = sqlite.connect('cache.db', timeout=10)

I hope, the timeout is not set to 5 secs as mentioned in the above comment?

Just to add on:

I have just noticed that there are no temp files (other than hidden files) in the pick temp folder here:

/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/mikado_pick_tmpcjd2lcgu

Is it that Mikado cleaned the temp folder when it crashed?

I did start a backup run and it looks like the pick temp folder is getting populated. The run that failed earlier took ~4hrs to crash. So we will have to wait and see.

/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/backup_run

So, if you do test for the current run can you please run from a new folder other than backup_run folder?

@lucventurini
Copy link
Collaborator

I have just noticed that there are no temp files (other than hidden files) in the pick temp folder here [...] Is it that Mikado cleaned the temp folder when it crashed?

Yes, Mikado tends to clean after itself ... which sometimes is not so good.

connection = sqlite.connect('cache.db', timeout=10)
I hope, the timeout is not set to 5 secs as mentioned in the above comment?

It might actually have been the cause of the crash, unfortunately. I did set up the timeout for the threads that read the database, but not for the one that writes it. Moreover, I did not enable the WAL mode, meaning that each read from the reading threads effectively locks the common inter-exchange database. I am fixing these hiccups now.

So, if you do test for the current run can you please run from a new folder other than backup_run folder?

Yep, will do!

@lucventurini
Copy link
Collaborator

@lucventurini Not the same issue but I had a crash with mikado-2.0_rc1 mikado prepare which was due to the input rather than mikado but after the error the slurm job did not exit (this was with multiple threads), and I had to kill the job manually.

Hi @swarbred , any chance to see the logs/files regarding this crash, by the way?

lucventurini added a commit to lucventurini/mikado that referenced this issue Jul 31, 2019
@swarbred
Copy link
Collaborator

Hi @lucventurini the error was due to the input file these were models based on a transfer from another assembly and there are non valid structures one of which caused the error. It would be nice if prepare just removed the problem transcript, warned and continued but my comment was more that after erroring the job did not exit. On a side note prepare does a nice job in validating the gtf (gffread simply ignores many types of invalid structures).

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/checking.py", line 86, in create_transcript
assert isinstance(feat, (list, tuple)) and 2 <= len(feat) <= 3, feat
AssertionError: 82149915
2019-07-30 21:29:40,621 - prepare - prepare.py:495 - ERROR - prepare - MainProcess - 'int' object is not subscriptable
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/prepare.py", line 493, in prepare
perform_check(sorter(shelf_stacks), shelf_stacks, args, logger)
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/prepare.py", line 210, in perform_check
for counter, keys in enumerate(keys):
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/prepare.py", line 92, in store_transcripts
features["features"]["exon"]],
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/prepare.py", line 91, in
exon_set = tuple(sorted([(exon[0], exon[1], strand) for exon in
TypeError: 'int' object is not subscriptable
2019-07-30 21:29:40,654 - prepare - prepare.py:497 - ERROR - prepare - MainProcess - Mikado has encountered an error, exiting

lucventurini added a commit to lucventurini/mikado that referenced this issue Jul 31, 2019
@lucventurini
Copy link
Collaborator

Hi @swarbred , I should have now fixed the part of "Mikado crashes because of an incorrect transcript during prepare". It is strange it was not handled correctly before.

For the other part - Mikado not exiting immediately after the process went into error - I am more puzzled. I have added a bit of code which should close correctly the queues present in the child processes - as that seems to be a common source of hangs in Python - but I am not certain it will solve the issue. If it presents itself after merging this branch, please let me know.

Otherwise, once I confirm that the other issue - Pick crashing - is solved, I would merge and close.

lucventurini added a commit to lucventurini/mikado that referenced this issue Jul 31, 2019
@lucventurini
Copy link
Collaborator

Hi all,
Mikado finished successfully and without raising any errors (job id 22513783 folder /ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/issue-205).

If you can confirm, I will merge into master and close the issue.
Thank you for reporting the bug, switching to WAL probably made Mikado quite faster.
Best

@gemygk
Copy link
Collaborator Author

gemygk commented Aug 1, 2019

Hi @lucventurini

The backup run also finished successfully (which failed first time with the error).

But there are differences in the count in the output we get from both runs. Both runs were executed with 'seed 10' and we should be getting identical results. We have not changed any code that will cause this difference, is that correct?

backup_run - stats
/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/backup_run
catvh mikado.loci.gff3 | cut -f3 | sort -n | uniq -c
1413184 CDS
1826939 exon
 504123 five_prime_UTR
 555815 gene
 584939 mRNA
 247911 ncRNA
 247687 ncRNA_gene
 624694 superlocus
 495865 three_prime_UTR

issue-205 - stats
/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/issue-205
catvh mikado.loci.gff3 | cut -f3 | sort -n | uniq -c
1413220 CDS
1826978 exon
 504125 five_prime_UTR
 555816 gene
 584942 mRNA
 247910 ncRNA
 247686 ncRNA_gene
 624694 superlocus
 495864 three_prime_UTR

@lucventurini
Copy link
Collaborator

Hi @gemygk, OK that is puzzling. I will have another look as soon as possible.

@lucventurini lucventurini added this to the 2.0 milestone Aug 1, 2019
@lucventurini
Copy link
Collaborator

Hi @gemygk , found the problem. Python uses multi-processing for parallelization, which means that each process is independent in terms of memory .... and crucially, random seeds. It is solvable but I need to set the random seed in all of the subprocesses.

Thank you for pointing this out, I will correct the issue in all subprograms.

lucventurini added a commit to lucventurini/mikado that referenced this issue Aug 1, 2019
lucventurini added a commit to lucventurini/mikado that referenced this issue Aug 1, 2019
…d also within the processes. Moreover, I have ensured that numpy.random.seed will not crash because of a seed larger than 2^32 - 1 (the maximum limit)
@lucventurini
Copy link
Collaborator

Hi @gemygk , I have switched to numpy for the randomization. It should now be ensured to be the same seed within all processes.

Apologies for the previous run, but they are different because of this. Unfortunately I was really not fixing the seed at all ... really many thanks for pointing this out!

lucventurini added a commit to lucventurini/mikado that referenced this issue Aug 2, 2019
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
Changes:
- Mikado now will not hang if a subprocess dies, it will immediately exit.
- Ensured that Mikado runs are fully reproducible using a random seed (EI-CoreBioinformatics#183)
- Solved a bug that crashed Mikado prepare in the presence of incorrect transcripts
- Removed the cause for locked interprocess-exchange databases in Mikado pick. Switched to WAL and increased the timeout limit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants