sqlite3.OperationalError: database is locked #205

gemygk · 2019-07-31T09:48:32Z

I am getting an error with the Mikado version - mikado-2.0_rc1 for one of the Wheat accession.

This accession is having quite large input models compared to rest.

Even though I got this error, Mikado is still running. No logs (pick.log) have been generated from the point it crashed.

It looks like the serialise stage finished without any issues.

Can you please look into this?

Error:

2019-07-30 22:52:54,509 - main - __init__.py:123 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-07-30 22:52:54,511 - main - __init__.py:124 - ERROR - main - MainProcess - database is locked
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/Mikado/__init__.py", line 109, in main
    args.func(args)
  File "/usr/local/lib/python3.7/site-packages/Mikado/subprograms/pick.py", line 194, in pick
    creator()
  File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 1205, in __call__
    self._parse_and_submit_input(data_dict)
  File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 1176, in _parse_and_submit_input
    self.__submit_multi_threading(data_dict)
  File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 921, in __submit_multi_threading
    self.add_to_index(conn, cursor, current_locus.transcripts.values(), counter)
  File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 826, in add_to_index
    conn.commit()
sqlite3.OperationalError: database is locked

Working directory:

/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1

Command used:

sbatch -p ei-cb -c 32 --mem 123G -o out_mikado.serialise-and-pick.%j.log -J MAT_Mikado_SP --wrap "source mikado-2.0_rc1 && /usr/bin/time -v mikado serialise --seed 10 --procs 30 --json-conf mikado.configuration.pick.yaml --external-scores annotation_run1.metrics.txt && /usr/bin/time -v mikado pick --seed 10 --only-reference-update --procs 30 --json-conf mikado.configuration.pick.yaml --subloci_out mikado.subloci.gff3"

Logfile:

/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/out_mikado.serialise-and-pick.22465863.log

Thanks,
Gemy

The text was updated successfully, but these errors were encountered:

swarbred · 2019-07-31T09:58:20Z

@lucventurini Not the same issue but I had a crash with mikado-2.0_rc1 mikado prepare which was due to the input rather than mikado but after the error the slurm job did not exit (this was with multiple threads), and I had to kill the job manually.

lucventurini · 2019-07-31T10:42:15Z

@lucventurini Not the same issue but I had a crash with mikado-2.0_rc1 mikado prepare which was due to the input rather than mikado but after the error the slurm job did not exit (this was with multiple threads), and I had to kill the job manually.

Hi @swarbred , I think the issue you mention might have been solved by fixing #196 (regarding the malformed input). As for the hanging ... I will have a look. It can happen, unfortunately, when using multiprocessing.

lucventurini · 2019-07-31T10:43:15Z

Hi @lucventurini
I am getting an error with the Mikado version - mikado-2.0_rc1 for one of the Wheat accession.
This accession is having quite large input models compared to rest.
Even though I got this error, Mikado is still running. No logs (pick.log) have been generated from the point it crashed.
It looks like the serialise stage finished without any issues.
Can you please look into this?

Hi @gemygk, yes, sure thing. It is a bit of a weird instance, the database is read-only for all other processes.

gemygk · 2019-07-31T10:59:33Z

Hi @lucventurini

Yes, I had the same feeling.

Just found this from google:
https://stackoverflow.com/a/8618328

connection = sqlite.connect('cache.db', timeout=10)

I hope, the timeout is not set to 5 secs as mentioned in the above comment?

Just to add on:

I have just noticed that there are no temp files (other than hidden files) in the pick temp folder here:

/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/mikado_pick_tmpcjd2lcgu

Is it that Mikado cleaned the temp folder when it crashed?

I did start a backup run and it looks like the pick temp folder is getting populated. The run that failed earlier took ~4hrs to crash. So we will have to wait and see.

/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/backup_run

So, if you do test for the current run can you please run from a new folder other than backup_run folder?

lucventurini · 2019-07-31T11:13:37Z

I have just noticed that there are no temp files (other than hidden files) in the pick temp folder here [...] Is it that Mikado cleaned the temp folder when it crashed?

Yes, Mikado tends to clean after itself ... which sometimes is not so good.

connection = sqlite.connect('cache.db', timeout=10)
I hope, the timeout is not set to 5 secs as mentioned in the above comment?

It might actually have been the cause of the crash, unfortunately. I did set up the timeout for the threads that read the database, but not for the one that writes it. Moreover, I did not enable the WAL mode, meaning that each read from the reading threads effectively locks the common inter-exchange database. I am fixing these hiccups now.

So, if you do test for the current run can you please run from a new folder other than backup_run folder?

Yep, will do!

lucventurini · 2019-07-31T11:16:13Z

@lucventurini Not the same issue but I had a crash with mikado-2.0_rc1 mikado prepare which was due to the input rather than mikado but after the error the slurm job did not exit (this was with multiple threads), and I had to kill the job manually.

Hi @swarbred , any chance to see the logs/files regarding this crash, by the way?

swarbred · 2019-07-31T11:47:33Z

Hi @lucventurini the error was due to the input file these were models based on a transfer from another assembly and there are non valid structures one of which caused the error. It would be nice if prepare just removed the problem transcript, warned and continued but my comment was more that after erroring the job did not exit. On a side note prepare does a nice job in validating the gtf (gffread simply ignores many types of invalid structures).

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/checking.py", line 86, in create_transcript
assert isinstance(feat, (list, tuple)) and 2 <= len(feat) <= 3, feat
AssertionError: 82149915
2019-07-30 21:29:40,621 - prepare - prepare.py:495 - ERROR - prepare - MainProcess - 'int' object is not subscriptable
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/prepare.py", line 493, in prepare
perform_check(sorter(shelf_stacks), shelf_stacks, args, logger)
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/prepare.py", line 210, in perform_check
for counter, keys in enumerate(keys):
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/prepare.py", line 92, in store_transcripts
features["features"]["exon"]],
File "/usr/local/lib/python3.7/site-packages/Mikado/preparation/prepare.py", line 91, in
exon_set = tuple(sorted([(exon[0], exon[1], strand) for exon in
TypeError: 'int' object is not subscriptable
2019-07-30 21:29:40,654 - prepare - prepare.py:497 - ERROR - prepare - MainProcess - Mikado has encountered an error, exiting

…e doing checks in prepare

lucventurini · 2019-07-31T13:37:48Z

Hi @swarbred , I should have now fixed the part of "Mikado crashes because of an incorrect transcript during prepare". It is strange it was not handled correctly before.

For the other part - Mikado not exiting immediately after the process went into error - I am more puzzled. I have added a bit of code which should close correctly the queues present in the child processes - as that seems to be a common source of hangs in Python - but I am not certain it will solve the issue. If it presents itself after merging this branch, please let me know.

Otherwise, once I confirm that the other issue - Pick crashing - is solved, I would merge and close.

lucventurini · 2019-07-31T17:18:06Z

Hi all,
Mikado finished successfully and without raising any errors (job id 22513783 folder /ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/issue-205).

If you can confirm, I will merge into master and close the issue.
Thank you for reporting the bug, switching to WAL probably made Mikado quite faster.
Best

gemygk · 2019-08-01T09:23:55Z

Hi @lucventurini

The backup run also finished successfully (which failed first time with the error).

But there are differences in the count in the output we get from both runs. Both runs were executed with 'seed 10' and we should be getting identical results. We have not changed any code that will cause this difference, is that correct?

backup_run - stats
/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/backup_run
catvh mikado.loci.gff3 | cut -f3 | sort -n | uniq -c
1413184 CDS
1826939 exon
 504123 five_prime_UTR
 555815 gene
 584939 mRNA
 247911 ncRNA
 247687 ncRNA_gene
 624694 superlocus
 495865 three_prime_UTR

issue-205 - stats
/ei/workarea/group-ga/Projects/CB-GENANNO-450_10plus_wheat_SYMattis_annotation/Analysis/mikado_integration/mikado-2.0_rc1/annotation_run1/issue-205
catvh mikado.loci.gff3 | cut -f3 | sort -n | uniq -c
1413220 CDS
1826978 exon
 504125 five_prime_UTR
 555816 gene
 584942 mRNA
 247910 ncRNA
 247686 ncRNA_gene
 624694 superlocus
 495864 three_prime_UTR

lucventurini · 2019-08-01T11:49:42Z

Hi @gemygk, OK that is puzzling. I will have another look as soon as possible.

lucventurini · 2019-08-01T14:11:32Z

Hi @gemygk , found the problem. Python uses multi-processing for parallelization, which means that each process is independent in terms of memory .... and crucially, random seeds. It is solvable but I need to set the random seed in all of the subprocesses.

Thank you for pointing this out, I will correct the issue in all subprograms.

…d also within the processes. Moreover, I have ensured that numpy.random.seed will not crash because of a seed larger than 2^32 - 1 (the maximum limit)

lucventurini · 2019-08-01T15:22:24Z

Hi @gemygk , I have switched to numpy for the randomization. It should now be ensured to be the same seed within all processes.

Apologies for the previous run, but they are different because of this. Unfortunately I was really not fixing the seed at all ... really many thanks for pointing this out!

Changes: - Mikado now will not hang if a subprocess dies, it will immediately exit. - Ensured that Mikado runs are fully reproducible using a random seed (EI-CoreBioinformatics#183) - Solved a bug that crashed Mikado prepare in the presence of incorrect transcripts - Removed the cause for locked interprocess-exchange databases in Mikado pick. Switched to WAL and increased the timeout limit.

gemygk assigned swarbred and lucventurini Jul 31, 2019

lucventurini added a commit to lucventurini/mikado that referenced this issue Jul 31, 2019

Fix EI-CoreBioinformatics#205

be3bccf

lucventurini added a commit to lucventurini/mikado that referenced this issue Jul 31, 2019

Fix EI-CoreBioinformatics#205 - now Mikado should avoid crashing whil…

c5bbb68

…e doing checks in prepare

lucventurini added a commit to lucventurini/mikado that referenced this issue Jul 31, 2019

Fixes for EI-CoreBioinformatics#205

ec1d42f

lucventurini added this to the 2.0 milestone Aug 1, 2019

lucventurini added bug EI-Internal labels Aug 1, 2019

lucventurini added a commit to lucventurini/mikado that referenced this issue Aug 1, 2019

Fix EI-CoreBioinformatics#205 - specifically the seed setting

530fd87

lucventurini added a commit to lucventurini/mikado that referenced this issue Aug 2, 2019

This should fix EI-CoreBioinformatics#205 and EI-CoreBioinformatics#183.

2f61779

lucventurini closed this as completed in 68d3c60 Aug 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlite3.OperationalError: database is locked #205

sqlite3.OperationalError: database is locked #205

gemygk commented Jul 31, 2019

swarbred commented Jul 31, 2019

lucventurini commented Jul 31, 2019

lucventurini commented Jul 31, 2019

gemygk commented Jul 31, 2019

lucventurini commented Jul 31, 2019

lucventurini commented Jul 31, 2019

swarbred commented Jul 31, 2019

lucventurini commented Jul 31, 2019

lucventurini commented Jul 31, 2019

gemygk commented Aug 1, 2019

lucventurini commented Aug 1, 2019

lucventurini commented Aug 1, 2019

lucventurini commented Aug 1, 2019

sqlite3.OperationalError: database is locked #205

sqlite3.OperationalError: database is locked #205

Comments

gemygk commented Jul 31, 2019

swarbred commented Jul 31, 2019

lucventurini commented Jul 31, 2019

lucventurini commented Jul 31, 2019

gemygk commented Jul 31, 2019

lucventurini commented Jul 31, 2019

lucventurini commented Jul 31, 2019

swarbred commented Jul 31, 2019

lucventurini commented Jul 31, 2019

lucventurini commented Jul 31, 2019

gemygk commented Aug 1, 2019

lucventurini commented Aug 1, 2019

lucventurini commented Aug 1, 2019

lucventurini commented Aug 1, 2019