-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlite3.OperationalError: database is locked #205
Comments
@lucventurini Not the same issue but I had a crash with mikado-2.0_rc1 mikado prepare which was due to the input rather than mikado but after the error the slurm job did not exit (this was with multiple threads), and I had to kill the job manually. |
Hi @swarbred , I think the issue you mention might have been solved by fixing #196 (regarding the malformed input). As for the hanging ... I will have a look. It can happen, unfortunately, when using multiprocessing. |
Hi @gemygk, yes, sure thing. It is a bit of a weird instance, the database is read-only for all other processes. |
Yes, I had the same feeling. Just found this from google:
I hope, the timeout is not set to 5 secs as mentioned in the above comment? Just to add on: I have just noticed that there are no temp files (other than hidden files) in the pick temp folder here:
Is it that Mikado cleaned the temp folder when it crashed? I did start a backup run and it looks like the pick temp folder is getting populated. The run that failed earlier took ~4hrs to crash. So we will have to wait and see.
So, if you do test for the current run can you please run from a new folder other than backup_run folder? |
Yes, Mikado tends to clean after itself ... which sometimes is not so good.
It might actually have been the cause of the crash, unfortunately. I did set up the timeout for the threads that read the database, but not for the one that writes it. Moreover, I did not enable the WAL mode, meaning that each read from the reading threads effectively locks the common inter-exchange database. I am fixing these hiccups now.
Yep, will do! |
Hi @swarbred , any chance to see the logs/files regarding this crash, by the way? |
Hi @lucventurini the error was due to the input file these were models based on a transfer from another assembly and there are non valid structures one of which caused the error. It would be nice if prepare just removed the problem transcript, warned and continued but my comment was more that after erroring the job did not exit. On a side note prepare does a nice job in validating the gtf (gffread simply ignores many types of invalid structures). Traceback (most recent call last): |
…e doing checks in prepare
Hi @swarbred , I should have now fixed the part of "Mikado crashes because of an incorrect transcript during prepare". It is strange it was not handled correctly before. For the other part - Mikado not exiting immediately after the process went into error - I am more puzzled. I have added a bit of code which should close correctly the queues present in the child processes - as that seems to be a common source of hangs in Python - but I am not certain it will solve the issue. If it presents itself after merging this branch, please let me know. Otherwise, once I confirm that the other issue - Pick crashing - is solved, I would merge and close. |
Hi all, If you can confirm, I will merge into master and close the issue. |
The backup run also finished successfully (which failed first time with the error). But there are differences in the count in the output we get from both runs. Both runs were executed with 'seed 10' and we should be getting identical results. We have not changed any code that will cause this difference, is that correct?
|
Hi @gemygk, OK that is puzzling. I will have another look as soon as possible. |
Hi @gemygk , found the problem. Python uses multi-processing for parallelization, which means that each process is independent in terms of memory .... and crucially, random seeds. It is solvable but I need to set the random seed in all of the subprocesses. Thank you for pointing this out, I will correct the issue in all subprograms. |
…d also within the processes. Moreover, I have ensured that numpy.random.seed will not crash because of a seed larger than 2^32 - 1 (the maximum limit)
Hi @gemygk , I have switched to numpy for the randomization. It should now be ensured to be the same seed within all processes. Apologies for the previous run, but they are different because of this. Unfortunately I was really not fixing the seed at all ... really many thanks for pointing this out! |
Changes: - Mikado now will not hang if a subprocess dies, it will immediately exit. - Ensured that Mikado runs are fully reproducible using a random seed (EI-CoreBioinformatics#183) - Solved a bug that crashed Mikado prepare in the presence of incorrect transcripts - Removed the cause for locked interprocess-exchange databases in Mikado pick. Switched to WAL and increased the timeout limit.
Hi @lucventurini
I am getting an error with the Mikado version - mikado-2.0_rc1 for one of the Wheat accession.
This accession is having quite large input models compared to rest.
Even though I got this error, Mikado is still running. No logs (pick.log) have been generated from the point it crashed.
It looks like the serialise stage finished without any issues.
Can you please look into this?
Error:
Working directory:
Command used:
Logfile:
Thanks,
Gemy
The text was updated successfully, but these errors were encountered: