-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating the *before_notebook* 20_start-postgresql.sh
and 40_prepare_aiida.sh
#476
Conversation
for arm64 I encountered the issues in restarting the container. 1- Deleting old ${PSQL_LOGFILE}.1.gz file before trying to compressed again. Otherwise, this can give errors and so the container will exit with non-zero status. 2- Clean stop of the daemon before the "verdi storage migrate --force" in `40_prepare_aiida.sh`. This is done because sometimes the restart of the docker container will give error in trying to migrate, as the daemon is still running. I am not able to reproduce the error in a standard way, however.
for more information, see https://pre-commit.ci
stack/base-with-services/before-notebook.d/20_start-postgresql.sh
Outdated
Show resolved
Hide resolved
@@ -74,6 +74,8 @@ load_computer('${computer_name}').set_minimum_job_poll_interval(${job_poll_inter | |||
else | |||
|
|||
# Migration will run for the default profile. | |||
## We need to stop the daemon before. | |||
verdi daemon stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to understand why this is needed, since the daemon cannot be running at the container startup.
@sphuber does verdi daemon stop
do some kind of cleanup that might explain why it is needed here? It is true that we currently don't gracefully stop the daemon when we stop the container so there might perhaps be some stale pid files or something?
@mikibonacci what is the exact error that you saw coming from the verdi storage migrate
command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on the version. The behavior has changed a bit in the 2.x minor releases. But currently, yes, verdi daemon stop
should clean up stale PID-files. And it is possible that some other commands will check if the daemon "is running" based on that pid file. Would have to look into more detail if you know exactly which version you are targeting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We're currently on 2.5, and hopefully will switch to 2.6 soon.
I was trying to get rid of verdi daemon stop
because it adds 2s to container startup. Perhaps we can just remove the stale pid files manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danielhollas and @sphuber, the exact error printed in the log is:
2024-07-03 15:41:58 Critical: Migration aborted, the daemon for the profile is still running.
The target AiiDA version is aiida-core==2.5.1
(build=pyhca7485f_0, channel=conda-forge) indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to get rid of verdi daemon stop because it adds 2s to container startup. Perhaps we can just remove the stale pid files manually?
Well if you are absolutely certain that the daemon shouldn't be running, you could just manually remove the pid file .aiida/daemon/circus-{profile-name}.pid
which should be very fast. The verdi daemon stop
command will actually try to reach the daemon, which has a 2 second timeout by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Command verdi storage migrate
has access to daemon client, so probably another solution is when --force
flag is used, it does clean stale PID files before migrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that it would be a good idea to have --force ignore a running daemon. The --force flag is there to not have to type out the confirmation message. This has been the case for a long time and users may have grown used to that. If we now also have it start ignoring a running daemon, that might be dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, then manually delete it as Miki did here is fine.
Thanks for catching these issues.
How exactly are you restarting the container? |
For the restart, I used |
Co-authored-by: Daniel Hollas <danekhollas@gmail.com>
for more information, see https://pre-commit.ci
…e-aiida.sh` This is faster than `verdi daemon stop`. We suppose that the daemon is not running when we start/restart the container.
Co-authored-by: Daniel Hollas <danekhollas@gmail.com>
@mikibonacci can you test the latest version of the image ( Once you confirm that it works reliably for you I'll merge and release a new version. Thanks for catching this and the fixes! |
Hi @danielhollas , I tested the |
Thanks for testing @mikibonacci. I'll release a new version soon with your fix. I've opened #477 as a follow up to this to prevent these issues in the future. |
@mikibonacci new version is released. 🚀 |
Fixings are done as for arm64 I encountered issues in restarting the container.
1 - Deleting old ${PSQL_LOGFILE}.1.gz file before trying to compressed again in
20_start-postgresql.sh
. Otherwise, this can give errors and so the container will exit with non-zero status.2 - Clean stop of the daemon before the "verdi storage migrate --force" in
40_prepare_aiida.sh
. This is done because sometimes the restart of the docker container will give error in trying to migrate, as the daemon is still running. I am not able to reproduce the error in a standard way, however.