Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the backup mechanism of the configuration file #3581

Merged
merged 4 commits into from
Nov 27, 2019

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Nov 27, 2019

Fixes #3580

The Config class was creating a new backup each time store was
called, which also happens if nothing really changed to the configuration
content. With the recently introduced change that backup files are now
made unique through a timestamp, this led to a lot of backups being
created that were essentially clones.

To improve this, the Config.store method now only writes the file to
disk if the contents in memory have changed. This is done by comparing
the checksum of the on disk file and that memory contents as written to
a temporary file. If the check sums differ a backup is created of the
existing file and the new contents written to the temporary file are
copied to the actual configuration file location. This latter new
approach also guarantees that the backup is always created before
overwriting the original one.

@sphuber sphuber requested a review from ltalirz November 27, 2019 10:25
@sphuber
Copy link
Contributor Author

sphuber commented Nov 27, 2019

@ltalirz this should also be merged before the v1.0.1 release because it was generating loads of backup files. Better to merge PR #3579 first though

The `Config` class was creating a new backup each time `store` was
called, which also happens if nothing really changed to the configuration
content. With the recently introduced change that backup files are now
made unique through a timestamp, this led to a lot of backups being
created that were essentially clones.

To improve this, the `Config.store` method now only writes the file to
disk if the contents in memory have changed. This is done by comparing
the checksum of the on disk file and that memory contents as written to
a temporary file. If the checksums differ a backup is created of the
existing file and the new contents written to the temporary file are
copied to the actual configuration file location. This latter new
approach also guarantees that the backup is always created before
overwriting the original one.
@@ -39,14 +38,22 @@ class Config(object): # pylint: disable=too-many-public-methods
def from_file(cls, filepath):
"""Instantiate a configuration object from the contents of a given file.

.. note:: if the filepath does not exist it will be created with the default configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the function is the "default content" written to the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. The file is touched but is initially empty. The Config instance is then created with the default configuration but is not stored explicitly. This is done by the caller of from_file if needed. In our case this is done in load_config that calls Config.from_file. If I add to the docstring that the file is touched but the content is not written, is that clear enough?

Copy link
Member

@ltalirz ltalirz Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back, would it not be cleaner if .from_file simply excepts if the filepath being passed does not exist? After all, the task of a .from_file method is to instantiate a Config object from the filepath being passed.
If one wants to create a default configuration, shouldn't one simply call the constructor instead, followed by .store()?

I find this intermediate step confusing, where the configuration file is created but is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design followed more or less directly from your other request (with which I agree) to not have warning messages being printed in the Config constructor. So the detection of whether the configuration has to be migrated as well as the actual migration (including the warning messages and backup generation) has to be done elsewhere. I think putting this in the from_file class method makes a lot of sense. Then the question was just how to deal with non-existing files. I think it is fine to put this here as well. This way one can always just call from_file also in all the unittest utils that don't have to worry about creating the file first or creating an initial default profile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the question was just how to deal with non-existing files. I think it is fine to put this here as well. This way one can always just call from_file also in all the unittest utils that don't have to worry about creating the file first or creating an initial default profile

I agree that it is a little bit less code to write, and so I'm not totally against it (but I would still say that this stretches the meaning of what a.from_file method is supposed to do).
Let's keep it for the moment - in that case, the question still remains whether this intermediate stage is necessary, where the configuration file is simply "touched" but without the actual content.

If the configuration file does not exist, can we not simply write the default content to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you are right, the indirect is not necessary. I will make that final simplification.

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber
One question: if I understand correctly, you are saying there are many calls to store() that are, essentially, unnecessary (since the file content actually isn't changing).
Would it be possible to simply remove those?

@sphuber
Copy link
Contributor Author

sphuber commented Nov 27, 2019

Thanks @sphuber
One question: if I understand correctly, you are saying there are many calls to store() that are, essentially, unnecessary (since the file content actually isn't changing).
Would it be possible to simply remove those?

This was mostly happening during running the unittests.

@ltalirz
Copy link
Member

ltalirz commented Nov 27, 2019

This was mostly happening during running the unittests.

Do you know whether this is an unintended consequence of with_temporary_config_instance or is it tests that should be using with_temporary_config_instance but aren't ?

@sphuber
Copy link
Contributor Author

sphuber commented Nov 27, 2019

Do you know whether this is an unintended consequence of with_temporary_config_instance or is it tests that should be using with_temporary_config_instance but aren't ?

I think I found the problem and it is exactly because from_file delegates the storing of the file in case it is new to the caller. That's why load_config always called config.store even if it was just loaded from file and nothing changed. It seems like it is better to put the store-if-created logic in the from_file method and then we can remove it from load_config. This should prevent store from being called all the time when running the tests. Would you agree?

@ltalirz
Copy link
Member

ltalirz commented Nov 27, 2019

I think I found the problem and it is exactly because from_file delegates the storing of the file in case it is new to the caller. That's why load_config always called config.store even if it was just loaded from file and nothing changed. It seems like it is better to put the store-if-created logic in the from_file method and then we can remove it from load_config. This should prevent store from being called all the time when running the tests. Would you agree?

I would certainly agree that load_config should not result in storing of the config file, if possible.

I would say that .from_file should actually also not result in storing of the config file if we can avoid it (see comment above).

@sphuber
Copy link
Contributor Author

sphuber commented Nov 27, 2019

I would say that .from_file should actually also not result in storing of the config file if we can avoid it (see comment above).

That only happens though when the specified filepath did not yet exist. In that case I think it is perfectly reasonable to create the default config, store it to disk and return the constructed Config object, no?

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience @sphuber ;-)
Looks good to me now!

@sphuber
Copy link
Contributor Author

sphuber commented Nov 27, 2019

Thanks for the patience @sphuber ;-)
Looks good to me now!

Not at all, code ended up way better as a result of your review. Seems the system works :) thanks a lot

config.store()
else:
# If the configuration file needs to be migrated first create a specific backup so it can easily be reverted
if config_needs_migrating(config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, one last thing I noticed: if the config file needs migrating, will this now result in two backups?
Won't the .store method take care of this backup automatically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no, because there is no storing going on here...
one could potentially replace the call to _backup with a .store though..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have, but for the reporting it is nice to write the filename of the created backup, which is only returned by _backup. So I think it is fine to keep it like this

@sphuber sphuber merged commit c88ac1e into aiidateam:develop Nov 27, 2019
@sphuber sphuber deleted the fix_3580_config_migration branch November 27, 2019 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only write configuration file backup when contents have changed
2 participants