Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: Delegate RepositoryBackend control to the Backend #5169

Merged
merged 7 commits into from
Oct 23, 2021

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Oct 8, 2021

This PR is a result of observations in #5154 and #5145:

By the current placement of get_repository on the Profile class, this caused two principal issues:

  1. It tied all possible Backend implementations to the DiskObjectStoreRepositoryBackend
  2. It made it impossible to implement/instantiate a Backend that was not directly dependent on the currently loaded Profile.

These issues, for example, preclude the possibility of implementing a "full" archive backend and instantiating it, whilst a "main" Backend is also loaded.

This PR moves the get_repository method to the Backend class, addressing both these issues.
It also removes the instatiate and delete methods from the (node) Repository class (which were just piping through to the backend), since a single node should not have "direct" access to deleting the whole profile's object store 😬.

Additionally, it improves the docstring of the Backend and fixes a minor issue, that verdi profile delete left an empty repository folder for that profile.


Note, this is likely the first in the number of PRs to address #5154:

No offence to the people involved (hindsight is always 20/20) but the whole abstraction and flow of control/information between Config, Profile, Manager, BackendManager, and Backend is IMO a bit of a mess really (plus some unnecessary use of global variables, which should always be avoided wherever possible).
This greatly hampers implementing desirable features like profile switching.

Firtly, the flow from Profile -> Backend could be improved by having the Backend instantiate with a Profile (and having this also set the Backend on the Profile, to ensure only one Backend is instantiated per profile).
With this PR, but already with Backend.get_session, we are pulling information from the profile via a global variable, rather than via a self.profile attribute.
This would then also allow for the ENGINE and SESSION_FACTORY global variables, in aiida/backends/djsite/__init__.py and aiida/backends/sqlalchemy/__init__.py, to be removed; delegating responsibility for database session management to the relevant backend.

Similarly, Config.delete_profile and Manager._load_backend (currently hard coded to sqla/django backends), should delegate some of their responsibilities to the Backend implementation.

@codecov
Copy link

codecov bot commented Oct 8, 2021

Codecov Report

Merging #5169 (cf10131) into develop (2f2bdc3) will decrease coverage by 0.02%.
The diff coverage is 82.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5169      +/-   ##
===========================================
- Coverage    81.01%   81.00%   -0.01%     
===========================================
  Files          535      535              
  Lines        37392    37401       +9     
===========================================
+ Hits         30289    30292       +3     
- Misses        7103     7109       +6     
Flag Coverage Δ
django 75.81% <70.00%> (-0.01%) ⬇️
sqlalchemy 74.90% <72.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/cmdline/commands/cmd_setup.py 50.00% <0.00%> (ø)
aiida/manage/configuration/profile.py 94.77% <ø> (-0.20%) ⬇️
aiida/repository/repository.py 96.73% <40.00%> (-1.86%) ⬇️
aiida/manage/configuration/config.py 88.54% <66.67%> (-0.35%) ⬇️
aiida/orm/implementation/backends.py 92.16% <66.67%> (-1.59%) ⬇️
aiida/manage/manager.py 81.01% <77.78%> (-0.45%) ⬇️
aiida/orm/implementation/sql/backends.py 96.43% <92.31%> (-3.57%) ⬇️
...ds/djsite/db/migrations/0047_migrate_repository.py 75.00% <100.00%> (+0.68%) ⬆️
...ations/versions/1feaea71bd5a_migrate_repository.py 83.59% <100.00%> (+0.51%) ⬆️
aiida/backends/testbase.py 93.23% <100.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f2bdc3...cf10131. Read the comment docs.

@giovannipizzi
Copy link
Member

Hi!
I agree that the current flow between the various classes you mention has space for improvement (also for me is hard to remember what is where) and the changes you suggest look good to me.
I'd just wait if possible to have feedback from @sphuber (I think he's OOO this week, so we might need to wait next week).

In the meantime, just to know where we're heading towards (so that it's clear and, if we all agree, will make it faster to review further PRs), @chrisjsewell did you already write down somewhere (e.g. in an issue) a minimal description of how you envision the classes to behave eventually? (In addition, having this written down will make it easy to move it to the design docs).

@giovannipizzi
Copy link
Member

(I guess it's this discussion thread? #5154
Maybe we can have a chat next week with interested people, so we all get a more concrete sense and agree of how this should be better organised).

@chrisjsewell
Copy link
Member Author

Bear with me @giovannipizzi, I am literally just writing some UML diagrams for this (before and after)!

@chrisjsewell
Copy link
Member Author

Bear with me @giovannipizzi, I am literally just writing some UML diagrams for this (before and after)!

Ok this is done in #5172 (comment)

(see also https://www.figma.com/file/oluW9WVn1v1yDj6pWpTiQ7/aiida-backend-design?node-id=11%3A266)

@sphuber
Copy link
Contributor

sphuber commented Oct 18, 2021

Thanks for the ping and waiting for me to get a chance to comment. I am perfectly happy to discuss restructuring the part of the code described in the OP and fully agree that there are quite some improvements to be had. But I also agree with @giovannipizzi that it would make sense to first work out in schematic's what design we want to achieve and discuss on this. The PRs should then be much easier to review and approve. The code we have now is the result of many step-wise improvements which are not ideal to get to good solutions. So I would put this PR on hold and organize the discussion and go from there.

@chrisjsewell
Copy link
Member Author

But I also agree with @giovannipizzi that it would make sense to first work out in schematic's what design we want to achieve and discuss on this.

@sphuber I have already created the design schematic in https://www.figma.com/file/oluW9WVn1v1yDj6pWpTiQ7/aiida-backend-design?node-id=11%3A266

As discussed in #5172 (comment)

@sphuber sphuber changed the title ♻️ REFATOR: Delegate RepositoryBackend control to the Backend ♻️ REFACTOR: Delegate RepositoryBackend control to the Backend Oct 22, 2021
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chrisjsewell just two questions about the method deletions on the Repository class

Comment on lines -57 to -63
def initialise(self, **kwargs: Any) -> None:
"""Initialise the repository if it hasn't already been initialised.

:param kwargs: keyword argument that will be passed to the ``initialise`` call of the backend.
"""
self.backend.initialise(**kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this done now? Is this no longer needed? Or do you just think that the BackendRepository shouldn't even define an initialise method but that should be the responsibility of the implementation? It seems to me that if we remove this here, we should probably also remove it from the BackendRepository interface, shouldn't we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for this and the delete method below, my feeling is you shouldn't need direct access to them from the Repository, which just provides the interface to the repository for a single Nodes file tree (perhaps in fact the name Repository is a little misleading and should be NodeRepository or something to denote it is only scoped to a single node).
If you are going to initialise/delete the whole repository, then you should be doing it via the BackendRepository, which is what now happens.

(It's a bit like having a Node().delete() that actually deletes every Node in the database)

Copy link
Member Author

@chrisjsewell chrisjsewell Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these were only really required before, because Profile.get_repository() returned Repository, whereas now Backend.get_repository() returns BackendRepository

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with that if Repository was indeed designed to only be used with a single node, but it is not the case. It can have access to the entire repository in principle. It is in fact the aiida.orm.nodes.repository.NodeRepositoryMixin that scopes access to the repository for just the files of a single node. Note that that class then also doesn't have a path to Repository.delete. So I would keep these.

Copy link
Member Author

@chrisjsewell chrisjsewell Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that that class then also doesn't have a path to Repository.delete.

Not sure what mean by this; you can still call Repository.backend.erase().

Eurgh, I'm kind of against this, because you should never ever be calling Repository.delete to delete the Container, and nowhere in the code do you ever have to.
Again, it would be like if you asked your computer to delete a folder, and it went ahead and deleted your whole file system, i.e. since there is a many-to-one correspondence of Repository to BackendRepository

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what mean by this; you can still call Repository.backend.erase().

Sure, but what I am saying is that from the public Node interface, there is no way to delete the entire repo. Of course you can do it from Node._repository.delete() but if you are accessing protected attributes there is all sorts of things you can break.

Eurgh, I'm kind of against this, because you should never ever be calling Repository.delete to delete the Container, and nowhere in the code do you ever have to.

The point is that I wrote the Repository interface as a generic one and not one that imposed the AiiDA concept that it is node-centric and from the node you only get access to part of the repository contents. I see the Repository as a generic interface to the entire repo and in that sense I think it makes sense to have a delete method and initialise, since some backend implementations might need it. The fact that the public facing API of AiiDA currently doesn't make use of it (as it shouldn't) is not a reason to remove it. I think it is perfectly fair to have such a method that could for example be called by maintenance operations. If you then have to go directly through to the backend implementation defeats the purpose of having the Repository front end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to disagree; you also cannot delete the entire database from the front-end.
Plus, I still think it will be overly confusing to people, the distinction between RepositoryBackend -> Repository -> NodeRepositoryMixin; Repository essentially sits somewhere between the back and front-end

I think it makes sense to have a delete method and initialise, since some backend implementations might need it.

The Repository class is not part of the backend, so backend implementations cannot do anything with it.

Anyhow, I re-added these redundant methods back

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyhow, I re-added these redundant methods back

You forgot to reinstate the test though. Waiting for the approval would have probably caught that mistake.

I still think it will be overly confusing to people

If with "people" you mean users, I think that is not the problem. The Repository is not and was not intended to be part of the public API. It is a generic interface to any repository that could be used. It intentionally does not embed the concept that the contents are viewed from the perspective of a node instead of its entirety.

I think it makes sense to have a delete method and initialise, since some backend implementations might need it.

The Repository class is not part of the backend, so backend implementations cannot do anything with it.

What I meant here is implementations of the "repository backend", i.e. RepositoryBackend, some of which may require some sort of initialization, as is the case for the disk object store implementation.

Copy link
Member Author

@chrisjsewell chrisjsewell Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant here is implementations of the "repository backend", i.e. RepositoryBackend, some of which may require some sort of initialization, as is the case for the disk object store implementation.

yes this exactly where the initialise method is already; RepositoryBackend.initialise, and why it is redundant to have Repository.initialise

Comment on lines -420 to -429
def delete(self) -> None:
"""Delete the repository.

.. important:: This will not just delete the contents of the repository but also the repository itself and all
of its assets. For example, if the repository is stored inside a folder on disk, the folder may be deleted.

"""
self.backend.erase()
self.reset()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@chrisjsewell chrisjsewell merged commit a80812c into aiidateam:develop Oct 23, 2021
@chrisjsewell chrisjsewell deleted the move-repo-to-backend branch October 23, 2021 07:14
edan-bainglass pushed a commit to edan-bainglass/aiida-core that referenced this pull request Dec 23, 2023
…iidateam#5169)

This commit moves `Profile.get_repository() -> Repository`,
to `Backend.get_repository() -> BackendRepository`.

The original delegation,
(1) did not allow different backends to implement different repository classes, and
(2) made it impossible to implement/instantiate a `Backend` that was not directly dependent on the currently loaded `Profile`.

Additionally, the commit improves the docstring of the `Backend`,
and fixes a minor issue, that `verdi profile delete` left an empty repository folder for that profile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants