Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple RETURN links import now possible #3354

Merged

Conversation

CasperWA
Copy link
Contributor

Fixes #3353

@sphuber
Copy link
Contributor

sphuber commented Sep 30, 2019

So if I understand correctly, the only real change was that the if-condition was comparing to LinkType.RETURN instead of LinkType.RETURN.value and therefore was always false, causing an unnecessary exception being raised. Is that correct @CasperWA ?

Unfortunately, looking at the current code, there are a lot of problems that really should be addressed sooner rather than later. The whole point of #1762 was to get to a state where we know what we export and import, but seeing this code I am afraid we do not actually have much of an idea. But I agree that we cannot tackle that right now. So for me also fine to merge this now

@CasperWA
Copy link
Contributor Author

So if I understand correctly, the only real change was that the if-condition was comparing to LinkType.RETURN instead of LinkType.RETURN.value and therefore was always false, causing an unnecessary exception being raised. Is that correct @CasperWA ?

This is correct.

Unfortunately, looking at the current code, there are a lot of problems that really should be addressed sooner rather than later. The whole point of #1762 was to get to a state where we know what we export and import, but seeing this code I am afraid we do not actually have much of an idea. But I agree that we cannot tackle that right now. So for me also fine to merge this now

Yeah, I have been trying slowly to take out each step of the export and import functions, where possible, trying to create backend-agnostic utility functions. This should hopefully make each step of the process easier to deal with and understand, however, it is taking forever to do.

@sphuber
Copy link
Contributor

sphuber commented Sep 30, 2019

I understand and already your work has made the organization of the code and the code itself a lot better, but eventually small steps won't solve this particular problem. At some point we will have to face the music and properly design an interface for export/import that matches ORM implementation but that allows for efficient bulk inserts. This is a big challenge, but is the only way to once and for all fix it.

@CasperWA
Copy link
Contributor Author

I completely agree.

@CasperWA CasperWA force-pushed the fix_3353_import_multiple_returns branch from 154a3dc to 7e971be Compare October 1, 2019 17:18
@CasperWA
Copy link
Contributor Author

CasperWA commented Oct 1, 2019

@sphuber @giovannipizzi this is ready for review. I have re-implemented the validate_link function from aiida.orm.utils.links. Note that for "unique_triple" Links, this is essentially similar to stating "this Link already exists", so if the util function link_triple_exists returns True, the import simply skips the Link in question instead of raising.

It is not the fastest, nor is it the prettiest fix, but it should be more rigorous in testing the created Links from import.

@CasperWA CasperWA force-pushed the fix_3353_import_multiple_returns branch from 7e971be to b6ce2b0 Compare October 2, 2019 12:19
@CasperWA CasperWA force-pushed the fix_3353_import_multiple_returns branch from b6ce2b0 to a8fc655 Compare October 2, 2019 19:29
@CasperWA
Copy link
Contributor Author

CasperWA commented Oct 2, 2019

I got insecure whether the ad hoc rules would actually work, feeling there is something missed between the usage of .get_outgoing and .get_incoming in the actual validate_links() function.
It seemed to me that if the Link type and label was the same for an existing incoming Link for a Node and we are only concerned with the existing outgoing Links being similar (possible only for CALL_WORK, I think), then the ad hoc rules here would not catch the distinction and simply raise, while it didn't need to.

However, when writing the test, I came to the realization that there would never be a situation where this is a possible case, due to a combination of the fixed Link rule to follow CALL_WORK forward and the fact that only sealed Nodes may be exported.
I.e., a graph:

work1 -> work2 -> work3

cannot be exported, following an export of

work1 -> work2 -> work3
           |
           +-> work4

with the same Node UUIDS, since they must be sealed - which means the Link between work2 and work4 cannot be created after export of the first graph.
On the other hand, the sub-graph equal to the first graph cannot be exported from the second graph due to the Link follow rules (non-changeable call_work_forward=True, changeable call_work_backward=False).

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing to fix and then this is good to go

)
)

# New link
links_to_store.append(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link checking itself now seems correct, very nice. There is just one thing we realized. In this logic we should not rely on export logic, i.e. what export can legally export. So for example, we should not assume that the contents of an archive are necessarily consistent. This means we have to check that the links within an archive are also consistent. To do this, we simply have to update the existing_ sets at this point. Then in the next iterations, the added link will be included in the existence check and if the archive itself contains duplicate links, it will be noticed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I have done the same for the SQLAlchemy backend, however, the only way to do this was to not "bulk create" the Links after the for loop, but instead add the new Link in every run using session.add(Link).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, also for the SQLAlchemy import, I did a check that the new Links are correctly continuously added and can be found using the QueryBuilder in the subsequent run of the for loop.

@CasperWA CasperWA force-pushed the fix_3353_import_multiple_returns branch from a8fc655 to ab8b6d3 Compare October 3, 2019 09:27
@sphuber sphuber merged commit b2492e2 into aiidateam:develop Oct 3, 2019
@sphuber
Copy link
Contributor

sphuber commented Oct 3, 2019

Thanks a million @CasperWA great stuff ! 👍

@CasperWA CasperWA deleted the fix_3353_import_multiple_returns branch October 3, 2019 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Import multiple return links with non-unique labels
3 participants