-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DS-4226 Improvements to Entity validation first pass #2522
DS-4226 Improvements to Entity validation first pass #2522
Conversation
I would like to provide this earlier feedback. I'm getting this error:
And I've also noticed that, the related field appears with a strange UUID: I will revisit my CSV and logs to better understand what is happening. |
Much appreciated. If it's not too much trouble/ if possible to include a link or sample of the CSV used in your testing with when you run into errors like these. I notice you're having rowName associate with a UUID, which maybe redundant if the As for why you're seeing a UUID like |
I've just send you the file via Slack. |
@AndrewZWood I've tried to create a relationship when submitting a new Project, and linking it to an existing Publication. I was able to trigger the new error logging when linking to a Publication in the relation.isOrgUnitOfProject column. It worked correctly here. I also noticed this branch has some merge conflicts with the changes which were requested in #2488. I would expect them to be easily adjustable since they're primarily method name changes and JavaDocs |
94ce25b
to
21203be
Compare
I found that the bug was the row count not being reset after the first run of the script thusly adding the same values to the reference map many times making the ambiguity check go off during validation. I corrected it here: 21203be#diff-b84d4d768d7f353bd7c78f0282c215bfR211 I also tested the sample you gave me ( thank you for that ). The file is ingested properly but fails relationship validation due to max cardinality constraint.
Currently my work is not accounting for cardinality constraints. I'm hoping to cover cardinality validation alongside relational ordering validation in a separate PR. |
c116229
to
0773f79
Compare
@benbosman I've reproduced your use case and confirmed that using the incorrect typeName is NOT caught when it should be. I've adjusted the code to catch such cases by ensuring direction of typeName is valid with respect to the Entity type being processed and included an IT for this case: 0773f79#diff-5ce04a4fb4f137714321ddf616041328R454 I've also adjusted method calls and java docs to respect the changes from community master. |
Thank you @AndrewZWood it worked! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR should be merged. This PR also fixes - https://jira.duraspace.org/browse/DS-4237 regarding the lack of error info.
I would also to add that I've just created this issue - https://jira.duraspace.org/browse/DS-4361 reporting the error I've just mentioned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndrewZWood it seems a small bug got introduced with verifying whether a relationship exists
If I try to import a Project type, it works correctly with a header relation.isPersonOfProject
defined in my relationship types as:
id | left_type | right_type | leftward_type | rightward_type | left_min_cardinality | left_max_cardinality | right_min_cardinality | right_max_cardinality
----+-----------+------------+-------------------+-------------------+----------------------+----------------------+-----------------------+-----------------------
4 | 2 | 3 | isProjectOfPerson | isPersonOfProject | 0 | | 0 |
3 is the project type
If I try to import a Project type, it doesn't work correctly with a header relation.isOrgUnitOfProject
defined in my relationship types as:
id | left_type | right_type | leftward_type | rightward_type | left_min_cardinality | left_max_cardinality | right_min_cardinality | right_max_cardinality
----+-----------+------------+--------------------+--------------------+----------------------+----------------------+-----------------------+-----------------------
6 | 3 | 4 | isOrgUnitOfProject | isProjectOfOrgUnit | 0 | | 0 |
3 is the project type
This is causing a correct CSV to be refused
@benbosman Sorry to be a pain but I am unable to reproduce the error you're describing with the following CSVs I created to match your description:
and
If you could provide me with the CSV you're using or a sample of it I can debug further with more specificity of your case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made an error in my test from #2522 (review)
I was trying trying to use relation.isOrgUnitOfProject on Publication instead of an OrgUnit.
The fact that it got refused in the preview was desired.
This looks good to me
This PR adds various improvements regarding entity type validation during the
MetadataImport
feature.Pre-validate relational changes before displaying the change set to the user during
MetadataImport
:The entire CSV is parsed and all potential relational changes are validated before the entire change set is displayed to the user.
If the given relational changes do not validate the validation errors are reported explicitly and the script exits.
Improve relational validation error reporting:
When there are validation errors during relational validation during the pre-validation phase the error(s) are reported to dspace logs more explicitly and will cause the current import process to fail. These errors are reported for the entire given CSV so the user will be able to correct against the set of errors for the entire CSV after just one run of the script. The previous behavior would not fail the
MetadataImport
process and createitems regardless if described relationships were valid which caused a great deal of confusion.
Revisit respective Integration Test:
Added additional test with respect to the new pre-validation code and ensured the new code was covered with the new test.
In addition, as to Tim Donohue's comment from DS-4316 I've ensure the Context authorization is only turned off when required.
Overall I believe the improvements to validation and reporting will save users a lot of head ache when batch importing a large set of relational changes. However, we're not quite done yet.
We still have cardinality and preservation of relational ordering to considered when validating relationships which has not been covered with this set of work. I believe that these issues
warrant their own PR.