Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-1443] Export structure with structure-builder #1257

Closed
wants to merge 12 commits into from

Conversation

mwoodiupui
Copy link
Member

https://jira.duraspace.org/browse/DS-1443

Export the current Community/Collection structure in a form suitable for re-importation with the command-line structure-builder.

@mwoodiupui mwoodiupui added improvement work in progress PR is still being worked on & is not currently ready for review and removed work in progress PR is still being worked on & is not currently ready for review labels Jan 29, 2016
@rradillen
Copy link
Contributor

this would be very practical

@helix84
Copy link
Member

helix84 commented Jan 31, 2016

I tested this PR and it works. Just some minor notes here:

  1. Is there any need for the -e option it to be required for export? It currently is required.
    /dspace/bin/dspace structure-builder -e admin@example.com -x -o dspace-structure.xml

  2. The import requires the root element to be import_structure and produces imported_structure (note the difference). The export produces imported_structure, thus leading to "-There are no top level communities in the source document" error when trying to re-import the exported xml (or re-import the import output xml).

  3. This one actually concerns import, but this would be a good opportunity to address it - import always creates new communities/collections, even if the identifier attribute is specified (which is the case in the import output xml and export xml).
    a) During import, we could attempt to find out whether all the specified handles are free and if they are, use them. If not, for those objects which already exist, offer to create new handles or skip their creation.
    b) Alternatively, if the identifier attribute is present, warn that new handles will be created.

@mwoodiupui
Copy link
Member Author

All of these points are good ones. I'm addressing each. I took the cheap way out on (3) and just issue a warning when 'identifier' attributes are present.

I wonder why we have two different root element names at all.

@mwoodiupui mwoodiupui added the work in progress PR is still being worked on & is not currently ready for review label Mar 21, 2016
@mwoodiupui
Copy link
Member Author

Rebased on current master.

@mwoodiupui
Copy link
Member Author

Rebased again on current master.

@mwoodiupui mwoodiupui removed the work in progress PR is still being worked on & is not currently ready for review label Aug 17, 2017
@mwoodiupui
Copy link
Member Author

Rebased again on current master.

@mwoodiupui mwoodiupui added this to the 7.0 milestone Mar 22, 2018
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Code looks good, but I haven't had a chance to test it. This seems like a logical feature to add in though, as we already have a corresponding import.

@tdonohue tdonohue added the quick win Pull request is small in size & should be easy to review and/or merge label Mar 22, 2018
Copy link
Member

@abollini abollini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By code inspection it looks good +1
I have made only a small suggestion/comment inline in the code to be (eventually) addressed.
It will be nice to provide IT for this feature, import a structure and check that the output match with the import file will be great

@@ -142,13 +171,52 @@ public static void main(String[] argv)
// set the context
context.setCurrentUser(ePersonService.findByEmail(context, eperson));

// Export? Import?
if (line.hasOption('x')) {
exportStructure(context, output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you don't have a logged-in user I guess that you can hit auth issue with communities or collections without Anonymous READ

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user is logged in by line 172. But it is possible that 'String eperson' contains a name that matches no EPerson, and in that case the context will be logged out. We could find the EPerson during command analysis, and exit with an error if -e was given but its value is not a known account. Probably that should be a separate issue. The code doesn't do anything wrong, but the resulting error message is less helpful than it might be. We ought to check for this problem throughout the CLI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see: you perhaps mean that export should require -e as import does. That also can be pushed up to command analysis, so that execution can assume that it has a valid identity to set into the Context.
However, it is still possible to give an identity which is not authorized to access some of the repository structure.

Copy link
Member

@pnbecker pnbecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine by code review. Will give it a test run asap.

@pnbecker
Copy link
Member

I think this can be merged as it is. I would like to see one further improvement, which could be done in a separate pull request. We can create Communities with new identifiers by using communityService.create(parent, context) or we can create Communities using specified identifiers by using communityService.create(null, context, handle). If an identifier does exist in the input document, why don't we try to reuse it?

BufferedWriter out = new BufferedWriter(new FileWriter(output));
out.write(new XMLOutputter().outputString(xmlOutput));
out.close();
new XMLOutputter().output(xmlOutput, output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the export method we use Format.getPrettyFormat() to indent the XML. Why don't we use this for the xml output of the import method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea, but it seems out-of-scope for the present patch and issue which are focused on the new export feature. I think that a new issue and a small patch would be welcome.

@tdonohue tdonohue added the tools: export Related to export of data from the system label Aug 22, 2018
@tdonohue
Copy link
Member

@mwoodiupui : Just to clarify, are you working on Integration tests for this feature? I recall you asking about how to do so on Slack. If you are working on ITs, then I'd suggest we wait to merge for ITs, and then merge immediately after ITs are added.

@pnbecker
Copy link
Member

@mwoodiupui I just send you a PR realizing my idea to import Communities and Collections reusing their old handles if the import file contains those: mwoodiupui#13. Can we include it in this PR or shall I create a separate ticket and PR?

@pnbecker
Copy link
Member

While extending the Code, I tested it. It works great. :-)

@mwoodiupui
Copy link
Member Author

@tdonohue I am working (slowly) on ITs for the feature.

@pnbecker
Copy link
Member

pnbecker commented Sep 5, 2018

@mwoodiupui Could you please take a look on my additions (mwoodiupui#13)? Would you like to integrate them here or shall I create a separate PR?

@mwoodiupui
Copy link
Member Author

This PR was built before DSpace 7 became master, and is rather out-of-sync with the current master. I'm closing it and will submit a new one, rebuilt for v7.

@mwoodiupui
Copy link
Member Author

@pnbecker I think that your patch really should be a separate Jira issue.

@tdonohue tdonohue removed this from the 7.0 milestone Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement quick win Pull request is small in size & should be easy to review and/or merge tools: export Related to export of data from the system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants