Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-1120] AIP Backup & Restore : SITE AIP has a different checksum everytime when orphaned Collection/Community groups exist #4492

Closed
dspace-bot opened this issue Feb 7, 2012 · 1 comment
Milestone

Comments

@dspace-bot
Copy link

Imported from JIRA [DS-1120] created by tdonohue

When a DSpace instance contains one or more "orphaned" Community/Collection groups, this causes issues with the SITE AIP generation of the AIP Backup & Restore (METS) tools.

By "orphaned" Community/Collection groups, I mean a group of the form "COMMUNITY_ADMIN" or "COLLECTION_SUBMIT" where the associated Community or Collection no longer exists in the system. Unfortunately, DSpace currently does a bad job of making sure all associated Groups are also cleaned up. This sometimes leaves several "orphaned" groups that are likely no longer in usage (unless a DSpace Admin still uses it as a sub-group of a larger group).

When exporting a SITE AIP, the AIP Backup & Restore tool needs to translate all Community/Collection groups into a format like "COLLECTION__ADMIN" (as the internal IDs have no meaning once the AIP is outside of DSpace, and they cannot be preserved between DSpace instances).

When the AIP Backup & Restore tool encounters an orphaned group, it renames it to a random name like: "GROUP__COLLECTION_ADMIN" (because the group is orphaned, it cannot be translated into a Handle).

Unfortunately, this random naming scheme backfires as it causes the MD5 Checksum of the SITE AIP to be different every time it is generated. This is extremely problematic as this means that the SITE AIP appears to always be different from a remote backup copy (even if the only difference is that a different was generated for these groups).

In essence, this is a long-winded way of saying that the AIP Backup & Restore tool needs to avoid generating random Group names on export. Rather, the exported group names need to be repeatable in every manner.

Instead, when exporting to an AIP, I suggest renaming orphaned groups into a standard format like: "ORPHANED_COLLECTION_GROUP__ADMIN". This naming format lets Admins know that it was determined to be an orphaned group (so it likely can/should be cleaned up if it isn't being used as a sub-group elsewhere). It also insures the new group name is still unique (at least in the AIP) & repeatable, by using the old internal Object ID of its orphaned parent.

I've attached a proposed patch to fix this issue as described above (see PackageUtils.patch). This issue (although small) is extremely problematic for folks using AIP Backup & Restore. I'd suggest we may need to do a 1.8.2 release to push this fix out sooner rather than later.

@dspace-bot
Copy link
Author

tdonohue said:

Patch above was applied to both 1.8.x branch and Trunk. Therefore this is fixed in both 1.8.2 and 3.0 releases. See r6939 and r6940.

Documentation was also updated to describe this new format for renaming orphaned groups in AIPs: https://wiki.duraspace.org/display/DSDOC18/DSpace+AIP+Format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant