Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DS-2593 : Ensure a "tombstone" is left for withdrawn items in OAI-PMH. Fix other minor, obvious OAI bugs. #982

Merged
merged 3 commits into from
Jul 22, 2015

Conversation

tdonohue
Copy link
Member

@tdonohue tdonohue commented Jul 6, 2015

Fix for https://jira.duraspace.org/browse/DS-2593 along with some necessary code cleanup.

This PR does the following:

  • Ensures running ./dspace oai import will update OAI index for withdrawn items. Essentially, withdrawn items are now indexed.
  • Ensures a "tombstone" is retained in OAI-PMH for any withdrawn items. Once an Item is withdrawn, and ./dspace oai import is run, accessing that item in OAI-PMH results in a "tombstone" page which reports <header status="deleted"> as recommended by OAI-PMH. NO METADATA is available in the tombstone. (NOTE: The "tombstone" is only show if you request the item directly via "GetRecord". Withdrawn items are not listed in "ListRecords" requests.)
  • Cleans up some mismanaged Context objects in several OAI classes. There were Context objects that may not have been properly closed if an error occurred, and other which were being "aborted" multiple times (which results in a logged warning).
  • Corrected some improper authorization logic (in XOAI.isPublic() and DSpaceAuthorizationFilter.isShown() which previously was just logging AuthorizationExceptions if an Item was not publicly READable. Both now properly use AuthorizeManager.authorizeActionBoolean() to simply return true/false based on whether Item has public READ permissions.

Please note, this PR simply fixes the bugs in DS-2593 (for possible inclusion in 5.3). It does NOT add a new EventConsumer to auto-update the OAI index (that would need to wait for 6.0), so you still must run ./dspace oai import via cron in order for your OAI index to be updated.

@tdonohue tdonohue added the bug label Jul 6, 2015
@tdonohue tdonohue added this to the 5.3 milestone Jul 6, 2015
@hardyoyo
Copy link
Member

hardyoyo commented Jul 7, 2015

I commented on the ticket, but the comment probably belongs here, so, copy/pasting: I can't figure out how to show a "tombstone" for a withdrawn item in OAI, but I can confirm that, after I withdraw an item and re-run bin/dspace oai import, the withdrawn item no longer displays in the list of available records. Which is the main concern. So, I can confirm that the bug is fixed. I'm willing to test the tombstone a bit more, if someone can tell me how. However, based on what I can test, this looks useful, so +1.

@hardyoyo
Copy link
Member

hardyoyo commented Jul 7, 2015

Well... I tried a bit harder and found the magic URL: http://dspace.vagrant.dev:8080/oai/request?verb=GetRecord&identifier=oai:localhost:10673/14&metadataPrefix=oai_dc (10673/14 is my test withdrawn item). There is a nice little tombstone there. So, no reservation, here's my +1, this works as described.

@kshepherd
Copy link
Member

Tombstone appears in GetRecord requests, using both import and import -c (does require import though, the eventconsumer doesn't handle on its own)

My understanding of OAI-PMH 2.0 is that these tombstones should also appear in selective harvests (set, datestamp matches deleted items) ListIdentifiers requests, too.
See:
http://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords
http://www.openarchives.org/OAI/openarchivesprotocol.html#SelectiveHarvesting

By default, this doesn't happen, because of the bitstreamAccessCondition filter. If I remove that condition, the tombstone for the deleted item appears properly in ListIdentifiers requests.

So I'm generally +1, this is a good PR and tests well, but I propose we either:

  • Amend this PR to include a change to org.dspace.xoai.filter.DSpaceAuthorizationFilter where we exclude withdrawn items from the filter so we still get tombstones (there won't be any metadata access or bitstream references - it's just a stub in ListIdentifiers), or
  • Submit an additional PR to do this, or
  • Create a new XOAI Condition for inclusion in th defaultFilter (withdrawnCondition) that can be logically AND NOT'd with the bitstreamAccessCondition and update default xoai.xml appropriately, or
  • Document the behaviour so users can choose whether to continue using the defaultFilter as it stands

Thoughts?

@helix84
Copy link
Member

helix84 commented Jul 16, 2015

@kshepherd: I agree. Amending DSpaceAuthorizationFilter as part of this PR seems like the best solution to me.

@tdonohue
Copy link
Member Author

@kshepherd and @helix84 - I've been digging in further yesterday, and I should have some updates here in a bit that should do what @kshepherd has suggested. I'm actually looking more closely at the approach he suggested in his third bullet:

Create a new XOAI Condition for inclusion in th defaultFilter (withdrawnCondition) that can be logically AND NOT'd with the bitstreamAccessCondition and update default xoai.xml appropriately...

Looking again at the codebase, I now realize that was likely the "proper" way to make this happen. It's also the most flexible, as it means that institutions could simply turn "off" (comment out) the "withdrawnCondition" in their xoai.xml config file, if they don't want withdrawn items to be visible at all.

A new commit should be coming sometime today.

@helix84
Copy link
Member

helix84 commented Jul 16, 2015

@tdonohue: I should have said why I think hardcoding the bugfix was the better solution. The bug is that what we're doing (not keeping tombstone for withdrawn item) is against the spec. Both solutions (via hardcoding and via configuration) fix the problem, but the later makes it easy to revert your DSpace to be a non-compliant OAI-PMH provider. It has been Joao's philosophy to make xoai flexible (hence filters, contexts and metadata formats are available), but also very compliant and to make it hard to go against the spec. As an example of this philosophy, there have been user questions how to change the item header (to add additional identifiers, IIRC) - which is intentionally not possible without changing xoai source code (that is where the xml skeleton is generated).

@tdonohue
Copy link
Member Author

@helix84 - It's actually not against the OAI-PMH spec to not keep around tombstones. The "transient" level of support doesn't require tombstones. But, that being said, I think the "sane" default is to have a tombstone, as it is still recommended by "transient" support. Here's the definition of "transient" support from the spec (emphasis is my own):

transient - the repository does not guarantee that a list of deletions is maintained persistently or consistently. A repository that indicates this level of support may reveal a deleted status for records.

When we switched to "transient" we allowed for more flexibility already. I'm just making sure that flexibility is also now possible in xoai.xml.

…rizationFilter. Update configs, and fix bugs.
@tdonohue
Copy link
Member Author

I've just pushed up a new commit (c343c40) which does the following (based on what @kshepherd suggested):

  • Creates a new DSpaceWithdrawnFilter to configure OAI-PMH to report withdrawn items (as tombstones) in ListRecords, ListIdentifiers and GetRecord requests
  • Updates xoai.xml to use that new DSpaceWithdrawnFilter by default in all contexts (default, DRIVER and OpenAIRE). (Sidenote: Fixes a bug where OpenAIRE was mis-configured currently to not even check for public access rights on Items).

Other minor changes/fixes:

  • Adds comments to xoai.xml to better describe various filters.
  • Adds basic styles to style.xsl to display which items are withdrawn when they appear in lists.
  • Fixes minor bugs in the OrFilter and NotFilter (corresponding to <Or> and <Not> in xoai.xml) as they were not working properly.
  • Fixes a minor bug in DSpaceDatabaseQueryResolver for Oracle.

Yea, I lumped in a few extra minor bug fixes here, just cause I stumbled on them while I was in the code.

I've tested these changes locally and it all seems to function properly. Withdrawn items now also appear in ListRecords, ListIdentifiers, and GetRecord requests via OAI-PMH.

Additional verification tests are welcome! I want to be sure I didn't overlook anything.

@kshepherd
Copy link
Member

@tdonohue All tests look good, nice job :) 👍 +1

I agree that this option is the best, because we don't want to prevent people running transient OAI if they want, and it works out clearer to the user this way - it's all spelled out in xoai.xml, a nice accessible config file.

tdonohue added a commit that referenced this pull request Jul 22, 2015
DS-2593 : Ensure a "tombstone" is left for withdrawn items in OAI-PMH. Fix other minor, obvious OAI bugs.
@tdonohue tdonohue merged commit 7787693 into DSpace:master Jul 22, 2015
@tdonohue tdonohue deleted the DS-2593 branch July 22, 2015 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants