Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oai harvesting setup #2491

Merged
merged 24 commits into from Oct 9, 2019
Merged

Oai harvesting setup #2491

merged 24 commits into from Oct 9, 2019

Conversation

@jpelgrims-atmire
Copy link
Contributor

jpelgrims-atmire commented Aug 21, 2019

These changes are based on the following contract: DSpace/Rest7Contract#66

Added an endpoint for updating/creating/retrieving collection harvest settings (GET/PUT /api/core/collections/<collection_uuid>/harvester). New settings are verified before updating the collection settings. There is also a new endpoint (/api/config/harvestermetadata) for seeing which metadata formats DSpace can harvest from. New controllers were added for both of these endpoints (CollectionHarvestSettingsController and HarvesterMetadataController). The harvesting verification logic was taken from OAIHarvester and put in a separate service (HarvestedCollectionService). The standard tests were written for both controllers, and some classes were added for reducing code reuse in these tests (MetadataConfigsMatcher, MockHarvestedCollectionServiceImpl).

Raf-atmire added 4 commits Sep 3, 2019
Conflicts:
	dspace-server-webapp/src/main/java/org/dspace/app/rest/model/hateoas/CollectionResource.java
…and refactored code
Copy link
Member

tdonohue left a comment

@jpelgrims-atmire : I gave this a code review today (sorry for the delay in feedback). Overall, it looks great, but I have a few minor suggestions/improvements inline below. I'll also note that this PR currently is failing in Travis because of minor code style issues. I haven't tested it yet, but plan to do so soon.


public HarvestedCollectionResource(HarvestedCollectionRest data) {
super(data);
embedResource("metadata_configs", data.getMetadataConfigs());

This comment has been minimized.

Copy link
@tdonohue

tdonohue Sep 4, 2019

Member

I believe this should be changed from "metadata_configs" to "harvestermetadata" per discussion in the REST contract here: https://github.com/DSpace/Rest7Contract/pull/66/files#r308300750

Raf-atmire added 2 commits Sep 17, 2019
@Raf-atmire

This comment has been minimized.

Copy link
Contributor

Raf-atmire commented Sep 24, 2019

@tdonohue : I went through your feedback and applied the requested changes and added the additional tests that were rightfully mentioned to be missing. Could you have another look when you get the chance? Thank you!

Copy link
Member

tdonohue left a comment

Thanks for the updates. On second glance I found what looks to be another inconsistency with the existing REST Contract: DSpace/Rest7Contract#66

this.lastHarvested = lastHarvested;
}

@LinkRest(linkClass = HarvesterMetadataRest.class, name = "metadata_configs", optional = true)

This comment has been minimized.

Copy link
@tdonohue

tdonohue Sep 25, 2019

Member

I think this is also wrong. I believe this should be name=harvestermetadata as well. To clarify, the REST Contract now uses the terminology "harvestermetadata" (which is the same terminology in the existing DSpace Documentation and also the REST endpoint name: https://github.com/DSpace/Rest7Contract/blob/master/harvestermetadata.md). So we should not be returning anything named metadata_configs. This looks like it might be the last instance of metadata_configs (beyond some awkwardly named variables, like the one in this method), but I haven't been able to test this PR yet to verify.

@Raf-atmire

This comment has been minimized.

Copy link
Contributor

Raf-atmire commented Sep 30, 2019

@tdonohue : I applied the requested changes, could you have a look again when you get some time for it? Thanks!

Copy link
Contributor

paulo-graca left a comment

This PR works as expected. Thank you @jpelgrims-atmire ! Meanwhile, during the tests I found a related bug - https://jira.duraspace.org/browse/DS-4353 and the solution for it seems trivial.

Copy link
Member

abollini left a comment

thanks for the PR, the IT covers the required scenarios and I agree about moving more code to a service class. Unfortunately, I'm not sure about how to proceed here because I reviewed only now the REST contract and essentially I disagree with it.

The issue that I see is that we are going to create a nested endpoint /harvester inside the collection endpoint, this is generally not what we do and the spring-data-rest community do.
Where possible we should try to map our class to a repository and expose it as /{category}/{class} endpoint, nesting should be reserved to manage relations and association between the classes

This is also close to the principle of single responsibility, if we have a /harvester/harvestedcollections endpoint it will be more natural to implement all the HTTP verbs that we need, following the general guide (repository class, rest class, resource, converter).

A new category oaiharvester can be created with a /metadata endpoint to expose what is currently under /harvestermetadata as a proper REST collection. Another /harvestedcollections (or /settings) endpoint can be used to expose the harvestedcollection with search endpoint by collection. In this way we don't need to have a custom controller and we will follow the front-controller and repository patterns of the Spring-Data-Rest. We can implement a search by collection method, to expose this information in the collection home page and use the standard findAll to build the administrative panel where we need to list all the collection with harvesting settings etc.
The current contract and implementation is an exception in our architecture and this will make more difficult the implementation and maintenance of general principles.

Please note that if we decide to move to a different REST contract as the one roughly described above we can still reuse most of the code here such as the designed IT

public void PutUnProcessableEntityIfIncorrectSettings() throws Exception {
String token = getAuthToken(admin.getEmail(), password);

JSONObject json = createHarvestSettingsJson("METADATA_ONLY", "https://dspace.mit.edu/iao/request", "col_1721.1_114174", "bc");

This comment has been minimized.

Copy link
@abollini

abollini Oct 3, 2019

Member

should we avoid to mention real installation?

This comment has been minimized.

Copy link
@tdonohue

tdonohue Oct 3, 2019

Member

@jpelgrims-atmire : I agree with @abollini on this comment. Can we remove the "real" URL https://dspace.mit.edu/iao/request from this line? Simply replace it with some sort of pretend URL like: https://mydspace.edu/oai/request The URL here should not matter for the ITs.

Copy link
Member

tdonohue left a comment

@jpelgrims-atmire : Overall this code is looking good. While I'm going to officially approve this, I won't merge it yet. I'd appreciate two very minor corrections before this can be merged (see below).

As discussed in today's DSpace 7 Meeting (https://wiki.duraspace.org/display/DSPACE/2019-10-03+DSpace+7+Working+Group+Meeting#id-2019-10-03DSpace7WorkingGroupMeeting-Notes), @abollini's additional refactoring comments will be added into a bug ticket (as it will require redesign/refactoring of the REST Contract). That will allow this initial implementation to go forward "as-is", but also ensure we schedule corrections to the REST contract & implementation (once a volunteer can be found).

public void PutUnProcessableEntityIfIncorrectSettings() throws Exception {
String token = getAuthToken(admin.getEmail(), password);

JSONObject json = createHarvestSettingsJson("METADATA_ONLY", "https://dspace.mit.edu/iao/request", "col_1721.1_114174", "bc");

This comment has been minimized.

Copy link
@tdonohue

tdonohue Oct 3, 2019

Member

@jpelgrims-atmire : I agree with @abollini on this comment. Can we remove the "real" URL https://dspace.mit.edu/iao/request from this line? Simply replace it with some sort of pretend URL like: https://mydspace.edu/oai/request The URL here should not matter for the ITs.

public void PutUnprocessableEntityIfHarvestTypeIncorrect() throws Exception {
String token = getAuthToken(admin.getEmail(), password);

JSONObject json = createHarvestSettingsJson("INCORRECT_HARVEST_TYPE", "https://dspace.mit.edu/oai/request", "col_1721.1_114174", "dc");

This comment has been minimized.

Copy link
@tdonohue

tdonohue Oct 3, 2019

Member

Again, can we remove the "real" URL https://dspace.mit.edu/oai/request from this line? Simply replace it with some sort of pretend URL like: https://mydspace.edu/oai/request The URL here should not matter for the ITs.

@Raf-atmire

This comment has been minimized.

Copy link
Contributor

Raf-atmire commented Oct 7, 2019

@tdonohue @abollini : I've updated the URL in the tests to be a pretend URL as suggested. Could you have another look when you get the chance? Thanks!

Copy link
Member

tdonohue left a comment

👍 This looks good to me know. Thanks @jpelgrims-atmire and @Raf-atmire .

As this is now at +2, I'm going to go ahead and merge it. As noted above, a followup ticket will be created by @abollini to describe how we can improve the Contract here (and eventually implementation) based on the comments here: #2491 (review)

@tdonohue tdonohue merged commit 6986b21 into DSpace:master Oct 9, 2019
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.5%) to 35.701%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.