Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6300/6396 additional dataset metadata via search api #6441

Merged
merged 14 commits into from Dec 11, 2019

Conversation

sekmiller
Copy link
Contributor

@sekmiller sekmiller commented Dec 6, 2019

What this PR does / why we need it: Adds requested dataset information on requests via the Search API. Does not effect searches used by Search Include Fragment (loading the home page, etc.)

Which issue(s) this PR closes:

Closes #6300 and closes #6396

Special notes for your reviewer: I ended up not adding indexing for this because the fields requested (create date, update date, data source, related publications, etc. aren't really useful as indexes. I made sure that the DB hits for collection the additional info only occur for retrieval of datasets via the SEARCH API.

Suggestions on how to test this: Add additional data to a dataset and see that it is returned via the SEARCH API. Run a search with that will return a draft dataset (while including a key that will give you access to that draft) and see that the version state is DRAFT and the published version is also returned if applicable.

Does this PR introduce a user interface change?: No.

Is there a release notes update needed for this change?: Not really. Could be mentioned as an enhancement

Additional documentation:

@coveralls
Copy link

coveralls commented Dec 6, 2019

Coverage Status

Coverage decreased (-0.03%) to 19.514% when pulling 6851c28 on 6300-additional-dataset-metadata-via-search-api into 751ca19 on develop.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions for the API Guide:

  • update docs for "query_entities" (true vs. false no longer has an effect)
  • update "Basic Search Example" (or add a new example?) to show the new expected output (I suspect this will help with QA and writing the release notes)

Questions (@sekmiller and I talked about some of these already):

  • Are the following fields interesting enough to be made into facets?
    • country
    • state
    • city
    • publisher name
  • Are we concerned about performance (new database hits) when "per_page" is set to the max of 1000?
  • In the future, can we imagine refactoring the code to take the performance hit at index time instead of query time?

@@ -136,7 +136,7 @@ public Response search(
paginationStart,
dataRelatedToMe,
numResultsPerPage,
queryEntities
true //SEK get query entities always for search API additional Dataset Information 6300 12/6/2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "true" vs. "false" for query_entities has no affect in this pull request, the documentation should be changed to reflect this. Here's a screenshot of how it's documented as of 4.18.1:

Screen Shot 2019-12-06 at 3 05 25 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the parameter list (removed the reference to query_entities) and added an example with the new data items shown.

@pdurbin
Copy link
Member

pdurbin commented Dec 6, 2019

@sekmiller I can't find the issue I half-remember Vito opening but can you please take a look at the following issue to see if it's easy to add in? To be honest, I think it might be best to have some discussion with the user first. My first thought is that "parent id" (database id of the "owner") feels pretty safe to add. 😄

@pdurbin
Copy link
Member

pdurbin commented Dec 6, 2019

Me again. @sekmiller and I just talked about how "city" for example, is already indexed into Solr:

https://github.com/IQSS/dataverse/blob/v4.18.1/conf/solr/7.3.1/schema_dv_mdb_fields.xml#L16

So I guess we should get "city" and friends from Solr (rather than the database) since they're already in there. It should be faster.

@djbrooke djbrooke changed the title 6300 additional dataset metadata via search api 6300/6396 additional dataset metadata via search api Dec 9, 2019
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sekmiller Could you please address the suggestions from @pdurbin, regarding the documentation? (like the part about the query_entities parameter - it makes sense that it probably should say something like "currently hard-coded to always be 'true'", or something similar)
I was surprised initially that you chose not to index anything extra - I generally feel we can afford to index more stuff; but I thought about it for a while and I like the argument that this is not something that we have a use for on the dataverse page; and that in the context of the Search API, we already have the entity instantiated and can get this extra info from it... There may be a counter-argument against allowing the functionality in the Search API and the page search to diverge farther apart... But I feel like this is probably outweighed by not having to force an extra full reindex on everybody unless strictly necessary.
So I'm ok with this going into QA once the doc request is addressed.

@sekmiller sekmiller removed their assignment Dec 11, 2019
@pdurbin
Copy link
Member

pdurbin commented Dec 11, 2019

@sekmiller and I just chatted about his recent commits and they're looking to me and are in light with comments by @landreev above as well.

a33bcdd updates the example output which should help QA efforts

dba46e8 mentions the removal of a parameter from the Search API and I added a release note about this in 1278522

I'll send this to QA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
6 participants