What are the allowed search fields for the Search API q parameter? #2558

leeper · 2015-09-20T13:21:14Z

I'm looking at the Search API Docs. What are the allowed fields for the q parameter? It appears to include the list of Dataverse DB Elements mentioned in the metadata crosswalk but it also appears to include other fields not listed there. Is there a complete list? And can the documentation be updated accordingly?

The text was updated successfully, but these errors were encountered:

pdurbin · 2015-09-21T14:03:46Z

@leeper it depends! :) @markwilkinson asked about this too, as I mentioned in #2291 .

At the very least, I could document the fact that the fields supported by an installation of Dataverse 4 depend on which domain-specific metadata schemas (metadata blocks) have been enabled. http://guides.dataverse.org/en/4.1/user/appendix.html#metadata-references contains a list as of 4.1 but there are other site-specific ("custom") metadata blocks used only by Harvard as of this writing. All metadata blocks are stored as TSV files and then loaded into the system at installation time: https://github.com/IQSS/dataverse/tree/v4.1/scripts/api/data/metadatablocks . When we update these tsv files, we add them to the list of data-driven fields we index into Solr: https://github.com/IQSS/dataverse/blob/v4.1/conf/solr/4.6.0/schema.xml#L328 . You'll see references to the "custom" Harvard-specific blocks like GSD and PSI in that Solr schema config.

Parsing those TSV files is a little rough (#2551) and I wouldn't wish it on any API user so perhaps we should allow API users to interrogate a running Dataverse installation for a list of supported metadata fields. I can imagine this being part of the Search API itself. Maybe you call into /api/search/fields or something...

I recently stumbled upon the fact that I can go to https://dataverse.harvard.edu/api/metadatablocks to find a list of metadata blocks as documented at http://guides.dataverse.org/en/4.1/api/native-api.html#metadata-blocks but I didn't quickly find how to list the fields within each metadata block. I did add an "admin-only" API endpoint which I mentioned at #2357 (comment) that lets me list all the fields from http://localhost:8080/api/admin/datasetfield but the output needs a lot of work. Also, that API endpoint only shows the data-driven fields, not the static ones in SearchFields.java I mentioned in #2291. (At some point we'll probably want to change these static fields to be fed from the database for #2039 .)

Oh, and some sensitive fields such as for email addresses aren't indexed for privacy reasons per #759 .

Going to an Advanced Search Page such as https://dataverse.harvard.edu/dataverse/harvard/search for the root dataverse can be a help in figuring out which fields are searchable but as #2353 notes right now you can't see the domain-specific metadata blocks at the root. I mention this because different blocks can be enabled at different dataverses within the tree of dataverses in a single Dataverse installation. So maybe when you ask the Search API for a list of supported fields you could supply the dataverse of interest and it will tell you which metadata blocks are enabled. Or rather, it would tell you the search fields that are available based on the metadata blocks enabled from that dataverse (i.e. social science vs. astronomy).

@leeper I'm sure this is way more information than you wanted! Thanks for opening this issue. :)

To sum up, I can at least improve the Search API documentation a bit. I should probably add something to the Search API so that API users can simply get a list of fields they can search on, perhaps with respect to where in the tree of dataverses they are searching (the root dataverse vs. a subdataverse).

pdurbin · 2015-09-21T14:52:15Z

@leeper I looked at the code and played around with the already existing "GET http://$SERVER/api/metadatablocks/$identifier" endpoint documented at http://guides.dataverse.org/en/4.1/api/native-api.html#metadata-blocks

Perhaps you and @markwilkinson and anyone else interested in knowing which fields are supported could play around with this metadatablocks API endpoint and give us feedback on it. It looks like it was developed by @michbarsinai and it seems quite useful. Here's how I can imagine it being used:

Get a list of metadata blocks that are enabled

curl -s https://apitest.dataverse.org/api/metadatablocks | jq .data[].name -r

citation
geospatial
socialscience
astrophysics
biomedical
journal

For each of the metadata blocks, show the fields

curl -s https://apitest.dataverse.org/api/metadatablocks/citation | jq . | head -20

{
  "status": "OK",
  "data": {
    "id": 1,
    "name": "citation",
    "displayName": "Citation Metadata",
    "fields": {
      "title": {
        "name": "title",
        "displayName": "Title",
        "title": "Title",
        "type": "TEXT",
        "watermark": "Enter title...",
        "description": "Full title by which the Dataset is known."
      },
      "subtitle": {
        "name": "subtitle",
        "displayName": "Subtitle",
        "title": "Subtitle",
        "type": "TEXT",
...

In the output above the field to search on is listed under "name" such as "title" or "subtitle".

Of course, these are only the data-driven fields at the dataset level, not the static fields in SearchFields.java I mentioned, but some of those fields are aren't searchable by design (though we recently made more of them searchable as part of #2038).

markwilkinson · 2015-09-22T05:54:48Z

Thanks for the update! :-)

Mark

leeper · 2015-09-22T12:34:16Z

@pdurbin Excellent! This response is a lot to parse! I'll take a look and see what I can do. I guess the minimum solution is to provide a flexible interface and then I can build on features that help tailor use of the API when there are known metadata schemes. Being able to query what those are for any particular installation would definitely be a helpful feature of the search API.

pdurbin · 2016-10-04T19:50:35Z

#1510 is related in the sense that people don't know what subjects are allowed when creating a dataset (and it's a required field).

pdurbin · 2019-08-21T13:59:37Z

In pull request #6107 I at least linked back to this issue so API users can get a sense of how they can know what the allowed search fields are. Here's the commit: d3a5b2f

If anyone wants to help with an actual solution to this issue, I'm happy to mentor them. I'm thinking that for now we could just list the "out of the box" fields in the API Guide.

Jerry-Ma · 2022-04-12T13:55:09Z

Hi, I was trying to find a reference to the query string for searching particular files. However it seems the above discussion is more about searching dataverse/dataset metadata. Could anyone point me to the place that show the key we could use for searching files?

The use case for me is that I am creating a script uploading files to a dataset via the API, and I would like to check if a particular file named "foo" (with filepath foo.txt) already exists in a certain dataset of global_id="doi:10.5072/FK2/J9EK29", which is within the dataverse identified by id="bar"

So far I the furthest point I've got is the following:

api/search?q=fileName:foo&type=file&subtree=bar&sort=date&order=desc.

There are a couple of issues with this:

The subtree parameter does not recognize the dataset persistent ID, so I had to use the dataverse identifier "bar". However this resulted in many files from multiple dataset with similar name foo.
The fileName:foo does not restrict the filename to be exactly foo. Instead, files named foo-1 foo-2 are also returned.
Any ideas?

qqmyers · 2022-04-12T14:13:16Z

The DVUploader<https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader> uses the /api/datasets/:persistentId/versions/:latest/files call to get the list of files in a dataset. You might also want to look at pyDataverse<https://github.com/gdcc/pyDataverse>. Both of these tools might be things you could use to upload files but they also would show you the api calls you might want to make.

…

-- Jim Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows From: Zhiyuan ***@***.***> Sent: Tuesday, April 12, 2022 9:55 AM To: ***@***.***> Cc: ***@***.***> Subject: Re: [IQSS/dataverse] What are the allowed search fields for the Search API q parameter? (#2558) Hi, I was trying to find a reference to the query string for searching particular files. However it seems the above discussion is more about searching dataverse/dataset metadata. Could anyone point me to the place that show the key we could use for searching files? The use case for me is that I am creating a script uploading files to a dataset via the API, and I would like to check if a particular file named "foo" (with filepath foo.txt) already exists in a certain dataset with known global_id="doi:10.5072/FK2/J9EK29", which is un dataverse id="bar" So far I the furthest point I've got is the following: api/search?q=fileName:foo&type=file&subtree=bar&sort=date&order=desc. There are a couple of issues with this: * The subtree parameter does not recognize the dataset persistent ID, so I had to use the dataverse identifier "bar". However this resulted in many files from multiple dataset with similar name foo. * The fileName:foo does not restrict the filename to be exactly foo. Instead, files named foo-1 foo-2 are also returned. Any ideas? — Reply to this email directly, view it on GitHub<#2558 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABTLRT64M6D3SYKHUGUYP53VEV6EVANCNFSM4BQB7RYA>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Jerry-Ma · 2022-04-12T14:23:28Z

@qqmyers

Thank you for the links. I'll take a look at DVUploader in detail. The direct upload with storage identifier is gonna also be useful for our use case, because we also have our own storage service (not amazon S3).

I am already using pyDataverse for creating datasets and uploading datafiles. It works great so far for me, but lacks certain logics (like checking if datafile already exists, etc) that I need to implement on my own. The repo of my mentioned workflow is here: https://github.com/toltec-astro/dvpipe.

Just a bit background, this effort is part of the software infrastructure that we are building for the Large Millimeter Telescope. We have setup a dataverse instance at https://dp.lmtgtm.org and plan to use it as the main channel to distribute the data products produced by the software pipelines that reduces the data taken by various instruments on the LMT. This dvpipe is to be the automation pipeline that packages the data reduction pipeline outputs and sends them to the dataverse server.

qqmyers · 2022-04-12T14:34:24Z

Nice! (I was involved with the Dark Energy Survey telescope data management project a f ew years ago.) W.r.t. pyDataverse, it@skasberger is open to pull requests so if there is logic you think should go there, please consider adding to it. (In particular, it would be great to get the direct upload capabilities in there.)

pdurbin · 2022-04-12T15:10:22Z

I would like to check if a particular file named "foo" (with filepath foo.txt) already exists in a certain dataset of global_id="doi:10.5072/FK2/J9EK29", which is within the dataverse identified by id="bar"

The approach suggested by @qqmyers to download the list of files is probably the most reliable but I thought I'd chime in specifically about the Search API question above.

@Jerry-Ma you can search against the parentIdentifier field with the DOI of the dataset like this:

https://dataverse.harvard.edu/api/search?q=name:2019-02-25.tab&fq=parentIdentifier:doi\:10.7910/DVN/TJCLKP

Please note:

You have to escape the colon in the DOI with a backslash.
If you know the database id of the dataset, you can search against parentId.
I didn't include subtree but you could, I suppose. Subtree only operates on dataverse collections, not datasets.
From a quick look I don't believe we index the file path. That's why I say the other approach is more reliable since the list of files will include both the file path and the name.

Also, if you'd like to include your installation on our map, please feel free to open an issue at https://github.com/IQSS/dataverse-installations !

pdurbin added Feature: Search/Browse Feature: API labels Sep 21, 2015

pdurbin mentioned this issue Sep 21, 2015

Feature request: implement search API IQSS/dataverse-client-python#21

Open

leeper mentioned this issue Sep 22, 2015

Expand documentation IQSS/dataverse-client-r#1

Closed

7 tasks

mercecrosas modified the milestone: In Review Nov 30, 2015

mheppler added the Component: Documentation label Jan 26, 2016

scolapasta added Status: Triaged and removed Status: Dev labels Jan 28, 2016

scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016

pdurbin mentioned this issue Aug 15, 2016

Harvest: Provide more documentation and examples on how to define common harvest sets #3262

Closed

pdurbin mentioned this issue Jan 26, 2017

Unclear how to find more granularity of files beyond "File Type" (application, tabulardata, data, etc.) #3597

Closed

pdurbin added Hackathon: Low Hanging Fruit and removed Status: Triaged labels Jun 5, 2017

pdurbin added Feature: API Guide and removed Feature: API Component: Documentation Feature: Search/Browse labels Jun 23, 2017

pdurbin added the User Role: API User Makes use of APIs label Jul 4, 2017

pdurbin added Help Wanted: Documentation Mentor: pdurbin labels Nov 9, 2017

djbrooke added this to Inbox 🗄 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 8, 2019

pdurbin added a commit that referenced this issue Aug 21, 2019

link to "What are the allowed search fields?" issue #2558

d3a5b2f

pdurbin removed the Hackathon: Low Hanging Fruit label Aug 21, 2019

pdurbin mentioned this issue Nov 9, 2021

Query Dataverse for mandatory metadata fields via API #6978

Closed

pdurbin added the Hackathon: More APIs Add new or missing API endpoints label Oct 10, 2022

briri mentioned this issue Sep 22, 2023

Investigate APIs for various repositories CDLUC3/dmsp_aws_prototype#58

Open

pdurbin added the Type: Suggestion an idea label Oct 7, 2023

DS-INRA added this to ⚠️ Needed/Important in Recherche Data Gouv (formerly Data INRAE) Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the allowed search fields for the Search API q parameter? #2558

What are the allowed search fields for the Search API q parameter? #2558

leeper commented Sep 20, 2015

pdurbin commented Sep 21, 2015

pdurbin commented Sep 21, 2015

markwilkinson commented Sep 22, 2015 •

edited by pdurbin

Loading

leeper commented Sep 22, 2015

pdurbin commented Oct 4, 2016

pdurbin commented Aug 21, 2019

Jerry-Ma commented Apr 12, 2022 •

edited

Loading

qqmyers commented Apr 12, 2022 via email

Jerry-Ma commented Apr 12, 2022 •

edited

Loading

qqmyers commented Apr 12, 2022

pdurbin commented Apr 12, 2022

What are the allowed search fields for the Search API q parameter? #2558

What are the allowed search fields for the Search API q parameter? #2558

Comments

leeper commented Sep 20, 2015

pdurbin commented Sep 21, 2015

pdurbin commented Sep 21, 2015

Get a list of metadata blocks that are enabled

For each of the metadata blocks, show the fields

markwilkinson commented Sep 22, 2015 • edited by pdurbin Loading

leeper commented Sep 22, 2015

pdurbin commented Oct 4, 2016

pdurbin commented Aug 21, 2019

Jerry-Ma commented Apr 12, 2022 • edited Loading

qqmyers commented Apr 12, 2022 via email

Jerry-Ma commented Apr 12, 2022 • edited Loading

qqmyers commented Apr 12, 2022

pdurbin commented Apr 12, 2022

markwilkinson commented Sep 22, 2015 •

edited by pdurbin

Loading

Jerry-Ma commented Apr 12, 2022 •

edited

Loading

Jerry-Ma commented Apr 12, 2022 •

edited

Loading