Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a javascript for the frontend that supports Fundref #9150

Closed
2 tasks
mreekie opened this issue Nov 8, 2022 · 29 comments · Fixed by #9402
Closed
2 tasks

Create a javascript for the frontend that supports Fundref #9150

mreekie opened this issue Nov 8, 2022 · 29 comments · Fixed by #9402
Labels
D: 5 Core PIDs Deliverable Increment defining how we support the 5 core PIDs NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc Size: 80 A percentage of a sprint. 56 hours.
Milestone

Comments

@mreekie
Copy link

mreekie commented Nov 8, 2022

Definition of Done:

  • front end is created
  • Code, javascript and a test metadatablock in the other repo

(i.e. not production )

@mreekie
Copy link
Author

mreekie commented Nov 8, 2022

Jim suggests the following as a rough breakdown on a backlog for the backend:

  • identify the official source of the vocabulary,
  • discover whether there are services available via API to query them,
  • discover if there are existing vocabulary browsers that can serve as examples or provide source code,
  • identify a viable identifier for the terms in the vocabulary,
  • create a simple example browser to demonstrate edit/display in Dataverse,
    investigation of whether the vocabularies are internationalized and if so, how the translations can be accessed, decide which field(s) in Dataverse they should be applied to, etc.

Jim's notes on limitations that the current impelmentations have
Here are some limitations in the current mechanism and/or example Javascripts that may need to be overcome:

  • Current examples either handle all subfields (i.e. for author/creator which has name, affiliation, identifier type and identifier) or just one. We have not made examples where, for example, a user can pick a name from ORCID but then manually change the affiliation. Issues like this could potentially be handled in Javascript but might also be simplified if Dataverse’s subfields were managed differently. For example, the demonstration of ORCID used a single field for the ORCID identifier, which was displayed with name/id together rather than having three subfields for name/id type/ and id. Changing the main author/creator to be similar could make Javascript development easier.
  • I18n requires a service able to provide translations in a specific Json format (i.e. as SKOSMOS does) - for registries, i.e. of people by name, may not involve i18n, but actual vocabularies, a translation service, or a connection with some SKOSMOS server that is a proxy for the underlying service may be needed.

Working From the users's point of view

  • If at all possible can we work this from a functional delivery of useful functionality based on a real end user?
  • I talked with Julian and he is familiar with workflows that researchers use and so can be a person to start with.
  • We can have a technical stakeholder and and a user stakeholder.

@mreekie
Copy link
Author

mreekie commented Jan 10, 2023

priority discussion with Stefano;

  • Size this.
  • If it's not overly large, get it to the top of the sprint ready backlog
  • Moved from NIH Deliverables Backlog to ordered backlog

@mreekie
Copy link
Author

mreekie commented Jan 11, 2023

Top priority for upcoming sprint

@mreekie
Copy link
Author

mreekie commented Jan 11, 2023

sizing:
For this task someone will need to look at example java scripts and the funding field that we need to populate. e.g. is it a single field or does it have child fields, etc. From there look at our sample scripts. Then look at the crossref fundref API and figure out how to use it from javascript
No one has done this before
We would probably be using the external vocab repo for sample script.

Size as a 33.
This is work that if done repeatedly this would become smaller - like a 10, but here we don't have the experience..

@qqmyers qqmyers added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Jan 11, 2023
@mreekie mreekie added this to This Sprint 🏃‍♀️ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) via automation Jan 11, 2023
@pdurbin pdurbin moved this from This Sprint 🏃‍♀️ to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 17, 2023
@pdurbin pdurbin self-assigned this Jan 17, 2023
@jggautier
Copy link
Contributor

@pdurbin, for usability testing the workflow with the javascript, would it be possible to either create a test server or to use Demo Dataverse? If we need to add new fields or think it would be better to, would it be easier to create a test server, like spinning something up on AWS?

To figure out who best to contact for testing, I'm reviewing metadata in the Harvard repository to find users who most often add funding information.

While I'm doing that it makes sense to look at the extent of #4859 (how many users enter funding metadata in both fields). Might be possible to learn more about that, too.

@pdurbin
Copy link
Member

pdurbin commented Jan 18, 2023

@jggautier sure, a demo server sounds good. Here's a quick dump from the first entry (NSF) from the first API endpoint listed at https://www.fundref.org/documentation/funder-registry/funder-data-via-the-api/

curl -s https://api.crossref.org/funders | jq '.message.items[0]'
{
  "id": "100000001",
  "location": "United States",
  "name": "National Science Foundation",
  "alt-names": [
    "U.S. National Science Foundation",
    "USA NSF",
    "USNSF",
    "NSF",
    "US National Science Foundation",
    "US NSF"
  ],
  "uri": "http://dx.doi.org/10.13039/100000001",
  "replaces": [],
  "replaced-by": [],
  "tokens": [
    "national",
    "science",
    "foundation",
    "us",
    "national",
    "science",
    "foundation",
    "usa",
    "nsf",
    "usnsf",
    "nsf",
    "us",
    "national",
    "science",
    "foundation",
    "us",
    "nsf"
  ]
}

That DOI for NSF ( http://dx.doi.org/10.13039/100000001 ) redirects to the HTTPS version ( https://dx.doi.org/10.13039/100000001 ) which redirects to http://data.crossref.org/fundingdata/funder/10.13039/100000001 where even more info is shown:

{
    "country": {"resource": "http://sws.geonames.org/6252001/"},
    "address": {"postalAddress": {"addressCountry": "usa"}},
    "inScheme": {"resource": "http://data.crossref.org/fundingdata/vocabulary"},
    "created": "2009-07-06T18:53:11.0",
    "prefLabel": {"Label": {
        "literalForm": {
            "lang": "en",
            "content": "National Science Foundation"
        },
        "about": "http://data.crossref.org/fundingdata/vocabulary/Label-36515"
    }},
    "narrower": [
        {"resource": "http://dx.doi.org/10.13039/100000076"},
        {"resource": "http://dx.doi.org/10.13039/100000083"},
        {"resource": "http://dx.doi.org/10.13039/100000081"},
        {"resource": "http://dx.doi.org/10.13039/100000084"},
        {"resource": "http://dx.doi.org/10.13039/100000085"},
        {"resource": "http://dx.doi.org/10.13039/100000086"},
        {"resource": "http://dx.doi.org/10.13039/100000088"},
        {"resource": "http://dx.doi.org/10.13039/100005447"},
        {"resource": "http://dx.doi.org/10.13039/100005441"},
        {"resource": "http://dx.doi.org/10.13039/100000179"},
        {"resource": "http://dx.doi.org/10.13039/100005716"},
        {"resource": "http://dx.doi.org/10.13039/100010608"},
        {"resource": "http://dx.doi.org/10.13039/100014072"},
        {"resource": "http://dx.doi.org/10.13039/100014073"},
        {"resource": "http://dx.doi.org/10.13039/100014411"},
        {"resource": "http://dx.doi.org/10.13039/100014591"},
        {"resource": "http://dx.doi.org/10.13039/100015815"},
        {"resource": "http://dx.doi.org/10.13039/100020427"},
        {"resource": "http://dx.doi.org/10.13039/100020475"}
    ],
    "altLabel": [
        {"Label": {
            "literalForm": {
                "lang": "en",
                "content": "US National Science Foundation"
            },
            "about": "http://data.crossref.org/fundingdata/vocabulary/Label-63344"
        }},
        {"Label": {
            "literalForm": {
                "lang": "en",
                "content": "U.S. National Science Foundation"
            },
            "about": "http://data.crossref.org/fundingdata/vocabulary/Label-89014"
        }},
        {"Label": {
            "literalForm": {
                "lang": "en",
                "content": "NSF"
            },
            "usageFlag": {"resource": "http://data.crossref.org/fundingdata/vocabulary/acronym"},
            "about": "http://data.crossref.org/fundingdata/vocabulary/Label-24515131"
        }},
        {"Label": {
            "literalForm": {
                "lang": "en",
                "content": "US NSF"
            },
            "usageFlag": {"resource": "http://data.crossref.org/fundingdata/vocabulary/acronym"},
            "about": "http://data.crossref.org/fundingdata/vocabulary/Label-952985"
        }},
        {"Label": {
            "literalForm": {
                "lang": "en",
                "content": "USA NSF"
            },
            "usageFlag": {"resource": "http://data.crossref.org/fundingdata/vocabulary/acronym"},
            "about": "http://data.crossref.org/fundingdata/vocabulary/Label-20677577"
        }},
        {"Label": {
            "literalForm": {
                "lang": "en",
                "content": "USNSF"
            },
            "usageFlag": {"resource": "http://data.crossref.org/fundingdata/vocabulary/acronym"},
            "about": "http://data.crossref.org/fundingdata/vocabulary/Label-3698180"
        }}
    ],
    "fundingBodyType": "gov",
    "fundingBodySubType": "National government",
    "modified": "2022-11-02T19:22:47.0",
    "id": "https://doi.org/10.13039/100000001",
    "state": {"resource": "http://sws.geonames.org/6254928/"},
    "region": "Americas"
}

Given the existing fields...

Screen Shot 2023-01-18 at 3 37 29 PM

... I assume we'd fill them in like this:

Screen Shot 2023-01-18 at 3 39 35 PM

That is:

"name": "National Science Foundation",
"id": "100000001",

Or is the uri (from the first JSON, to avoid making the second call) better?

Screen Shot 2023-01-18 at 3 40 59 PM

That is:

"name": "National Science Foundation",
"uri": "http://dx.doi.org/10.13039/100000001",

I realize this is http rather than https, but that's ok. Like I said above, it redirects.

@qqmyers
Copy link
Member

qqmyers commented Jan 18, 2023

+1 for URI - 100000001 could be anyone's numbering system. The JavaScript in view mode can hide the URI form if desired - either just showing the number or even showing just one field with National Science Foundation - as a link - so what's best to display can be a separate issue from what to store.

(Sadly while all the variants of http(s)://(dx.)doi.org/ all redirect to the right place, they do give you four variants of the URI to use as the identifier. Using what they say (unless DataCite has a preference?) is probably best practice although https://doi.org/10.13039/100000001 is the most modern at this point.)

@pdurbin
Copy link
Member

pdurbin commented Jan 19, 2023

Wait, the old name for the field (from #4859) was "Grant Number")...

46164147-3615a200-c25b-11e8-8cb8-c4a288da51ae

... I think the popup text confused me. I need to dig more into these fields, obviously.

@jggautier
Copy link
Contributor

jggautier commented Jan 19, 2023

Great catch and sorry for the confusion!

The second child field, Identifier, should be the identifier of the funding (like the grant), not of the funder. The popup used to be:

"The grant or contract number of the project that sponsored the effort."

When we tried to improve the popups last year, we changed it to:

"The grant identifier or contract identifier of the agency that provided financial support for the Dataset"

So it sounds like the rewritten popup text and maybe the new field names make others think that they should enter the ID of the funder (instead of the expected identifier of the funding). Is that right?

I think a new issue should be opened to address this, involving seeing how most people have interpreted the new field name and popup text and how they could be improved.

@pdurbin
Copy link
Member

pdurbin commented Jan 19, 2023

@jggautier sure, a new issue sounds fine. Honestly, simply changing "of" to "from" would help me a lot. Like this:

-datasetfieldtype.grantNumberValue.description=The grant identifier or contract identifier of the agency that provided financial support for the Dataset
+datasetfieldtype.grantNumberValue.description=The grant identifier or contract identifier from the agency that provided financial support for the Dataset

@jggautier
Copy link
Contributor

jggautier commented Jan 19, 2023

Makes sense to me. I'll open an issue about it.

About this issue, from #7285 (comment), it sounds like I should wait to hear from @mreekie about finding a time to discuss, and maybe you could join us? I think the scope might change once we're all the same page about what we expect depositors to enter in the current fields.

In the meantime it might be helpful to know that when depositors have filled the second child field, Identifier, it looks like it's always a funding ID, never the ID of the funder. So there's no evidence yet of anyone else misinterpreting that field, although I've only looked at datasets from research funded by the NIH (see Google Sheet with funding metadata from NIH-funded data). Although I still agree that changing "of" to "from" seems clearer to me, too.

@jggautier
Copy link
Contributor

Just to keep things clear and updated, when someone chooses a funder from the Funding Information Agency dropdown list that this javascript produces from the CrossRef API, the funder's identifier should not go in the Funding Information Identifier field. Instead, the Funding Information Identifier field (the second child field) is for the award ID.

  • Where should the funder's identifier go?

    • We could create a new field and have that be populated with the funder's identifier. Seeing the funder ID of the funder name they chose might give depositors more verification that they've chosen the right funder from the suggestions. But if depositors don't choose from the suggested list, can they fill in their own funder name and funder identifier in the newly created funder identifier field?
  • Is it possible to also get suggested award IDs from Crossref's APIs to help people fill in the Funding Information Identifier? Would it be the same "javascript" providing suggestions for both fields? Or one "javascript" for each field? I read the white paper and what Mike wrote in an earlier comment about Jim's suggestions, but this isn't clear to me.

@mreekie
Copy link
Author

mreekie commented Jan 23, 2023

waiting right now on some input from the ROR work. #9151

@jggautier
Copy link
Contributor

jggautier commented Jan 23, 2023

I think it's important to publicize the use-cases that this work will support. This will help with evaluation/testing.

The NIH GREI group is interested in helping funders find the data produced by the research they've funded, specifically NIH funded research. This is one of the use-cases they want each repository of the GREI group to support. And this javascript supports that goal by helping the Harvard Dataverse more consistently record funding agency names and record persistent identifiers for those agencies.

Because a lot of the funding agency name metadata in the Harvard Dataverse looks like this today...

Screen Shot 2023-01-23 at 3 37 18 PM

... some funders need to do more complex searches. There are more than 57 datasets whose metadata says that the data is created from NIH-funded research, but only 57 have the text "NIH" in the Funding Information Agency field.

During the GREI training workshops on Jan 24 and 25, we'll have a chance to learn about funders' experiences with finding data from the research they fund. And when it's ready I plan to review this new javascript with users or potential users of the repository who would or often add funding metadata to their datasets.

@jggautier
Copy link
Contributor

jggautier commented Jan 23, 2023

We'll eventually want to send this metadata to DataCite, so I took a look at the funding metadata that's already included in the OpenAIRE export. The OpenAIRE schema is based on the DataCite schema so members of the Dataverse community have been looking at the design decisions there to inform how best to send metadata to DataCite. And I looked at examples that DataCite provides for sending funding metadata.

How funding metadata is included in Dataverse's OpenAIRE export:

<fundingReferences>
	<fundingReference>
		<funderName>Contributor: Funder</funderName>
	</fundingReference>
	<fundingReference>
		<funderName>Grant Agency1</funderName>
		<awardNumber>Grant Number1</awardNumber>
	</fundingReference>
</fundingReferences>

The first <fundingReference> property contains the name entered in the Contributor field when Contributor Type "Funder" is chosen, so there are two places where depositors can enter names of funders. There's more about this in the GitHub issue at #4859 and I'm hoping we can address this soon. But let's ignore this for now.

The second <fundingReference> property shows the funder name entered in the Funding Information Agency field and award number entered in the Funding Information Identifier field.

Sending funder identifier metadata to DataCite

To add the funder identifier, which the javascript would grab from the Crossref API when a depositor chooses a funder from the suggested list, we could use DataCite's <funderIdentifier> property from their example XML:

<funderIdentifier funderIdentifierType="Crossref Funder ID">https://doi.org/10.13039/5011</funderIdentifier>

So when depositors entered a funding name and award number, what's sent to DataCite (and included in the OpenAIRE export) would look like this:

<fundingReferences>
	<fundingReference>
		<funderName>Grant Agency1</funderName>
		<awardNumber>Grant Number1</awardNumber>
		<funderIdentifier funderIdentifierType="Crossref Funder ID">https://doi.org/10.13039/5011</funderIdentifier>
	</fundingReference>
</fundingReferences>

This suggests that DataCite prefers the DOI, such as http://dx.doi.org/10.13039/100000001 from the example in an earlier comment, instead of the ID number "100000001".

And when <funderIdentifier> is used, funderIdentifierType must be included with one of 5 values (see page 32 and 33 of the DataCite schema doc).

So Dataverse will need to know that if the depositor chooses a funder from the suggested list that the javascript shows, the funder identifier is a Crossref Funder ID, and that it can include that in the <funderIdentifierType> subproperty.

If we add a new funder ID field to the Citation metadatablock, if the depositor types in their own funder name, instead of choosing from the suggested list from the Crossref API, and if she enters a funder ID, Dataverse could use the "Other" value. Or could we add a Funder Identifier Type field so the depositor could choose from the list?

This all depends on what's possible with the javascript and design decisions we make when using it for the funding metadata. And of course considering how to send this metadata to DataCite could be postponed and a new GitHub issue could be added to tackle it later. I just wanted to mention that sending this metadata to DataCite is a goal of the NIH GREI group in case there's something we can do to prepare for it earlier. When folks in the NIH GREI meetings ask about being able to use one interface to search for datasets from many repositories, DataCite Commons is mentioned as a way to do this, and that works only when DataCite is sent metadata from the different repositories.

@mreekie
Copy link
Author

mreekie commented Jan 25, 2023

resizing for next sprint:

  • Have discovered there is a lot of unknowns and work associated with doing this ofr the firs time.
  • This will be for test. It won't be for produciton.
  • This is really answering the ui/ux that we want, although its' in test.
  • Likely to be done in the same way jim described storing the orcid ID.
  • Sizing it at: 80

@mreekie mreekie added Size: 80 A percentage of a sprint. 56 hours. and removed Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) labels Jan 25, 2023
@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2023

I've been bad about leaving updates here. It's early days for me getting this working. I've switch trying to understand the ROR code to trying to understand the creator and skosmos code. I'm too embarrassed to push a branch of my hacking to the main external vocab repo so I just forked it to "pdurbin" and put it here: pdurbin/dataverse-external-vocab-support@c8fb9c2 (That said, it's mostly the same code just with comments and such, some config, some experiements.)

I said this a couple times but I'm ok with another developer taking on this issue if they want. It's potentially fun but there's a lot to do. I mostly picked it up because we want to make progress on the NIH stuff and there wasn't much left toward the end of the last sprint.

@pdurbin
Copy link
Member

pdurbin commented Jan 30, 2023

Decent progress today. I pushed another scratch commit: pdurbin/dataverse-external-vocab-support@35d4eac

The code is now somewhat demo-able and I put it on https://dev1.dataverse.org in preparation for tomorrow's meeting.

In case the demo gods are not with me, below are some screenshots as well.

Screen Shot 2023-01-30 at 4 23 32 PM
Screen Shot 2023-01-30 at 4 23 41 PM
Screen Shot 2023-01-30 at 4 23 54 PM
Screen Shot 2023-01-30 at 4 24 00 PM
Screen Shot 2023-01-30 at 4 24 07 PM
Screen Shot 2023-01-30 at 4 25 06 PM
Screen Shot 2023-01-30 at 4 25 37 PM
Screen Shot 2023-01-30 at 4 26 08 PM
Screen Shot 2023-01-30 at 4 26 48 PM
Screen Shot 2023-01-30 at 4 27 11 PM
Screen Shot 2023-01-31 at 9 26 48 AM
Screen Shot 2023-01-31 at 9 27 21 AM

Screen Shot 2023-01-31 at 9 28 59 AM
Screen Shot 2023-01-31 at 9 29 57 AM

Screen Shot 2023-01-24 at 2 29 20 PM
Screen Shot 2023-01-24 at 2 33 57 PM

Related issue:

@pdurbin
Copy link
Member

pdurbin commented Jan 31, 2023

After standup we talked through the screenshots above and I did a demo. We decided to store the funder PID URL (e.g. http://dx.doi.org/10.13039/100000002) where we've been putting the string for the funder's name (e.g. NIH), when we can. This is highlighted in red below. From https://design.penpot.app/#/view/2ca7d284-ea89-8144-8001-fd5447dbc79e?page-id=2ca7d284-ea89-8144-8001-fd5447dbc79f&section=interactions&index=0&share-id=e666624f-1b00-80b5-8001-fd5638e43b73

Screen Shot 2023-01-31 at 5 43 32 PM

@mreekie
Copy link
Author

mreekie commented Feb 7, 2023

Review for monthly

Bump: @pdurbin @jggautier

  • Thank you 1000X for the discussion and comments captured to this issue!
  • I thought someone mentioned that the Fundref name is now changed? Can you remind me?
  • I'm working on the monthly update and want to be sure I understand what will come out of this issue.

Here's my shot:

We are exploring the use of a similar methodology to a script authored by the GDCC that retrieves ROR/Author Affiliation to retrieve the Agency and Agency identifier from the fundref registry. At the end we will have a demoable UI.

Coming out of our meeting on the topic of PIDs we decided that our initial level of support for Fundref will include:

  • The UI displaying information that is easily interpreted by the user
    • Code, javascript and a test metadatablock in the other repo
  • The end result is data into the database
  • The data to be stored to the database will include the text and a key.
    • There was some discussion around what the key field should be.
    • For this issue it was agreed to use the funder PID URL
  • Once it's working we'll get it out on demo.

@mreekie
Copy link
Author

mreekie commented Feb 7, 2023

Bump: @pdurbin @jggautier

  • I went through the meeting notes and ended up with a bit more on the update

@mreekie mreekie added the D: 5 Core PIDs Deliverable Increment defining how we support the 5 core PIDs label Feb 7, 2023
@mreekie
Copy link
Author

mreekie commented Feb 8, 2023

Sprint resizing

There are 2 streams of work:

  • JSON export
  • API related issues
  • There continues to be a lot of uncertainty around the details, so discovery still happening.
  • Probably some more follow-up with Julian to nail down the scope/definition of done.
  • no change in sizing for new sprint

@jggautier
Copy link
Contributor

jggautier commented Feb 9, 2023

Thought I'd try summarizing some the uncertainty as I understand it:

  • Results from the API can be pretty poor (like @pdurbin's said)
    This is just based on searches of well known funding agencies and this can make the suggestions less useful. When you type in National Science Foundation or NIH, what we expect isn't at the top of the results. The order of the results is exactly the order in the JSON document that's returned. I'm not sure what that order is, but it's not using guidelines for prioritizing more relevant results (such as prioritizing whole-word matches and matches in certain fields like name instead of alternative name).

    I don't know if it's possible to make the API return results that are sorted by relevance. I tried some of the parameters of the API (documented at https://github.com/CrossRef/rest-api-doc as @pdurbin pointed out) and there's a sort by relevance/score parameter, but when I try it on https://api.crossref.org/funders, e.g. https://api.crossref.org/funders?query=NIH&sort=relevance, I get errors that I think says that sorting by relevance isn't supported when searching for funders. The parameter does work when searching for "works", e.g. https://api.crossref.org/works?query=R34MH115769&sort=relevance.

    @pdurbin have you played around with this? If it's possible to have the API sort funder results by relevance, is this useful for this javascript? Or if it's not possible, should we reach out to Crossref to ask if it's currently possible to get the API to sort funders in the results by relevance?

  • Funder name can't be searched on
    When I looked at the functionality on https://dev1.dataverse.org today, it seemed like the PID (DOI) of the funder is indexed and so it can be searched on, but the funder name isn't indexed. There's a dataset on that test server with National Institutes of Health in the Funding Information Agency field, but when I search for National Institutes of Health, that dataset isn't returned. And when I search for it with quotes, "National Institutes of Health", nothing is returned.

    This will affect how findable those datasets are when people search by funder name, which I suspect they'll do more often than they search by PID.

  • Display issue after choosing funder from suggestions
    When I choose from the list of suggested funders, there's a UI display issue that @pdurbin's already mentioned (and might have already fixed?)

@mreekie you wrote in the first comment that the definition of done includes "Code, javascript and a test metadatablock in the other repo". That's so that we can evaluate it right? I've been writing that I plan to test it with users (particularly users who often add funding metadata to their datasets). But I think right now we're trying to resolve what we know would be issues if this was tested with users right now.

@qqmyers
Copy link
Member

qqmyers commented Feb 9, 2023

FWIW: Names should appear if there is a facet and in advanced search. These, and I think basic search as well, depend on the expandedValue that @pdurbin hasn't yet completed. Once that's done, I think these will be findable even by any i18n names they have.

@jggautier
Copy link
Contributor

jggautier commented Feb 10, 2023

Thanks @qqmyers. Then I'll hold off on reviewing and asking more questions.

@pdurbin pdurbin moved this from IQSS Team - In Progress 💻 to This Sprint 🏃‍♀️ 🏃 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Feb 15, 2023
@pdurbin pdurbin removed their assignment Feb 15, 2023
@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2023

I unassigned myself and haven't been actively working on this. Anyone else is welcome to pick it up. I'm happy to give a brain dump.

Some scratch work on the Dataverse side: pdurbin@ba4575a

Some scratch work on the external vocab side: gdcc/dataverse-external-vocab-support@main...pdurbin:dataverse-external-vocab-support:scratch3

@qqmyers qqmyers self-assigned this Feb 21, 2023
@scolapasta scolapasta moved this from This Sprint 🏃‍♀️ 🏃 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Mar 1, 2023
@qqmyers
Copy link
Member

qqmyers commented Mar 13, 2023

With #9402 and gdcc/dataverse-external-vocab-support#14 , there's basic support for using the CrossRef funder registry with the grantNumberAgency field and the Research Organization Registry with the authorAffiliation field. As is, they work ~ like the original ORCID script allowing selection from a list of choices and displaying that choice correctly in the various parts of the display (see#9150), and capture/provide additional info in the Json and OAI-ORE exports. Both sort results to prioritize entries where the entry is 'active' (ROR only), where typed text matches the acronym or relevant tokens (fundreg only), and, lastly if the entry has been used by that user before. This means that NIH appears near the top in both fields to start and, once the user has selected the real NIH entry once, it is always at the top. (Same if the once select a bad one, until they delete the browser's localStore cache.) Values are truncated to fit the narrow inputs allowed for child fields. The fundreg script sends a mailto: address so that CrossRef will the requests on their priority queue. Both use cached values rather than pinging the services every time a term is displayed.

That said, there are a variety of issues that could be addressed now or later:

  • There is no internationalization - something similar to what was done for the previewers could be added (possibly to the cvocutils.js script)
  • The ROR search for NIH can still be confusing since there are several orgs with that acronym. The real NIH could be prioritized in a couple ways:
    • pre-cache the NIH and any other important entries so that sort will prioritize them
    • inspect the response from ROR and see if other fields would allow a better sort (possibly location based sort?) - this is constrained by what ROR provides.
  • The current widget is not responsive and doesn't shrink/grow well with page size changes. This might be improvable via css although I've seen some limitations to the select2 widget itself in terms of it hardcoding width as it is launched (based on querying it's current parent). An alternative would perhaps be to switch to the pop-up concept used in the original (non-working) ror script. The main display then only includes a button to launch the widget and the widget can then be sized independent of the size of the underlying input space on the Dataverse page.
  • The current scripts do not make the entries links out to external web information. For fundreg, the main issue is that there is no info about a web page for the org, just links to the raw json information in the registry (which is ugly/confusing). For ROR, there are optional fields that could be used - "links" seems to usually have one entry to the org homepage. There is also an optional wikipedia link that could be used. If someone can propose a concrete algorithm (i.e. use the first "links" entry if it exists, use the wikipedia link if that exists and links is empty, and if there is no entry in either, don't make a link) then it should be relatively easy to implement (though the example here means that some ROR entries won't be linked and others will which could be confusing).
  • I have not checked a11y - there isn't much styling that isn't inherited from Dataverse, so I expect these scripts are probably OK.
  • Fundreg can be slow and did give a 504/gateway timeout during my testing at one point. This hopefully won't occur often, but would result in the raw id number showing if the user had never cached the entry before. As noted, I did do what Fundreg requests to get improved performance. Their next suggestion is basically to run your own copy of the api instead which could be done if this becomes a real issue. (By contrast, ROR seems to be very quick.)

@qqmyers
Copy link
Member

qqmyers commented Mar 13, 2023

Also FWIW - I think this means that the current ORCID script, possibly with some of the cleanup/improvements applied to the fundreg/ror scripts) could be applied to the author child field now. (It was designed to work with a primitive field and/or to fill in all children of a parent field and may or may not have worked to just fill in the author name child field in the citation block.)

@mreekie
Copy link
Author

mreekie commented Mar 14, 2023

Daily

  • Sounds like the plan is to get through what Jim has done for a PR.
  • Do a demo for Sonia for appropriateness. If what we have is enough then close this.
  • Then revisit what's left as separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D: 5 Core PIDs Deliverable Increment defining how we support the 5 core PIDs NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc Size: 80 A percentage of a sprint. 56 hours.
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants