Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that datasets published using the Dataverse APIs include a license or have Terms of Use #9148

Closed
jggautier opened this issue Nov 8, 2022 · 18 comments
Assignees
Labels
Feature: API Feature: Terms & Licensing Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Milestone

Comments

@jggautier
Copy link
Contributor

jggautier commented Nov 8, 2022

Overview of the Feature Request

What kind of user is the feature intended for?
API User, Curator, Superuser, Sysadmin

What inspired the request?
Noticing that after the multiple license update (v.5.10) was applied to the Harvard Dataverse Repository, some datasets have been published using the Dataverse APIs without a license (CC0 is the only one available in the dropdown for now) and with nothing entered in the Terms of Use field.

The GUI enforces the requirement to choose a license from the License/Data Use Agreement dropdown or enter something in the Terms of Use field, in order to ensure that the metadata includes some information that makes explicit how the data can be used, but the API doesn't enforce this requirement.

What existing behavior do you want changed?
Depositors publishing datasets using the APIs must include a license or add something in the Terms of Use field, to match the rules enforced by the GUI.

Any related open or closed issues to this feature request?
#7440

@jggautier
Copy link
Contributor Author

jggautier commented Nov 8, 2022

Since the multiple license update, many of the datasets published in the Harvard Dataverse Repository with no license and no terms of use are from one collection. I'm speaking with the admin of that collection (see email thread at https://help.hmdc.harvard.edu/Ticket/Display.html?id=329217) to learn why the datasets continue to be published without a license or terms of use and what they will do if/when it's no longer possible to do this (when the API enforces the same requirement as the GUI).

@mreekie mreekie added the pm.netcdf-hdf5.d All 3 aims are currently under this deliverable label Nov 8, 2022
@pdurbin
Copy link
Member

pdurbin commented Nov 18, 2022

Over in this issue @sergejzr seems to be having trouble adding a license via API:

@mreekie mreekie removed the pm.netcdf-hdf5.d All 3 aims are currently under this deliverable label Nov 28, 2022
@mreekie
Copy link

mreekie commented Nov 28, 2022

@pdurbin pinged me on this. I don't recall this issue and I didn't leave breadcrumbs 😞 So I removed the NIH: NetCDF label.

@mreekie
Copy link

mreekie commented Jan 18, 2023

prio meeting:

  • Depositors publishing datasets using the APIs must include a license or add something in the Terms of Use field, to match the rules enforced by the GUI.
  • Sized at a 10.

@mreekie mreekie added the Size: 10 A percentage of a sprint. 7 hours. label Jan 18, 2023
@mreekie mreekie added this to New in deleteMeAfterTesting via automation Feb 22, 2023
@mreekie mreekie removed this from New in deleteMeAfterTesting Feb 23, 2023
@mreekie mreekie added this to New in deleteMeAfterTesting via automation Feb 23, 2023
@mreekie mreekie removed this from New in deleteMeAfterTesting Feb 23, 2023
@mreekie mreekie added this to New in deleteMeAfterTesting via automation Feb 23, 2023
@mreekie mreekie removed this from New in deleteMeAfterTesting Feb 23, 2023
@mreekie mreekie added this to New in deleteMeAfterTesting via automation Feb 23, 2023
@mreekie mreekie removed this from New in deleteMeAfterTesting Feb 23, 2023
@pdurbin pdurbin moved this from ▶ SPRINT READY to This Sprint 🏃‍♀️ 🏃 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Mar 29, 2023
@landreev landreev moved this from This Sprint 🏃‍♀️ 🏃 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Apr 26, 2023
@landreev landreev self-assigned this Apr 26, 2023
@landreev
Copy link
Contributor

Happy to work on this, since I just spent some time looking into terms of use bugs for a user;
not sure if it's only a 10 though, on our sausage scale. As it does require fixing a couple of known bugs (will link).

@landreev
Copy link
Contributor

landreev commented Apr 26, 2023

There is the #9155 (already linked, above): "round trip" for license is broken in the api; i.e., the format in which we export the license is incompatible with what the update api accepts. Needs to be fixed for sure as part of this.
But there is also #9364 - this is is pretty much a duplicate of this issue, if I'm reading it correctly [Edit: not exactly a duplicate, but related].
There is also yet another bug, that may not be explicitly mentioned in the ones above, where a dataset with defined terms of use loses them if updated via API, without the license being explicitly defined in the supplied json. (This is not how the API is supposed to work - it should not erase or modify any fields not specifically mentioned [EDIT: not true! that's exactly how the API works. So this part may be addressed simply by fixing the information in the Guide] ).

And yet another, small thing is that for the CC BY 4.0 license its name is spelled differently in the json that we distribute (and, therefore, in the database definition) and in the License.properties file. But this should be an easy fix.

All in all, not sure if all of the above is a "10", but I'll do my best to fix it in the 2 days remaining this week.

@landreev
Copy link
Contributor

(So I will just add the extra "closes" tags for the related issues above to the final pr)

@landreev landreev changed the title Feature Request/Idea: As a curator or collection manager, I'd like to ensure that datasets published using the Dataverse APIs include a license or have Terms of Use Ensure that datasets published using the Dataverse APIs include a license or have Terms of Use Apr 27, 2023
@jggautier
Copy link
Contributor Author

Just an update about the bit of discovery I wrote about in November: Last month the collection admins let us know that the datasets that were missing licenses and terms metadata were always meant to be released in the public domain, and not indicating this in the metadata was "an oversight." They didn't realize that the datasets were missing this metadata.

I helped them adjust their code so that they can update their datasets with CC0 waivers and continue adding datasets with CC0 waivers.

@landreev
Copy link
Contributor

Cool, thanks.
So, you don't want to default to CC0 when the dataset is created, unless something else is specified - ?

@landreev
Copy link
Contributor

The way the publish GUI enforces this requirement, that the user chooses a license or enters some terms of use, is it builtin? Or do we allow other instances to disable the requirement somehow? (I took a quick look, but couldn't immediately find the answer)

@jggautier
Copy link
Contributor Author

jggautier commented Apr 27, 2023

I would say that if we decide for CC0 to be the default when API users pass metadata that doesn't include a license or custom terms of use, then they should be made aware of that default before they can publish the dataset. The one user I spoke with didn't know that there was no waiver/license/terms of use being applied to the datasets they publish.

Sherry suggested adding text in the API guides about how a license is required and how a default will be applied if none is included in the JSON file, which I agree should be done. But there's at least one user, the one I chatted with in November, and maybe there are more, who are already using the API and wouldn't think to visit the API guide to learn about the new behavior and requirements.

Considering how installations can configure what options their depositors have for waivers, licenses, and custom terms is what I think Jim was writing about in the other issue, too. Installations can require that datasets published through the GUI need to include a license from the list and can prevent them from entering custom terms in the Terms of Use fields.

What if, when someone tries to publish a dataset using an API but doesn't provide a "valid" waiver/license or custom terms, they get an error message saying what's required? Or a warning message saying that the installation's default will be applied if they proceed to publish the dataset? I think this would more closely mirror the GUI's depositor workflow, where depositors are told before they publish what license or custom terms will be applied to their dataset.

@landreev
Copy link
Contributor

OK, so it sounds like defaulting to CC0 (or defaulting to anything) quietly would not be a good idea, so I'm not going to consider it going forward.

@landreev
Copy link
Contributor

So this was my idea of how to resolve this issue:

  • The publish API would require a valid license or terms defined. If not defined, it would return an error message.
  • Fix a couple of bugs in the create/modify dataset api that are currently making it difficult or impossible to set or change the license via the api. I.e., make it possible/easy for a user to add the license before publishing.

I wasn't going to add a way to add a license during publishing, by the same /publish api call. I really hope that's not necessary.

Does this make sense?

@jggautier
Copy link
Contributor Author

Yeah I'm not a fan of being quiet here (although I'm quiet in most places).

Your idea makes sense to me.

@landreev
Copy link
Contributor

One thing I should've added, to the list of things/bugs that need to be fixed to resolve this, the guide needs to be fixed where it currently has misleading information.

@landreev
Copy link
Contributor

I agree with Sherry, that we should add the default CC0 to the example json we provide in the guide and suggest to use as a model.

@landreev landreev added Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) and removed Size: 10 A percentage of a sprint. 7 hours. labels Apr 27, 2023
landreev added a commit that referenced this issue Apr 27, 2023
landreev added a commit that referenced this issue Apr 28, 2023
landreev added a commit that referenced this issue Apr 28, 2023
…n files... this

of course may break some other tests. one way to find out. #9148
@landreev
Copy link
Contributor

As for "defaulting quietly [to CC0]", to be clear, I'm leaving the behavior intact, where it does in fact default to the default license, but ONLY if the instance does NOT allow custom terms. So, this does not affect our prod. instance.

@pdurbin
Copy link
Member

pdurbin commented May 9, 2023

Closing manually because this PR was merged even though it says closed:

@pdurbin pdurbin closed this as completed May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: API Feature: Terms & Licensing Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants