Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API only returns Deaccessioned Dataset Version if user has edit privileges on the version #10164

Closed
ekraffmiller opened this issue Dec 4, 2023 · 5 comments · Fixed by #10191
Labels
pm.GREI-d-2.7.1 NIH, yr2, aim7, task1: R&D UI modules for creating datasets and supporting publishing workflows pm.GREI-d-2.7.2 NIH, yr2, aim7, task2: Implement UI modules for creating datasets and publishing workflows Size: 10 A percentage of a sprint. 7 hours. Type: Bug a defect
Milestone

Comments

@ekraffmiller
Copy link
Contributor

What steps does it take to reproduce the issue?

  1. Create a deaccessioned version
  2. from the API, try to get the version using another account that doesn't have edit privileges on that version:

http://localhost:8000/api/v1/datasets/:persistentId/versions/1.0?persistentId=doi:10.5072/FK2/VV0A0P&includeDeaccessioned=true&includeFiles=false

  1. The API returns an error message instead of the dataset version
    {"status":"ERROR","message":"Dataset version 1.0 of dataset 2 not found"}
  • When does this issue occur?
    Using API from SPA to view Deaccessioned version

  • Which page(s) does it occurs on?
    SPA Dataset View Page

  • What happens?
    It shows the latest published page, or page not found

  • To whom does it occur (all users, curators, superusers)?
    all

  • What did you expect to happen?
    Return the Dataset Version

Which version of Dataverse are you using?
The latest unstable version from Docker Hub:
https://hub.docker.com/layers/gdcc/dataverse/unstable/images/sha256-d552309321453989fc55ddabe12c8ff996e039db77738353fe15246dc85c6aca?context=explore

Any related open or closed issues to this bug report?
no

Screenshots:

Screenshot 2023-12-02 at 4 57 01 PM Screenshot 2023-12-02 at 4 55 32 PM
@ekraffmiller ekraffmiller added the Type: Bug a defect label Dec 4, 2023
@GPortas GPortas added Size: 10 A percentage of a sprint. 7 hours. pm.GREI-d-2.7.1 NIH, yr2, aim7, task1: R&D UI modules for creating datasets and supporting publishing workflows pm.GREI-d-2.7.2 NIH, yr2, aim7, task2: Implement UI modules for creating datasets and publishing workflows labels Dec 6, 2023
@GPortas GPortas moved this from ▶ SPRINT READY to This Sprint 🏃‍♀️ 🏃 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Dec 6, 2023
@jp-tosca jp-tosca self-assigned this Dec 8, 2023
@jp-tosca jp-tosca moved this from This Sprint 🏃‍♀️ 🏃 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Dec 8, 2023
@jp-tosca
Copy link
Contributor

Hi @ekraffmiller 👋🏼

I have a possible solution but I wanted to check that "this is a bug not a feature" 😆.

After having a fix on my environment I ran the DatasetsIT to verify that this change is not breaking any tests but I found that the following tests will show an error:

  • getDownloadSize
  • getVersionFileCounts
  • getVersionFile

by the comments on the code:

// Test that the dataset files for a deaccessioned dataset cannot be accessed by a guest

@qqmyers
Copy link
Member

qqmyers commented Dec 12, 2023

@ekraffmiller - trying clarify what is needed: For an unauthorized user, you need the API call with &includeDeaccessioned=true&includeFiles=false to return the datasetversion info, but we still don't want &includeDeaccessioned=true&includeFiles=true to succeed, or for any other API that uses the includeDeaccessioned flag to succeed (like /api/datasets/id/versions/id/files)? Do you also need the {id}/versions/{versionId}/citation api to work? The reason for the question is that all of these calls use the getDatasetVersionOrDie method right now and if we need some but not all of the api calls to change behavior, we can't just let getDatasetVersionOrDie always return the deaccessioned version when asked regardless of permission. (I think trying that kind of fix is what causes the tests to fail for @jp-tosca.) If only some of the API calls change, we probably need something like a boolean flag for getDatasetVersionOrDie to ignore permissions that only works for those calls.

@pdurbin
Copy link
Member

pdurbin commented Dec 13, 2023

This new email from Dario seems related:

"Subject: Deaccessioning a dataset

We still have a problem with a deaccessioned dataset.
We already had a similar problem, as described here: (https://groups.google.com/g/dataverse-community/c/rTMkdCfYzg8/m/1NzV40DQAwAJ)
We solved that problem upgrading the version.

But now, the problem is still there, we cannot list the dataset in the API, because the dataverse platform won't return any answer when we issue API of the following form:
https://dataverse.unimi.it/api/search?q=*&type=dataset&key=
while the dataset is here: (https://dataverse.unimi.it/dataset.xhtml?persistentId=doi:10.13130/RD_UNIMI/KI5KBY). The dataverse platform loops without answering.

At the end, can someone clear to us the following questions:

  • In general, what is the use of deaccessioning a dataset and how long maximum should we keep deaccessioned?
  • Our deaccessioned dataset causes that we cannot list all the datasets anymore. Is this a known behaviour or a bug?"

From https://groups.google.com/g/dataverse-community/c/MgkprLAlkfg/m/1Chom4dzAAAJ

I'm not sure if the Search API is in scope for this issue or not. @jp-tosca specifically, perhaps as you're writing tests you could make assertions about search behavior and visibility while you're in there.

Here are the rules according to the Dataverse 4.0 design doc on Deaccessioning:

Screenshot 2023-12-13 at 10 59 12 AM

@ekraffmiller
Copy link
Contributor Author

ekraffmiller commented Dec 13, 2023

@ekraffmiller - trying clarify what is needed: For an unauthorized user, you need the API call with &includeDeaccessioned=true&includeFiles=false to return the datasetversion info, but we still don't want &includeDeaccessioned=true&includeFiles=true to succeed, or for any other API that uses the includeDeaccessioned flag to succeed (like /api/datasets/id/versions/id/files)? Do you also need the {id}/versions/{versionId}/citation api to work? The reason for the question is that all of these calls use the getDatasetVersionOrDie method right now and if we need some but not all of the api calls to change behavior, we can't just let getDatasetVersionOrDie always return the deaccessioned version when asked regardless of permission. (I think trying that kind of fix is what causes the tests to fail for @jp-tosca.) If only some of the API calls change, we probably need something like a boolean flag for getDatasetVersionOrDie to ignore permissions that only works for those calls.

@qqmyers yes, we want &includeDeaccessioned=true&includeFiles=false to succeed. Also we need the citation ({id}/versions/{versionId}/citation api). For the View Dataset Page, we need to show the dataset version information and the citation, but not the files.

@ekraffmiller
Copy link
Contributor Author

Hi @ekraffmiller 👋🏼

I have a possible solution but I wanted to check that "this is a bug not a feature" 😆.

After having a fix on my environment I ran the DatasetsIT to verify that this change is not breaking any tests but I found that the following tests will show an error:

  • getDownloadSize
  • getVersionFileCounts
  • getVersionFile

by the comments on the code:

// Test that the dataset files for a deaccessioned dataset cannot be accessed by a guest

@jp-tosca I think those calls you mention are all related to files, and since an unauthorized user doesn't have access to the deaccessioned files, then it is ok for these calls to fail, but we should check that with the rest of the team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pm.GREI-d-2.7.1 NIH, yr2, aim7, task1: R&D UI modules for creating datasets and supporting publishing workflows pm.GREI-d-2.7.2 NIH, yr2, aim7, task2: Implement UI modules for creating datasets and publishing workflows Size: 10 A percentage of a sprint. 7 hours. Type: Bug a defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants