Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk edit file tags and description via update_file_metadata() #49

Open
adam3smith opened this issue Jan 28, 2020 · 4 comments
Open

Bulk edit file tags and description via update_file_metadata() #49

adam3smith opened this issue Jan 28, 2020 · 4 comments

Comments

@adam3smith
Copy link
Contributor

adam3smith commented Jan 28, 2020

a suggested code or documentation change, improvement to the code, or feature request:

User story: As a user or curator, I have a list of files on dataverse and want to add tags and descriptions from a spreadsheet to them.

While it may be(?) possible to update file metadata as part of update_dataset, that's quite messy (and I'm not 100% sure it's even possible -- couldn't get it to work). The native API offers a nice set of functions for this, that I think we should implement using the above-noted functions:

http://guides.dataverse.org/en/latest/api/native-api.html#updating-file-metadata

Thoughts?

@wibeasley
Copy link
Contributor

I've never encountered this use case, but it sounds helpful for your scenario.

  1. Do you think the R package should have any additional validity or safety checks, beyond what the API would perform? (Such as making sure the files exist, or tags aren't overwritten, or something like that.) Or is that overkill and one more thing to unnecessarily maintain?

  2. This seems like a function that not many people would think to use. I'd hate for you to spend time on it and it not e adopted like it should. Is there some way you can advertise it so more people see they could benefit from it? Maybe a section in a vignette?

@adam3smith
Copy link
Contributor Author

  1. I think if the file doesn't exists or the user in question doesn't have edit permission, the API will just throw an error (as it should), so less worried about that -- overwriting existing metadata is a real concern, though. One option would be to have a parameter overwrite = FALSE (or so) in the update function. Since the API call overwrites, that would mean first making a get metadata call and then only update fields with explicit parameters in the call. That would work and would actually be quite useful I think -- is that inelegant, though?

  2. I honestly don't know how people use this package. Issues I am raising are all motivated by my/our own use cases. My guess would be that this is most attractive to power-users and curators, though I do think that the ability to programmatically add folder names post-hoc is actually quite attractive. I don't think it'd be used less than other functions -- e.g. I'd be surprised if update_dataset() sees much use.

I'm personally fine to have functionality that's going to be used by a small number of people, and my approach would be to aim to have the client mirror the the native API in its abilities pretty closely,
but I'm perfectly fine to be convinced otherwise.

A different approach would be to make it easier to construct native API queries from scratch using the client so that we don't have to code everything but it's more readily accessible. I don't know how hard that would be, but when I wrote API functionality (for downloading .zip files) locally it required quite a bit of code duplication from dataverse library.

@pdurbin
Copy link
Member

pdurbin commented Jan 29, 2020

In open source we scratch our own itches and it sounds to me like @adam3smith has one. 😄

Also, this is a real use case I've talked to @sbarbosadataverse about, perhaps when I gave an API talk a few months back: "As a user or curator, I have a list of files on dataverse and want to add tags and descriptions from a spreadsheet to them."

@adam3smith
Copy link
Contributor Author

FWIW, for now I have implemented this in our local curation package -- it works pretty great (there's a bulk function further down).
If you look at that function, I think lines 76-86 could be a one-liner with #51 implemented, so that'd be great.

The other thing I noticed while writing this is that existing file-handling functions like add_dataset_file() and update_dataset_file() are, at least for my purposes, effectively useless because they don't expose all file metadata fields -- since the API overwrites unspecified file metadata, they remove folders and tags from files...

@kuriwaki kuriwaki changed the title Add update_file_metadata() a and get_file_metadata() functions Bulk edit file tags and description via update_file_metadata() Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants