Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading multiple files #46

Closed
1 task done
adam3smith opened this issue Jan 10, 2020 · 6 comments · Fixed by #47
Closed
1 task done

Downloading multiple files #46

adam3smith opened this issue Jan 10, 2020 · 6 comments · Fixed by #47
Labels
data-download Functions that are about downloading, not uploading, data

Comments

@adam3smith
Copy link
Contributor

adam3smith commented Jan 10, 2020

Please specify whether your issue is about:

  • a question about package functionality

I think this is just a question, but might also be enhancement/bug report.: The dataverse API allows downloading multipel files as .zip. This is particularly relevant now as it preserves the folder structure where available.
There is code in the get_file() function that accesses this functionality, but I don't actually think it's ever possible to get there: I find no way of specifying multiple fileids

So first question:

  1. Am I right about this? Or could someone give me syntax to do this in get_file()?
  2. If I'm right that this isn't possible, what would be a good way to do this? Allow a vector of ids as input for the file parameter?
@pdurbin
Copy link
Member

pdurbin commented Jan 10, 2020

I don't mean to muddy the waters but there is a conversation going on about the :ZipDownloadLimit that comes into play here: https://groups.google.com/d/msg/dataverse-community/V1gExuDnm0A/nR4FIU1QBgAJ .Just something to be conscious of.

The Dataverse API absolutely does allow you to ask the Dataverse server to zip up a bunch of files by passing a comma-separated list of database IDs for files: http://guides.dataverse.org/en/4.18.1/api/dataaccess.html#multiple-file-bundle-download

You could also create the zip file client side, but this is more work (though easier on the server). You'd need to get the file hierarchy from the directoryLabel field in the metadata: https://dev2.dataverse.org/api/datasets/export?exporter=dataverse_json&persistentId=doi%3A10.5072/FK2/V8C0XO

@adam3smith
Copy link
Contributor Author

Thanks @pdurbin -- yes, aware of the file zip limit discussion, but at least I'm using this with QDR where we have a more generous limit.

The Dataverse API absolutely does allow you to ask the Dataverse server to zip up a bunch of files by passing a comma-separated list of database IDs for files: http://guides.dataverse.org/en/4.18.1/api/dataaccess.html#multiple-file-bundle-download

Yes, that's what I was referring to and the linked code in get_file() actually implements that, it just never gets called (I think)

@pdurbin
Copy link
Member

pdurbin commented Jan 10, 2020

@adam3smith ah, I just clicked and I see what you mean:

    fileid <- paste0(fileid, collapse = ",")
    u <- paste0(api_url(server), "access/datafiles/", file)

Yes, that should do the trick, if it gets called. 😄

@adam3smith
Copy link
Contributor Author

Ah got it -- this is possible in principle using a numeric vector (as one would expect), but there's a regression from 5ec375b that missed one of the file --> fileid

I'll submit a PR with added documentation, test, and fix

@kuriwaki kuriwaki added the data-download Functions that are about downloading, not uploading, data label Dec 3, 2020
@kuriwaki kuriwaki linked a pull request Dec 27, 2020 that will close this issue
3 tasks
@kuriwaki
Copy link
Member

@adam3smith

Given that #47 "[does not] use the zip functionality of the API at all" and instead stores the each file content in a R list, does this mean we cannot implement a get_zip_* function that returns a zipped file (preferably one that keeps the nested directory structure)?

@kuriwaki
Copy link
Member

The current functionality and examples (e.g. here in doc) should be enough for the immediate task for this issue.

Further considerations are to write a test for multi-file structures, and considering aget_zip_* function that wil return a zipfile (per #46 (comment)). If that is a useful feature, please make a new Issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-download Functions that are about downloading, not uploading, data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants