Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Access API file download problem #4373

Closed
conjugateprior opened this issue Dec 10, 2017 · 4 comments
Closed

Data Access API file download problem #4373

conjugateprior opened this issue Dec 10, 2017 · 4 comments

Comments

@conjugateprior
Copy link

I've been trying to figure out how to download one of my own data files (say id 109356) from http://hdl.handle.net/1902.1/FYXLAWZRIA.

The dataverse R package fails to let me do this. Bug report here. (tl;dr the metadata downloads but the requesting the file itself gives a 503 error.)

So I turned to the docs to see if I could get a working it working with curl. The introduction suggests a line of the form

curl -H "X-Dataverse-key: MYKEY" https://demo.dataverse.org/api/datasets/:persistentId?persistentId=hdl:1902.1/FYXLAWZRIA

using the key MYKEY I generated. This failed with message

{"status":"ERROR","message":"Bad api key 'MYKEY'"}

which seemed odd since I just generated that key and pasted it out of my own user account. So I generated a fresh one and got the same error.

curl "https://demo.dataverse.org/api/datasets/:persistentId?persistentId=hdl:1902.1/FYXLAWZRIA&key=MYKEY"

also fails.

@pdurbin
Copy link
Member

pdurbin commented Dec 10, 2017

@conjugateprior thanks for opening this issue as well as the one about the documentation being confusing: #4374

how to download one of my own data files (say id 109356)

This question has come up so often that for #3584 we started showing "Download URL" on file landing pages. It looks like this for the file id you mentioned above:

screen shot 2017-12-10 at 7 42 42 am

Once you have the download URL for a file you can just paste it into your browser...

screen shot 2017-12-10 at 7 43 46 am

... and the file should begin to download, like this:

screen shot 2017-12-10 at 7 44 04 am

This should work fine from the command line with curl but I thought screenshots like this would be easier to follow.

Please note that if you're actually using curl, you'll need to add -J or --remote-header-name to have curl save the file with the name that shows in Dataverse (1995-1999 Levels of Source _ Target.tab) rather than the file id (109356):

curl -O -J https://dataverse.harvard.edu/api/access/datafile/109356 shows "curl: Saved to filename '1995-1999 Levels of Source _ Target.tab'".

@pdurbin pdurbin added UX & UI: Design This issue needs input on the design of the UI and from the product owner Feature: File Upload & Handling labels Dec 10, 2017
@conjugateprior
Copy link
Author

Great, thanks.

From this I conclude that:

  • the Data Access API is not actually necessary for this kind of data access.
  • files get an id that is not relative to the DV they are in.
  • it is still unclear why API access does not work for my repo, relatedly why the API does not believe in my token.

The first two would great things to have, or have more prominently, in the API documentation.

Re #3584, I didn't know how public the file was, having previously been obliged to click through some T&Cs in the browser.

For reference, I was planning to embed the data request in an R package. That now works as

> library(httr)
> resp <- GET("https://dataverse.harvard.edu/api/access/datafile/109356")
> content(resp)

No encoding supplied: defaulting to UTF-8.
Parsed with column specification:
cols(
  CODE = col_character(),
  LEVEL = col_character(),
  DESCRIPT = col_character()
)
# A tibble: 17 x 3
     CODE                       LEVEL
    <chr>                       <chr>
 1 <CAPI>                    Capitals

[snip]

which is perfect.

So, this fixes the immediate problem and resolves the ticket. Thanks again.

@pdurbin
Copy link
Member

pdurbin commented Dec 10, 2017

@conjugateprior sure, but you mentioned a 503 before and when I dig in a bit more I think you're onto something. Check this out.

I go to the file landing page at https://dataverse.harvard.edu/file.xhtml?fileId=109356 and click "Download" and then "Original File Format (UNKNOWN)" (seeing "UNKNOWN" here is already somewhat suspicious to me):

screen shot 2017-12-10 at 9 25 01 am

Then, I get a 503 error at when format=original) at https://dataverse.harvard.edu/api/access/datafile/109356?format=original&gbrecs=true (last I checked gbrecs=true doesn't actually do anything):

screen shot 2017-12-10 at 9 25 21 am

So there's something strange going on.

@pdurbin
Copy link
Member

pdurbin commented Jul 13, 2018

The "original file" download option is no longer available via the dataset above because worked on this issue: Make "download as original" disappear from download options, when there is no saved original. #4796

Here's how it looks now:

screen shot 2018-07-12 at 10 34 33 pm

@conjugateprior , who opened this issue, seems fine with using the tab-delimited version of the file so I think it's safe to close this issue. It's specific to a file for a particular installation of Dataverse (Harvard Dataverse).

@mheppler mheppler removed the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Feb 14, 2020
@pdurbin pdurbin closed this as completed Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants