Wrong file name for download from a FigShare URL #760

briochemc · 2021-09-18T02:27:08Z

Julia 1.6.2
HTTP v0.9.14
MbedTLS v1.0.3

This minimal example,

using DataDeps, XLSX

register(DataDep(
            "FigShare_dataset",
            "No description needed",
            "https://ndownloader.figshare.com/files/6294558",
            "ba250b1b64b8c43d1130a7aee39674e9ee5936ebb49050c4c224c740bff588b0"
))

XLSX.read(joinpath(datadep"FigShare_dataset", "rsta20150293_si_001.xlsx"))

downloads a table from FigShare (doi: https://doi.org/10.6084/m9.figshare.3980064.v1) and tries to read it (using DataDeps.jl) but fails with No such file or directory. It fails because instead of the correct name ("rsta20150293_si_001.xlsx"), the file is saved as 6294558 (which is the last "word" of the URL). However, this exact same snippet used to work sometime in the past year, so something changed since then. Talking briefly with @oxinabox, he suggested that this could be an issue for HTTP.jl or an issue from FigShare itself. I tried to dig when/where something changed using blame here, but I failed to figure it out, and pinning earlier package versions did not work either.

I don't come with just a problem, FWIW, a solution/workaround (thanks to Lyndon as well) is to rename the file in post-processing after download. Thus at this stage this is not an issue for me anymore, but hopefully posting all these details will help someone here find a fix! 😃

The text was updated successfully, but these errors were encountered:

fredrikekre · 2021-09-18T07:58:55Z

What filename does curl give?

briochemc · 2021-09-18T10:25:05Z

Apologies if this is dumb, but I'm not sure how to answer that! 😅

oxinabox · 2021-09-18T10:33:31Z

Don't worry, I got this.

❱ curl -Li "https://ndownloader.figshare.com/files/6294558"                    
HTTP/1.1 302 Found
Server: nginx
Date: Sat, 18 Sep 2021 10:28:44 GMT
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Content-Length: 0
Connection: keep-alive
...
Location: https://s3-eu-west-1.amazonaws.com/pstorage-rs-4828782598/6294558/rsta20150293_si_001.xlsx?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-
...
Content-Disposition: attachment;filename=rsta20150293_si_001.xlsx
...

That content-disposition is what we should (and normally do) use to determine file name.

I wonder if we are getting tricked by the 302 redirect?

oxinabox · 2021-09-18T10:46:09Z

Yeah looks like it is the redirect confusing things
HTTP.jl see code 200 and no Content-Disposition header:

julia> resp = HTTP.headers(HTTP.request("GET", "https://ndownloader.figshare.com/files/6294558"))
12-element Vector{Pair{SubString{String}, SubString{String}}}:
               "x-amz-id-2" => "t5e96lUgqwlnTS65M5hrdcLtnZ/K3vhlDScYBehbxxFL85CqPMfrqsc8nMbXy4KG1FL8nB/3NCw="
         "x-amz-request-id" => "PQM7QC3VMVZPTGFZ"
                     "Date" => "Sat, 18 Sep 2021 10:45:02 GMT"
 "x-amz-replication-status" => "COMPLETED"
            "Last-Modified" => "Fri, 03 Sep 2021 08:47:47 GMT"
                     "ETag" => "\"bf518a09be3cf14d4d7abb47489cbae8\""
      "x-amz-tagging-count" => "1"
         "x-amz-version-id" => "U6trxKUd0lhNhFEHAmnrHQsVVofp9yxk"
            "Accept-Ranges" => "bytes"
             "Content-Type" => "binary/octet-stream"
                   "Server" => "AmazonS3"
           "Content-Length" => "463645"

in HTTP.download, fixes #760. Co-authored-by: Lyndon White <lyndon.white@invenialabs.co.uk> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>

Use Content-Disposition for 3xx requests for filename detection in HTTP.download, fixes #760. Co-authored-by: Lyndon White <lyndon.white@invenialabs.co.uk> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>

oxinabox self-assigned this Sep 18, 2021

oxinabox mentioned this issue Sep 18, 2021

Make HTTP.download support redirects when the redirected URL does not set Content-Disposition #761

Merged

fredrikekre added a commit that referenced this issue Sep 26, 2021

Use Content-Disposition for 3xx requests for filename detection

f96574d

in HTTP.download, fixes #760. Co-authored-by: Lyndon White <lyndon.white@invenialabs.co.uk> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>

fredrikekre closed this as completed in #761 Sep 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong file name for download from a FigShare URL #760

Wrong file name for download from a FigShare URL #760

briochemc commented Sep 18, 2021

fredrikekre commented Sep 18, 2021

briochemc commented Sep 18, 2021

oxinabox commented Sep 18, 2021

oxinabox commented Sep 18, 2021 •

edited

Loading

Wrong file name for download from a FigShare URL #760

Wrong file name for download from a FigShare URL #760

Comments

briochemc commented Sep 18, 2021

fredrikekre commented Sep 18, 2021

briochemc commented Sep 18, 2021

oxinabox commented Sep 18, 2021

oxinabox commented Sep 18, 2021 • edited Loading

oxinabox commented Sep 18, 2021 •

edited

Loading