Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent filenames when url download redirects #1374

Closed
philwinder opened this issue Nov 24, 2022 · 1 comment · Fixed by #2416
Closed

Inconsistent filenames when url download redirects #1374

philwinder opened this issue Nov 24, 2022 · 1 comment · Fixed by #2416
Assignees
Labels
type/bug Type: Something is not working as expected

Comments

@philwinder
Copy link
Contributor

philwinder commented Nov 24, 2022

When I run this:

bacalhau docker run --input-urls=https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt ubuntu -- ls -lah /inputs

The result is:

total 15M
drwxr-xr-x 2 root root 4.0K Nov 24 15:12 .
drwxr-xr-x 1 root root 4.0K Nov 24 15:12 ..
-rw-r--r-- 1 root root  15M Nov 24 15:12 76813c2d-b52b-47af-95fb-e92c1b0b2783

When curling that URL you'll see that it redirects to another with a (random?) UUID for the name. Something that looks like this:

https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/14327886-3839-4fa5-96c3-d52cfa73cdc5?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20221124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221124T151717Z&X-Amz-Expires=300&X-Amz-Signature=c3a324c90a4f6ac3e54d6cdeefbfddc70ccd0472667cfb9652a3f0e15ee37fa1&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=264818686&response-content-disposition=attachment%3B%20filename%3Dyolov5s.pt&response-content-type=application%2Foctet-stream

As a user I expect the file name to match the base name of the original URL, not the redirected one.

An alternative fix would be to allow volume mapping to rename foo:bar

Source: https://filecoinproject.slack.com/archives/C02RLM3JHUY/p1669302839546539

@philwinder philwinder added the type/bug Type: Something is not working as expected label Nov 24, 2022
@aronchick
Copy link
Collaborator

The problem is the opposite is true too - https://picsum.photos/200/200/ - redirects to get the final file name.

This: https://superuser.com/questions/301044/how-to-wget-a-file-with-correct-name-when-redirected <- suggests we should look at content-disposition. CURL doesn't work unless you give it an -o to name the output. wget uses the original file name.

I've seen many sites out there with things like this - http://www.vim.org/scripts/download_script.php?src_id=9750 and i think that naming things download_script.php would not be expected either.

@rossjones rossjones self-assigned this Apr 26, 2023
rossjones added a commit that referenced this issue Apr 27, 2023
Occassionally we get strange behavious where the filename during a
download is set to a strange value, because the original URL redirects
to a CDN which uses a different name. In these cases the response often
contains a content-disposition telling us the intended filename.

This PR reads the content-disposition and uses it for the filename when
it is available.

Fixes #1374
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Type: Something is not working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants