Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle filenames for basic streaming transfer adapter. #68

Merged
merged 8 commits into from
Feb 24, 2021

Conversation

jonathansberry
Copy link
Contributor

@jonathansberry jonathansberry commented Feb 15, 2021

This is a possible solution to issue #67. I have valued the opportunity to dig into giftless and understand how it works a bit. I don't know if this PR is useful or not, but I'd find your review and otherwise proposed solution really interesting to understand.

Some thoughts:

  • I've used the GET args to wrap up the filename into the download url returned by batch.
  • From what I can see there are no tests for basic streaming transfer adapter to update?
  • There is a small amount of reuse of code from here. It's only a few lines, but could be refactored. Just wasn't sure where would be best to put it?
  • The Content-Type header is still not set. I think this would probably require a PR to ckanext-blob-storage as well, so that the content type and filename are both passed through to the giftless batch request?

Copy link
Contributor

@shevron shevron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all this is great, the query param approach is the one I would take as well. A couple of minor comments and I think this is good for merging.

As to some of the other points you've raised:

  • Testing - you're right, there aren't any. If you wish, feel free to add some ;-)

  • Content-type: it is indeed not handled here, and may require changes to blob-storage to handle or adding some kind of metadata DB to local storage. In general it's a more complex problem and the solution may be worth thinking about. Solving it "right" also means it can have some interaction with the storage module, e.g. Azure / GCP may already know the file's content-type and it can be piped though. Local storage could use a mime type detection library to automagically guess the mime type if not specified. So all in all a lot of options. I'd suggest opening a separate ticket to discuss (if there isn't already one).

giftless/transfer/basic_streaming.py Outdated Show resolved Hide resolved
giftless/transfer/basic_streaming.py Outdated Show resolved Hide resolved
giftless/transfer/basic_streaming.py Outdated Show resolved Hide resolved
@jonathansberry
Copy link
Contributor Author

@shevron Thanks for the great feedback - really helpful. I'm hoping I've resolved most of the issues. The only one I remain unsure of is whether and how we should be checking that the filename is safe to return in the Content-Disposition header.

giftless/util.py Outdated Show resolved Hide resolved
@shevron
Copy link
Contributor

shevron commented Feb 24, 2021

Looks great. I think we can be less strict with filename sanitisation - there is really no problem with many "special" characters like space, plus, @ and others, which may appear in file names, as long as it can't be used to inject something into the HTTP headers (so newlines, semicolons, double quotes). Moreover, non-latin characters (umlauts, Cyrillic, Hebrew, Arabic etc. etc. ...) are perfectly safe and common for international users. If the entire file name is in Japanese, users may end up with an empty filename here.

But - I think this version is better than what we have, so I'm merging it, and we can open a new issue to be fixed at some point in the future re allowing more characters in file names.

@shevron
Copy link
Contributor

shevron commented Feb 24, 2021

@jonathansberry see #70 if you have time / wish to further improve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants