Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Range requests in HTTP::StaticFileHandler #12886

Conversation

jgaskins
Copy link
Contributor

When fetching a large file, such as a media file, that gets picked up by HTTP::StaticFileHandler, the handler copies the entire file to the response, even if the client is just trying to fetch metadata about the file. This can make that metadata request take a lot longer and lead to extra bandwidth being used, which can be expensive.

At the moment, this PR does not handle the If-Range header, which it looks like is used to ensure that multiple Range requests to get small chunks of a file will return chunks of the same file, falling back to returning the entire file in case of an Etag or Last-Modified mismatch.

Fixes #8954

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some additional specs for an empty file and where start byte is in range, but the end byte is not (I suppose it should clamp to the end of data?).

The Multipart.parse method is an independent feature. Could you extract that into a separate PR, please? We should try to separate concerns.

Comment on lines 105 to 107
if (range = context.request.headers["Range"]?) && (match = range.match(/bytes=([0-9\-,]+)/))
ranges = match[1].split(',').map do |range|
start, finish = range.strip.split('-', 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parsing mechanism looks like it doesn't do proper error handling. It breaks on invalid range values such as bytes=1 or bytes=1-2-3.
Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parsing mechanism looks like it doesn't do proper error handling. It breaks on invalid range values such as bytes=1 or bytes=1-2-3.

This is not actionable feedback. Define “breaks”. What do you think the expected and actual behaviors are here?

Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.

What do you mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, how likely is this header to be malformed? How critical is it to get this precisely correct before the feature is merged at all? If it has to be perfect before it gets merged I’ll just close the PR and publish it as a shard.

Copy link
Member

@straight-shoota straight-shoota Jan 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes=1 raises IndexError and bytes=1-2-3 is interpreted the same as bytes=1-. I think both kind of errors should cause an HTTP error response.

Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.

What do you mean here?

It's just that validation is essentially split between the regex and the string methods. So some of that is redundant. For example the regex would be unnecessary, if the parsing was bit more precise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, how likely is this header to be malformed? How critical is it to get this precisely correct before the feature is merged at all? If it has to be perfect before it gets merged I’ll just close the PR and publish it as a shard.

I doesn't has to be perfect. But the basics should be covered and I think proper error handling is necessary for that.
As said below, I can take care of finishing it up and I expect this shouldn't take long to be merged.
The base implementation looks solid, just needs some adjustments to handle edge cases / errors properly.

src/http/server/handlers/static_file_handler.cr Outdated Show resolved Hide resolved
@jgaskins
Copy link
Contributor Author

jgaskins commented Dec 31, 2022

I'd like to see some additional specs for an empty file and where start byte is in range, but the end byte is not (I suppose it should clamp to the end of data?).

I was sure I had written a test for omitting the end of the range. Looks like it got deleted or rewritten into one of the ones I did end up committing. But it does clamp to the end of the file as written.

The Multipart.parse method is an independent feature. Could you extract that into a separate PR, please? We should try to separate concerns.

It was required to be able to parse the response for multiple ranges. This PR can’t exist without it.

@straight-shoota
Copy link
Member

It was required to be able to parse the response for multiple ranges. This PR can’t exist without it.

This PR can depend on the other one.
It's only used for the specs, so you could alternatively turn it a spec helper for this PR in order to merge it independently.

@jgaskins
Copy link
Contributor Author

jgaskins commented Jan 1, 2023

This PR can depend on the other one.

If you would like to go to all that effort yourself of creating multiple PRs and wiring up their dependencies, you’re more than welcome to, but I will not be splitting PRs for the sake of accounting that ultimately makes no difference.

I’ve already spent over a day on this PR already between trying to understand how these requests work, implementing the feature itself, and doing additional testing with a browser, not counting the time I spent on a previous attempt at this several months ago.

@straight-shoota
Copy link
Member

Sure, no problem. I can take care of these administrative aspects. Thanks for all the work you have already put into it.

I realize that it seems a bit cumbersome to go the extra steps, but we've realized that it's worth it in the long run for the project to have small and self-contained changes. And we'll take the burdon of this extra work (it's not that much actually, though).

@straight-shoota straight-shoota changed the base branch from master to feature/multipart-parse January 1, 2023 17:36
@straight-shoota straight-shoota marked this pull request as draft January 1, 2023 17:36
@straight-shoota
Copy link
Member

I extracted the implementation of MIME::Multipart.parse into #12890. This PR should be merged first, then this one is good to go on top.

@asterite
Copy link
Member

asterite commented Jan 2, 2023

I said it before and I'll say it again: the Ruby core team happily merges incomplete or almost good PRs. Then they introduce extra commits (here we'll use extra PRs) with those additional changes. The end result is that contributory are happier, they don't get frustrated, they don't get to do extra work, and as a bonus, they don't get the final word on how things should be. The last one might be a downside to them, but I'm sure the happiness is of contributing to is greater (I contributed to Ruby a couple of times and that was my experience)

@asterite
Copy link
Member

asterite commented Jan 2, 2023

Like, instead of saying "please do this and that" it's so much easier to do it ourselves with exactly the code that we have in mind.

@asterite
Copy link
Member

asterite commented Jan 2, 2023

That said, everything is depends on the chosen process. If contribution is a friction, providing a diff of the intended change could also work. That way the rest of the work will surely depend on the core team.

Happy new year everyone!

@straight-shoota straight-shoota deleted the branch crystal-lang:master January 25, 2023 18:50
@straight-shoota straight-shoota changed the base branch from feature/multipart-parse to master January 25, 2023 19:12
@straight-shoota straight-shoota marked this pull request as ready for review January 31, 2023 22:07
@straight-shoota straight-shoota marked this pull request as draft February 1, 2023 14:21
@straight-shoota straight-shoota force-pushed the support-range-requests-for-static-file-handler branch from 1641e22 to 65b6f26 Compare February 1, 2023 14:22
@straight-shoota straight-shoota marked this pull request as ready for review February 1, 2023 14:45
Also extract implementation into helper method
straight-shoota and others added 2 commits February 1, 2023 19:08
Co-authored-by: Jamie Gaskins <jgaskins@hey.com>
Co-authored-by: Jamie Gaskins <jgaskins@hey.com>
@straight-shoota
Copy link
Member

straight-shoota commented Feb 1, 2023

Going through the RFC again, there are a couple more edge cases missing. I'll address that later.

@straight-shoota straight-shoota marked this pull request as draft February 1, 2023 18:47
@beta-ziliani
Copy link
Member

I don't know if that's expected, but CI is failing for these specs already.

@straight-shoota
Copy link
Member

Yes, it's because I applied one of the suggestions which only changed the spec, not the implementation.

@straight-shoota straight-shoota marked this pull request as ready for review February 1, 2023 21:44
@straight-shoota straight-shoota self-assigned this Feb 28, 2023
Comment on lines 341 to 347
it "start >= file_size" do
headers = HTTP::Headers{"Range" => "bytes=12-"}
response = handle HTTP::Request.new("GET", "/range.txt", headers)

response.status_code.should eq(416)
response.headers["Content-Range"]?.should eq "bytes */12"
end
Copy link
Contributor

@HertzDevil HertzDevil Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this should also be grouped into the above describe, because if that refers to syntactic validity, it would be confusing if this describe refers to semantic invalidity

@straight-shoota straight-shoota added this to the 1.8.0 milestone Mar 27, 2023
@straight-shoota straight-shoota merged commit 61edd54 into crystal-lang:master Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP::StaticFileHandler should support 206 Partial Content feature
7 participants