New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Etag support in HTTP::StaticFileHandler #6145

Merged
merged 1 commit into from Jun 15, 2018

Conversation

Projects
None yet
6 participants
@emq
Contributor

emq commented May 30, 2018

Hi all

I was trying to extract some logic in Kemal - kemalcr/kemal#440 - it seems at this point static file handler is kinda hard to re-use as it's quite coupled - a lot of stuff is going on there at the moment.

Maybe injecting conditional into handle arguments might be a good idea? So you can basically plug-in anything you want there (you could also remove caching by injecting some class that will simply return false for matches? class method).

It's just a totally random proposal / proof of concept (wip) - please let me know what you think about it so we can proceed with it / change the approach or simply reject it 馃挘

Keep up great work, I'm really enjoying having 'ruby experience' in a compiled language - it's truly awesome 鉂わ笍

@emq emq changed the title from [WIP] Redo HTTP::StaticFileHandler to [WIP/Proposal] Redo HTTP::StaticFileHandler May 31, 2018

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 4, 2018

It's certainly a good idea to clean up and reorder some of the logic in that class. But I don't see the point in having a Conditional type. It just complicates things. This static file handler doesn't need to be super flexible. It's just a default implementation. It doesn't need to support complex scenarios (although solutions to that can be based upon).

It's particularly odd that this PR adds support for Etag/If-None-Match but is seemingly not to be able to use it alongside Last-Modified/If-Modified-Since simultaneously. That's not really helpful.

In my opinion, Etag support should be added without sacrificing any of the current behaviour. It could make sense to extract some features to private instance methods (for example matches? and add_headers) that isolate behaviour and can be used for customization in inheriting classes (if needed).

@emq

This comment has been minimized.

Contributor

emq commented Jun 5, 2018

Thanks for quick feedback! We can pack logic internally and still keep it extendable - I thought why not making it more configurable out of the box.

As for using both simultaneously - I will give it another go; by (quickly :x) looking at rfc7232 before submitting that proposal I thought it made no sense because of:

A recipient MUST ignore If-Modified-Since if the request contains an
If-None-Match header field; the condition in If-None-Match is
considered to be a more accurate replacement for the condition in
If-Modified-Since, and the two are only combined for the sake of
interoperating with older intermediaries that might not implement
If-None-Match.

But I guess we still have to deal with some older clients and support both at the same time on server side 馃 good point!

Will ping you hopefully this week for further discussion and I'm still hoping for more feedback ;).

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 5, 2018

Yes, some clients or intermediaries might only use one of them so we should support both. And If-None-Match takes precedence. Please also be aware that its value can be a number of etags and you need to validate each of them.

@emq

This comment has been minimized.

Contributor

emq commented Jun 7, 2018

Another quick go, please lemme know wdyt - I'm still collecting feedback here (will rebase whole thing anyway)

Note/Fun (?) fact: Implementation in Rails contradicts specification linked here, but I guess this should follow rfc 馃檴

@@ -6,6 +6,8 @@ require "uri"
class HTTP::StaticFileHandler
include HTTP::Handler

ALLOWED_METHODS = ["GET", "HEAD"]

This comment has been minimized.

@Sija

Sija Jun 7, 2018

Contributor
ALLOWED_METHODS = %w(GET HEAD)
@@ -24,12 +26,13 @@ class HTTP::StaticFileHandler
end

def call(context)
unless context.request.method == "GET" || context.request.method == "HEAD"
request_method = context.request.method

This comment has been minimized.

@Sija

Sija Jun 7, 2018

Contributor

Useless assign.

@straight-shoota

This looks really good overall 馃憤 I've added a few line comments about simple improvements.

A few notes regarding general design:

  • I don't see a benefit from isolating the behaviour in an additional Cache type. These methods should just be (private) instance methods in StaticFileHandler. They're of no use outside of that class.
  • Combining methods etag? and modified_since? in match? would avoid lot's of boiler plate and would even make it better understandable what's going on. IMHO it doesn't make sense to separate these methods.

Support for a list of etags in If-None-Match header and validating each of them would be great, but that can be added when the general design is settled (should be fairly easy).

@@ -6,7 +6,10 @@ require "uri"
class HTTP::StaticFileHandler
include HTTP::Handler

ALLOWED_METHODS = ["GET", "HEAD"]

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

I'm not sure why we need to expose this as a constant. But it should be a tuple if so.

if @fallthrough
call_next(context)
else
context.response.status_code = 405
context.response.headers.add("Allow", "GET, HEAD")
context.response.headers.add("Allow", ALLOWED_METHODS.join(", "))

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

Generating that string dynamically is useless as the allowed methods are not gonna change. Better leave it as before and scrap ALLOWED_METHODS.

response.status_code.should eq(200)
response.headers["Last-Modified"].should eq(HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time))
response.body.should eq(File.read("#{__DIR__}/static/test.txt"))
end
end

context "with If-None-Match header" do
it "should return 304 Not Modified if header matches etag" do
etag = %{W/"#{File.info("#{__DIR__}/static/test.txt").modification_time.epoch.to_s}"}

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

You shouldn't generate a etag manually. How it's generated is irrelevant to this spec.

Just send a prior request to get the etag.

context "with both If-None-Match and If-Modified-Since headers" do
it "ignores If-Modified-Since as specified in RFC 7232" do
headers = HTTP::Headers.new
etag = %{W/"#{File.info("#{__DIR__}/static/test.txt").modification_time.epoch.to_s}"}

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

ditto

end

context "with both If-None-Match and If-Modified-Since headers" do
it "ignores If-Modified-Since as specified in RFC 7232" do

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

How does this spec test if If-Modified-Since is ignored? The date needs to be before modifcation time of the file for this.

This comment has been minimized.

@emq

emq Jun 7, 2018

Contributor

Because of current 1s padding that I did not touched I don't think it's correct, see spec

it "should return 304 Not Modified if file mtime is equal" do

update: ouh, I think I can see it now ;)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

Oh, it seems there is something off... We need more integration tests for this.

ref_response = handle HTTP::Request.new("GET", "/test.txt")
headers = HTTP::Headers.new
headers["If-Modified-Since"] = ref_response.headers["Last-Modified"]
response = handle HTTP::Request.new("GET", "/test.txt")

This should result in 304 Not Modified (both with and without this PR), but it doesn't.

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

My bad, this works as intended.

headers["If-None-Match"] = "some random etag"
response = handle HTTP::Request.new("GET", "/test.txt", headers)
response.status_code.should eq(200)
response.headers["Etag"].should match(/W\/"\d+"$/)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

Validation of the etag format and should be a separate spec.

response = handle HTTP::Request.new("GET", "/test.txt", headers)
response.status_code.should eq(200)
response.headers["Etag"].should match(/W\/"\d+"$/)
response.body.should eq(File.read("#{__DIR__}/static/test.txt"))

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

I don't think this expectation is necessary. Checking the status code should be enough.

This comment has been minimized.

@emq

emq Jun 7, 2018

Contributor

Actually re-added body assertion after I removed it because of this comment 馃槄 as first (original) spec does it and I guess it won't hurt, right? 馃

@@ -24,12 +26,13 @@ class HTTP::StaticFileHandler
end

def call(context)
unless context.request.method == "GET" || context.request.method == "HEAD"
request_method = context.request.method
unless ALLOWED_METHODS.includes?(request_method)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

{"GET", HEAD"}.includes?(context.request.method)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

Should support OPTIONS as well, but that's for another PR.

@@ -0,0 +1,48 @@
class HTTP::StaticFileHandler::Cache

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

This should be a module since there are no instances for this type.

if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
return false unless if_none_match

if_none_match == etag(file_path)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

You don't need to recompute the etag, it's already available as context.response.headers[HTTP_ETAG]. Ditto below.

@emq

This comment has been minimized.

Contributor

emq commented Jun 7, 2018

Thanks for comments guys! 鉂わ笍 Assuming this is acceptable way I will wrap it up (cleanup and address comments)

@straight-shoota I tried to keep it in a single file/class as suggested - but damn it seemed cluttered - I will give it another go tho and let's decide one more important issues are addressed ;-)

end

private def self.etag(file_path)
%{W/"#{File.info(file_path).modification_time.epoch.to_s}"}

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

Don't need .to_s here. In fact, it results in a additional, useless string allocation.


# According to RFC 7232:
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field
return if_none_match == context.response.headers[HTTP_ETAG] if if_none_match

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

Suggestion for better readable control flow:

# According to RFC 7232:
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field
if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
  if_none_match == context.response.headers[HTTP_ETAG]
elsif if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
  header_time = HTTP.parse_time(if_modified_since)
  # ...
else
  false
end

This comment has been minimized.

@Sija

Sija Jun 7, 2018

Contributor

Using case would be even more readable.

This comment has been minimized.

@emq

emq Jun 7, 2018

Contributor

case by using somehow by using the implicit-object syntax

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 7, 2018

Please move the methods from Change type directly into StaticFileHandler. There is no reason to have an extra type for that.

def self.match?(context : HTTP::Server::Context, file_path : String) : Bool
# According to RFC 7232:
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field
if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?

This comment has been minimized.

@Sija

Sija Jun 7, 2018

Contributor
case
when if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
  # ...
when if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
  # ...
else
  # ...
end

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

why??

This comment has been minimized.

@Sija

Sija Jun 7, 2018

Contributor

Arguably reads better than:

if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
  # ...
elsif if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
  # ...
else
  # ...
end

This comment has been minimized.

@emq

emq Jun 8, 2018

Contributor

to be honest by first glance I don't see how one is better than other (and vice versa) 馃槄 I thought at first with crystal's implicit-object syntax there might be some more sophisticated approach, maybe let's just leave it as-it?

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

This is clearly a case for if. Yes, it could be written as case but that's a bit different semantically. And it's more to type. Theoretically, you could rewrite every if ... elseif as case ... when, but what would be the point?

@emq

This comment has been minimized.

Contributor

emq commented Jun 8, 2018

Please move the methods from Change

um, from Cache you mean? I would leave it there for readability / re-usability of internals and (imo) cleaner testability but I won't fight over that, I would rather see this getting merged sooner than later :P

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 8, 2018

Yes, I mean Cache. I don't think it really improves readability, because it spreads the implementation across different files and types. These methods are not really re-usable outside the static file handler. In fact, it would even help reusability if these methods were instance methods on the handler class, so that inheriting classes can easily overwrite them.

We could certainly generalize match? and etag to make them reusable for other handlers, but we're not there yet. This needs some discussion about design (where to put them?) and I guess we can leave that for another day.

Currently, these methods are only internal implementation and as such, don't need isolated unit tests at all.

@emq emq force-pushed the emq:redo_static_file_handler branch from 632b72d to 83ddbfd Jun 8, 2018

@emq

This comment has been minimized.

Contributor

emq commented Jun 8, 2018

Resolved all comments (I think) and rebased whole thing; gonna eat some breakfast and make sure I didn't messed smtg up, I don't trust myself 馃槄

again thanks for all the feedback!

@emq emq changed the title from [WIP/Proposal] Redo HTTP::StaticFileHandler to Add Etag support in HTTP::StaticFileHandler Jun 8, 2018

@emq

This comment has been minimized.

Contributor

emq commented Jun 8, 2018

any ideas why test_linux is failing? is this a known issue? 馃檲

edit: this #5890 ?

@straight-shoota
  • spec add Etag header should also test the etag is different when mtime changes.
  • Would you rewrite the other existing examples to use initial_response as well? Name suggestions: returns 304 Not Modified for younger than Last-Modified and serves content for older than Last-Modified. You can use HTTP.parse_time to parse the Last-Modified header for calculations.
@@ -21,29 +21,94 @@ describe HTTP::StaticFileHandler do
response.body.should eq(File.read("#{__DIR__}/static/test.txt"))
end

context "with header If-Modified-Since" do
it "should add Etag header" do

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

Just a nit-bit: don't use should. Just it "adds Etag header" do is fine. Less noise.
I know, some of the existing examples in this file use should but it doesn't add anything. Of course all specs describe something that should work. Feel free to remove all of them.

headers = HTTP::Headers.new
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time)
headers["If-Modified-Since"] = initial_response.headers["Last-Modified"]

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

Maybe also add a expectation response.headers["Last-Modified"].should eq initial_response.headers["Last-Modified"] later to ensure the header doesn't change between requests.

headers["If-Modified-Since"] = initial_response.headers["Last-Modified"]

response = handle HTTP::Request.new("GET", "/test.txt", headers), ignore_body: true
response.headers["Content-Type"]?.should eq(nil)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

I don't think this needs to be a separate example. Just add this expectation to the previous one.

s/eq(nil)/be_nil/

end

it "should serve file if file mtime is younger" do
headers = HTTP::Headers.new
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time - 1.hour)
response = handle HTTP::Request.new("GET", "/test.txt")
response = handle HTTP::Request.new("GET", "/test.txt", headers)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

Maybe add ignore_body: false to make it explicit that this is expecting content.

headers = HTTP::Headers.new
headers["If-None-Match"] = "some random etag"

response = handle HTTP::Request.new("GET", "/test.txt", headers)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

ditto ignore_body: false

initial_response = handle HTTP::Request.new("GET", "/test.txt")

headers = HTTP::Headers.new
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time - 1.hour)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

This would also benefit from using Last-Modified:

HTTP.parse_time(initial_response.headers["Last-Modified"]) - 1.hour

This comment has been minimized.

@emq

emq Jun 8, 2018

Contributor

but you would have to parse it back with rfc1123_date (not mentioning that parse_time returns Time?, so not_nil! would have to be added as well) - is it really worth it? 馃


it "should serve a file if header does not match etag" do
headers = HTTP::Headers.new
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

ditto

response.status_code.should eq(304)
end

it "should serve a file if header does not match etag" do

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

Maybe add even If-Modified-Since is fresh.

@emq emq force-pushed the emq:redo_static_file_handler branch from 83ddbfd to 64b23e5 Jun 8, 2018

end
end

private def etag(file_path)

This comment has been minimized.

@RX14

RX14 Jun 8, 2018

Member

This should take a modification_time so that we only call File.info(path).modification_time once, this saves performance and avoids race conditions where the etag and last modified are different.

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

Race condition doesn't matter, but performance and reusability 馃憤

This comment has been minimized.

@straight-shoota

straight-shoota Jun 8, 2018

Contributor

Could even move this as a property of Response for even more reusability.

HTTP_IF_MODIFIED_SINCE = "If-Modified-Since"
HTTP_IF_NONE_MATCH = "If-None-Match"
HTTP_ETAG = "Etag"
HTTP_LAST_MODIFIED = "Last-Modified"

This comment has been minimized.

@RX14

RX14 Jun 8, 2018

Member

these consts should be private, or maybe even just removed and inlined. We don't typically agressively extract literals in crystal.

@emq emq force-pushed the emq:redo_static_file_handler branch 2 times, most recently from 5d540a0 to c5fcce7 Jun 11, 2018

@emq

This comment has been minimized.

Contributor

emq commented Jun 11, 2018

How about now? 馃檹

@straight-shoota

Just a small improvement, apart from that: 馃憤

# An exact comparison might be slightly off, so we add 1s padding.
# Static files should generally not be modified in subsecond intervals, so this is perfectly safe.
# This might be replaced by a more sophisticated time comparison when it becomes available.
last_modified = modification_time(file_path)

This comment has been minimized.

@straight-shoota

straight-shoota Jun 11, 2018

Contributor

Maybe move this to the call method and pass modification_time to both add_cache_header and cache_request? to avoid another duplicate file stat.

This comment has been minimized.

@emq

emq Jun 11, 2018

Contributor

good point! rebased PR once again 馃槄

@emq emq force-pushed the emq:redo_static_file_handler branch from c5fcce7 to 2163d40 Jun 11, 2018

@RX14

RX14 approved these changes Jun 11, 2018

@sdogruyol

This comment has been minimized.

Member

sdogruyol commented Jun 15, 2018

This is a great improvement over the previous StaticFileHandler. I can already see many frameworks / libraries benefiting from this 馃暫

@sdogruyol sdogruyol merged commit 5d431c7 into crystal-lang:master Jun 15, 2018

4 checks passed

ci/circleci: test_darwin Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@emq emq deleted the emq:redo_static_file_handler branch Jun 15, 2018

@bcardiff bcardiff added this to the 0.25.1 milestone Jun 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment