Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize HTTP.header_name #14503

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jgaskins
Copy link
Contributor

Using Indexable#bsearch operates in logarithmic time, but Hash#[]? operates in constant time.

Benchmark code
require "set"

COMMON_HEADERS = %w(
  Accept-Encoding
  Accept-Language
  Accept-encoding
  Accept-language
  Allow
  Cache-Control
  Cache-control
  Connection
  Content-Disposition
  Content-Encoding
  Content-Language
  Content-Length
  Content-Type
  Content-disposition
  Content-encoding
  Content-language
  Content-length
  Content-type
  ETag
  Etag
  Expires
  Host
  Last-Modified
  Last-modified
  Location
  Referer
  User-Agent
  User-agent
  accept-encoding
  accept-language
  allow
  cache-control
  connection
  content-disposition
  content-encoding
  content-language
  content-length
  content-type
  etag
  expires
  host
  last-modified
  location
  referer
  user-agent
)
  .map { |header| {header.to_slice, header} }
  .to_h

# :nodoc:
def header_name(slice : Bytes) : String
  # Check if the header name is a common one.
  # If so we avoid having to allocate a string for it.
  if slice.size < 20 && (name = COMMON_HEADERS[slice]?)
    name
  else
    String.new(slice)
  end
end

require "benchmark"
require "http"

value = nil
puts "hit"
bench "allow"
puts
puts "miss"
bench "florp"
puts
puts "miss - long name"
bench "Strict-Transport-Security"

# side effect so LLVM won't optimize out the benchmark blocks
pp value unless value

def bench(header_name : String)
  Benchmark.ips do |x|
    x.report "HTTP.header_name" { value = HTTP.header_name(header_name.to_slice) }
    x.report "optimized" { value = header_name(header_name.to_slice) }
  end
end
➜  Code crystal run --release bsearch_vs_set.cr
hit
HTTP.header_name  43.35M ( 23.07ns) (± 0.37%)  0.0B/op   2.95× slower
       optimized 127.99M (  7.81ns) (± 0.63%)  0.0B/op        fastest

miss
HTTP.header_name  36.37M ( 27.50ns) (± 0.54%)  32.0B/op   1.79× slower
       optimized  65.16M ( 15.35ns) (± 2.01%)  32.0B/op        fastest

miss - long name
HTTP.header_name  74.37M ( 13.45ns) (± 0.75%)  48.0B/op        fastest
       optimized  73.93M ( 13.53ns) (± 1.10%)  48.0B/op   1.01× slower

Using `Indexable#bsearch` operates in logarithmic time, but `Hash#[]?` operates in constant time
@HertzDevil
Copy link
Contributor

HertzDevil commented Apr 17, 2024

Just wondering, could StringPool be even faster? Something like %w(...).each_with_object(StringPool.new) { |pool, str| pool.get(str) }

@jgaskins
Copy link
Contributor Author

Maybe, I can check it out.

@jgaskins
Copy link
Contributor Author

Hmm, is there a way to avoid adding the string to the StringPool? We don't want to store every single header ever sent/received. That's a vector for denial of service through resource depletion.

@straight-shoota
Copy link
Member

Not yet, but there's a feature request: #11217

src/http/common.cr Outdated Show resolved Hide resolved
jgaskins and others added 2 commits April 19, 2024 16:54
Rather than creating the hash in 2 steps, this commit creates it in 1.

Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
These are no longer order-dependent since we're using a hash now.
@straight-shoota
Copy link
Member

@jgaskins StringPool#get? is now available (#14508) if you want to try this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants