Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP::Client hangs on www #8770

Closed
hah opened this issue Feb 9, 2020 · 9 comments
Closed

HTTP::Client hangs on www #8770

hah opened this issue Feb 9, 2020 · 9 comments

Comments

@hah
Copy link

hah commented Feb 9, 2020

When I try send a get request to certain sites using www it simply hangs

eg.:

require "http/client"
response = HTTP::Client.get "https://www.adidas.com/"
puts response.status_code

but this works without a problem:

require "http/client"
response = HTTP::Client.get "https://adidas.com/"
puts response.status_code

even though as soon as I try to get the redirect location from the response header it hangs again since the site redirects to www

I haven't had time yet to debug the issue, but weirdly it's not happening on all sites. (youtube redirects to www as well, and it's working fine)

os: macOS 10.15.2
crystal: 0.32.1

edit: after a while it died with a time out:

Unhandled exception: Error reading socket: Operation timed out (Errno)
  from /usr/local/Cellar/crystal/0.32.1/src/socket.cr:61:9 in 'unbuffered_read'
  from /usr/local/Cellar/crystal/0.32.1/src/io/buffered.cr:79:16 in 'read'
  from /usr/local/Cellar/crystal/0.32.1/src/openssl/bio.cr:46:13 in '->'
  from bio_read_intern
  from BIO_read
  from ssl3_read_n
  from ssl3_get_record
  from ssl3_read_bytes
  from ssl3_read_internal
  from SSL_read
  from /usr/local/Cellar/crystal/0.32.1/src/openssl/ssl/socket.cr:116:5 in 'unbuffered_read'
  from /usr/local/Cellar/crystal/0.32.1/src/io/buffered.cr:214:12 in 'fill_buffer'
  from /usr/local/Cellar/crystal/0.32.1/src/io/buffered.cr:102:7 in 'peek'
  from /usr/local/Cellar/crystal/0.32.1/src/io.cr:632:37 in 'gets'
  from /usr/local/Cellar/crystal/0.32.1/src/io.cr:591:5 in 'gets'
  from /usr/local/Cellar/crystal/0.32.1/src/http/client/response.cr:127:5 in 'from_io?'
  from /usr/local/Cellar/crystal/0.32.1/src/http/client.cr:594:5 in 'exec_internal_single'
  from /usr/local/Cellar/crystal/0.32.1/src/http/client.cr:580:5 in 'exec_internal'
  from /usr/local/Cellar/crystal/0.32.1/src/http/client.cr:576:5 in 'exec'
  from /usr/local/Cellar/crystal/0.32.1/src/http/client.cr:698:5 in 'exec'
  from /usr/local/Cellar/crystal/0.32.1/src/http/client.cr:730:7 in 'exec'
  from /usr/local/Cellar/crystal/0.32.1/src/http/client.cr:402:3 in 'get'
  from tls.cr:5:1 in '__crystal_main'
  from /usr/local/Cellar/crystal/0.32.1/src/crystal/main.cr:97:5 in 'main_user_code'
  from /usr/local/Cellar/crystal/0.32.1/src/crystal/main.cr:86:7 in 'main'
  from /usr/local/Cellar/crystal/0.32.1/src/crystal/main.cr:106:3 in 'main'
@jgillich
Copy link
Contributor

jgillich commented Feb 9, 2020

Adidas have lots of measures against automated crawlers, it's possible they're just not responding (I tried to do some scraping on their site a while back). Might want to try a different user agent. Did you also experience this with other sites?

@hah
Copy link
Author

hah commented Feb 9, 2020

I've tried to curl it, it gave me instantly a 403 page so I've added an user agent and it worked
so techincally this should work:

require "http/client"

headers = HTTP::Headers.new
headers["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36"

response = HTTP::Client.get("https://www.adidas.com/", headers: headers)
puts response.status_code

but the issue is the same

@asterite
Copy link
Member

asterite commented Feb 9, 2020

So adding a user agent makes it work? I think that's expected.

@hah
Copy link
Author

hah commented Feb 9, 2020

No it does not. It only works via curl

@jkthorne
Copy link
Contributor

So when using curl it seems like the server is responding in http2. I think you will have to force http1.1 if you want the same results to compare to your crystal code. Also https://adidas.com/ returns a 301 to https://www.adidas.com/ so there should be no body.

@rdp
Copy link
Contributor

rdp commented Feb 10, 2020

this hangs for me: curl -v --http1.1 https://www.adidas.com/ I'd blame adidas on this one...though maybe crystal should support http/2? Is there an issue for that?

@straight-shoota
Copy link
Member

@rdp #2125

@asterite
Copy link
Member

I think we can close this, it seems to be an issue with Adidas' website not supporting 1.1 or something like that.

HTTP/2 is very hard to implement but it's a separate issue.

@jkthorne
Copy link
Contributor

Here is an implementation of HTTP2. https://github.com/ysbaddaden/http2

Here is the client class. It does not and docs so there might be some trial and error. https://github.com/ysbaddaden/http2/blob/5db786ee42f9193894720f3bb9eefdb24038d83e/client.cr#L6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants