-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection closed too early on some websites #30
Comments
I've tried setting really high timeouts for both connection and read (5_000_000) but the issue persists. |
5_000_000 is a high timeout indeed. #32 might do something to address this, but it seems strange an infinite timeout would be any better than 5_000_000. |
I've tried and failed to reproduce this. As of right now, on my Ubuntu 20.04 system,
I've also adapted one of the unittests from #67 to dribble out data every 10ms for 300 seconds, and found that ureq successfully fetches the whole thing. @Shnatsel, does this still reproduce for you? Do you have a more self-contained test case? |
I suppose that website has fixed their weird behavior, whatever it was. I've discovered this issue by downloading frontpages of the top million websites with a bunch of different HTTP clients and comparing the results. A more thorough writeup can be found here. I suppose that's the easiest way to discover cases when it happens, and it might also uncover other issues. |
Good plan, I've been meaning to put together a tool to do that. I didn't see a link to your code in that blog post - if you still have it around, you could save me a bit of time. |
https://gist.github.com/deltaphc/2949ed292c7d1169e744e5ffa7fd0687 invocation looked like this: where “fetch-one-url.sh” sets I’ve also tried setting |
Oh and I've compiled with Address Sanitizer of course. Nightly Rust, |
Thanks for the tips on compile flags. I've set up a test harness and am running a similar test over the Tranco list. Once I've gotten through a run of that, I'll do the Majestic Million as well. It's already yielded useful fruit like #84. For the specific test code you shared:
This doesn't call res.into_read(), so it's not reading the body. That would produce the behavior you describe - the program would exit, successfully, as soon as it received headers, rather than waiting to read the whole (slow) body. |
By the way, I've added the smoke test code I used at #85 if you'd like to try it out. |
Ah, that's probably a bug in my code then, and not in ureq itself. Closing. |
FYI I've run a smoke test across the Tranco list and the Majestic Million with RUSTFLAGS="-Zsanitizer=address -C debug-assertions -C overflow-checks --target x86_64-unknown-linux-gnu", and had no panics or crashes on the latest release. |
Some websites transmit data really slowly, but ureq drops the connection almost immediately after establishing it, without actually downloading the content, and without reporting an error either.
Example of where this happens is
7911game.com
(warning: malicious website, ships some kind of VBScript so I assume it exploits Internet Explorer).curl
takes a long time to download it and loads it gradually, as does https://github.com/jayjamesjay/http_reqThe text was updated successfully, but these errors were encountered: