Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection closed too early on some websites #30

Closed
Shnatsel opened this issue Jan 15, 2020 · 11 comments
Closed

Connection closed too early on some websites #30

Shnatsel opened this issue Jan 15, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@Shnatsel
Copy link

Some websites transmit data really slowly, but ureq drops the connection almost immediately after establishing it, without actually downloading the content, and without reporting an error either.

Example of where this happens is 7911game.com (warning: malicious website, ships some kind of VBScript so I assume it exploits Internet Explorer). curl takes a long time to download it and loads it gradually, as does https://github.com/jayjamesjay/http_req

@Shnatsel
Copy link
Author

I've tried setting really high timeouts for both connection and read (5_000_000) but the issue persists.

@lolgesten
Copy link
Contributor

5_000_000 is a high timeout indeed. #32 might do something to address this, but it seems strange an infinite timeout would be any better than 5_000_000.

@algesten algesten added the bug Something isn't working label Feb 6, 2020
@jsha
Copy link
Collaborator

jsha commented Jun 20, 2020

I've tried and failed to reproduce this. As of right now, on my Ubuntu 20.04 system, 7911game.com produces connection timeouts with both ureq and curl:

$ time cargo run http://7911game.com
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/curl3 'http://7911game.com'`
Error: Oops("Connection Failed: Connection timed out (os error 110)")

real    2m10.222s
user    0m0.036s
sys     0m0.018s
$ time curl http://7911game.com -i -vvv
*   Trying 39.96.202.49:80...
* TCP_NODELAY set
* connect to 39.96.202.49 port 80 failed: Connection timed out
* Failed to connect to 7911game.com port 80: Connection timed out
* Closing connection 0
curl: (28) Failed to connect to 7911game.com port 80: Connection timed out

real    2m11.390s
user    0m0.000s
sys     0m0.016s

I've also adapted one of the unittests from #67 to dribble out data every 10ms for 300 seconds, and found that ureq successfully fetches the whole thing.

@Shnatsel, does this still reproduce for you? Do you have a more self-contained test case?

@Shnatsel
Copy link
Author

I suppose that website has fixed their weird behavior, whatever it was.

I've discovered this issue by downloading frontpages of the top million websites with a bunch of different HTTP clients and comparing the results. A more thorough writeup can be found here.

I suppose that's the easiest way to discover cases when it happens, and it might also uncover other issues.

@jsha
Copy link
Collaborator

jsha commented Jun 20, 2020

Good plan, I've been meaning to put together a tool to do that. I didn't see a link to your code in that blog post - if you still have it around, you could save me a bit of time.

@Shnatsel
Copy link
Author

Shnatsel commented Jun 21, 2020

https://gist.github.com/deltaphc/2949ed292c7d1169e744e5ffa7fd0687

invocation looked like this: cut -d , -f 3 majestic_million.csv | parallel -j 50 ./fetch-one-url.sh

where “fetch-one-url.sh” sets RUST_BACKTRACE=1, calls the Rust binary wrapped in timeout and sorts the outputs into folders depending on exit code. I don't have the contents of that script anymore.

I’ve also tried setting -C debug-assertions -C overflow-checks rustc flags when compiling, but fortunately that didn’t reveal anything interesting back then. It's probably a good idea to set them nevertheless.

@Shnatsel
Copy link
Author

Oh and I've compiled with Address Sanitizer of course. Nightly Rust, -Zsanitizer=address

@jsha
Copy link
Collaborator

jsha commented Jun 23, 2020

Thanks for the tips on compile flags. I've set up a test harness and am running a similar test over the Tranco list. Once I've gotten through a run of that, I'll do the Majestic Million as well. It's already yielded useful fruit like #84.

For the specific test code you shared:

    let res = ureq::get(&url)
        .timeout_connect(5_000)
        .call();

    if res.error() {
        println!("{:?}", res);
    }
    println!("\n\nDone.");

This doesn't call res.into_read(), so it's not reading the body. That would produce the behavior you describe - the program would exit, successfully, as soon as it received headers, rather than waiting to read the whole (slow) body.

@jsha
Copy link
Collaborator

jsha commented Jun 23, 2020

By the way, I've added the smoke test code I used at #85 if you'd like to try it out.

@Shnatsel
Copy link
Author

Ah, that's probably a bug in my code then, and not in ureq itself. Closing.

@jsha
Copy link
Collaborator

jsha commented Jun 28, 2020

FYI I've run a smoke test across the Tranco list and the Majestic Million with RUSTFLAGS="-Zsanitizer=address -C debug-assertions -C overflow-checks --target x86_64-unknown-linux-gnu", and had no panics or crashes on the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants