Replaying requests for large files #217

ironsmile · 2015-10-08T16:29:16Z

I wanted to use gor for testing a HTTP server while mirroring real traffic. Unfortunately the traffic was for relatively big media files (from 100MB up to few GBs). In that case the buffer for reading the response body was too small and gor replayer was failing to replicate the load which real users are generating on the mirrored server.

An other limitation was the fixed deadline for reading the response body. With very large files a timeout of 5s, 30s or whatever is not practical as very big files can take some time to be downloaded. On the other hand a relatively small timeout is required to stop stalled connections which do not move any data around anymore. Maybe because of a faulty HTTP server under test, network problems or something else.

This pull request aims to remedy both of the problems. It does it by

Reading the "whole" body of the response. Or at least trying to do so. The reading is done in small chunks which are discarded instantly in order not to consume too much memory for large responses. An upper limit of the read data is still present to make sure a faulty server will not create an never ending chunked response or some other problem of the kind.
Instead of a fixed deadline, a rolling timeout is introduced while reading the response. The connection will timeout only if there is no network activity at all for the set amount of time.
The behaviour regarding small files is not changed in any way.
Fixes few crashes which were happening from time to time during prolonged tests.

Every packet which had 4 bytes of data or less was causing a panic - slice bounds out of range. Closes #215

This makes sure replaying with big responses work well. Without reading the whole response the server under test was behaving differently from the one which traffic is mirrored.

The --output-http-timeout option mieaning is changed. Previously it was a hard limit for which the whole body should have been read. But most of the time this does not work for large bodies. Tho timeout is now set before every Read operation effectively meaning the body will continue to be read as long as there is any data flowing. The timeout will be triggered only after an period of inactivity is reached.

houndci-bot · 2015-10-08T16:29:23Z

http_client.go

+		}
+		toConn.readTimeout = c.config.Timeout
+
+		var readBytes int64 = 0


should drop = 0 from declaration of var readBytes; it is the zero value

You're right. Forgot to run go vet against it.

ironsmile · 2015-10-08T16:35:13Z

The tests failed. This happened for me too few times. But other times they all pass. Should I look into the tests as well?

Unreachable code and default value for the int64 type.

buger · 2015-10-08T17:15:32Z

Thanks for doing it! Can you clarify how you use response from http client? input-raw is unable to handle such big files by design, so it means you using another input? Basically pls describe how you run gor.

ironsmile · 2015-10-08T18:10:56Z

Well, the set up is this:

A production machine which serves big files has an gor instance running with

gor --http-original-host --input-raw :80 -output-tcp-stats --output-tcp <replayer:port>|20%

An other machine, the replayer, is running

gor --http-original-host --input-tcp <replayer:port> -output-http-timeout 10s --output-http <server-under-test:port>

As far as I can tell it works great with more than 5 Gbps of traffic and the replayer is not struggling at all.

UPDATE:
I have some more numbers now. -input-raw may be missing some of the requests but those it catches are good enough for me. In my setup the production server has about 9k concurrent connections and is serving about 8GBps. The server under test receives about 300 concurrent connections. This means -input-raw is able to relay about 16% of the supposed requests, right? Because 20% of 9000 is 1800 and 300 is 16% of 1800.

buger · 2015-10-08T18:31:06Z

Well its not really about number of requests, but their consistency. Payloads larger then 200kb (or those who have more then 3 packets (64kb each), have big chances for corruption. See #167

ironsmile · 2015-10-08T18:42:47Z

Unfortunatelly our users do not do POST requests. Pretty much all of the requests are GETs and HEADs which do not have a body. So far I haven't noticed any broken requests, in HTTP return code 400 sense.

Maybe we can devise some test for consistency? And try it this way?

buger · 2015-10-08T18:59:21Z

Then i do not really get what reading of whole response gives to you. Can you clarify?

ironsmile · 2015-10-08T19:18:39Z

The gor replayer is reading the whole response from the HTTP request it makes to its --output-http. Previously it was closing the connection without reading the whole body.

buger · 2015-10-09T06:49:06Z

So main problem for you that it closes connection, and not properly emulate user behavior, right?

Sorry if i'm too pedantic :)

ironsmile · 2015-10-09T07:41:53Z

Yes, this is my problem which I am trying to solve with this pull request. This is not a small change in the behaviour of the software. You are right in your effort to understand it completely.

buger · 2015-10-09T11:41:18Z

I'm not sure if custom conn timeout implementation needed here, few lines above it sets timeout c.conn.SetReadDeadline(timeout). As you mentioned it is fixed, but when you read in batches using io.CopyN, it should apply separate timeout for each, chunk, so it will not be really fixed, right?

ironsmile · 2015-10-09T12:47:59Z

The custom timeout is required. The documentation of net.Conn about deadlines read this:

        // A deadline is an absolute time after which I/O operations
        // fail with a timeout (see type Error) instead of
        // blocking. The deadline applies to all future I/O, not just
        // the immediately following call to Read or Write.

An absolute time deadline is set for all I/O no matter how many times Read is called (io.CopyN uses the connection's Read method). This means that if now is 15:30:05 and the deadline is set to 15:30:10 any Reads after 15:30:10 will result in a timeout error. If reading the whole body takes more than 5 seconds, then the first read after 15:30:10 will timeout.

buger · 2015-10-09T12:49:36Z

I see, but you can just call c.conn.SetReadDeadline(timeout) inside for after each io.CopyN call, then it will be reseted each time.

ironsmile · 2015-10-09T12:58:10Z

Yes, the c.conn.SetReadDeadline call on line 47 is the thing which makes sure reading the body will not timeout unless there is no network activity at all.

Read again the second bullet in the pull request description and my last comment. Or maybe you think this approach is not enough? Or does not work as intended?

buger · 2015-10-09T13:06:30Z

I see what you mean, but in my view inlining it in for loop will make things easier, like this:

for {
    if n, err := io.CopyN(ioutil.Discard, c.conn, readChunkSize); err == io.EOF {
        break
    } else if err != nil {
        Debug("[HTTPClient] Read the whole body error:", err, c.baseURL)
        break
    } else {
        readBytes += n

        // Setting new timeout for next chunk
        timeout = time.Now().Add(c.config.Timeout)
        c.conn.SetReadDeadline(timeout)
    }

    if readBytes >= maxResponseSize {
        Debug("[HTTPClient] Body is more than the max size", maxSizeFromBody,
            c.baseURL)
        break
    }
}

buger · 2015-10-09T13:26:02Z

output_http.go

@@ -184,6 +184,12 @@ func (o *HTTPOutput) Read(data []byte) (int, error) {

 func (o *HTTPOutput) sendRequest(client *HTTPClient, request []byte) {
 	meta := payloadMeta(request)
+
+	if len(meta) < 2 {


Meta is constructed manually, and always should be (type, uuid, time), do you know a repeatable case when meta gets corrupted (because it never should) ?

It does happen fairly often my setup. Normally it takes from few minutes to few hours to happen. I was not able to reproduce it in an automatic test.

But I have stack traces from crashes and can reproduce it easily.

ironsmile · 2015-10-09T13:33:02Z

Ah, I see. I didn't understand you before. Yes, it does make it a lot easier to read. Will amend the pull request shortly.

buger · 2015-11-18T10:29:05Z

@ironsmile ping?:)

salimane · 2016-03-15T15:10:13Z

any update on this ?

buger · 2016-05-19T05:32:48Z

Fixed in #277

ironsmile added 6 commits September 30, 2015 11:38

Fixes a panic in TCPMessage with small packets

75060a4

Every packet which had 4 bytes of data or less was causing a panic - slice bounds out of range. Closes #215

Packets with less than 4 bytes are not multipart

679c291

Read the whole body of the HTTP responses

a76ef26

This makes sure replaying with big responses work well. Without reading the whole response the server under test was behaving differently from the one which traffic is mirrored.

Fixes an index-out-of-bound crash

5596eb4

Fixes a bug in the wrapper Reader

3ef2a49

houndci-bot reviewed Oct 8, 2015
View reviewed changes

Fixes few thigs go vet found

8075105

Unreachable code and default value for the int64 type.

buger reviewed Oct 9, 2015
View reviewed changes

buger mentioned this pull request May 19, 2016

Improve Http client compilance #277

Merged

buger closed this in #277 May 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaying requests for large files #217

Replaying requests for large files #217

ironsmile commented Oct 8, 2015

houndci-bot Oct 8, 2015

ironsmile Oct 8, 2015

ironsmile commented Oct 8, 2015

buger commented Oct 8, 2015

ironsmile commented Oct 8, 2015

buger commented Oct 8, 2015

ironsmile commented Oct 8, 2015

buger commented Oct 8, 2015

ironsmile commented Oct 8, 2015

buger commented Oct 9, 2015

ironsmile commented Oct 9, 2015

buger commented Oct 9, 2015

ironsmile commented Oct 9, 2015

buger commented Oct 9, 2015

ironsmile commented Oct 9, 2015

buger commented Oct 9, 2015

buger Oct 9, 2015

ironsmile Oct 9, 2015

ironsmile commented Oct 9, 2015

buger commented Nov 18, 2015

salimane commented Mar 15, 2016

buger commented May 19, 2016

Replaying requests for large files #217

Replaying requests for large files #217

Conversation

ironsmile commented Oct 8, 2015

houndci-bot Oct 8, 2015

Choose a reason for hiding this comment

ironsmile Oct 8, 2015

Choose a reason for hiding this comment

ironsmile commented Oct 8, 2015

buger commented Oct 8, 2015

ironsmile commented Oct 8, 2015

buger commented Oct 8, 2015

ironsmile commented Oct 8, 2015

buger commented Oct 8, 2015

ironsmile commented Oct 8, 2015

buger commented Oct 9, 2015

ironsmile commented Oct 9, 2015

buger commented Oct 9, 2015

ironsmile commented Oct 9, 2015

buger commented Oct 9, 2015

ironsmile commented Oct 9, 2015

buger commented Oct 9, 2015

buger Oct 9, 2015

Choose a reason for hiding this comment

ironsmile Oct 9, 2015

Choose a reason for hiding this comment

ironsmile commented Oct 9, 2015

buger commented Nov 18, 2015

salimane commented Mar 15, 2016

buger commented May 19, 2016