-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout hang bug #168
Timeout hang bug #168
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of questions, but basically LGTM! 🎉
@@ -26,6 +26,11 @@ type Toxic interface { | |||
Pipe(*ToxicStub) | |||
} | |||
|
|||
type CleanupToxic interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clean approach. I like it.
// Drop the data on the ground. | ||
} | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would there not be a timeout? This is essentially a "drop data" proxy in that case, right? Perhaps it should be created as a different proxy and have a different name? We could also just alias the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is keeping consistent with the previous implementation of the timeout
toxic. Where a 0
timeout value immediately stops all data.
I agree that the name timeout
might be confusing. I would propose we open an issue unless you are confident in how this should be renamed / changed.
toxics/timeout_test.go
Outdated
proxy.Toxics.AddToxicJson(ToxicToJson(t, "to_delete", "timeout", "upstream", &toxics.TimeoutToxic{Timeout: 0})) | ||
|
||
serverConnRecv := make(chan net.Conn) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It almost seems at this point we need a helper for the pattern of starting a server and accepting connections, as it's somewhat common and should be encouraged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'm going to refactor this before shipping.
toxics/timeout_test.go
Outdated
t.Fatal("Unable to write to proxy", err) | ||
} | ||
|
||
time.Sleep(1 * time.Second) // Shitty sync waiting for latency toxi to get data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary for waiting for the latency toxic to get data to shut it down? Isn't this a race that could have problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This frustrates me.
Our bug only occurs when the latency
toxic is blocked attempting to send data to the timeout
toxic. Without the sleep, the test removes the latency
toxic before it receives data which means the bug does not occur because the latency
toxic is not blocking.
I can't think of a non racy way to wait for the latency
toxic to receive input data. We can't wait for the upstream to receive the data because the timeout
toxic is blocking data from reaching the upstream.
// Drop the data on the ground. | ||
} | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is keeping consistent with the previous implementation of the timeout
toxic. Where a 0
timeout value immediately stops all data.
I agree that the name timeout
might be confusing. I would propose we open an issue unless you are confident in how this should be renamed / changed.
toxics/timeout_test.go
Outdated
proxy.Toxics.AddToxicJson(ToxicToJson(t, "to_delete", "timeout", "upstream", &toxics.TimeoutToxic{Timeout: 0})) | ||
|
||
serverConnRecv := make(chan net.Conn) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'm going to refactor this before shipping.
toxics/timeout_test.go
Outdated
t.Fatal("Unable to write to proxy", err) | ||
} | ||
|
||
time.Sleep(1 * time.Second) // Shitty sync waiting for latency toxi to get data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This frustrates me.
Our bug only occurs when the latency
toxic is blocked attempting to send data to the timeout
toxic. Without the sleep, the test removes the latency
toxic before it receives data which means the bug does not occur because the latency
toxic is not blocking.
I can't think of a non racy way to wait for the latency
toxic to receive input data. We can't wait for the upstream to receive the data because the timeout
toxic is blocking data from reaching the upstream.
link_test.go
Outdated
n, err = link.output.Read(buf) | ||
if n != 0 || err != io.EOF { | ||
t.Fatalf("Read did not get EOF: %d %v", n, err) | ||
done := make(chan struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also going to refactor this into a timeout test helper.
This is a proposed fix to the API hang bug from issue #159.
Overview of what this PR does.
timeout
toxic to drop incoming tcp data on the ground.timeout
toxic implementsCleanupToxic
and closes it's connection on cleanup to avoid stream corruption because of dropped data.Cleanup
when a toxic is deleted and it implements theCleanupToxic
interface.timeout
toxic is deleted.TLDR of Bug (Described in issue #159.)
timeout
toxic attempt to forward TCP data to thetimeout
toxic.timeout
toxic does not read incoming data, these forwarding toxics block forever.timeout
toxic.TLDR of Fix (Described in comment #159 (comment).)
We make the
timeout
toxic read incoming data so that writers don't block forever. Thetimeout
toxic then ignores this data ("drops it on the ground"). Because the data is dropped on the ground, this corrupts the TCP stream so the connection needs to be closed when thetimeout
toxic is deleted.Does this break backwards compatibility?
Yes but not in realistic use cases. If you remove a
timeout
toxic, connections to this proxy will be closed. Before this change, data blocked by thetimeout
toxic would be forwarded after its deletion. (Assuming this hang bug was not encountered.). IMO the new behaviour is the correct behaviour becausetimeout
implies the connection will be closed. If you want long delays, use a large value on alatency
toxic.