Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: TestIssue2746 #5022

Closed
xiang90 opened this issue Apr 9, 2016 · 6 comments
Closed

test: TestIssue2746 #5022

xiang90 opened this issue Apr 9, 2016 · 6 comments
Milestone

Comments

@xiang90
Copy link
Contributor

xiang90 commented Apr 9, 2016

=== RUN   TestIssue2746
--- FAIL: TestIssue2746 (1.67s)
    cluster_test.go:360: #1: watch on http://127.0.0.1:20114 error: client: etcd cluster is unavailable or misconfigured
@xiang90 xiang90 self-assigned this Apr 21, 2016
@xiang90
Copy link
Contributor Author

xiang90 commented Apr 22, 2016

Not able to reproduce... Will try more...

@AkihiroSuda
Copy link
Contributor

Still reproducible (less than 1%) with the latest version (d32113a) on my machine (Xeon E3, 4 cores)

@xiang90
Copy link
Contributor Author

xiang90 commented Apr 22, 2016

@AkihiroSuda

Can you type assert that error to client.ClusterError and print out its detail? (https://github.com/coreos/etcd/blob/master/client/cluster_error.go#L19-L33)

@AkihiroSuda
Copy link
Contributor

I got this ClusterError.

--- FAIL: TestIssue2746 (6.36s)
        cluster_test.go:351: create on http://127.0.0.1:20950 error: client: etcd cluster is unavailable or misconfigured(detail: error #0: read tcp 127.0.0.1:49676->127.0.0.1:20950: i/o timeout

Note that this error is raised from a slightly different point than a original point.

diff --git a/integration/cluster_test.go b/integration/cluster_test.go
index 4d7e9e0..c1be43d 100644
--- a/integration/cluster_test.go
+++ b/integration/cluster_test.go
@@ -347,7 +347,8 @@ func clusterMustProgress(t *testing.T, membs []*member) {
        key := fmt.Sprintf("foo%d", rand.Int())
        resp, err := kapi.Create(ctx, "/"+key, "bar")
        if err != nil {
-               t.Fatalf("create on %s error: %v", membs[0].URL(), err)
+               cerr := err.(*client.ClusterError)
+               t.Fatalf("create on %s error: %v(detail: %s)", membs[0].URL(), err, cerr.Detail())
        }
        cancel()

@@ -357,7 +358,9 @@ func clusterMustProgress(t *testing.T, membs []*member) {
                mkapi := client.NewKeysAPI(mcc)
                mctx, mcancel := context.WithTimeout(context.Background(), requestTimeout)
                if _, err := mkapi.Watcher(key, &client.WatcherOptions{AfterIndex: resp.Node.ModifiedIndex - 1}).Next(mctx); err != nil {
-                       t.Fatalf("#%d: watch on %s error: %v", i, u, err)
+                       cerr := err.(*client.ClusterError)
+                       t.Fatalf("#%d: watch on %s error: %v(detail: %s)", i, u, err, cerr.Detail())
+
                }
                mcancel()
        }

@xiang90
Copy link
Contributor Author

xiang90 commented May 16, 2016

@heyitsanthony Can you take this over? I cannot reproduce this on my local machine :(. Thanks!

mitake added a commit to mitake/etcd that referenced this issue May 19, 2016
Because of leader absence, TestIssue2746 fails occasionally. For
fixing the problem, this commit lets the test call waitLeader() before
sending requests.

The test failure is fixed partially. It is because the campaign can
happen during testing (not initialization phase). For handling it,
we would need to let clients retry the request.

Partially fixes etcd-io#5022
@heyitsanthony
Copy link
Contributor

ETCD_ELECTION_TIMEOUT_TICKS wasn't set in semaphore like travis so it was triggering a new election which was causing the lost leader to drop messages. I tried to repro with the election ticks set to 600 and it seemed to work OK. Updated semaphore and marking this as closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants