test: TestIssue2746 #5022

xiang90 · 2016-04-09T02:12:28Z

=== RUN   TestIssue2746
--- FAIL: TestIssue2746 (1.67s)
    cluster_test.go:360: #1: watch on http://127.0.0.1:20114 error: client: etcd cluster is unavailable or misconfigured

The text was updated successfully, but these errors were encountered:

xiang90 · 2016-04-22T04:03:04Z

Not able to reproduce... Will try more...

AkihiroSuda · 2016-04-22T04:37:43Z

Still reproducible (less than 1%) with the latest version (d32113a) on my machine (Xeon E3, 4 cores)

xiang90 · 2016-04-22T05:14:58Z

@AkihiroSuda

Can you type assert that error to client.ClusterError and print out its detail? (https://github.com/coreos/etcd/blob/master/client/cluster_error.go#L19-L33)

AkihiroSuda · 2016-04-22T06:32:24Z

I got this ClusterError.

--- FAIL: TestIssue2746 (6.36s)
        cluster_test.go:351: create on http://127.0.0.1:20950 error: client: etcd cluster is unavailable or misconfigured(detail: error #0: read tcp 127.0.0.1:49676->127.0.0.1:20950: i/o timeout

Note that this error is raised from a slightly different point than a original point.

diff --git a/integration/cluster_test.go b/integration/cluster_test.go
index 4d7e9e0..c1be43d 100644
--- a/integration/cluster_test.go
+++ b/integration/cluster_test.go
@@ -347,7 +347,8 @@ func clusterMustProgress(t *testing.T, membs []*member) {
        key := fmt.Sprintf("foo%d", rand.Int())
        resp, err := kapi.Create(ctx, "/"+key, "bar")
        if err != nil {
-               t.Fatalf("create on %s error: %v", membs[0].URL(), err)
+               cerr := err.(*client.ClusterError)
+               t.Fatalf("create on %s error: %v(detail: %s)", membs[0].URL(), err, cerr.Detail())
        }
        cancel()

@@ -357,7 +358,9 @@ func clusterMustProgress(t *testing.T, membs []*member) {
                mkapi := client.NewKeysAPI(mcc)
                mctx, mcancel := context.WithTimeout(context.Background(), requestTimeout)
                if _, err := mkapi.Watcher(key, &client.WatcherOptions{AfterIndex: resp.Node.ModifiedIndex - 1}).Next(mctx); err != nil {
-                       t.Fatalf("#%d: watch on %s error: %v", i, u, err)
+                       cerr := err.(*client.ClusterError)
+                       t.Fatalf("#%d: watch on %s error: %v(detail: %s)", i, u, err, cerr.Detail())
+
                }
                mcancel()
        }

xiang90 · 2016-05-16T17:38:22Z

@heyitsanthony Can you take this over? I cannot reproduce this on my local machine :(. Thanks!

Because of leader absence, TestIssue2746 fails occasionally. For fixing the problem, this commit lets the test call waitLeader() before sending requests. The test failure is fixed partially. It is because the campaign can happen during testing (not initialization phase). For handling it, we would need to let clients retry the request. Partially fixes etcd-io#5022

heyitsanthony · 2016-06-01T23:02:12Z

ETCD_ELECTION_TIMEOUT_TICKS wasn't set in semaphore like travis so it was triggering a new election which was causing the lost leader to drop messages. I tried to repro with the election ticks set to 600 and it seemed to work OK. Updated semaphore and marking this as closed.

xiang90 added the area/testing label Apr 9, 2016

xiang90 self-assigned this Apr 21, 2016

xiang90 added this to the v3.0.0 milestone Apr 29, 2016

xiang90 mentioned this issue May 13, 2016

integration: make read/write io timeout longer #5350

Closed

heyitsanthony assigned heyitsanthony and unassigned xiang90 May 16, 2016

This was referenced May 19, 2016

integration: call waitLeader() before sending requests in TestIssue2746 #5395

Closed

Evaluation of process inspector osrg/namazu#125

Open

heyitsanthony mentioned this issue Jun 1, 2016

integration: make clusterMustProgress retry on lost leader #5522

Closed

heyitsanthony closed this as completed Jun 1, 2016

philips unassigned heyitsanthony Aug 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: TestIssue2746 #5022

test: TestIssue2746 #5022

xiang90 commented Apr 9, 2016

xiang90 commented Apr 22, 2016

AkihiroSuda commented Apr 22, 2016

xiang90 commented Apr 22, 2016

AkihiroSuda commented Apr 22, 2016

xiang90 commented May 16, 2016

heyitsanthony commented Jun 1, 2016

test: TestIssue2746 #5022

test: TestIssue2746 #5022

Comments

xiang90 commented Apr 9, 2016

xiang90 commented Apr 22, 2016

AkihiroSuda commented Apr 22, 2016

xiang90 commented Apr 22, 2016

AkihiroSuda commented Apr 22, 2016

xiang90 commented May 16, 2016

heyitsanthony commented Jun 1, 2016