Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: initial read index implementation #6212

Merged
merged 1 commit into from Sep 27, 2016

Conversation

xiang90
Copy link
Contributor

@xiang90 xiang90 commented Aug 18, 2016

The actual readindex implementation in raft still depends on clock.

But this is pretty much what i would expect in etcdserver.

@gyuho Can you please do a benchmark for this? See if there is any perf improvement?

@xiang90
Copy link
Contributor Author

xiang90 commented Aug 18, 2016

on my local machine, there is a 500% improvement.

base s-read:

./benchmark --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 range --total=100000 --consistency=s foo
bench with serializable range
 100000 / 100000 Booooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%7s

Summary:
  Total:        7.4628 secs.
  Slowest:      0.1260 secs.
  Fastest:      0.0002 secs.
  Average:      0.0047 secs.
  Stddev:       0.0050 secs.
  Requests/sec: 13399.8265

Response time histogram:
  0.000 [1]     |
  0.013 [95488] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.025 [3483]  |∎
  0.038 [639]   |
  0.050 [247]   |
  0.063 [119]   |
  0.076 [10]    |
  0.088 [3]     |
  0.101 [6]     |
  0.113 [2]     |
  0.126 [2]     |

Latency distribution:
  10% in 0.0009 secs.
  25% in 0.0020 secs.
  50% in 0.0037 secs.
  75% in 0.0057 secs.
  90% in 0.0086 secs.
  95% in 0.0122 secs.
  99% in 0.0258 secs.

before

./benchmark --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 range --total=100000 foo
bench with linearizable range
 100000 / 100000 Boooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%47s

Summary:
  Total:        47.4401 secs.
  Slowest:      0.1776 secs.
  Fastest:      0.0089 secs.
  Average:      0.0472 secs.
  Stddev:       0.0185 secs.
  Requests/sec: 2107.9199

Response time histogram:
  0.009 [1]     |
  0.026 [3574]  |∎∎∎
  0.043 [40043] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.060 [39865] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.076 [11188] |∎∎∎∎∎∎∎∎∎∎∎
  0.093 [2753]  |∎∎
  0.110 [974]   |
  0.127 [637]   |
  0.144 [740]   |
  0.161 [158]   |
  0.178 [67]    |

Latency distribution:
  10% in 0.0292 secs.
  25% in 0.0323 secs.
  50% in 0.0472 secs.
  75% in 0.0549 secs.
  90% in 0.0656 secs.
  95% in 0.0773 secs.
  99% in 0.1266 secs.

after

./benchmark --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 range --total=100000 foo
bench with linearizable range
 100000 / 100000 Booooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%8s

Summary:
  Total:        8.8687 secs.
  Slowest:      0.1732 secs.
  Fastest:      0.0002 secs.
  Average:      0.0064 secs.
  Stddev:       0.0061 secs.
  Requests/sec: 11275.5991

Response time histogram:
  0.000 [1]     |
  0.017 [96911] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.035 [2552]  |∎
  0.052 [402]   |
  0.069 [67]    |
  0.087 [14]    |
  0.104 [17]    |
  0.121 [1]     |
  0.139 [1]     |
  0.156 [0]     |
  0.173 [34]    |

Latency distribution:
  10% in 0.0016 secs.
  25% in 0.0030 secs.
  50% in 0.0053 secs.
  75% in 0.0080 secs.
  90% in 0.0114 secs.
  95% in 0.0145 secs.
  99% in 0.0272 secs.

@gyuho
Copy link
Contributor

gyuho commented Aug 18, 2016

With patch

$ ./benchmark --endpoints=${HOST_1}:2379,${HOST_2}:2379,${HOST_3}:2379 --clients=100 --conns=100 range --total=100000 --consistency=l foo
bench with linearizable range
 100000 / 100000 Boooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%2s

Summary:
  Total:    2.9057 secs.
  Slowest:  0.0223 secs.
  Fastest:  0.0003 secs.
  Average:  0.0024 secs.
  Stddev:   0.0016 secs.
  Requests/sec: 34414.8033

Response time histogram:
  0.000 [1] |
  0.003 [63837] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.005 [30936] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.007 [2276]  |∎
  0.009 [1982]  |∎
  0.011 [661]   |
  0.014 [183]   |
  0.016 [41]    |
  0.018 [50]    |
  0.020 [20]    |
  0.022 [13]    |

Latency distribution:
  10% in 0.0011 secs.
  25% in 0.0015 secs.
  50% in 0.0021 secs.
  75% in 0.0028 secs.
  90% in 0.0037 secs.
  95% in 0.0049 secs.
  99% in 0.0091 secs.

Without patch
$ ./benchmark --endpoints=${HOST_1}:2379,${HOST_2}:2379,${HOST_3}:2379 --clients=100 --conns=100 range --total=100000 --consistency=l foo
bench with linearizable range
 100000 / 100000 Boooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%6s

Summary:
  Total:    6.6010 secs.
  Slowest:  0.0257 secs.
  Fastest:  0.0018 secs.
  Average:  0.0065 secs.
  Stddev:   0.0024 secs.
  Requests/sec: 15149.2299

Response time histogram:
  0.002 [1] |
  0.004 [16913] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.007 [37914] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.009 [30898] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.011 [11044] |∎∎∎∎∎∎∎∎∎∎∎
  0.014 [2263]  |∎∎
  0.016 [690]   |
  0.019 [189]   |
  0.021 [72]    |
  0.023 [12]    |
  0.026 [4] |

Latency distribution:
  10% in 0.0038 secs.
  25% in 0.0046 secs.
  50% in 0.0062 secs.
  75% in 0.0080 secs.
  90% in 0.0095 secs.
  95% in 0.0106 secs.
  99% in 0.0137 secs.

Great. +2x faster!

@xiang90
Copy link
Contributor Author

xiang90 commented Aug 18, 2016

@gyuho Is this the same setup with https://github.com/coreos/etcd/blob/master/Documentation/op-guide/performance.md?

In the perf doc, we use 1000 clients not 100 clients.

@gyuho
Copy link
Contributor

gyuho commented Aug 18, 2016

@xiang90 It's with slower machines (--custom-cpu=4 --custom-memory=8). I will run more tests with the same environments.

@gyuho
Copy link
Contributor

gyuho commented Aug 18, 2016

For reference, new test results with 8-CPU, 16GB memory machines

Now linearized reads are almost same as serializable reads

benchmark --endpoints=${HOST_1}:2379,${HOST_2}:2379,${HOST_3}:2379 --conns=100 --clients=1000 range foo --total=100000 --consistency=l

bench with linearizable range
 100000 / 100000 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%0s

Summary:
  Total:    0.9486 secs.
  Slowest:  0.0673 secs.
  Fastest:  0.0004 secs.
  Average:  0.0074 secs.
  Stddev:   0.0054 secs.
  Requests/sec: 105416.6610

Response time histogram:
  0.000 [1] |
  0.007 [56413] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.014 [33022] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.020 [7868]  |∎∎∎∎∎
  0.027 [1865]  |∎
  0.034 [565]   |
  0.041 [181]   |
  0.047 [52]    |
  0.054 [14]    |
  0.061 [11]    |
  0.067 [8] |

Latency distribution:
  10% in 0.0021 secs.
  25% in 0.0034 secs.
  50% in 0.0062 secs.
  75% in 0.0099 secs.
  90% in 0.0140 secs.
  95% in 0.0172 secs.
  99% in 0.0262 secs.

bench with linearizable range
 100000 / 100000 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%0s

Summary:
  Total:  0.9435 secs.
  Slowest:  0.0547 secs.
  Fastest:  0.0003 secs.
  Average:  0.0070 secs.
  Stddev: 0.0047 secs.
  Requests/sec: 105991.6876

Response time histogram:
  0.000 [1] |
  0.006 [50145] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.011 [33038] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.017 [12831] |∎∎∎∎∎∎∎∎∎∎
  0.022 [2853]  |∎∎
  0.028 [869] |
  0.033 [181] |
  0.038 [55]  |
  0.044 [21]  |
  0.049 [5] |
  0.055 [1] |

Latency distribution:
  10% in 0.0023 secs.
  25% in 0.0035 secs.
  50% in 0.0058 secs.
  75% in 0.0096 secs.
  90% in 0.0132 secs.
  95% in 0.0157 secs.
  99% in 0.0226 secs.
benchmark --endpoints=${HOST_1}:2379,${HOST_2}:2379,${HOST_3}:2379 --conns=100 --clients=1000 range foo --total=100000 --consistency=s

bench with serializable range
 100000 / 100000 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1s

Summary:
  Total:    1.1777 secs.
  Slowest:  0.2462 secs.
  Fastest:  0.0003 secs.
  Average:  0.0071 secs.
  Stddev:   0.0128 secs.
  Requests/sec: 84911.0008

Response time histogram:
  0.000 [1] |
  0.025 [98661] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.049 [792]   |
  0.074 [96]    |
  0.099 [93]    |
  0.123 [58]    |
  0.148 [21]    |
  0.172 [25]    |
  0.197 [14]    |
  0.222 [158]   |
  0.246 [81]    |

Latency distribution:
  10% in 0.0012 secs.
  25% in 0.0025 secs.
  50% in 0.0047 secs.
  75% in 0.0092 secs.
  90% in 0.0133 secs.
  95% in 0.0164 secs.
  99% in 0.0283 secs.

bench with serializable range
 100000 / 100000 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%0s

Summary:
  Total:  0.8832 secs.
  Slowest:  0.0445 secs.
  Fastest:  0.0003 secs.
  Average:  0.0059 secs.
  Stddev: 0.0045 secs.
  Requests/sec: 113228.7147

Response time histogram:
  0.000 [1] |
  0.005 [51281] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.009 [28305] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.014 [14520] |∎∎∎∎∎∎∎∎∎∎∎
  0.018 [3956]  |∎∎∎
  0.022 [1092]  |
  0.027 [576] |
  0.031 [215] |
  0.036 [47]  |
  0.040 [5] |
  0.044 [2] |

Latency distribution:
  10% in 0.0014 secs.
  25% in 0.0025 secs.
  50% in 0.0046 secs.
  75% in 0.0083 secs.
  90% in 0.0117 secs.
  95% in 0.0142 secs.
  99% in 0.0215 secs.
benchmark --endpoints=${HOST_1}:2379 --conns=1 --clients=1 range foo --total=100000 --consistency=l

bench with linearizable range
 100000 / 100000 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%37s

Summary:
  Total:  37.6047 secs.
  Slowest:  0.0065 secs.
  Fastest:  0.0003 secs.
  Average:  0.0004 secs.
  Stddev: 0.0001 secs.
  Requests/sec: 2659.2406

Response time histogram:
  0.000 [1] |
  0.001 [99320] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.002 [234] |
  0.002 [312] |
  0.003 [108] |
  0.003 [10]  |
  0.004 [4] |
  0.005 [6] |
  0.005 [2] |
  0.006 [1] |
  0.007 [2] |

Latency distribution:
  10% in 0.0003 secs.
  25% in 0.0003 secs.
  50% in 0.0003 secs.
  75% in 0.0004 secs.
  90% in 0.0004 secs.
  95% in 0.0005 secs.
  99% in 0.0006 secs.

ok := make(chan struct{})

select {
case s.readwaitc <- ok:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of this pattern. Would it be possible to do something like

func  (s *EtcdServer) notifyLinearRead() <-chan struct{} {
    rlock()
    defer runlock()
    return s.notifyNextLinearRead
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. we need one chan per read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you want server side to return the chan instead of creating chan here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it need a channel per read? the logic looks a lot like a barrier...

@xiang90
Copy link
Contributor Author

xiang90 commented Aug 19, 2016

@heyitsanthony

clock independent version

./benchmark --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 range --total=100000 foo
bench with linearizable range
 100000 / 100000 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%9s

Summary:
  Total:        9.5157 secs.
  Slowest:      1.0616 secs.
  Fastest:      0.0005 secs.
  Average:      0.0079 secs.
  Stddev:       0.0366 secs.
  Requests/sec: 10508.9396

~10% slower than clock dependent one, but acceptable.

if the latency between leader, follower is high, then the latency difference can be significant though.

@mitake
Copy link
Contributor

mitake commented Aug 24, 2016

@xiang90 does this PR implement an optimization described in section 6.4 of the thesis? Then, I can close my PR (#5912) because it is almost same to this PR (I noticed about it based on the discussion in the raft-dev).

@siddontang
Copy link
Contributor

Hi @xiang90
Do you plan to using this to speed up read and replace quorum read?

@gyuho
Copy link
Contributor

gyuho commented Aug 24, 2016

@mitake Yeah this is etcd implementation of Raft §6.4 Processing read-only queries more efficiently, p.72.

@mitake
Copy link
Contributor

mitake commented Aug 25, 2016

@gyuho I see, thanks!

@xiang90 xiang90 force-pushed the readindex branch 6 times, most recently from 8d14c11 to b13a23a Compare September 13, 2016 08:38
@xiang90 xiang90 changed the title WIP etcdserver: initial read index implementation etcdserver: initial read index implementation Sep 13, 2016
@xiang90
Copy link
Contributor Author

xiang90 commented Sep 13, 2016

@heyitsanthony all fixed. PTAL. I still need to add backward compatibility for this feature.

result, err := s.processInternalRaftRequest(ctx, pb.InternalRaftRequest{Range: r})
if err != nil {
return nil, err
var resp *pb.RangeResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why shuffle this around?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nm, I see it falls through from linearized to serialized once it gets the notify

@heyitsanthony
Copy link
Contributor

approach looks OK but the synchronization/error handling is a little iffy

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 22, 2016

the rate one appears on par or faster than the buffered way of doing it. Where's the latency going?

I will try with more testing + benchmarks.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 23, 2016

@heyitsanthony All fixed. I made some minor changes, the performance got improved for unbuffered case magically. I do not really understand why, but the result seems to match the simple benchmark you wrote. The test has an assumption that we can almost always batch the concurrent requests together. The unbuffered still can be slower than the other two solutions when some of the requests reach the etcd server at slightly different timestamp and miss the batch. (Basically if the requests can be divided into N groups, which consumes N times resources )But the difference is not huge for most of the cases, I guess we should just go with the simplest solution for now.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 23, 2016

If this approach looks good, i will resolve the conflicts and fix tests.

}
}

func (nc *notifier) close(err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notify?

@heyitsanthony
Copy link
Contributor

@xiang90 approach looks good. Thanks!

@xiang90 xiang90 force-pushed the readindex branch 8 times, most recently from e1cb8e1 to b06b789 Compare September 26, 2016 10:46
@xiang90
Copy link
Contributor Author

xiang90 commented Sep 26, 2016

@heyitsanthony All fixed. PTAL.

Copy link
Contributor

@heyitsanthony heyitsanthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few final nits

return nc.err
case <-ctx.Done():
return ctx.Err()
case <-s.done:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.stopping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a routine etcd server created. This is a per request routine so I assume we should use s.done?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it doesn't really make a difference

for !timeout && !done {
select {
case rs = <-s.r.readStateC:
if !bytes.Equal(rs.RequestCtx, ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done = bytes.Equal(rs.RequestCtx, ctx)?

func TestCtlV3Elect(t *testing.T) { testCtl(t, testElect) }
func TestCtlV3Elect(t *testing.T) {
for i := 0; ; i++ {
fmt.Println(i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray debugging output?

@@ -86,6 +95,31 @@ type Authenticator interface {
}

func (s *EtcdServer) Range(ctx context.Context, r *pb.RangeRequest) (*pb.RangeResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also support l-read Txn?

Use read index to achieve l-read.
@xiang90
Copy link
Contributor Author

xiang90 commented Sep 27, 2016

@heyitsanthony Can we improve txn in another pull request? It is more complicated than read. We need to add some code to find out readonly Txn. And then if the txn is serializable, it can access local kv immediately. Or it needs to wait for linearizable notify.

@heyitsanthony
Copy link
Contributor

@xiang90 ok can defer the txn stuff but it needs to go in before 3.1

@heyitsanthony
Copy link
Contributor

lgtm

@xiang90 xiang90 merged commit 150576f into etcd-io:master Sep 27, 2016
@xiang90 xiang90 deleted the readindex branch September 27, 2016 16:51
@ericpai
Copy link

ericpai commented Mar 15, 2017

Will this improvement be backported to v2.3.x ?

@heyitsanthony
Copy link
Contributor

@ericpai no

@mitake mitake mentioned this pull request Apr 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants