Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

G2-item anomaly with master kills #2125

Closed
aphyr opened this issue Aug 22, 2019 · 5 comments

Comments

@aphyr
Copy link

commented Aug 22, 2019

In rare cases (1 in ~100 hours of testing), YugaByte DB 1.3.1.0 can exhibit G2 (anti-dependency cycles) in the Jepsen append test workload.

Let:
  T1 = {:type :ok, :f :txn, :value [[:r 1125 nil] [:r 1122 [5]] [:r 1124 nil] [:r 1121 [31]]], :process 1640, :time 198131935175, :op-timing ["2019-08-21_23:14:08.700" "2019-08-21_23:14:08.729"], :index 48754}
  T2 = {:type :ok, :f :txn, :value [[:r 1123 [1]] [:r 1121 nil] [:r 1119 nil] [:append 1122 11] [:r 1117 nil]], :process 1619, :time 198218022005, :op-timing ["2019-08-21_23:14:00.789" "2019-08-21_23:14:08.815"], :index 48792}
  T3 = {:type :ok, :f :txn, :value [[:append 1121 31] [:append 1119 32] [:r 1123 [1]]], :process 1571, :time 198113058388, :op-timing ["2019-08-21_23:14:08.671" "2019-08-21_23:14:08.710"], :index 48744}

Then:
  - T1 < T2, because T1 did not observe T2's append of 11 to 1122.
  - T2 < T3, because T2 observed the initial (nil) state of 1121, which T3 created by appending 31.
  - However, T3 < T1, because T1 observed T3's append of 31 to key 1121: a contradiction!

This anomaly occurred during a kill-master and start-master combo: we killed all three masters from 14:00 to 14:01:

2019-08-21 23:14:00,731{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :kill-master    nil
2019-08-21 23:14:01,139{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :kill-master    {"ec2-54-172-38-176.compute-1.amazonaws.com" "", "ec2-35-173-249-185.compute-1.amazonaws.com" "", "ec2-54-226-85-143.compute-1.amazonaws.com" ""}

... and started those nodes from 14:06 to 14:25:

2019-08-21 23:14:06,049{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-master   {"ec2-54-226-85-143.c
ompute-1.amazonaws.com" "", "ec2-54-172-38-176.compute-1.amazonaws.com" "", "ec2-35-173-249-185.compute-1.amazonaws.com" ""}
2019-08-21 23:14:25,251{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :kill-master    nil

Full logs are here.

This anomaly occurred in a 1200 second test with master kills, on a 5-node EC2 cluster. You can reproduce it (extremely slowly) using Jepsen 47fba9e4c5e8477a51e24a5185703bcb596b52ab, and running:

lein run test-all --username admin --nodes-file ~/nodes --os debian -w ysql/append --concurrency 3n --time-limit 1200 --test-count 5 --version 1.3.1.0
@aphyr

This comment has been minimized.

Copy link
Author

commented Aug 22, 2019

I've been exploring different techniques to speed reproduction, and I have a second example of G2 for you:

Let:
  T1 = {:type :ok, :f :txn, :value [[:append 63 923] [:r 61 [2 3 11 12 7 13 14 15 17 22 23 27 29 35 39 40 42 47 43 49 50 62 64 66 67 74 75 77 76 78 83 82 84 86 87 90 92 93 94 102 101 103 104 105 108 109 110 116 118 119 121 120 134 135 136 138 139 140 141 143 144 148 153 157 159 161 162 166 170 171 172 173 176 180 183 182 186 187 188 191 194 200 201 202 203 209 213 214 215 216 212 217 218 220 222 225 227 230 226 231 229 234 236 240 239 241 242 249 250 254 256 258 257 260 265 266 267 271 275 276 277 279 281 283 284 288 285 290 291 292 293 297 298 302 303 305 308 309 310 314 315 316 317 318 326 329 330 333 334 336 338 339 340 344 346 345 348 349 350 351 353 359 360 363 364 365 367 370 372 368 375 377 379 398 401 403 404 405 406 407 408 409 412 414 415 416 418 419 423 424 425 426 429 427 428 431 433 449 451 485 487 488 489 490 493 494 495 499 496 502 507 511 512 509 522 524 526 529 555 561 568 573 576 603 605 604 606 607 608 611 612 625 628 629 630 638 639 635 642 643 644 650 651 652 655 658 663 664 665 666 667 670 671 675 676 678 681 684 686 696 697 698 699 709 710 711 712 713 716 717 718 721 750 753 754 755 756 757 760 761 763 766 767 765 770 775 778 781 783 789 792 794 795 797 799 801 802 803 805 806 808 809 810 813 815 812 818 820 823 824 828 829 831 832 849 855 856 857 858 859 868 869 880 886 887 893 894 900 901 902 903 904 905 908 910 912 913 914 916 919 921 922 923 929 930 931 937 939 942]]], :process 2805, :time 976981179523, :op-timing ["2019-08-22_18:23:14.755" "2019-08-22_18:23:15.140"], :index 88230}
  T2 = {:type :ok, :f :txn, :value [[:append 62 896] [:append 61 958] [:r 63 [6 8 9 10 11 17 30 31 32 33 34 35 36 39 41 42 54 55 57 58 59 63 65 70 71 74 75 76 80 91 96 98 99 100 108 109 114 115 117 131 132 133 134 137 138 140 141 144 146 155 156 157 158 161 160 166 169 171 170 174 175 176 177 184 186 187 196 197 202 205 203 209 213 215 217 219 224 226 237 240 241 248 252 253 244 251 255 257 261 262 264 260 266 268 269 265 271 273 274 276 277 279 281 287 286 293 294 295 296 297 298 299 304 300 307 308 310 313 315 316 317 319 322 326 330 328 329 334 335 336 337 341 342 344 345 346 347 348 349 350 351 352 354 356 360 361 362 363 364 365 366 367 370 372 373 374 375 386 390 391 398 399 400 403 406 407 408 409 411 412 413 414 418 419 420 421 425 427 428 431 436 439 440 446 447 449 464 466 468 470 471 475 476 477 481 482 484 486 487 488 492 491 493 494 499 504 508 509 563 564 566 572 575 580 583 584 588 589 590 591 592 594 596 598 603 597 605 608 609 611 626 631 633 641 642 643 646 648 651 652 653 649 654 655 656 657 658 659 661 677 678 682 687 689 690 697 708 709 711 712 713 714 717 719 720 718 723 727 738 739 740 741 745 746 765 767 768 769 770 773 776 777 774 780 781 782 785 788 790 787 791 793 795 799 802 805 810 809 811 812 813 814 816 817 821 822 823 820 828 830 831 835 872 874 876 885 890 891 895 901 899 902 903 905 907 908]]], :process 3032, :time 977024498559, :op-timing ["2019-08-22_18:23:15.106" "2019-08-22_18:23:15.183"], :index 88237}

Then:
  - T1 < T2, because T1 did not observe T2's append of 958 to 61.
  - However, T2 < T1, because T2 did not observe T1's append of 923 to 63: a contradiction!

We killed all three masters:

2019-08-22 18:22:34,180{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :kill-master    nil
2019-08-22 18:22:34,591{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :kill-master    {"ec2-54-226-85-143.compute-1.amazonaws.com" "", "e
c2-35-173-249-185.compute-1.amazonaws.com" "", "ec2-54-172-38-176.compute-1.amazonaws.com" ""}

Then restarted all three nodes, and hit both of these errors three seconds after that start operation completed:

2019-08-22 18:23:11,694{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-master   nil
2019-08-22 18:23:12,001{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-master   {"ec2-54-226-85-143.compute-1.amazonaws.com" "", "ec2-54-172-38-176.compute-1.amazonaws.com" "", "ec2-35-173-249-185.compute-1.amazonaws.com" ""}

Full logs are here: 20190822T180648.000Z.zip

And can be reproduced on Jepsen 64ea9ea via:

lein run test --username admin --nodes-file ~/nodes --os debian -w ysql/append --concurrency 3n --time-limit 1200 --test-count 10 --version 1.3.1.0 --nemesis kill-master --nemesis-interval 30

This time it took roughly 4 test runs. Following @spolitov's hunch that this might be related to a timing anomaly, I lowered the clock skew threshold to 1 us (from 50 ms). I also increased the number of writes per key to give us a better picture of how the system evolves over time (which makes the examples much longer), and added some delays to increase the fraction of successful transactions; maybe 90% were aborting in earlier runs.

@aphyr aphyr changed the title G2 anomaly with master kills G2-item anomaly with master kills Aug 22, 2019

@aphyr

This comment has been minimized.

Copy link
Author

commented Aug 23, 2019

With improved append test tuning, and more frequent crashes affecting all master processes, we can reproduce a G2 anomalies in most 500-second tests. With Jepsen 3ee5900213bfe5b6eecbaf663ebab422a6f6eaae, try:

lein run test --username admin --nodes-file ~/nodes --os debian -w ysql/append --concurrency 3n --time-limit 500 --test-count 1 --version 1.3.1.0 --nemesis kill-master --nemesis-interval 5
@aphyr

This comment has been minimized.

Copy link
Author

commented Aug 23, 2019

I can also reproduce G2 anomalies with master pauses! 20190823T211304.000Z.zip

spolitov added a commit that referenced this issue Aug 24, 2019
[#2125] Avoid picking read time for the serializable transactions
Summary:
For backward compatibility, we have a list of capabilities, supported by tserver.
Right after master restart, while it did not receive heartbeats from tservers it does not know their capabilities.
So, when meta cache is updated tserver with empty capabilities could be stored in it.

Also, we have a feature that read time for the transaction with snapshot isolation is picked at tablet server,
while processing the first request from this transaction.

But, when iterating with tablet server that does not support picking read time,
we set transaction read time at the proxy layer (i.e. YCQL or YSQL proxy).
This particular code contained a bug and also set read time for the transaction with serializable isolation.

Read time for the transaction with serializable isolation should not be set, since
if it would be set transaction with serializable isolation could miss writes made by committed transactions.

The scenario that happened in #2125:
1) All masters are turned off.
2) Meta cache is invalidated and right after master restarted client fetches tablet server information with empty capabilities.
3) When the client tries to send the next transaction request, the buggy code is executed and read time is picked for the transaction with serializable isolation
(it is T2 transaction in #2125).
4) Operation executed for this transaction missed values that were recently committed by other transactions. Following by incorrect state.

So actually it is not just serializable violation but could cause more serious consistency bugs.

Fixed by avoid setting read time for the transaction with serializable isolation.

Extended PgLibPqTest.OnConflict test:
1) Add reads to transaction
2) Find order cycles

Added PgLibPqTest.OnConlictWithKillMaster.

Test Plan: ybd --gtest_filter PgLibPqTest.OnConflict -n 100

Reviewers: mikhail

Reviewed By: mikhail

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7108
@aphyr

This comment has been minimized.

Copy link
Author

commented Aug 27, 2019

It looks to me like this is fixed with yugabyte-1.3.1.2-b1-release-centos-x86_64, with mbautin@3e09352!

@kmuthukk

This comment has been minimized.

Copy link
Collaborator

commented Aug 27, 2019

Thx @aphyr !

@kmuthukk kmuthukk closed this Aug 27, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.