Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dragonfly is about 10x slower than Redis when used by JuiceFS #1285

Closed
zhijian-pro opened this issue May 24, 2023 · 27 comments · Fixed by #1375 or #1777
Closed

Dragonfly is about 10x slower than Redis when used by JuiceFS #1285

zhijian-pro opened this issue May 24, 2023 · 27 comments · Fixed by #1375 or #1777
Assignees
Labels
bug Something isn't working

Comments

@zhijian-pro
Copy link

zhijian-pro commented May 24, 2023

JuiceFS supports using redis as the metadata engine, and we had feedback from the community that they wanted to use dragonfly to replace redis, but after testing, I found that dragonfly is very unstable. It is about 10 times slower than the average redis latency.(calculated by the mean of the latency) I'm not sure if it's my usage or dragonfly's own problem. I would like to ask for help to determine the cause.

As you can see on this graph, Dragonfly's performance is similar to that of Redis under normal circumstances, but there are often exceptionally slow requests, resulting in very large latency fluctuations.

image

To Reproduce

  1. install dragonfly by binary, listen on 127.0.0.1:6378
  2. start dragonfly
    ./dragonfly --logtostderr -dbnum 10 --bind 127.0.0.1 --port 6378
  3. install redis by binary, listen on 127.0.0.1:6379
  4. install golang first
     wget https://go.dev/dl/go1.20.4.linux-amd64.tar.gz
     rm -rf /usr/local/go && tar -C /usr/local -xzf go1.20.4.linux-amd64.tar.gz
     export PATH=$PATH:/usr/local/go/bin
    
  5. git clone -b test_for_dragonfly https://github.com/juicedata/juicefs.git && cd juicefs
  6. go mod tidy
  7. test redis
    addr=redis://127.0.0.1:6379/1 go test -count=1 -v ./pkg/meta/... -run=TestDgfAndRedis
root@bench2:~/juicefs# addr=redis://127.0.0.1:6379/2 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis
=== RUN   TestDgfAndRedis
2023/05/24 04:12:16.195448 juicefs[10992] <INFO>: Meta address: redis://127.0.0.1:6379/2 [interface.go:494]
2023/05/24 04:12:16.196587 juicefs[10992] <INFO>: Ping redis latency: 97.831µs [redis.go:3523]
2023/05/24 04:12:16.197723 juicefs[10992] <INFO>: Create session 1 OK with version: 1.1.0-dev+unknown [base.go:474]
    benchmarks_test.go:673: number:  0: cost: 494 us
    benchmarks_test.go:673: number:  1: cost: 523 us
    benchmarks_test.go:673: number:  2: cost: 507 us
    benchmarks_test.go:673: number:  3: cost: 493 us
    benchmarks_test.go:673: number:  4: cost: 497 us
    benchmarks_test.go:673: number:  5: cost: 555 us
    benchmarks_test.go:673: number:  6: cost: 544 us
    benchmarks_test.go:673: number:  7: cost: 514 us
    benchmarks_test.go:673: number:  8: cost: 511 us
    benchmarks_test.go:673: number:  9: cost: 524 us
    benchmarks_test.go:673: number: 10: cost: 487 us
    benchmarks_test.go:673: number: 11: cost: 485 us
    benchmarks_test.go:673: number: 12: cost: 475 us
    benchmarks_test.go:673: number: 13: cost: 477 us
    benchmarks_test.go:673: number: 14: cost: 474 us
    benchmarks_test.go:673: number: 15: cost: 484 us
    benchmarks_test.go:673: number: 16: cost: 475 us
    benchmarks_test.go:673: number: 17: cost: 509 us
    benchmarks_test.go:673: number: 18: cost: 513 us
    benchmarks_test.go:673: number: 19: cost: 477 us
    benchmarks_test.go:673: number: 20: cost: 504 us
    benchmarks_test.go:673: number: 21: cost: 486 us
    benchmarks_test.go:673: number: 22: cost: 476 us
    benchmarks_test.go:673: number: 23: cost: 475 us
    benchmarks_test.go:673: number: 24: cost: 475 us
    benchmarks_test.go:673: number: 25: cost: 528 us
    benchmarks_test.go:673: number: 26: cost: 510 us
    benchmarks_test.go:673: number: 27: cost: 488 us
    benchmarks_test.go:673: number: 28: cost: 469 us
    benchmarks_test.go:673: number: 29: cost: 469 us
    benchmarks_test.go:673: number: 30: cost: 476 us
    benchmarks_test.go:673: number: 31: cost: 470 us
    benchmarks_test.go:673: number: 32: cost: 468 us
    benchmarks_test.go:673: number: 33: cost: 474 us
    benchmarks_test.go:673: number: 34: cost: 497 us
    benchmarks_test.go:673: number: 35: cost: 522 us
    benchmarks_test.go:673: number: 36: cost: 475 us
    benchmarks_test.go:673: number: 37: cost: 469 us
    benchmarks_test.go:673: number: 38: cost: 481 us
    benchmarks_test.go:673: number: 39: cost: 479 us
    benchmarks_test.go:673: number: 40: cost: 464 us
    benchmarks_test.go:673: number: 41: cost: 495 us
    benchmarks_test.go:673: number: 42: cost: 470 us
    benchmarks_test.go:673: number: 43: cost: 478 us
    benchmarks_test.go:673: number: 44: cost: 485 us
    benchmarks_test.go:673: number: 45: cost: 595 us
    benchmarks_test.go:673: number: 46: cost: 542 us
    benchmarks_test.go:673: number: 47: cost: 514 us
    benchmarks_test.go:673: number: 48: cost: 474 us
    benchmarks_test.go:673: number: 49: cost: 484 us
    benchmarks_test.go:673: number: 50: cost: 490 us
    benchmarks_test.go:673: number: 51: cost: 490 us
    benchmarks_test.go:673: number: 52: cost: 478 us
    benchmarks_test.go:673: number: 53: cost: 493 us
    benchmarks_test.go:673: number: 54: cost: 491 us
    benchmarks_test.go:673: number: 55: cost: 541 us
    benchmarks_test.go:673: number: 56: cost: 489 us
    benchmarks_test.go:673: number: 57: cost: 487 us
    benchmarks_test.go:673: number: 58: cost: 480 us
    benchmarks_test.go:673: number: 59: cost: 486 us
    benchmarks_test.go:673: number: 60: cost: 491 us
    benchmarks_test.go:673: number: 61: cost: 487 us
    benchmarks_test.go:673: number: 62: cost: 484 us
    benchmarks_test.go:673: number: 63: cost: 489 us
    benchmarks_test.go:673: number: 64: cost: 493 us
    benchmarks_test.go:673: number: 65: cost: 542 us
    benchmarks_test.go:673: number: 66: cost: 492 us
    benchmarks_test.go:673: number: 67: cost: 512 us
    benchmarks_test.go:673: number: 68: cost: 493 us
    benchmarks_test.go:673: number: 69: cost: 490 us
    benchmarks_test.go:673: number: 70: cost: 492 us
    benchmarks_test.go:673: number: 71: cost: 478 us
    benchmarks_test.go:673: number: 72: cost: 494 us
    benchmarks_test.go:673: number: 73: cost: 510 us
    benchmarks_test.go:673: number: 74: cost: 512 us
    benchmarks_test.go:673: number: 75: cost: 501 us
    benchmarks_test.go:673: number: 76: cost: 475 us
    benchmarks_test.go:673: number: 77: cost: 483 us
    benchmarks_test.go:673: number: 78: cost: 481 us
    benchmarks_test.go:673: number: 79: cost: 584 us
    benchmarks_test.go:673: number: 80: cost: 486 us
    benchmarks_test.go:673: number: 81: cost: 518 us
    benchmarks_test.go:673: number: 82: cost: 521 us
    benchmarks_test.go:673: number: 83: cost: 476 us
    benchmarks_test.go:673: number: 84: cost: 570 us
    benchmarks_test.go:673: number: 85: cost: 577 us
    benchmarks_test.go:673: number: 86: cost: 479 us
    benchmarks_test.go:673: number: 87: cost: 475 us
    benchmarks_test.go:673: number: 88: cost: 501 us
    benchmarks_test.go:673: number: 89: cost: 514 us
    benchmarks_test.go:673: number: 90: cost: 485 us
    benchmarks_test.go:673: number: 91: cost: 489 us
    benchmarks_test.go:673: number: 92: cost: 516 us
    benchmarks_test.go:673: number: 93: cost: 492 us
    benchmarks_test.go:673: number: 94: cost: 501 us
    benchmarks_test.go:673: number: 95: cost: 478 us
    benchmarks_test.go:673: number: 96: cost: 473 us
    benchmarks_test.go:673: number: 97: cost: 478 us
    benchmarks_test.go:673: number: 98: cost: 577 us
    benchmarks_test.go:673: number: 99: cost: 478 us
    benchmarks_test.go:676: --------------------test redis://127.0.0.1:6379/2--------------------
    benchmarks_test.go:677: mean: 497.02, standard deviation: 27.18
--- PASS: TestDgfAndRedis (0.05s)
PASS
ok  	github.com/juicedata/juicefs/pkg/meta	0.070s
  1. test dragonfly
    addr=redis://127.0.0.1:6378/1 go test -count=1 -v ./pkg/meta/... -run=TestDgfAndRedis
root@bench2:~/juicefs# addr=redis://127.0.0.1:6378/2 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis
=== RUN   TestDgfAndRedis
2023/05/24 04:11:15.817867 juicefs[10886] <INFO>: Meta address: redis://127.0.0.1:6378/2 [interface.go:494]
2023/05/24 04:11:15.819105 juicefs[10886] <INFO>: Ping redis latency: 86.584µs [redis.go:3523]
2023/05/24 04:11:15.819355 juicefs[10886] <WARNING>: Existing volume will be overwrited: {
  "Name": "benchmarkAll",
  "UUID": "",
  "Storage": "",
  "Bucket": "",
  "BlockSize": 0,
  "TrashDays": 0
} [config.go:96]
2023/05/24 04:11:15.820231 juicefs[10886] <INFO>: Create session 2 OK with version: 1.1.0-dev+unknown [base.go:474]
2023/05/24 04:11:15.820845 juicefs[10886] <WARNING>: unexpected error for lookup: ERR Error running script (call to 8526c64984678fc27f67f44755c8ba7dcdf7ea60): @user_script:12: script tried accessing undeclared key [redis.go:695]
    benchmarks_test.go:673: number:  0: cost: 714 us
    benchmarks_test.go:673: number:  1: cost: 716 us
    benchmarks_test.go:673: number:  2: cost: 722 us
    benchmarks_test.go:673: number:  3: cost: 732 us
    benchmarks_test.go:673: number:  4: cost: 690 us
    benchmarks_test.go:673: number:  5: cost: 698 us
    benchmarks_test.go:673: number:  6: cost: 731 us
    benchmarks_test.go:673: number:  7: cost: 673 us
    benchmarks_test.go:673: number:  8: cost: 615 us
    benchmarks_test.go:673: number:  9: cost: 714 us
    benchmarks_test.go:673: number: 10: cost: 717 us
    benchmarks_test.go:673: number: 11: cost: 700 us
    benchmarks_test.go:673: number: 12: cost: 710 us
    benchmarks_test.go:673: number: 13: cost: 6720 us
    benchmarks_test.go:673: number: 14: cost: 43235 us
    benchmarks_test.go:673: number: 15: cost: 692 us
    benchmarks_test.go:673: number: 16: cost: 694 us
    benchmarks_test.go:673: number: 17: cost: 698 us
    benchmarks_test.go:673: number: 18: cost: 715 us
    benchmarks_test.go:673: number: 19: cost: 733 us
    benchmarks_test.go:673: number: 20: cost: 856 us
    benchmarks_test.go:673: number: 21: cost: 717 us
    benchmarks_test.go:673: number: 22: cost: 683 us
    benchmarks_test.go:673: number: 23: cost: 672 us
    benchmarks_test.go:673: number: 24: cost: 640 us
    benchmarks_test.go:673: number: 25: cost: 726 us
    benchmarks_test.go:673: number: 26: cost: 703 us
    benchmarks_test.go:673: number: 27: cost: 709 us
    benchmarks_test.go:673: number: 28: cost: 699 us
    benchmarks_test.go:673: number: 29: cost: 2740 us
    benchmarks_test.go:673: number: 30: cost: 43136 us
    benchmarks_test.go:673: number: 31: cost: 705 us
    benchmarks_test.go:673: number: 32: cost: 618 us
    benchmarks_test.go:673: number: 33: cost: 708 us
    benchmarks_test.go:673: number: 34: cost: 740 us
    benchmarks_test.go:673: number: 35: cost: 713 us
    benchmarks_test.go:673: number: 36: cost: 704 us
    benchmarks_test.go:673: number: 37: cost: 676 us
    benchmarks_test.go:673: number: 38: cost: 672 us
    benchmarks_test.go:673: number: 39: cost: 696 us
    benchmarks_test.go:673: number: 40: cost: 705 us
    benchmarks_test.go:673: number: 41: cost: 715 us
    benchmarks_test.go:673: number: 42: cost: 44201 us
    benchmarks_test.go:673: number: 43: cost: 710 us
    benchmarks_test.go:673: number: 44: cost: 698 us
    benchmarks_test.go:673: number: 45: cost: 728 us
    benchmarks_test.go:673: number: 46: cost: 667 us
    benchmarks_test.go:673: number: 47: cost: 724 us
    benchmarks_test.go:673: number: 48: cost: 723 us
    benchmarks_test.go:673: number: 49: cost: 724 us
    benchmarks_test.go:673: number: 50: cost: 739 us
    benchmarks_test.go:673: number: 51: cost: 710 us
    benchmarks_test.go:673: number: 52: cost: 702 us
    benchmarks_test.go:673: number: 53: cost: 708 us
    benchmarks_test.go:673: number: 54: cost: 678 us
    benchmarks_test.go:673: number: 55: cost: 610 us
    benchmarks_test.go:673: number: 56: cost: 709 us
    benchmarks_test.go:673: number: 57: cost: 692 us
    benchmarks_test.go:673: number: 58: cost: 690 us
    benchmarks_test.go:673: number: 59: cost: 44570 us
    benchmarks_test.go:673: number: 60: cost: 712 us
    benchmarks_test.go:673: number: 61: cost: 708 us
    benchmarks_test.go:673: number: 62: cost: 672 us
    benchmarks_test.go:673: number: 63: cost: 638 us
    benchmarks_test.go:673: number: 64: cost: 4791 us
    benchmarks_test.go:673: number: 65: cost: 44406 us
    benchmarks_test.go:673: number: 66: cost: 717 us
    benchmarks_test.go:673: number: 67: cost: 605 us
    benchmarks_test.go:673: number: 68: cost: 724 us
    benchmarks_test.go:673: number: 69: cost: 710 us
    benchmarks_test.go:673: number: 70: cost: 706 us
    benchmarks_test.go:673: number: 71: cost: 705 us
    benchmarks_test.go:673: number: 72: cost: 646 us
    benchmarks_test.go:673: number: 73: cost: 679 us
    benchmarks_test.go:673: number: 74: cost: 694 us
    benchmarks_test.go:673: number: 75: cost: 699 us
    benchmarks_test.go:673: number: 76: cost: 689 us
    benchmarks_test.go:673: number: 77: cost: 44305 us
    benchmarks_test.go:673: number: 78: cost: 760 us
    benchmarks_test.go:673: number: 79: cost: 694 us
    benchmarks_test.go:673: number: 80: cost: 716 us
    benchmarks_test.go:673: number: 81: cost: 6768 us
    benchmarks_test.go:673: number: 82: cost: 47010 us
    benchmarks_test.go:673: number: 83: cost: 748 us
    benchmarks_test.go:673: number: 84: cost: 2813 us
    benchmarks_test.go:673: number: 85: cost: 728 us
    benchmarks_test.go:673: number: 86: cost: 700 us
    benchmarks_test.go:673: number: 87: cost: 705 us
    benchmarks_test.go:673: number: 88: cost: 698 us
    benchmarks_test.go:673: number: 89: cost: 645 us
    benchmarks_test.go:673: number: 90: cost: 723 us
    benchmarks_test.go:673: number: 91: cost: 691 us
    benchmarks_test.go:673: number: 92: cost: 712 us
    benchmarks_test.go:673: number: 93: cost: 721 us
    benchmarks_test.go:673: number: 94: cost: 691 us
    benchmarks_test.go:673: number: 95: cost: 669 us
    benchmarks_test.go:673: number: 96: cost: 44559 us
    benchmarks_test.go:673: number: 97: cost: 677 us
    benchmarks_test.go:673: number: 98: cost: 680 us
    benchmarks_test.go:673: number: 99: cost: 804 us
    benchmarks_test.go:676: --------------------test redis://127.0.0.1:6378/2--------------------
    benchmarks_test.go:677: mean: 4402.53, standard deviation: 11906.39
--- PASS: TestDgfAndRedis (2.45s)
PASS
ok  	github.com/juicedata/juicefs/pkg/meta	2.469s

Expected behavior
Performance on the same order of magnitude as redis or better

Environment (please complete the following information):

  • OS: [ubuntu 20.04]
  • Kernel:
    Linux bench2 5.4.0-1029-aws renamenx is not supported #30-Ubuntu SMP Tue Oct 20 10:06:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Dragonfly Version:
    dragonfly v1.3.0-f80afca9c23e2f30373437520a162c591eaa2005
    build time: 2023-05-18 07:11:31
@zhijian-pro zhijian-pro added the bug Something isn't working label May 24, 2023
@zhijian-pro zhijian-pro changed the title Testing found that dragonfly is about 400 times less stable than redis DragonFlyDB is about 10x slower than Redis when used by JuiceFS May 24, 2023
@zhijian-pro zhijian-pro changed the title DragonFlyDB is about 10x slower than Redis when used by JuiceFS DragonflyDB is about 10x slower than Redis when used by JuiceFS May 24, 2023
@zhijian-pro zhijian-pro changed the title DragonflyDB is about 10x slower than Redis when used by JuiceFS Dragonfly is about 10x slower than Redis when used by JuiceFS May 24, 2023
@romange
Copy link
Collaborator

romange commented May 24, 2023

@zhijian-pro how many cpus do you have on your machine? Is your memory store supposed to run locally? i.e. to be colocated with JuiceFS?

@royjacobson
Copy link
Contributor

Hi, thanks for the clear reproduction instructions! I've looked at the latency problem and could reproduce it. I saw constant latencies of ~41ms, not spikes. What I saw looked like some kind of network misconfiguration in the TCP socket that causes packets to be delayed until dragonfly gets an ACK from JuiceFS:

image

When I switched to using unix sockets with I saw latencies of ~700us and no spikes.

mkfifo /tmp/my_unix_socket
dragonfly --unixsocket /tmp/my_unix_socket
addr=unix:///tmp/my_unix_socket?db=1 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis

So that's a temporary workaround and, if you plan to deploy locally, probably a better alternative anyway. We'll try to understand if that's some socket configuration problem we can fix ourselves.

@royjacobson
Copy link
Contributor

Alternatively, giving dragonfly --tcp_nodelay also seems to solve the problem.

@zhijian-pro
Copy link
Author

@romange In fact, this phenomenon was discovered during a multi-machine test where redis and Dragonfly were running simultaneously on a 4c8g linux and the juicefs test program was running on another 4c8g machine. I simplified the model for that test to make it easier for everyone its easier to reproduce the problem.

But I can confirm that redis, dragonfly and juicefs all have enough cpu and memory. This should not be a resource allocation issue.

@zhijian-pro
Copy link
Author

@royjacobson I tested it, and after adding --tcp_nodelay to Dragonfly, the result is much better than before.

root@bench2:~/juicefs# addr=redis://10.0.101.xx:6378/3 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis
=== RUN   TestDgfAndRedis
2023/05/24 12:17:07.307180 juicefs[1145] <INFO>: Meta address: redis://10.0.101.xx:6378/3 [interface.go:494]
2023/05/24 12:17:07.318705 juicefs[1145] <INFO>: Ping redis latency: 90.234µs [redis.go:3523]
2023/05/24 12:17:07.321449 juicefs[1145] <INFO>: Create session 1 OK with version: 1.1.0-dev+unknown [base.go:474]
    benchmarks_test.go:673: number:  0: cost: 595 us
    benchmarks_test.go:673: number:  1: cost: 675 us
    benchmarks_test.go:673: number:  2: cost: 703 us
    benchmarks_test.go:673: number:  3: cost: 652 us
    benchmarks_test.go:673: number:  4: cost: 666 us
    benchmarks_test.go:673: number:  5: cost: 583 us
    benchmarks_test.go:673: number:  6: cost: 570 us
    benchmarks_test.go:673: number:  7: cost: 552 us
    benchmarks_test.go:673: number:  8: cost: 620 us
    benchmarks_test.go:673: number:  9: cost: 660 us
    benchmarks_test.go:673: number: 10: cost: 691 us
    benchmarks_test.go:673: number: 11: cost: 634 us
    benchmarks_test.go:673: number: 12: cost: 673 us
    benchmarks_test.go:673: number: 13: cost: 658 us
    benchmarks_test.go:673: number: 14: cost: 626 us
    benchmarks_test.go:673: number: 15: cost: 672 us
    benchmarks_test.go:673: number: 16: cost: 659 us
    benchmarks_test.go:673: number: 17: cost: 652 us
    benchmarks_test.go:673: number: 18: cost: 654 us
    benchmarks_test.go:673: number: 19: cost: 666 us
    benchmarks_test.go:673: number: 20: cost: 646 us
    benchmarks_test.go:673: number: 21: cost: 598 us
    benchmarks_test.go:673: number: 22: cost: 529 us
    benchmarks_test.go:673: number: 23: cost: 620 us
    benchmarks_test.go:673: number: 24: cost: 655 us
    benchmarks_test.go:673: number: 25: cost: 684 us
    benchmarks_test.go:673: number: 26: cost: 695 us
    benchmarks_test.go:673: number: 27: cost: 670 us
    benchmarks_test.go:673: number: 28: cost: 641 us
    benchmarks_test.go:673: number: 29: cost: 668 us
    benchmarks_test.go:673: number: 30: cost: 667 us
    benchmarks_test.go:673: number: 31: cost: 671 us
    benchmarks_test.go:673: number: 32: cost: 673 us
    benchmarks_test.go:673: number: 33: cost: 711 us
    benchmarks_test.go:673: number: 34: cost: 676 us
    benchmarks_test.go:673: number: 35: cost: 699 us
    benchmarks_test.go:673: number: 36: cost: 606 us
    benchmarks_test.go:673: number: 37: cost: 583 us
    benchmarks_test.go:673: number: 38: cost: 627 us
    benchmarks_test.go:673: number: 39: cost: 733 us
    benchmarks_test.go:673: number: 40: cost: 715 us
    benchmarks_test.go:673: number: 41: cost: 666 us
    benchmarks_test.go:673: number: 42: cost: 680 us
    benchmarks_test.go:673: number: 43: cost: 655 us
    benchmarks_test.go:673: number: 44: cost: 654 us
    benchmarks_test.go:673: number: 45: cost: 679 us
    benchmarks_test.go:673: number: 46: cost: 666 us
    benchmarks_test.go:673: number: 47: cost: 683 us
    benchmarks_test.go:673: number: 48: cost: 669 us
    benchmarks_test.go:673: number: 49: cost: 657 us
    benchmarks_test.go:673: number: 50: cost: 686 us
    benchmarks_test.go:673: number: 51: cost: 1644 us
    benchmarks_test.go:673: number: 52: cost: 650 us
    benchmarks_test.go:673: number: 53: cost: 660 us
    benchmarks_test.go:673: number: 54: cost: 8705 us
    benchmarks_test.go:673: number: 55: cost: 693 us
    benchmarks_test.go:673: number: 56: cost: 649 us
    benchmarks_test.go:673: number: 57: cost: 656 us
    benchmarks_test.go:673: number: 58: cost: 699 us
    benchmarks_test.go:673: number: 59: cost: 675 us
    benchmarks_test.go:673: number: 60: cost: 655 us
    benchmarks_test.go:673: number: 61: cost: 684 us
    benchmarks_test.go:673: number: 62: cost: 671 us
    benchmarks_test.go:673: number: 63: cost: 675 us
    benchmarks_test.go:673: number: 64: cost: 675 us
    benchmarks_test.go:673: number: 65: cost: 679 us
    benchmarks_test.go:673: number: 66: cost: 686 us
    benchmarks_test.go:673: number: 67: cost: 579 us
    benchmarks_test.go:673: number: 68: cost: 574 us
    benchmarks_test.go:673: number: 69: cost: 612 us
    benchmarks_test.go:673: number: 70: cost: 719 us
    benchmarks_test.go:673: number: 71: cost: 668 us
    benchmarks_test.go:673: number: 72: cost: 673 us
    benchmarks_test.go:673: number: 73: cost: 661 us
    benchmarks_test.go:673: number: 74: cost: 656 us
    benchmarks_test.go:673: number: 75: cost: 680 us
    benchmarks_test.go:673: number: 76: cost: 658 us
    benchmarks_test.go:673: number: 77: cost: 677 us
    benchmarks_test.go:673: number: 78: cost: 662 us
    benchmarks_test.go:673: number: 79: cost: 684 us
    benchmarks_test.go:673: number: 80: cost: 684 us
    benchmarks_test.go:673: number: 81: cost: 661 us
    benchmarks_test.go:673: number: 82: cost: 611 us
    benchmarks_test.go:673: number: 83: cost: 580 us
    benchmarks_test.go:673: number: 84: cost: 622 us
    benchmarks_test.go:673: number: 85: cost: 658 us
    benchmarks_test.go:673: number: 86: cost: 663 us
    benchmarks_test.go:673: number: 87: cost: 678 us
    benchmarks_test.go:673: number: 88: cost: 678 us
    benchmarks_test.go:673: number: 89: cost: 659 us
    benchmarks_test.go:673: number: 90: cost: 663 us
    benchmarks_test.go:673: number: 91: cost: 688 us
    benchmarks_test.go:673: number: 92: cost: 690 us
    benchmarks_test.go:673: number: 93: cost: 705 us
    benchmarks_test.go:673: number: 94: cost: 671 us
    benchmarks_test.go:673: number: 95: cost: 669 us
    benchmarks_test.go:673: number: 96: cost: 667 us
    benchmarks_test.go:673: number: 97: cost: 615 us
    benchmarks_test.go:673: number: 98: cost: 641 us
    benchmarks_test.go:673: number: 99: cost: 634 us
    benchmarks_test.go:676: --------------------test redis://10.0.101.xx:6378/3--------------------
    benchmarks_test.go:677: mean: 747.19, standard deviation: 810.68
--- PASS: TestDgfAndRedis (0.09s)
PASS
ok  	github.com/juicedata/juicefs/pkg/meta	0.107s

@royjacobson
Copy link
Contributor

For the record, the latency issue is reproducible with the following snippet:

import time, redis
r = redis.Redis()

def test_multi():
    r.execute_command("MULTI")
    r.execute_command(f"SET val1 whatever")
    start = time.time()
    r.execute_command("EXEC")
    lat_ms = 1000 * (time.time() - start)
    print(f"Latency: {lat_ms}")

romange added a commit that referenced this issue May 24, 2023
Fixes #1285

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 25, 2023
Fixes #1285

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 29, 2023
Also pull the latest helio.

Fixes #1285

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
@zhijian-pro
Copy link
Author

@romange @royjacobson I tested it manually using the main branch, but I don't think it works well, and there are still very slow requests.

  1. The effect of the current main branch using tcp_nodelay is worse than the previous version(now mean: 839.36, standard deviation: 1577.91, before mean: 747.19, standard deviation: 810.68).

  2. The effect of the current main branch without tcp_nodelay is still poor(now mean: 4326.63, standard deviation: 12204.32 before: mean: 4402.53, standard deviation: 11906.39)

Maybe I didn't compile correctly ?

# dragonfly version   build form main branch
root@bench1:~/repo/dragonfly/build-opt# ./dragonfly --version
dragonfly dev-0000000
build time: bigbang
# start dragonfly with tcp_nodelay on bench1 instance
$ ./dragonfly --logtostderr  -dbnum 10 --bind 0.0.0.0 --port 6378 --tcp_nodelay

# run test on bench2 instance 
root@bench2:~/juicefs# addr=redis://10.0.101.xx:6378/5 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis
=== RUN   TestDgfAndRedis
2023/06/07 02:47:15.359327 juicefs[19937] <INFO>: Meta address: redis://10.0.101.xx:6378/5 [interface.go:494]
2023/06/07 02:47:15.370262 juicefs[19937] <INFO>: Ping redis latency: 77.046µs [redis.go:3523]
2023/06/07 02:47:15.371322 juicefs[19937] <INFO>: Create session 1 OK with version: 1.1.0-dev+unknown [base.go:474]
    benchmarks_test.go:673: number:  0: cost: 572 us
    benchmarks_test.go:673: number:  1: cost: 548 us
    benchmarks_test.go:673: number:  2: cost: 571 us
    benchmarks_test.go:673: number:  3: cost: 581 us
    benchmarks_test.go:673: number:  4: cost: 12658 us
    benchmarks_test.go:673: number:  5: cost: 576 us
    benchmarks_test.go:673: number:  6: cost: 575 us
    benchmarks_test.go:673: number:  7: cost: 560 us
    benchmarks_test.go:673: number:  8: cost: 536 us
    benchmarks_test.go:673: number:  9: cost: 589 us
    benchmarks_test.go:673: number: 10: cost: 575 us
    benchmarks_test.go:673: number: 11: cost: 572 us
    benchmarks_test.go:673: number: 12: cost: 562 us
    benchmarks_test.go:673: number: 13: cost: 561 us
    benchmarks_test.go:673: number: 14: cost: 561 us
    benchmarks_test.go:673: number: 15: cost: 577 us
    benchmarks_test.go:673: number: 16: cost: 570 us
    benchmarks_test.go:673: number: 17: cost: 565 us
    benchmarks_test.go:673: number: 18: cost: 618 us
    benchmarks_test.go:673: number: 19: cost: 579 us
    benchmarks_test.go:673: number: 20: cost: 574 us
    benchmarks_test.go:673: number: 21: cost: 509 us
    benchmarks_test.go:673: number: 22: cost: 557 us
    benchmarks_test.go:673: number: 23: cost: 557 us
    benchmarks_test.go:673: number: 24: cost: 527 us
    benchmarks_test.go:673: number: 25: cost: 488 us
    benchmarks_test.go:673: number: 26: cost: 511 us
    benchmarks_test.go:673: number: 27: cost: 543 us
    benchmarks_test.go:673: number: 28: cost: 557 us
    benchmarks_test.go:673: number: 29: cost: 540 us
    benchmarks_test.go:673: number: 30: cost: 576 us
    benchmarks_test.go:673: number: 31: cost: 557 us
    benchmarks_test.go:673: number: 32: cost: 570 us
    benchmarks_test.go:673: number: 33: cost: 580 us
    benchmarks_test.go:673: number: 34: cost: 557 us
    benchmarks_test.go:673: number: 35: cost: 603 us
    benchmarks_test.go:673: number: 36: cost: 568 us
    benchmarks_test.go:673: number: 37: cost: 553 us
    benchmarks_test.go:673: number: 38: cost: 531 us
    benchmarks_test.go:673: number: 39: cost: 489 us
    benchmarks_test.go:673: number: 40: cost: 545 us
    benchmarks_test.go:673: number: 41: cost: 572 us
    benchmarks_test.go:673: number: 42: cost: 502 us
    benchmarks_test.go:673: number: 43: cost: 470 us
    benchmarks_test.go:673: number: 44: cost: 533 us
    benchmarks_test.go:673: number: 45: cost: 572 us
    benchmarks_test.go:673: number: 46: cost: 551 us
    benchmarks_test.go:673: number: 47: cost: 565 us
    benchmarks_test.go:673: number: 48: cost: 545 us
    benchmarks_test.go:673: number: 49: cost: 538 us
    benchmarks_test.go:673: number: 50: cost: 578 us
    benchmarks_test.go:673: number: 51: cost: 558 us
    benchmarks_test.go:673: number: 52: cost: 595 us
    benchmarks_test.go:673: number: 53: cost: 563 us
    benchmarks_test.go:673: number: 54: cost: 4644 us
    benchmarks_test.go:673: number: 55: cost: 550 us
    benchmarks_test.go:673: number: 56: cost: 551 us
    benchmarks_test.go:673: number: 57: cost: 556 us
    benchmarks_test.go:673: number: 58: cost: 562 us
    benchmarks_test.go:673: number: 59: cost: 563 us
    benchmarks_test.go:673: number: 60: cost: 785 us
    benchmarks_test.go:673: number: 61: cost: 675 us
    benchmarks_test.go:673: number: 62: cost: 580 us
    benchmarks_test.go:673: number: 63: cost: 549 us
    benchmarks_test.go:673: number: 64: cost: 591 us
    benchmarks_test.go:673: number: 65: cost: 554 us
    benchmarks_test.go:673: number: 66: cost: 487 us
    benchmarks_test.go:673: number: 67: cost: 496 us
    benchmarks_test.go:673: number: 68: cost: 562 us
    benchmarks_test.go:673: number: 69: cost: 562 us
    benchmarks_test.go:673: number: 70: cost: 488 us
    benchmarks_test.go:673: number: 71: cost: 481 us
    benchmarks_test.go:673: number: 72: cost: 9607 us
    benchmarks_test.go:673: number: 73: cost: 516 us
    benchmarks_test.go:673: number: 74: cost: 550 us
    benchmarks_test.go:673: number: 75: cost: 576 us
    benchmarks_test.go:673: number: 76: cost: 551 us
    benchmarks_test.go:673: number: 77: cost: 526 us
    benchmarks_test.go:673: number: 78: cost: 550 us
    benchmarks_test.go:673: number: 79: cost: 539 us
    benchmarks_test.go:673: number: 80: cost: 570 us
    benchmarks_test.go:673: number: 81: cost: 544 us
    benchmarks_test.go:673: number: 82: cost: 538 us
    benchmarks_test.go:673: number: 83: cost: 539 us
    benchmarks_test.go:673: number: 84: cost: 558 us
    benchmarks_test.go:673: number: 85: cost: 3603 us
    benchmarks_test.go:673: number: 86: cost: 512 us
    benchmarks_test.go:673: number: 87: cost: 573 us
    benchmarks_test.go:673: number: 88: cost: 546 us
    benchmarks_test.go:673: number: 89: cost: 563 us
    benchmarks_test.go:673: number: 90: cost: 558 us
    benchmarks_test.go:673: number: 91: cost: 541 us
    benchmarks_test.go:673: number: 92: cost: 551 us
    benchmarks_test.go:673: number: 93: cost: 553 us
    benchmarks_test.go:673: number: 94: cost: 574 us
    benchmarks_test.go:673: number: 95: cost: 575 us
    benchmarks_test.go:673: number: 96: cost: 557 us
    benchmarks_test.go:673: number: 97: cost: 573 us
    benchmarks_test.go:673: number: 98: cost: 611 us
    benchmarks_test.go:673: number: 99: cost: 506 us
    benchmarks_test.go:676: --------------------test redis://10.0.101.xx:6378/5--------------------
    benchmarks_test.go:677: mean: 839.36, standard deviation: 1577.91
--- PASS: TestDgfAndRedis (0.10s)
PASS
ok  	github.com/juicedata/juicefs/pkg/meta	0.114s
# start dragonfly without tcp_nodelay on bench1 instance
$ ./dragonfly --logtostderr  -dbnum 10 --bind 0.0.0.0 --port 6378

# run test on bench2 instance 
root@bench2:~/juicefs# addr=redis://10.0.101.xx:6378/6 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis
=== RUN   TestDgfAndRedis
2023/06/07 02:48:30.240476 juicefs[20046] <INFO>: Meta address: redis://10.0.101.xx:6378/6 [interface.go:494]
2023/06/07 02:48:30.251426 juicefs[20046] <INFO>: Ping redis latency: 71.527µs [redis.go:3523]
2023/06/07 02:48:30.252793 juicefs[20046] <INFO>: Create session 1 OK with version: 1.1.0-dev+unknown [base.go:474]
    benchmarks_test.go:673: number:  0: cost: 583 us
    benchmarks_test.go:673: number:  1: cost: 575 us
    benchmarks_test.go:673: number:  2: cost: 573 us
    benchmarks_test.go:673: number:  3: cost: 587 us
    benchmarks_test.go:673: number:  4: cost: 46990 us
    benchmarks_test.go:673: number:  5: cost: 628 us
    benchmarks_test.go:673: number:  6: cost: 599 us
    benchmarks_test.go:673: number:  7: cost: 606 us
    benchmarks_test.go:673: number:  8: cost: 606 us
    benchmarks_test.go:673: number:  9: cost: 615 us
    benchmarks_test.go:673: number: 10: cost: 610 us
    benchmarks_test.go:673: number: 11: cost: 525 us
    benchmarks_test.go:673: number: 12: cost: 556 us
    benchmarks_test.go:673: number: 13: cost: 593 us
    benchmarks_test.go:673: number: 14: cost: 592 us
    benchmarks_test.go:673: number: 15: cost: 615 us
    benchmarks_test.go:673: number: 16: cost: 538 us
    benchmarks_test.go:673: number: 17: cost: 559 us
    benchmarks_test.go:673: number: 18: cost: 44258 us
    benchmarks_test.go:673: number: 19: cost: 592 us
    benchmarks_test.go:673: number: 20: cost: 653 us
    benchmarks_test.go:673: number: 21: cost: 571 us
    benchmarks_test.go:673: number: 22: cost: 529 us
    benchmarks_test.go:673: number: 23: cost: 574 us
    benchmarks_test.go:673: number: 24: cost: 580 us
    benchmarks_test.go:673: number: 25: cost: 592 us
    benchmarks_test.go:673: number: 26: cost: 548 us
    benchmarks_test.go:673: number: 27: cost: 520 us
    benchmarks_test.go:673: number: 28: cost: 561 us
    benchmarks_test.go:673: number: 29: cost: 607 us
    benchmarks_test.go:673: number: 30: cost: 569 us
    benchmarks_test.go:673: number: 31: cost: 605 us
    benchmarks_test.go:673: number: 32: cost: 44357 us
    benchmarks_test.go:673: number: 33: cost: 561 us
    benchmarks_test.go:673: number: 34: cost: 564 us
    benchmarks_test.go:673: number: 35: cost: 568 us
    benchmarks_test.go:673: number: 36: cost: 580 us
    benchmarks_test.go:673: number: 37: cost: 559 us
    benchmarks_test.go:673: number: 38: cost: 520 us
    benchmarks_test.go:673: number: 39: cost: 591 us
    benchmarks_test.go:673: number: 40: cost: 597 us
    benchmarks_test.go:673: number: 41: cost: 589 us
    benchmarks_test.go:673: number: 42: cost: 50778 us
    benchmarks_test.go:673: number: 43: cost: 609 us
    benchmarks_test.go:673: number: 44: cost: 608 us
    benchmarks_test.go:673: number: 45: cost: 578 us
    benchmarks_test.go:673: number: 46: cost: 610 us
    benchmarks_test.go:673: number: 47: cost: 600 us
    benchmarks_test.go:673: number: 48: cost: 599 us
    benchmarks_test.go:673: number: 49: cost: 527 us
    benchmarks_test.go:673: number: 50: cost: 543 us
    benchmarks_test.go:673: number: 51: cost: 645 us
    benchmarks_test.go:673: number: 52: cost: 624 us
    benchmarks_test.go:673: number: 53: cost: 596 us
    benchmarks_test.go:673: number: 54: cost: 495 us
    benchmarks_test.go:673: number: 55: cost: 497 us
    benchmarks_test.go:673: number: 56: cost: 44338 us
    benchmarks_test.go:673: number: 57: cost: 598 us
    benchmarks_test.go:673: number: 58: cost: 4601 us
    benchmarks_test.go:673: number: 59: cost: 540 us
    benchmarks_test.go:673: number: 60: cost: 9645 us
    benchmarks_test.go:673: number: 61: cost: 44547 us
    benchmarks_test.go:673: number: 62: cost: 612 us
    benchmarks_test.go:673: number: 63: cost: 633 us
    benchmarks_test.go:673: number: 64: cost: 608 us
    benchmarks_test.go:673: number: 65: cost: 561 us
    benchmarks_test.go:673: number: 66: cost: 551 us
    benchmarks_test.go:673: number: 67: cost: 581 us
    benchmarks_test.go:673: number: 68: cost: 618 us
    benchmarks_test.go:673: number: 69: cost: 592 us
    benchmarks_test.go:673: number: 70: cost: 514 us
    benchmarks_test.go:673: number: 71: cost: 566 us
    benchmarks_test.go:673: number: 72: cost: 583 us
    benchmarks_test.go:673: number: 73: cost: 578 us
    benchmarks_test.go:673: number: 74: cost: 580 us
    benchmarks_test.go:673: number: 75: cost: 579 us
    benchmarks_test.go:673: number: 76: cost: 581 us
    benchmarks_test.go:673: number: 77: cost: 543 us
    benchmarks_test.go:673: number: 78: cost: 571 us
    benchmarks_test.go:673: number: 79: cost: 574 us
    benchmarks_test.go:673: number: 80: cost: 595 us
    benchmarks_test.go:673: number: 81: cost: 547 us
    benchmarks_test.go:673: number: 82: cost: 44221 us
    benchmarks_test.go:673: number: 83: cost: 573 us
    benchmarks_test.go:673: number: 84: cost: 536 us
    benchmarks_test.go:673: number: 85: cost: 533 us
    benchmarks_test.go:673: number: 86: cost: 553 us
    benchmarks_test.go:673: number: 87: cost: 570 us
    benchmarks_test.go:673: number: 88: cost: 547 us
    benchmarks_test.go:673: number: 89: cost: 582 us
    benchmarks_test.go:673: number: 90: cost: 561 us
    benchmarks_test.go:673: number: 91: cost: 4622 us
    benchmarks_test.go:673: number: 92: cost: 42820 us
    benchmarks_test.go:673: number: 93: cost: 567 us
    benchmarks_test.go:673: number: 94: cost: 572 us
    benchmarks_test.go:673: number: 95: cost: 598 us
    benchmarks_test.go:673: number: 96: cost: 609 us
    benchmarks_test.go:673: number: 97: cost: 597 us
    benchmarks_test.go:673: number: 98: cost: 678 us
    benchmarks_test.go:673: number: 99: cost: 584 us
    benchmarks_test.go:676: --------------------test redis://10.0.101.xx:6378/6--------------------
    benchmarks_test.go:677: mean: 4326.63, standard deviation: 12204.32
--- PASS: TestDgfAndRedis (0.45s)
PASS
ok  	github.com/juicedata/juicefs/pkg/meta	0.463s

@royjacobson
Copy link
Contributor

Thanks @zhijian-pro, it's very helpful. The small latency increase might be explained by lack of link-time-optimization that we use in the release binaries, but the fact that you still see latency spikes is concerning.

Since I couldn't reproduce the exact latency patterns locally, I think the difference might be in the networking setup you use. If it's not too much trouble, do you think you could do a network capture of the traffic in the benchmark? I think it should be something like this

sudo tcpdump -i <interface> -w benchmark.cap port 6378

@royjacobson royjacobson reopened this Jun 7, 2023
@zhijian-pro
Copy link
Author

@royjacobson benchmark.cap.zip

royjacobson added a commit that referenced this issue Jun 8, 2023
This removes the "25" limit for batched messages.

Turns out the aggregation in #1287 was not aggressive enough, because
it's quite possible to reach the specified max capacity of  io vectors.
For example, each "QUEUED" is actually "+", "QUEUED", "\r\n" so we can
reach the limit with about 8 batched commands and then finish\
aggregating prematurely.

Closes #1285
romange pushed a commit that referenced this issue Jun 8, 2023
This removes the "25" limit for batched messages.

Turns out the aggregation in #1287 was not aggressive enough, because
it's quite possible to reach the specified max capacity of  io vectors.
For example, each "QUEUED" is actually "+", "QUEUED", "\r\n" so we can
reach the limit with about 8 batched commands and then finish\
aggregating prematurely.

Closes #1285
dranikpg pushed a commit that referenced this issue Jun 9, 2023
This removes the "25" limit for batched messages.

Turns out the aggregation in #1287 was not aggressive enough, because
it's quite possible to reach the specified max capacity of  io vectors.
For example, each "QUEUED" is actually "+", "QUEUED", "\r\n" so we can
reach the limit with about 8 batched commands and then finish\
aggregating prematurely.

Closes #1285
@vimaxwell
Copy link

@royjacobson they said it's still not stable

juicedata/juicefs#3363 (comment)

@vimaxwell
Copy link

@romange

@dranikpg dranikpg reopened this Aug 13, 2023
@romange romange closed this as completed Aug 26, 2023
@vimaxwell
Copy link

@romange is it fixed?

@romange
Copy link
Collaborator

romange commented Aug 26, 2023

No, because we have not received any new information. Our tests do not show any significant latency.

@vimaxwell
Copy link

@romange Sorry for bothering but It's confused between your test and @zhijian-pro test.
These are their feedback

juicedata/juicefs#3363 (comment)
juicedata/juicefs#3363 (comment)

@zhijian-pro
Copy link
Author

zhijian-pro commented Aug 26, 2023

I'm going to retest this problem.
Which version of the Dragonfly should I use to test, and whether to use the tcp_nodelay parameter when testing ? @royjacobson

@zhijian-pro
Copy link
Author

zhijian-pro commented Aug 30, 2023

2023/08/30 Test results:
Dragonfly build from main branch
Redis

root@ip-10-0-101-131:~/repo/juicefs# addr=redis://10.0.101.178:6379/7 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis
=== RUN   TestDgfAndRedis
2023/08/30 05:52:54.820961 juicefs[81643] <INFO>: Meta address: redis://10.0.101.178:6379/7 [interface.go:494]
2023/08/30 05:52:54.821797 juicefs[81643] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly. [info.go:84]
2023/08/30 05:52:54.821953 juicefs[81643] <INFO>: Ping redis latency: 80.821µs [redis.go:3523]
2023/08/30 05:52:54.822138 juicefs[81643] <WARNING>: Existing volume will be overwrited: {
  "Name": "benchmarkAll",
  "UUID": "",
  "Storage": "",
  "Bucket": "",
  "BlockSize": 0,
  "TrashDays": 0
} [config.go:96]
2023/08/30 05:52:54.822780 juicefs[81643] <INFO>: Create session 2 OK with version: 1.1.0-dev+unknown [base.go:474]
    benchmarks_test.go:673: number:  0: cost: 380 us
    benchmarks_test.go:673: number:  1: cost: 371 us
    benchmarks_test.go:673: number:  2: cost: 368 us
    benchmarks_test.go:673: number:  3: cost: 378 us
    benchmarks_test.go:673: number:  4: cost: 385 us
    benchmarks_test.go:673: number:  5: cost: 372 us
    benchmarks_test.go:673: number:  6: cost: 367 us
    benchmarks_test.go:673: number:  7: cost: 372 us
    benchmarks_test.go:673: number:  8: cost: 371 us
    benchmarks_test.go:673: number:  9: cost: 360 us
    benchmarks_test.go:673: number: 10: cost: 372 us
    benchmarks_test.go:673: number: 11: cost: 379 us
    benchmarks_test.go:673: number: 12: cost: 373 us
    benchmarks_test.go:673: number: 13: cost: 364 us
    benchmarks_test.go:673: number: 14: cost: 366 us
    benchmarks_test.go:673: number: 15: cost: 399 us
    benchmarks_test.go:673: number: 16: cost: 381 us
    benchmarks_test.go:673: number: 17: cost: 366 us
    benchmarks_test.go:673: number: 18: cost: 389 us
    benchmarks_test.go:673: number: 19: cost: 380 us
    benchmarks_test.go:673: number: 20: cost: 372 us
    benchmarks_test.go:673: number: 21: cost: 361 us
    benchmarks_test.go:673: number: 22: cost: 369 us
    benchmarks_test.go:673: number: 23: cost: 365 us
    benchmarks_test.go:673: number: 24: cost: 370 us
    benchmarks_test.go:673: number: 25: cost: 391 us
    benchmarks_test.go:673: number: 26: cost: 371 us
    benchmarks_test.go:673: number: 27: cost: 374 us
    benchmarks_test.go:673: number: 28: cost: 366 us
    benchmarks_test.go:673: number: 29: cost: 371 us
    benchmarks_test.go:673: number: 30: cost: 363 us
    benchmarks_test.go:673: number: 31: cost: 376 us
    benchmarks_test.go:673: number: 32: cost: 367 us
    benchmarks_test.go:673: number: 33: cost: 365 us
    benchmarks_test.go:673: number: 34: cost: 366 us
    benchmarks_test.go:673: number: 35: cost: 436 us
    benchmarks_test.go:673: number: 36: cost: 383 us
    benchmarks_test.go:673: number: 37: cost: 368 us
    benchmarks_test.go:673: number: 38: cost: 360 us
    benchmarks_test.go:673: number: 39: cost: 365 us
    benchmarks_test.go:673: number: 40: cost: 371 us
    benchmarks_test.go:673: number: 41: cost: 392 us
    benchmarks_test.go:673: number: 42: cost: 378 us
    benchmarks_test.go:673: number: 43: cost: 402 us
    benchmarks_test.go:673: number: 44: cost: 362 us
    benchmarks_test.go:673: number: 45: cost: 375 us
    benchmarks_test.go:673: number: 46: cost: 474 us
    benchmarks_test.go:673: number: 47: cost: 382 us
    benchmarks_test.go:673: number: 48: cost: 373 us
    benchmarks_test.go:673: number: 49: cost: 367 us
    benchmarks_test.go:673: number: 50: cost: 381 us
    benchmarks_test.go:673: number: 51: cost: 369 us
    benchmarks_test.go:673: number: 52: cost: 366 us
    benchmarks_test.go:673: number: 53: cost: 365 us
    benchmarks_test.go:673: number: 54: cost: 366 us
    benchmarks_test.go:673: number: 55: cost: 398 us
    benchmarks_test.go:673: number: 56: cost: 376 us
    benchmarks_test.go:673: number: 57: cost: 360 us
    benchmarks_test.go:673: number: 58: cost: 359 us
    benchmarks_test.go:673: number: 59: cost: 380 us
    benchmarks_test.go:673: number: 60: cost: 365 us
    benchmarks_test.go:673: number: 61: cost: 377 us
    benchmarks_test.go:673: number: 62: cost: 378 us
    benchmarks_test.go:673: number: 63: cost: 377 us
    benchmarks_test.go:673: number: 64: cost: 375 us
    benchmarks_test.go:673: number: 65: cost: 386 us
    benchmarks_test.go:673: number: 66: cost: 381 us
    benchmarks_test.go:673: number: 67: cost: 364 us
    benchmarks_test.go:673: number: 68: cost: 402 us
    benchmarks_test.go:673: number: 69: cost: 372 us
    benchmarks_test.go:673: number: 70: cost: 375 us
    benchmarks_test.go:673: number: 71: cost: 386 us
    benchmarks_test.go:673: number: 72: cost: 373 us
    benchmarks_test.go:673: number: 73: cost: 373 us
    benchmarks_test.go:673: number: 74: cost: 375 us
    benchmarks_test.go:673: number: 75: cost: 368 us
    benchmarks_test.go:673: number: 76: cost: 375 us
    benchmarks_test.go:673: number: 77: cost: 376 us
    benchmarks_test.go:673: number: 78: cost: 374 us
    benchmarks_test.go:673: number: 79: cost: 374 us
    benchmarks_test.go:673: number: 80: cost: 367 us
    benchmarks_test.go:673: number: 81: cost: 374 us
    benchmarks_test.go:673: number: 82: cost: 364 us
    benchmarks_test.go:673: number: 83: cost: 399 us
    benchmarks_test.go:673: number: 84: cost: 408 us
    benchmarks_test.go:673: number: 85: cost: 376 us
    benchmarks_test.go:673: number: 86: cost: 371 us
    benchmarks_test.go:673: number: 87: cost: 389 us
    benchmarks_test.go:673: number: 88: cost: 381 us
    benchmarks_test.go:673: number: 89: cost: 373 us
    benchmarks_test.go:673: number: 90: cost: 378 us
    benchmarks_test.go:673: number: 91: cost: 378 us
    benchmarks_test.go:673: number: 92: cost: 382 us
    benchmarks_test.go:673: number: 93: cost: 374 us
    benchmarks_test.go:673: number: 94: cost: 420 us
    benchmarks_test.go:673: number: 95: cost: 376 us
    benchmarks_test.go:673: number: 96: cost: 392 us
    benchmarks_test.go:673: number: 97: cost: 387 us
    benchmarks_test.go:673: number: 98: cost: 395 us
    benchmarks_test.go:673: number: 99: cost: 447 us
    benchmarks_test.go:676: --------------------test redis://10.0.101.178:6379/7--------------------
    benchmarks_test.go:677: mean: 378.05, standard deviation: 17.42
--- PASS: TestDgfAndRedis (0.10s)
PASS
ok  	github.com/juicedata/juicefs/pkg/meta	0.115s

Dragonfly without tcp_nodelay
(Second run result)

root@ip-10-0-101-131:~/repo/juicefs# addr=redis://10.0.101.178:6378/7 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis
=== RUN   TestDgfAndRedis
2023/08/30 06:55:50.887379 juicefs[93142] <INFO>: Meta address: redis://10.0.101.178:6378/7 [interface.go:494]
2023/08/30 06:55:50.888428 juicefs[93142] <INFO>: Ping redis latency: 77.277µs [redis.go:3523]
2023/08/30 06:55:50.888640 juicefs[93142] <WARNING>: Existing volume will be overwrited: {
  "Name": "benchmarkAll",
  "UUID": "",
  "Storage": "",
  "Bucket": "",
  "BlockSize": 0,
  "TrashDays": 0
} [config.go:96]
2023/08/30 06:55:50.889807 juicefs[93142] <INFO>: Create session 2 OK with version: 1.1.0-dev+unknown [base.go:474]
2023/08/30 06:55:50.933023 juicefs[93142] <WARNING>: unexpected error for lookup: ERR Error running script (call to 8526c64984678fc27f67f44755c8ba7dcdf7ea60): @user_script:12: script tried accessing undeclared key [redis.go:695]
    benchmarks_test.go:673: number:  0: cost: 43978 us
    benchmarks_test.go:673: number:  1: cost: 43987 us
    benchmarks_test.go:673: number:  2: cost: 44007 us
    benchmarks_test.go:673: number:  3: cost: 43996 us
    benchmarks_test.go:673: number:  4: cost: 43953 us
    benchmarks_test.go:673: number:  5: cost: 43980 us
    benchmarks_test.go:673: number:  6: cost: 43972 us
    benchmarks_test.go:673: number:  7: cost: 43993 us
    benchmarks_test.go:673: number:  8: cost: 44000 us
    benchmarks_test.go:673: number:  9: cost: 43974 us
    benchmarks_test.go:673: number: 10: cost: 43990 us
    benchmarks_test.go:673: number: 11: cost: 44003 us
    benchmarks_test.go:673: number: 12: cost: 43988 us
    benchmarks_test.go:673: number: 13: cost: 43963 us
    benchmarks_test.go:673: number: 14: cost: 43981 us
    benchmarks_test.go:673: number: 15: cost: 44021 us
    benchmarks_test.go:673: number: 16: cost: 43965 us
    benchmarks_test.go:673: number: 17: cost: 44024 us
    benchmarks_test.go:673: number: 18: cost: 43949 us
    benchmarks_test.go:673: number: 19: cost: 43973 us
    benchmarks_test.go:673: number: 20: cost: 43998 us
    benchmarks_test.go:673: number: 21: cost: 43957 us
    benchmarks_test.go:673: number: 22: cost: 44009 us
    benchmarks_test.go:673: number: 23: cost: 44017 us
    benchmarks_test.go:673: number: 24: cost: 43935 us
    benchmarks_test.go:673: number: 25: cost: 44056 us
    benchmarks_test.go:673: number: 26: cost: 43976 us
    benchmarks_test.go:673: number: 27: cost: 43956 us
    benchmarks_test.go:673: number: 28: cost: 43978 us
    benchmarks_test.go:673: number: 29: cost: 43964 us
    benchmarks_test.go:673: number: 30: cost: 43980 us
    benchmarks_test.go:673: number: 31: cost: 44031 us
    benchmarks_test.go:673: number: 32: cost: 43971 us
    benchmarks_test.go:673: number: 33: cost: 43980 us
    benchmarks_test.go:673: number: 34: cost: 44028 us
    benchmarks_test.go:673: number: 35: cost: 43989 us
    benchmarks_test.go:673: number: 36: cost: 43941 us
    benchmarks_test.go:673: number: 37: cost: 44005 us
    benchmarks_test.go:673: number: 38: cost: 43986 us
    benchmarks_test.go:673: number: 39: cost: 44002 us
    benchmarks_test.go:673: number: 40: cost: 43981 us
    benchmarks_test.go:673: number: 41: cost: 43976 us
    benchmarks_test.go:673: number: 42: cost: 43971 us
    benchmarks_test.go:673: number: 43: cost: 43995 us
    benchmarks_test.go:673: number: 44: cost: 43977 us
    benchmarks_test.go:673: number: 45: cost: 44038 us
    benchmarks_test.go:673: number: 46: cost: 43967 us
    benchmarks_test.go:673: number: 47: cost: 43979 us
    benchmarks_test.go:673: number: 48: cost: 43984 us
    benchmarks_test.go:673: number: 49: cost: 43983 us
    benchmarks_test.go:673: number: 50: cost: 44000 us
    benchmarks_test.go:673: number: 51: cost: 43980 us
    benchmarks_test.go:673: number: 52: cost: 43968 us
    benchmarks_test.go:673: number: 53: cost: 43985 us
    benchmarks_test.go:673: number: 54: cost: 44004 us
    benchmarks_test.go:673: number: 55: cost: 43985 us
    benchmarks_test.go:673: number: 56: cost: 43997 us
    benchmarks_test.go:673: number: 57: cost: 43987 us
    benchmarks_test.go:673: number: 58: cost: 44001 us
    benchmarks_test.go:673: number: 59: cost: 43972 us
    benchmarks_test.go:673: number: 60: cost: 43988 us
    benchmarks_test.go:673: number: 61: cost: 43990 us
    benchmarks_test.go:673: number: 62: cost: 43982 us
    benchmarks_test.go:673: number: 63: cost: 43986 us
    benchmarks_test.go:673: number: 64: cost: 43985 us
    benchmarks_test.go:673: number: 65: cost: 43945 us
    benchmarks_test.go:673: number: 66: cost: 43995 us
    benchmarks_test.go:673: number: 67: cost: 43991 us
    benchmarks_test.go:673: number: 68: cost: 44001 us
    benchmarks_test.go:673: number: 69: cost: 43997 us
    benchmarks_test.go:673: number: 70: cost: 43954 us
    benchmarks_test.go:673: number: 71: cost: 44010 us
    benchmarks_test.go:673: number: 72: cost: 44033 us
    benchmarks_test.go:673: number: 73: cost: 43964 us
    benchmarks_test.go:673: number: 74: cost: 43954 us
    benchmarks_test.go:673: number: 75: cost: 43979 us
    benchmarks_test.go:673: number: 76: cost: 43989 us
    benchmarks_test.go:673: number: 77: cost: 44008 us
    benchmarks_test.go:673: number: 78: cost: 43979 us
    benchmarks_test.go:673: number: 79: cost: 44004 us
    benchmarks_test.go:673: number: 80: cost: 43995 us
    benchmarks_test.go:673: number: 81: cost: 43977 us
    benchmarks_test.go:673: number: 82: cost: 43997 us
    benchmarks_test.go:673: number: 83: cost: 43978 us
    benchmarks_test.go:673: number: 84: cost: 43980 us
    benchmarks_test.go:673: number: 85: cost: 44000 us
    benchmarks_test.go:673: number: 86: cost: 43973 us
    benchmarks_test.go:673: number: 87: cost: 44030 us
    benchmarks_test.go:673: number: 88: cost: 43959 us
    benchmarks_test.go:673: number: 89: cost: 43970 us
    benchmarks_test.go:673: number: 90: cost: 43999 us
    benchmarks_test.go:673: number: 91: cost: 43972 us
    benchmarks_test.go:673: number: 92: cost: 43990 us
    benchmarks_test.go:673: number: 93: cost: 44005 us
    benchmarks_test.go:673: number: 94: cost: 43984 us
    benchmarks_test.go:673: number: 95: cost: 43990 us
    benchmarks_test.go:673: number: 96: cost: 44014 us
    benchmarks_test.go:673: number: 97: cost: 43948 us
    benchmarks_test.go:673: number: 98: cost: 43990 us
    benchmarks_test.go:673: number: 99: cost: 44000 us
    benchmarks_test.go:676: --------------------test redis://10.0.101.178:6378/7--------------------
    benchmarks_test.go:677: mean: 43986.71, standard deviation: 21.64
--- PASS: TestDgfAndRedis (9.07s)
PASS
ok  	github.com/juicedata/juicefs/pkg/meta	9.083s

Dragonfly with tcp_nodelay

root@ip-10-0-101-131:~/repo/juicefs# addr=redis://10.0.101.178:6378/7 go test -count=1  -v  ./pkg/meta/... -run=TestDgfAndRedis
=== RUN   TestDgfAndRedis
2023/08/30 05:51:49.680699 juicefs[81430] <INFO>: Meta address: redis://10.0.101.178:6378/7 [interface.go:494]
2023/08/30 05:51:49.681852 juicefs[81430] <INFO>: Ping redis latency: 81.495µs [redis.go:3523]
2023/08/30 05:51:49.682073 juicefs[81430] <WARNING>: Existing volume will be overwrited: {
  "Name": "benchmarkAll",
  "UUID": "",
  "Storage": "",
  "Bucket": "",
  "BlockSize": 0,
  "TrashDays": 0
} [config.go:96]
2023/08/30 05:51:49.682834 juicefs[81430] <INFO>: Create session 4 OK with version: 1.1.0-dev+unknown [base.go:474]
2023/08/30 05:51:49.683599 juicefs[81430] <WARNING>: unexpected error for lookup: ERR Error running script (call to 8526c64984678fc27f67f44755c8ba7dcdf7ea60): @user_script:12: script tried accessing undeclared key [redis.go:695]
    benchmarks_test.go:673: number:  0: cost: 559 us
    benchmarks_test.go:673: number:  1: cost: 580 us
    benchmarks_test.go:673: number:  2: cost: 566 us
    benchmarks_test.go:673: number:  3: cost: 610 us
    benchmarks_test.go:673: number:  4: cost: 578 us
    benchmarks_test.go:673: number:  5: cost: 543 us
    benchmarks_test.go:673: number:  6: cost: 558 us
    benchmarks_test.go:673: number:  7: cost: 561 us
    benchmarks_test.go:673: number:  8: cost: 548 us
    benchmarks_test.go:673: number:  9: cost: 545 us
    benchmarks_test.go:673: number: 10: cost: 561 us
    benchmarks_test.go:673: number: 11: cost: 531 us
    benchmarks_test.go:673: number: 12: cost: 572 us
    benchmarks_test.go:673: number: 13: cost: 555 us
    benchmarks_test.go:673: number: 14: cost: 550 us
    benchmarks_test.go:673: number: 15: cost: 544 us
    benchmarks_test.go:673: number: 16: cost: 552 us
    benchmarks_test.go:673: number: 17: cost: 549 us
    benchmarks_test.go:673: number: 18: cost: 556 us
    benchmarks_test.go:673: number: 19: cost: 564 us
    benchmarks_test.go:673: number: 20: cost: 554 us
    benchmarks_test.go:673: number: 21: cost: 547 us
    benchmarks_test.go:673: number: 22: cost: 556 us
    benchmarks_test.go:673: number: 23: cost: 587 us
    benchmarks_test.go:673: number: 24: cost: 586 us
    benchmarks_test.go:673: number: 25: cost: 594 us
    benchmarks_test.go:673: number: 26: cost: 566 us
    benchmarks_test.go:673: number: 27: cost: 551 us
    benchmarks_test.go:673: number: 28: cost: 543 us
    benchmarks_test.go:673: number: 29: cost: 532 us
    benchmarks_test.go:673: number: 30: cost: 536 us
    benchmarks_test.go:673: number: 31: cost: 537 us
    benchmarks_test.go:673: number: 32: cost: 536 us
    benchmarks_test.go:673: number: 33: cost: 541 us
    benchmarks_test.go:673: number: 34: cost: 538 us
    benchmarks_test.go:673: number: 35: cost: 553 us
    benchmarks_test.go:673: number: 36: cost: 590 us
    benchmarks_test.go:673: number: 37: cost: 565 us
    benchmarks_test.go:673: number: 38: cost: 574 us
    benchmarks_test.go:673: number: 39: cost: 556 us
    benchmarks_test.go:673: number: 40: cost: 550 us
    benchmarks_test.go:673: number: 41: cost: 570 us
    benchmarks_test.go:673: number: 42: cost: 583 us
    benchmarks_test.go:673: number: 43: cost: 564 us
    benchmarks_test.go:673: number: 44: cost: 552 us
    benchmarks_test.go:673: number: 45: cost: 533 us
    benchmarks_test.go:673: number: 46: cost: 565 us
    benchmarks_test.go:673: number: 47: cost: 551 us
    benchmarks_test.go:673: number: 48: cost: 567 us
    benchmarks_test.go:673: number: 49: cost: 544 us
    benchmarks_test.go:673: number: 50: cost: 533 us
    benchmarks_test.go:673: number: 51: cost: 533 us
    benchmarks_test.go:673: number: 52: cost: 545 us
    benchmarks_test.go:673: number: 53: cost: 558 us
    benchmarks_test.go:673: number: 54: cost: 551 us
    benchmarks_test.go:673: number: 55: cost: 542 us
    benchmarks_test.go:673: number: 56: cost: 563 us
    benchmarks_test.go:673: number: 57: cost: 600 us
    benchmarks_test.go:673: number: 58: cost: 590 us
    benchmarks_test.go:673: number: 59: cost: 568 us
    benchmarks_test.go:673: number: 60: cost: 572 us
    benchmarks_test.go:673: number: 61: cost: 564 us
    benchmarks_test.go:673: number: 62: cost: 552 us
    benchmarks_test.go:673: number: 63: cost: 575 us
    benchmarks_test.go:673: number: 64: cost: 600 us
    benchmarks_test.go:673: number: 65: cost: 554 us
    benchmarks_test.go:673: number: 66: cost: 563 us
    benchmarks_test.go:673: number: 67: cost: 528 us
    benchmarks_test.go:673: number: 68: cost: 537 us
    benchmarks_test.go:673: number: 69: cost: 522 us
    benchmarks_test.go:673: number: 70: cost: 547 us
    benchmarks_test.go:673: number: 71: cost: 571 us
    benchmarks_test.go:673: number: 72: cost: 585 us
    benchmarks_test.go:673: number: 73: cost: 592 us
    benchmarks_test.go:673: number: 74: cost: 598 us
    benchmarks_test.go:673: number: 75: cost: 577 us
    benchmarks_test.go:673: number: 76: cost: 565 us
    benchmarks_test.go:673: number: 77: cost: 556 us
    benchmarks_test.go:673: number: 78: cost: 550 us
    benchmarks_test.go:673: number: 79: cost: 543 us
    benchmarks_test.go:673: number: 80: cost: 573 us
    benchmarks_test.go:673: number: 81: cost: 546 us
    benchmarks_test.go:673: number: 82: cost: 544 us
    benchmarks_test.go:673: number: 83: cost: 539 us
    benchmarks_test.go:673: number: 84: cost: 538 us
    benchmarks_test.go:673: number: 85: cost: 562 us
    benchmarks_test.go:673: number: 86: cost: 546 us
    benchmarks_test.go:673: number: 87: cost: 519 us
    benchmarks_test.go:673: number: 88: cost: 546 us
    benchmarks_test.go:673: number: 89: cost: 568 us
    benchmarks_test.go:673: number: 90: cost: 545 us
    benchmarks_test.go:673: number: 91: cost: 555 us
    benchmarks_test.go:673: number: 92: cost: 573 us
    benchmarks_test.go:673: number: 93: cost: 557 us
    benchmarks_test.go:673: number: 94: cost: 581 us
    benchmarks_test.go:673: number: 95: cost: 566 us
    benchmarks_test.go:673: number: 96: cost: 543 us
    benchmarks_test.go:673: number: 97: cost: 543 us
    benchmarks_test.go:673: number: 98: cost: 539 us
    benchmarks_test.go:673: number: 99: cost: 655 us
    benchmarks_test.go:676: --------------------test redis://10.0.101.178:6378/7--------------------
    benchmarks_test.go:677: mean: 558.75, standard deviation: 20.96
--- PASS: TestDgfAndRedis (0.16s)
PASS
ok  	github.com/juicedata/juicefs/pkg/meta	0.174s

When the tcp_nodelay parameter is used, the previous instability problem does not exist. The test results is about 30-50% slower than redis, so we think that in the case of `tcp_nodelay``, Dragonfly can be used as a metadata engine for Juicefs just like redis.

When Dragonfly as the metadata engine for JuiceFS,must use tcp_nodelay or the performance is close to unuseable.

@vimaxwell @royjacobson

@royjacobson royjacobson reopened this Aug 30, 2023
@royjacobson
Copy link
Contributor

royjacobson commented Aug 30, 2023

Hi @zhijian-pro, thanks again for your great help reporting and debugging this issue.

I've tried to reproduce now, both with an up-to-date version and with the release binary from here and unfortunately I couldn't reproduce the latency behavior you describe.

Could you maybe record and upload a new tcpdump capture of the latency issue? It was very helpful last time. The reason we'd like to avoid tcp_nodelay is that in our experience that it causes higher CPU usage.

@romange
Copy link
Collaborator

romange commented Aug 30, 2023

Also, can you please describe your OS (kernel, distro etc).
Also, did you apply any network-related changes to the default image?

@zhijian-pro
Copy link
Author

zhijian-pro commented Aug 30, 2023

@royjacobson sudo tcpdump -i ens5 -w benchmark.cap port 6378 on Dragonfly host without tcp_nodelay benchmark.cap.zip

@romange I'm using c5d.xlarge from aws.

# uname -a
Linux ip-10-0-101-178 5.15.0-1043-aws #48~20.04.1-Ubuntu SMP Wed Aug 16 18:26:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

No network changes have been made

@zhijian-pro
Copy link
Author

zhijian-pro commented Aug 30, 2023

I just tested it and found that your pre-compiled 1.8 version works fine without using tcp_nodelay. Is it possible that my main branch compilation is incorrect?
My compilation steps is https://github.com/dragonflydb/dragonfly/blob/main/docs/build-from-source.md
You can download my self-compiled version by clicking here

@romange
Copy link
Collaborator

romange commented Aug 30, 2023

@zhijian-pro what's /etc/os-release ?
what's the commit hash of your HEAD during the compilation?
can you attach the /tmp/dragonfly.INFO log as well?

@royjacobson
Copy link
Contributor

Huh. I think I just reproduced when the latency issue happens?
If I start a new dragonfly instance with no data, it works great. If it has the data from the previous profiling session, I see the 40ms latency issue. At least that's what I see locally, and it can explain why it seems to depend on such random things.

I don't think it's a bug in the test, but now that I can reproduce it I'm optimistic about solving it soon :)

@royjacobson royjacobson self-assigned this Aug 30, 2023
@zhijian-pro
Copy link
Author

# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

git log
head hash is 0bbd14bee8b1fcf6089ed7a67a6fb482f49594e3
image

There is no log output when the test code is run

root@ip-10-0-101-178:~/dragonfly# ./dragonfly --logtostderr  -dbnum 10 --bind 0.0.0.0 --port 6378
I20230830 11:47:03.318627  3502 init.cc:69] ./dragonfly running in opt mode.
* Logs will be written to the first available of the following paths:
/tmp/dragonfly.*
./dragonfly.*
* For the available flags type dragonfly [--help | --helpfull]
* Documentation can be found at: https://www.dragonflydb.io/docs
I20230830 11:47:03.318701  3502 dfly_main.cc:716] Starting dragonfly df-dev-0000000
I20230830 11:47:03.318881  3502 dfly_main.cc:758] maxmemory has not been specified. Deciding myself....
I20230830 11:47:03.318892  3502 dfly_main.cc:762] Found 6.92GiB available memory. Setting maxmemory to 5.54GiB
I20230830 11:47:03.319913  3503 uring_proactor.cc:158] IORing with 1024 entries, allocated 102720 bytes, cq_entries is 2048
I20230830 11:47:03.323329  3502 proactor_pool.cc:147] Running 4 io threads
I20230830 11:47:03.333969  3502 server_family.cc:240] Data directory is "/root/dragonfly"
I20230830 11:47:03.334321  3502 server_family.cc:1058] Loading /root/dragonfly/dump-2023-08-30T09:28:09-summary.dfs
I20230830 11:47:03.334632  3503 listener_interface.cc:97] sock[15] AcceptServer - listening on port 6378
I20230830 11:47:03.504679  3505 server_family.cc:1105] Load finished, num keys read: 214919

@zhijian-pro what's /etc/os-release ? what's the commit hash of your HEAD during the compilation? can you attach the /tmp/dragonfly.INFO log as well?

@zhijian-pro
Copy link
Author

zhijian-pro commented Aug 30, 2023

Huh. I think I just reproduced when the latency issue happens? If I start a new dragonfly instance with no data, it works great. If it has the data from the previous profiling session, I see the 40ms latency issue. At least that's what I see locally, and it can explain why it seems to depend on such random things.

I don't think it's a bug in the test, but now that I can reproduce it I'm optimistic about solving it soon :)

That's why I noted the second run.
I don't think the high second run latency has anything to do with the test program, but it could be that the test program triggered some behavior in Dragonfly that caused it to run particularly slow.

However, when I use your pre-compiled version 1.8, it can run very well without tcp_nodelay. Is that a compilation problem?
Is the binary you reproduce also the main branch of manual compilation ?

@zhijian-pro
Copy link
Author

This branch is based on test_dragonfly changes, it disables the lua script. Use this branch so that the test case will be able to reproduce the high latency problem every time (using my own compiled binary).

This may be helpful to you.

git clone -b test_for_dragonfly_without_lua https://github.com/juicedata/juicefs.git && cd juicefs

@pveentjer
Copy link

The 40ms problem might be caused by the bad interaction of Nagle and delayed acks.

https://news.ycombinator.com/item?id=24785405

@chakaz
Copy link
Collaborator

chakaz commented Dec 10, 2023

@pveentjer indeed, that's why we actually flipped --tcp_nodelay to true be default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
7 participants