What is the expected throughput? #7

makorne · 2021-08-20T08:14:58Z

Hi!
Thank you for your great crate!

I am testing sqlxmq_stress and I dont see any high load for cores.

My results:

num_jobs = 1000; set_concurrency(50, 1000)

min: 8.296434179s
max: 9.840498547s
median: 8.851534467s
95th percentile: 9.600073887s
throughput: 99.19286908241159/s

num_jobs = 10000; set_concurrency(50, 1000)

Took more than 2 hours and still works on Ryzen 5900HX / SSD.

I think may be it is hung?
How to prevent such situations and what is the expected throughput on recent hardware?

The text was updated successfully, but these errors were encountered:

makorne · 2021-08-20T10:48:39Z

I did this test several times with num_jobs = 3000; set_concurrency(50, 100)

1 time this result:

min: 3.804157088s
max: 40.62486232s
median: 34.72999443s
95th percentile: 39.781787279s
throughput: 72.50754711855541/s

1 time this result:

min: 8.497855039s
max: 23.348883659s
median: 18.169892469s
95th percentile: 22.225424099s
throughput: 127.19332590396806/s

Other 4 times: the job tasks ended in table mq_msgs but the program was still working endlessly.

Diggsey · 2021-08-29T11:24:40Z

Hi @makorne, sorry I didn't get around to investigating this earlier - I would like to figure out what the problem is though.

Diggsey · 2021-08-29T15:27:15Z

I can't seem to reproduce it. When I try with the parameters that caused it to hang for you (num_jobs = 10000, concurrency = [50, 1000]) I get these results:

min: 0.0021288s
max: 0.3401738s
median: 0.0037295s
95th percentile: 0.1190739s
throughput: 1427.4979093579368/s

Did you ever figure out what caused this?

imbolc · 2021-09-18T07:45:15Z

I've tried it too with different settings for concurrency up to (5000, 10000), but I couldn't reproduce it on my laptop

imbolc · 2021-09-18T07:53:06Z

Though after a couple of runs with (num_jobs = 100000, concurrency = [5000, 10000]) the process hangs with no activity and empty mq_payloads and mq_msgs tables.

imbolc · 2021-09-18T15:29:15Z

I've tried to locate the bug somehow and got this numbers:

  180s total: 100000, started:  61975, got json:  61975, completed:  55702, sent:  55702, payloads:  13924, msgs:  13924
  185s total: 100000, started:  66977, got json:  66977, completed:  60204, sent:  60204, payloads:   9670, msgs:   9670
  190s total: 100000, started:  71977, got json:  71977, completed:  64772, sent:  64772, payloads:   5102, msgs:   5102
  195s total: 100000, started:  76099, got json:  76099, completed:  70456, sent:  70456, payloads:    399, msgs:    399
  200s total: 100000, started:  76099, got json:  76099, completed:  72049, sent:  72049, payloads:      0, msgs:      0
  205s total: 100000, started:  76099, got json:  76099, completed:  72049, sent:  72049, payloads:      0, msgs:      0
  hanging ...

Here's the code I used: imbolc@3399b41

imbolc · 2021-09-20T07:19:00Z

Got it, at some point sqlxmq_stress::start_job results in PoolTimedOut. So it hangs because some tasks just aren't scheduled.

Diggsey · 2021-09-20T11:06:34Z

Ah, nice find! We should probably just abort if sending fails.

imbolc · 2021-09-20T12:01:12Z

Sure, but I new to async and couldn't find a way to pass the error back from a task without sacrificing performance

Diggsey · 2021-09-20T13:02:08Z

I've addressed this in 0.3.0.

sbeckeriv · 2022-02-04T21:48:10Z

@Diggsey Sorry to comment on a closed issue. I am not seeing the throughput on the stress test either. My system spec are at the bottom. I am running postgres 12.8 installed via the tool asdf if that matters.

With or without release I am getting about the same results. I even tried edited main to [50,1000]

min: 0.075423397s
max: 37.357557689s
median: 29.706058712s
95th percentile: 34.275748728s
throughput: 266.1852449625013/s

I know benchmarks depend on a lot of things and are really good for relative changes. I am wondering if there is anything you can think of that would cause the large difference?

Thanks for your hard work! I am excited about the project.
Becker

          .-/+oossssoo+/-.               becker
       `:+ssssssssssssssssss+:`           -------------------- 
     -+ssssssssssssssssssyyssss+-         OS: Ubuntu 21.04 x86_64 
   .ossssssssssssssssssdMMMNysssso.       Host: HP Z2 Tower G5 Workstation 
  /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 5.11.0-41-generic 
 +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 45 days, 22 hours, 12 mins 
/sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 1957 (dpkg), 9 (snap) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.1.4 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Resolution: 3840x2160 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   WM: Mutter 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   WM Theme: Adwaita 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Theme: Yaru [GTK3] 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Icons: Adwaita [GTK3] 
/sssssssshNMMMyhhyyyyhdNMMMNhssssssss/    Terminal: /dev/pts/2 
 +sssssssssdmydMMMMMMMMddddyssssssss+     CPU: Intel i9-10900K (20) @ 5.300GHz 
  /ssssssssssshdmNNNNmyNMMMMhssssss/      GPU: Intel CometLake-S GT2 [UHD Graphics 630] 
   .ossssssssssssssssssdMMMNysssso.       Memory: 6744MiB / 31882MiB 
     -+sssssssssssssssssyyyssss+-
       `:+ssssssssssssssssss+:`                                   
           .-/+oossssoo+/-.

Diggsey · 2022-02-05T14:09:50Z

@sbeckeriv I'm not sure TBH, this queue is not really designed for high throughput, but I do see much higher throughput than you're getting with much worse system specs. You are using an SSD right?

sbeckeriv · 2022-02-05T17:06:51Z

@Diggsey yes. PM981a NVMe Samsung 1024GB (15302129) Ext4 file format full disk encryption.

I know it doesnt have the symbols. I am working on it. It looks like there is a long pause for some reason.

I will keep digging and let you know what I find.

sbeckeriv · 2022-02-17T23:30:50Z

Hello again,

I got a flame graph to report things but I dont know what to make of it. Maybe something will spark an idea for you. Thanks again for your work on this.

https://gist.githubusercontent.com/sbeckeriv/8b97f44a88364afdd1ba0d2b87f9527e/raw/bd706be7105e05f15eb4f1d91541e0aeadbd099d/flame.svg

Github does something funky with the svg file. I can zoom on it locally. The gist file at least supports hover.
Becker

makorne · 2022-04-18T13:51:40Z

I've tried to locate the bug somehow and got this numbers:

  180s total: 100000, started:  61975, got json:  61975, completed:  55702, sent:  55702, payloads:  13924, msgs:  13924
  185s total: 100000, started:  66977, got json:  66977, completed:  60204, sent:  60204, payloads:   9670, msgs:   9670
  190s total: 100000, started:  71977, got json:  71977, completed:  64772, sent:  64772, payloads:   5102, msgs:   5102
  195s total: 100000, started:  76099, got json:  76099, completed:  70456, sent:  70456, payloads:    399, msgs:    399
  200s total: 100000, started:  76099, got json:  76099, completed:  72049, sent:  72049, payloads:      0, msgs:      0
  205s total: 100000, started:  76099, got json:  76099, completed:  72049, sent:  72049, payloads:      0, msgs:      0
  hanging ...

Here's the code I used: imbolc@3399b41

I tried your code on pg14 and latest sqlxmq.
It is hanging too on Ryzen 5900HX / NVme SSD.
Looks like some bug still exists.

const MIN_CONCURRENCY: usize = 50;
const MAX_CONCURRENCY: usize = 1000;

32556s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32561s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32566s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32571s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32576s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32581s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32587s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32592s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32597s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32602s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0
32607s total: 100000, started:  82306, got json:  82306, completed:  81357, sent:  81357, payloads:      0, msgs:      0

const MIN_CONCURRENCY: usize = 50;
const MAX_CONCURRENCY: usize = 100;

  591s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0
  596s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0
  601s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0
  606s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0
  611s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0
  616s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0
  621s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0
  626s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0
  631s total: 100000, started:  81555, got json:  81555, completed:  81555, sent:  81555, payloads:      0, msgs:      0

makorne closed this as completed Aug 29, 2021

Diggsey reopened this Aug 29, 2021

Diggsey closed this as completed Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the expected throughput? #7

What is the expected throughput? #7

makorne commented Aug 20, 2021 •

edited

makorne commented Aug 20, 2021

Diggsey commented Aug 29, 2021

Diggsey commented Aug 29, 2021

imbolc commented Sep 18, 2021

imbolc commented Sep 18, 2021

imbolc commented Sep 18, 2021

imbolc commented Sep 20, 2021

Diggsey commented Sep 20, 2021

imbolc commented Sep 20, 2021

Diggsey commented Sep 20, 2021

sbeckeriv commented Feb 4, 2022

Diggsey commented Feb 5, 2022

sbeckeriv commented Feb 5, 2022

sbeckeriv commented Feb 17, 2022 •

edited

makorne commented Apr 18, 2022

What is the expected throughput? #7

What is the expected throughput? #7

Comments

makorne commented Aug 20, 2021 • edited

makorne commented Aug 20, 2021

Diggsey commented Aug 29, 2021

Diggsey commented Aug 29, 2021

imbolc commented Sep 18, 2021

imbolc commented Sep 18, 2021

imbolc commented Sep 18, 2021

imbolc commented Sep 20, 2021

Diggsey commented Sep 20, 2021

imbolc commented Sep 20, 2021

Diggsey commented Sep 20, 2021

sbeckeriv commented Feb 4, 2022

Diggsey commented Feb 5, 2022

sbeckeriv commented Feb 5, 2022

sbeckeriv commented Feb 17, 2022 • edited

makorne commented Apr 18, 2022

makorne commented Aug 20, 2021 •

edited

sbeckeriv commented Feb 17, 2022 •

edited