Improve query performance during indexing #212

msm-code · 2023-02-18T17:41:45Z

Currently running a query gets very slow if indexing operation is also in progress. This is (probably) because of how disk queues work - indexing is very disk heavy, and saturates the disk with reads of new files to index.

In practice, indexing new files is less important than responding to queries quickly. Ideally, running a query should always have a priority. I think we can solve this with linux's IO priority: https://www.kernel.org/doc/html/latest/block/ioprio.html.

Things to do:

Create a benchmark: Measure a query performance without indexing, and during indexing. Doesn't have to be very precise, but must show that performance during indexing is significantly degraded.
Investigate if it's possible to use ioprio_set/ioprio_get syscalls to work around this issue per worker.
Run the benchmark again, and make sure the query performance is better (and that indexing performance is not hugely impacted, though I don't expect it)
Hopefully this solves the issue, but if not, we can consider other measures (for example, pausing all indexing workers during processing a query)

msm-code · 2023-02-18T20:50:28Z

Benchmark: compacting a big dataset collection, and querying DB at the same time. All tests done after dropping VM cache. All tests repeated 3 times.

Performance when not compacting (baseline, best case performnance):

0:39
0:40
0:40

Performance when compacting (master):

1:03
1:07
1:11

Performance when compacting (after Lower iopriority when indexing to IDLE #213)

1:17
1:13
1:09

Yeah, on average the database got slower. But I realised that's because IO priority is per process, not per thread. And ursadb is a single (multi-threaded) process. So I can't actually do what I hoped to do.

But that's not all - I've tried to work around this by running a second ursadb process ("slow" process for compacting), and the results are

Performance when compacting (after Lower iopriority when indexing to IDLE #213, with separate process):

1:11
1:24
1:16

And this is... Even slower? This is surprising to me, I think this time all the priorities were installed as I wanted them to be.

But just to be sure I've ran a second DB by just manually changing the priority with ionice, and:

Performance when compacting (after Lower iopriority when indexing to IDLE #213, with separate process, one started by ionice -c 3):

1:10
1:10
1:15

I have to say this is underwhelming.

Also actually I also suspect that a big part of the slowdown comes from the fact that OS' disk cache is filled by useless (never again used) data. Maybe I should experiment with MADV_DONTNEED instead?

Anyway, looks like this approach may be more challenging than I suspected. Looks like I need to ponder on this topic a bit more 🤔

msm-code · 2023-02-18T21:08:57Z

Initial tests suggest that adding fadvise may not help either:

1:03
1:16
1:15

msm-code self-assigned this Feb 18, 2023

msm-code linked a pull request Feb 18, 2023 that will close this issue

Lower iopriority when indexing to IDLE #213

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve query performance during indexing #212

Improve query performance during indexing #212

msm-code commented Feb 18, 2023 •

edited

Loading

msm-code commented Feb 18, 2023

msm-code commented Feb 18, 2023

Improve query performance during indexing #212

Improve query performance during indexing #212

Comments

msm-code commented Feb 18, 2023 • edited Loading

msm-code commented Feb 18, 2023

msm-code commented Feb 18, 2023

msm-code commented Feb 18, 2023 •

edited

Loading