Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scavenge index #1638

Merged
merged 10 commits into from Nov 27, 2018
Merged

Scavenge index #1638

merged 10 commits into from Nov 27, 2018

Conversation

lscpike
Copy link
Contributor

@lscpike lscpike commented May 24, 2018

This is based on PR #1637.

The index currently gets scavenged during a level merge. This can happen at any time and can impact performance of the node. It is also very inefficient as each level performs a new scavenge even though it may have just been scavenged for a lower level. If the database hasn't been scavenged since the last merge this is a complete waste of time.

This PR adds an explicit scavenge operation to the TableIndex. This is called at the end of a DB scavenge meaning there is definite benefit to doing the work. It also means it can be stopped during high load situations (thanks to PR #1632.)

Notes

  • The index scavenge blocks the awaiting tables queue. This shouldn't be a problem as the impact is no different to a normal merge.
  • The scavenge code is based upon the merge code.
  • There's a decent test suite in the PR.
  • The merge still has a lot of the scavenge code in there for the v1 upgrade path. If v1 support is dropped in future this code gets a lot smaller.
  • The activity is logged to the $scavenges stream to fit the general pattern.

@lscpike
Copy link
Contributor Author

lscpike commented May 24, 2018

Just to note this my last PR around scavenging.

@gregoryyoung
Copy link
Contributor

Can you squash the changes?

@lscpike
Copy link
Contributor Author

lscpike commented May 25, 2018 via email

@gregoryyoung
Copy link
Contributor

I meant just for this one (in case we later need to revert). Its far easier to have the commit history with one commit for the feature.

@lscpike
Copy link
Contributor Author

lscpike commented May 25, 2018

Makes sense! Want the same for the others; faster scavenge and scavenge log clean up? I need to rebase anyway.

@lscpike lscpike force-pushed the feature/scavenge_index branch 3 times, most recently from e30f853 to 724043a Compare May 29, 2018 10:07
@lscpike
Copy link
Contributor Author

lscpike commented Jun 1, 2018

We realized that the OptimizeIndexMerge flag results in a lot of memory sitting unused when the index scavenge is complete. As this optimisation now only applies when doing a scavenge it makes sense to free the memory at the end of the scavenge process.

I can squash in if you want, but figured it's easier to see what we've changed as is for now.

@megakid
Copy link
Contributor

megakid commented Jun 20, 2018

Any movement on this or if not, timelines for progression?

@riccardone riccardone self-requested a review July 26, 2018 10:24
@riccardone riccardone self-requested a review July 26, 2018 14:48
@riccardone
Copy link
Contributor

@lscpike @megakid we like the stop operation that is implemented in your PR. Thanks for that.
We are not sure about the changes that you have done with multi-threading. The scavenge operation as is today is a very simple operation and we would like to keep it simple in order to serve all the different scenarios and environments where our clients are running ES.
Could it be possible keep as is or with minimal modification the current implementation and provide a separate implementation with the parallel/multithread execution?
Something to be driven by command line param (--parallel-scavenge) or http?

@lscpike
Copy link
Contributor Author

lscpike commented Jul 27, 2018 via email

@riccardone
Copy link
Contributor

riccardone commented Jul 27, 2018

I get the fact that there is a thread param count. I was asking about the possibility to keep separate your new scavenge multi-threaded implementation from the existing and use your one on demand. That's the reason for the --parallel-scavenge suggestion. An alternative to the command line param, the presence of the http thread count option could be used to decide to use your new implementation or the original.

@lscpike
Copy link
Contributor Author

lscpike commented Jul 27, 2018

The parallel implementation is only about 3 lines of code. Are you more concerned about the performance optimizations we've made?

Would it be possible to start merging the prior PRs to make the faster scavenge and index scavenge PRs easier to review? I'll rebase the others as each PR is merged to give a clean diff. We can then focus on getting the scavenge optimizations as low risk as possible for you.

@gregoryyoung
Copy link
Contributor

The parallel can work out to be slower was more the comment due to disk head thrashing vs sequential reading

@lscpike
Copy link
Contributor Author

lscpike commented Jul 27, 2018

The default thread count of 1 will result in sequential reads of the chunks. It's then up to the user not to use the hidden option (not in the UI) that might have a decrease in performance for their disks. That can be in the documentation if required.

@riccardone riccardone added this to the v4.2 (RC) milestone Aug 22, 2018
@lscpike
Copy link
Contributor Author

lscpike commented Sep 20, 2018

We're running this in production as a custom build now. Thought you might be interested to see the scavenge log.

image

@shaan1337
Copy link
Member

@lscpike @megakid Kindly also rebase these commits (including 6b18816)

@lscpike
Copy link
Contributor Author

lscpike commented Nov 5, 2018

Rebased as requested.

riccardone
riccardone previously approved these changes Nov 9, 2018
@riccardone
Copy link
Contributor

@lscpike I'm doing the final tests of this PR.
I wonder if it could be possible to log the 'total space saved' as it was before the changes?
https://github.com/lscpike/EventStore/blob/feature/scavenge_index/src/EventStore.Core/TransactionLog/Chunks/TFChunkScavenger.cs#L204
example:
SCAVENGING: total time taken: 00:08:47.3769352, total space saved: 802301658

@shaan1337 shaan1337 added the area/documentation Issues relating to project documentation label Nov 12, 2018
@riccardone
Copy link
Contributor

@lscpike in PTableConstruction can I ask why the new method is called Scavenged and not Scavenge?

@lscpike
Copy link
Contributor Author

lscpike commented Nov 12, 2018 via email

@lscpike
Copy link
Contributor Author

lscpike commented Nov 12, 2018

Just went to fix it and realised it's a factory method. It's createing a "Scavenged" PTable. I'll leave as is unless you have a different opinion.

riccardone
riccardone previously approved these changes Nov 13, 2018
Copy link
Contributor

@riccardone riccardone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these changes with small, medium and large databases and I confirm that the code works as expected. Well done, this is a very good improvement. Thanks.

riccardone
riccardone previously approved these changes Nov 26, 2018
Copy link
Contributor

@riccardone riccardone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these changes with small, medium and large databases and I confirm that the code works as expected. Well done, this is a very good improvement. Thanks.

@riccardone
Copy link
Contributor

riccardone commented Nov 26, 2018

@lscpike it's probably better if you rebase your PR (not me). Now fw is on 4.7.1 and there is only a simple conflict to fix on index_map. After that I will proceed merging the Faster_scavenge and this PR into master. Thanks

James Connor and others added 10 commits November 26, 2018 16:01
- Ensure we don't have too many threads
- More efficient memory use to avoid BitArray.
- Smaller struct for in memory copy of chunk.
… scavenge twice to collect up commits spanning multiple chunks.
- Test the TableIndex scavenge process.
- Test the index is scavenged at the end of a DB scavenge.
- Remove the ExistsAt check when doing an index table merge. The existsAt is still used when upgrading from a v1 index.
@lscpike
Copy link
Contributor Author

lscpike commented Nov 26, 2018

I rebased and resolved this. This branch includes the faster_scavenge changes. Do you want me to rebase that PR too or are you happy to merge all from here?

@riccardone
Copy link
Contributor

I'm happy to merge all from here, thanks.

Copy link
Contributor

@riccardone riccardone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these changes with small, medium and large databases and I confirm that the code works as expected. Well done, this is a very good improvement. Thanks.

@lscpike lscpike mentioned this pull request Nov 26, 2018
@riccardone riccardone merged commit ae4d359 into EventStore:master Nov 27, 2018
@ChrisChinchilla
Copy link
Contributor

Docs noted @riccardone and @shaan1337

@lscpike lscpike deleted the feature/scavenge_index branch December 6, 2018 11:46
@megakid megakid mentioned this pull request Dec 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Issues relating to project documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants