Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive index rework to make loading faster #8078

Merged
merged 5 commits into from
Jun 27, 2024
Merged

Conversation

macneale4
Copy link
Contributor

@macneale4 macneale4 commented Jun 26, 2024

The initial impl of archive indexes over optimized for space. This resulted in being 10x slower to load the index of archives than noms table files. To address this:

  • Dropped the end to end compression of the index
  • Dropped the use of var ints for offset deltas and chunk refs
  • Altered the use of byte span offsets, and instead used a end-offset approach which requires no delta processing on load.
  • Used only slices of primitive types in the index memory. Constant time read path with a little more complexity, but allows us to read directly off disk into memory.

Testing indicates that on a 41 Gb archive file, this returned load performance to match classic table files, and the size of the index increased by about 350Mb (total ~ 1Gb)

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
5ef8ca5 ok 5937457
version total_tests
5ef8ca5 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
0b83495 ok 5937457
version total_tests
0b83495 5937457
correctness_percentage
100.0

@macneale4 macneale4 marked this pull request as ready for review June 27, 2024 16:00
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
24185cc ok 5937457
version total_tests
24185cc 5937457
correctness_percentage
100.0

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Most of my questions were related to archive overall and messaged neil separately

Copy link

Additional work is required for integration with DoltgreSQL.

@macneale4 macneale4 merged commit 605b4de into main Jun 27, 2024
21 checks passed
@macneale4 macneale4 deleted the macneale4/archive-index branch June 27, 2024 19:16
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
b4f995a ok 5937457
version total_tests
b4f995a 5937457
correctness_percentage
100.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants