Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformat journal index #7780

Merged
merged 25 commits into from
May 8, 2024
Merged

Reformat journal index #7780

merged 25 commits into from
May 8, 2024

Conversation

max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Apr 25, 2024

Change the way we write journal index lookups. Each write appends a lookup to a bufio.Writer that lazily writes to disk. And after some increment we flush a CRC/root value record for consistency checking the index during bootstrap. This avoids big stalls for flushing a batch of index records. We also only write an addr16 now, because that's what we load into the default chunk address map.

Databases with the older format will pay a one-time startup penalty to rewrite the journal index. In testing this appears to be 5-10% of the import time for the database.

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
a1b2b7c ok 5937457
version total_tests
a1b2b7c 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 99.510960
version result total
7a0804b did not run 7423
7a0804b not ok 21613
7a0804b ok 5908511
7a0804b timeout 1
version total_tests
7a0804b 5937548
correctness_percentage
99.51096

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 99.378724
version result total
8e519a7 did not run 996
8e519a7 not ok 35891
8e519a7 ok 5900570
8e519a7 timeout 1
version total_tests
8e519a7 5937458
correctness_percentage
99.378724

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 99.387347
version result total
7bf0ce8 did not run 17061
7bf0ce8 not ok 19314
7bf0ce8 ok 5901082
7bf0ce8 timeout 1
version total_tests
7bf0ce8 5937458
correctness_percentage
99.387347

@max-hoffman
Copy link
Contributor Author

#benchmark

Copy link

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 99.686701
version result total
a008969 did not run 5107
a008969 not ok 13493
a008969 ok 5918856
a008969 timeout 2
version total_tests
a008969 5937458
correctness_percentage
99.686701

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
ee36b06 ok 5937457
version total_tests
ee36b06 5937457
correctness_percentage
100.0

@max-hoffman
Copy link
Contributor Author

#benchmark

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
290471d ok 5937457
version total_tests
290471d 5937457
correctness_percentage
100.0

Copy link

github-actions bot commented May 3, 2024

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

test_name from_latency_median to_latency_median is_faster
tpcc-scale-factor-1 97.55 90.78 0
test_name server_name server_version tps test_name server_name server_version tps is_faster
tpcc-scale-factor-1 dolt 34c3613 22.27 tpcc-scale-factor-1 dolt 290471d 24.97 0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

read_tests from_latency_median to_latency_median is_faster
covering_index_scan 3.07 3.02 0
groupby_scan 17.63 17.63 0
index_join 5.18 5.18 0
index_join_scan 2.22 2.26 0
index_scan 52.89 53.85 0
oltp_point_select 0.51 0.51 0
oltp_read_only 8.43 8.58 0
select_random_points 0.8 0.81 0
select_random_ranges 0.97 0.99 0
table_scan 53.85 54.83 0
types_table_scan 134.9 161.51 -1
write_tests from_latency_median to_latency_median is_faster
oltp_delete_insert 6.79 6.79 0
oltp_insert 3.36 3.36 0
oltp_read_write 16.12 16.12 0
oltp_update_index 3.49 3.49 0
oltp_update_non_index 3.43 3.43 0
oltp_write_only 7.84 7.84 0
types_delete_insert 7.56 7.56 0

@max-hoffman max-hoffman requested a review from reltuk May 3, 2024 23:03
Copy link
Contributor

@reltuk reltuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally seems fine. A few comments about recovery and the bootstrap process.

As discussed offline, not sure if this will impact existing database with large journals – the loss of their index will cause a high one time startup cost on upgrade.

recTag, err := rd.ReadByte()
if err != nil {
if errors.Is(err, io.EOF) {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get the number of bytes we read after the last successfuly indexRecMeta callback back to the caller so that they can file.Seek() and file.Truncate() the output file to start at that point. Then we can start writing new records without causing a CRC failure in a later bootstrap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the points where we return nil from ErrUnexpectedEOF down below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of context that I was missing, the way this used to work was if anything went wrong we delete the index and exit process. The next startup loads all chunks into novel, and the next batch flush writes all chunks into a fresh index. Startup/clear/exit is the repeatable retry loop if anything goes wrong. Shitty for the next index flush, but it works.

The rewrite has different semantics that you pointed out. Now we are concerned with hanging index lookups after the last metadata record. I added a seek/truncate, which clears the handing lookups. We have to do an extra step where we add the hanging lookups back to the index for consistency. So I basically ignore most errors here, missing/malformed/io.EOF, we just truncate and rebuild. I think that should be equally repeatable, as long as there isn't a pathological loop where the index can't get a foothold for some reason.

Like one thing that is maybe a bit annoying in both versions is that we could rewrite the entire index, the server quits before writing a root value, and then next startup has to do it all over again. Reread all of the lookups, delete the index b/c no batch metadata, and then rebuilds the same index again. A 45 minute startup becoming like 3 hours would be annoying.

batch = nil
batchCrc = 0
default:
return fmt.Errorf("expected record to start with a chunk or metadata type tag")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some ways different, in some ways...maybe not so different? Why an error here but ErrUnexpectedEOF is not one? If we there could be garbage beyond where we read...

if _, err = wr.index.Write(buf); err != nil {
return err
}
writeJournalIndexMeta(wr.indexWriter, root, wr.indexed, end, wr.batchCrc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment here to be more accurate.

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
0cc415b ok 5937457
version total_tests
0cc415b 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
9042954 ok 5937457
version total_tests
9042954 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
f51e89a ok 5937457
version total_tests
f51e89a 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
cf2cf03 ok 5937457
version total_tests
cf2cf03 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
9db5679 ok 5937457
version total_tests
9db5679 5937457
correctness_percentage
100.0

@max-hoffman max-hoffman merged commit 084b835 into main May 8, 2024
19 of 20 checks passed
@max-hoffman max-hoffman deleted the max/streamline-journal-index branch May 8, 2024 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants