-
-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index rebuilds with external key sorting #7754
Conversation
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
…lt into max/external-sort-index-build
I tested the preliminary speedup by comparing 1 million -> 3 million -> 10 million row build time on an HDD.
3 million rows on an SSD for comparison goes from 10 seconds -> 30 seconds.. Tuning has some effect on the scaling constants, but the difference is disk seeks on HDD. We only build the tree once with sorted edits now by pre-sorting keys with merge sort. The previous version would build the whole tree log(n) number of times, with each rebuild random IO reading every chunk. The merge sort shifts the work upfront to sequential IO. |
…lt into max/external-sort-index-build
@coffeegoddd DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start. Had a number of initial comments. Happy to follow up on any of these. It's quite close, just needs a bit of cleanup, and it's going to be great :)
go/store/prolly/sort/external.go
Outdated
} | ||
} | ||
}) | ||
for i := 0; i < len(a.files); i += 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the whole logic here, the asymptotics on this strategy as implemented here seem off to me, at least compared to optimality. AFAIU:
- we're scanning the entire data set on every compact, and
- every compact is coming at fixed size intervals, for example, every time we have 64 new files of 32MBs, we scan every existing file...
Don't we get N^2 asymptotics?
If you take a look at the file sizes of the files in each slot, you'll also see we're just building one big file at index 0 and everything else stays quite a bit smaller than it...so it's also probably not great for parallelism, because we're going to block on the linear scan of that file each compact...
I think an approach where we only merge the n/2 smallest files each compact
(we only get by n/4
slots in that case) gets us back to better asymptotics, but there are probably much better approaches which take advantage of the k-way merge (instead of 2-way...)
@reltuk With your changes index building looks like it runs about 2x as quickly as the first version. Compaction strategies seem edge-casey, so I picked what seems like a simple one. Leveled tiers that get parallelism=1 compacted to the next level after a threshold. 1 milion rows (18s) -> 3 million rows (1 min) -> 10 million rows (4 min) |
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I like the level change and the heap based multi-merger.
I think it's worth taking a short pass to get timely resource finalization on the file handles and cleanup from disk of the temp files. Somewhat your call.
defer it.Close() | ||
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer it.Close() | |
if err != nil { | |
return nil, err | |
} | |
if err != nil { | |
return nil, err | |
} | |
defer it.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More Close()
es, but I think only a defer sorter.Close()
up above, if we add timely resource finalization.
} | ||
|
||
func (a *tupleSorter) newFile() (*os.File, error) { | ||
f, err := a.tmpProv.NewFile("", "key_file_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something seems off about the lifetime of these files. We put them in keyIterables, and those return the Iter, but the iter's Close() method doesn't close the file. And there's no way provided in the API to close and clean them up in the case of an error. And it seems like we would want to os.Remove() these files after we're done with them, since we had a contract was to use a certain number of files?
Near at hand is something like:
- Make it explicit that keyFile.IterAll() can only ever be called once.
- On keyFile.IterAll(), set keyFile's
f *os.File
tonil
. func (k *keyFile) Close() { if k.f != nil { k.f.Close(); os.Remove(k.f.Name()) }
- keyFileReader gets the
*os.File
as well as its buffered reader. func (r keyFileReader) Close() { r.f.Close(); os.Remove(r.f.Name()) }
func (a *tupleSorter) Close() { for _, fs := range a.files { for _, f := range fs { f.Close() } } }
Or something like that?
outF := newKeyFile(newF, a.batchSize) | ||
m, err := newFileMerger(ctx, a.keyCmp, outF, fileLevel[:a.fileMax]...) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add resource finalization, careful not to leak newF
or any fileLevel[:a.fileMax]
which didn't successfully IterAll'd here.
return err | ||
} | ||
if err := m.run(ctx); err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add resource finalization, careful not to leak file newF here. m.run()
should probably guarantee all iterators it made are closed.
} else { | ||
reader.iter.Close() | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the contract is these are all closed when we return, need to add some code to close all the ones remaining in m.mq
here.
Co-authored-by: Aaron Son <aaron@dolthub.com>
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
Index builds now write keys to intermediate files and merge sort before materializing the prolly tree for the secondary index. This contrasts the default approach, which rebuilds the prolly tree each time we flush keys from memory. The old approach reads most of the tree with random reads and writes when memory flushes are unsorted keys. The new approach structures work for sequential IO by flushing sorted runs that become incrementally merge sorted. The sequential IO is dramatically faster for disk-based systems.