Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: faster ingestion #25

Closed
petermattis opened this issue Dec 5, 2018 · 3 comments · Fixed by #2212
Closed

perf: faster ingestion #25

petermattis opened this issue Dec 5, 2018 · 3 comments · Fixed by #2212
Assignees

Comments

@petermattis
Copy link
Collaborator

petermattis commented Dec 5, 2018

DB.Ingest currently experiences a hiccup if the table being ingested overlaps with a memtable: ingestion needs to wait for the memtable to be flushed. This is necessary because the ingested sstable is given a sequence number newer than entries in the memtable. We cannot add the ingested table to the LSM until overlapping entries with older sequence numbers are written to L0.

One way to avoid the hiccup is to have ingestion lazily add the table to the LSM. If the ingested table overlaps with a memtable, an entry is added to the WAL (ensuring that the action will be completed in the face of a crash) and the ingested table is appended to the list of memtables. We'll have to add a small wrapper around the sstable so that it implements the flushable interface and add some logic so that the table isn't actually flushed, but it would then simply be added to the L0 metadata when it is time to flush (i.e. a new table would not be created).

@jbowens
Copy link
Collaborator

jbowens commented Apr 6, 2021

Would we need to create a new memtable on every ingestion so that iterators may respect sequence order within the flushables?

@petermattis
Copy link
Collaborator Author

Would we need to create a new memtable on every ingestion so that iterators may respect sequence order within the flushables?

Yes. That could leave a significant amount of wasted space in the memtables, but I think that memory is accounted for and will result in a flush eventually occurring.

As mentioned in cockroachdb/cockroach#62700, if the ingested sstables are small we could convert the ingestions into write batches. Same idea as above that the WAL entry would point to the sstable on disk, but rather than appending the sstable to the memtable list we loop over the contents of the sstable and insert it into the memtable.

@sumeerbhola
Copy link
Collaborator

DB.Ingest currently experiences a hiccup if the table being ingested overlaps with a memtable: ingestion needs to wait for the memtable to be flushed.

My understanding is that there is also a hiccup for concurrent normal writes that are assigned a seqnum after this ingest. They will need to wait until their seqnum becomes visible, which is blocked behind the ingest waiting for the memtable(s) to be flushed so that it can then update the manifest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants