Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetcher: don't hold batches across boundaries #66376

Merged
merged 3 commits into from
Jul 9, 2021

Commits on Jul 2, 2021

  1. colencoding: bugfix to interleaved seek prefixes

    Previously, there was a bug in the decoding methods that returned the
    expected "seek prefix" for interleaved table when encountering a row
    from an unwanted table. The returned seek prefix was not as precise as
    it should have been: instead of returning the first key of the desired
    table, it returned a key somewhere in the middle of the found table.
    
    This did not cause any user-visible issues, since the cfetcher code is
    resilient to finding an incorrect interleaved table row. But, it caused
    interleaved table fetches to be less efficient than necessary.
    
    This commit also adds explicit capacities to the returned keys and
    values from the KV Fetcher, to prevent issues where callers accidentally
    slice past the length of a large-capacity backing slice that backs the
    returned kvs.
    
    Release note: None
    jordanlewis committed Jul 2, 2021
    Configuration menu
    Copy the full SHA
    99a96f1 View commit details
    Browse the repository at this point in the history
  2. colfetcher: don't hold batches across boundaries

    Previously, the cfetcher was completely blind to whether or not the keys
    it retrieves from the kvfetcher lie on a batch boundary. This led to
    the cfetcher retaining a pointer to 2 fetched batches at once during
    fetches, since it doesn't deep copy key or value memory out of the
    batches. The retained pointer is required, because the cfetcher needs to
    keep the bytes of the last key and value it saw for various reasons.
    
    Now, the kvfetcher and kvbatchfetcher participate by returning whether
    or not the most recent returned key or batch lies on a *batch boundary*,
    meaning that the next invocation to the fetcher would require sending a
    BatchRequest to KV, getting more bytes back.
    
    If this condition is true, the cFetcher will now deep copy its last-seen
    KV instead of keeping a pointer into the most recent batch.
    
    This improves memory usage by 2x for a brief moment, since before, the
    cFetcher would always have a live pointer to both the last batch and the
    next batch until it had a chance to decode the first key of the next
    batch.
    
    Commit 5c00812 improved this same
    situation by drastically cutting down the window in which the last batch
    was retained. But it wasn't quite enough. After this commit, there
    should be no moment in time when the fetcher ever retains a pointer to
    more than one batch at the same time.
    
    Release note (sql change): reduce instantaneous memory usage during
    scans by up to 2x.
    jordanlewis committed Jul 2, 2021
    Configuration menu
    Copy the full SHA
    8eba7b9 View commit details
    Browse the repository at this point in the history
  3. colfetcher: use cached types array

    This commit reduces the number of interface calls in the hot path for
    fetching data by 1 per column per row.
    
    Release note: None
    jordanlewis committed Jul 2, 2021
    Configuration menu
    Copy the full SHA
    e42fe76 View commit details
    Browse the repository at this point in the history