Skip to content

Conversation

@zhyass
Copy link
Member

@zhyass zhyass commented Mar 11, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  1. refactor compact block:

    +--------------+
    |CompactSource1|  ------
    +--------------+        |      +-----------------+      +------------+
    |    ...       |  ...   | ---> |CompactAggregator| ---> |MutationSink|
    +--------------+        |      +-----------------+      +------------+
    |CompactSourceN|  ------
    +--------------+
    

    CompactSource: the compact tasks, will compact and generate a new block.
    CompactAggregator: gather the new blocks, generate and write the new segments.

  2. Add status for compact segment and compact block.

  3. Read segments in batch to avoid oom.

Memo (draft)

memo about the "order" of blocks, @zhyass pls correct me if things are not described correctly:

before this PR, given a snapshot, if the blocks are traversed in the following way (one of the ways), before and after segment compactions, the same order of blocks could be observed

(note that segment compaction does not merge blocks):

// pseudocode
let mut block_metas = vec![];
for segment_loc in snaphshot.segment {
   let segment = segment_reader.read(segment_loc);
    for block_loc in segment.block_metas {
         let block_meta = blcok_meta_reader.read(block_loc);
         block_metas.push(block_meta);
   }
}
- before compact:

snapshot.segments = [s5, s4, s3, s2, s1];

s5.block_metas : [b5]
s4.block_metas : [b4]
s3.block_metas : [b3]
s2.block_metas : [b2]
s1.block_metas : [b1]

order of traversed block metas : [b5, b4, b3, b2, b1]

- after segment compaction, with a setting of  2 blocks per seg
snapshot.segments = [s5, s7, s6];
s5.block_metas = [b5]
s7.block_metas = [b4, b3]
s6.block_metas = [b2, b1]

order of traversed block metas : [b5, b4, b3, b2, b1]

- after another segment compaction, with a setting of  4 blocks per seg
snapshot.segments = [s5, s8];
s5.block_metas  = [b5]
s8.block_metas  = [b4, b3, b2, b1]

order of traversed block metas : [b5, b4, b3, b2, b1]

after this PR, given a snapshot, the same order of block metas can also be observed (before and after segment compactions) if they are traversed in the following way (one of the ways):

// pseudocode
let mut block_metas = vec![];
for segment_loc in snaphshot.segment.**reverse()** {
   let segment = segment_reader.read(segment_loc);
    for block_loc in segment.block_metas {
         let block_meta = blcok_meta_reader.read(block_loc);
         block_metas.push(block_meta);
   }
}
- before compact:

snapshot.segments = [s5, s4, s3, s2, s1];

s5.block_metas : [b5]
s4.block_metas : [b4]
s3.block_metas : [b3]
s2.block_metas : [b2]
s1.block_metas : [b1]

order of traversed block metas : [b1, b2, b3, b4, b5]

- after segment compaction, with a setting of  2 blocks per seg
snapshot.segments = [s5, s7, s6];
s5.block_metas = [b5]
s7.block_metas = [b3, b4]  // before this PR, [b4, b3]
s6.block_metas = [b1, b2]  // before this PR, [b2, b1]

order of traversed block metas : [b1, b2, b3, b4, b5]

- after another segment compaction, with a setting of  4 blocks per seg

snapshot.segments = [s5, s8];

s5.block_metas  = [b5]
s8.block_metas  = [b1, b2, b3, b4]  // before this PR [b4, b3, b2, b1]

order of traversed block metas : [b1, b2, b3, b4, b5]
  • The motivation for this adjustment:

during block compaction, the segments are traversed from the last (ordered by their positions in TableSnapshot.segments : Vec<_>, to the first.

and the block of each segment is traversed from the first to the last (ordered by their positions in SegmentInfo.blocks : Vec<_>.

Closes #10520

@vercel
Copy link

vercel bot commented Mar 11, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
databend ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Mar 16, 2023 at 8:24AM (UTC)

@mergify mergify bot added the pr-refactor this PR changes the code base without new features or bugfix label Mar 11, 2023
@zhyass zhyass marked this pull request as draft March 11, 2023 17:13
@dantengsky
Copy link
Member

related issue #10520

@zhyass zhyass marked this pull request as ready for review March 15, 2023 10:21
@BohuTANG
Copy link
Member

Conflicting files
scripts/benchmark/query/load/tpch.sh

@zhyass zhyass marked this pull request as draft March 15, 2023 12:59
@zhyass zhyass marked this pull request as ready for review March 15, 2023 13:52
@dantengsky
Copy link
Member

@zhyass

mysql> create table t(c int)  block_per_segment=2;
Query OK, 0 rows affected (0.25 sec)

mysql> insert into t values(1);
Query OK, 1 row affected (0.28 sec)

mysql> insert into t values(2);
Query OK, 1 row affected (0.17 sec)

mysql> insert into t values(3);
Query OK, 1 row affected (0.16 sec)

mysql> insert into t values(4);
Query OK, 1 row affected (0.14 sec)

mysql> insert into t values(5);
Query OK, 1 row affected (0.14 sec)

.
mysql> optimize table t compact segment;
Query OK, 0 rows affected (0.13 sec)

snapshot before optimize table compact segment looks good:

   segments of snapshot
   ------------------------------
    [
      "1/8/_sg/01b95005b15d44869993e65d8640ad0f_v2.json",
      "1/8/_sg/90edb991cd1c42da886af55a0e02dd57_v2.json",
      "1/8/_sg/7101a2adf7e94dc185963857c0498057_v2.json",
      "1/8/_sg/1af80f75f37b4659b5f6d662108e1f45_v2.json",
      "1/8/_sg/8c9fe90c4fc74ed8808b55b3fa34b560_v2.json",
    ]
    
    blocks of segments:
    ----------------------------    
    01b95005b15d44869993e65d8640ad0f_v2:  [1/8/_b/0c4f4ca921e044dd87a789a6c8930327_v2.parquet]
    
    90edb991cd1c42da886af55a0e02dd57_v2:  [1/8/_b/955c028c14d94668baea6c77e7d58e1b_v2.parquet]
    
    7101a2adf7e94dc185963857c0498057_v2:  [1/8/_b/38c25ff1d87e469baf0e30f3a1a74124_v2.parquet]
    
    1af80f75f37b4659b5f6d662108e1f45_v2: [1/8/_b/58ef61c78fb74559ab515f209b4eea1f_v2.parquet]
    
    8c9fe90c4fc74ed8808b55b3fa34b560_v2: [1/8/_b/ac744122d1164f168c702227e3352a06_v2.parquet]
    
     flattened (order preserved)
    [
    1/8/_b/0c4f4ca921e044dd87a789a6c8930327_v2.parquet,  
    1/8/_b/955c028c14d94668baea6c77e7d58e1b_v2.parquet,
    1/8/_b/38c25ff1d87e469baf0e30f3a1a74124_v2.parquet,
    1/8/_b/58ef61c78fb74559ab515f209b4eea1f_v2.parquet,
    1/8/_b/ac744122d1164f168c702227e3352a06_v2.parquet,
    ]

snapshot after optimize table compact segment looks like this:

     [
      "1/8/_sg/01b95005b15d44869993e65d8640ad0f_v2.json",
      "1/8/_sg/9cd09e4e75994dc9a5279ed35dc317fc_v2.json",
      "1/8/_sg/9e7305ffbc15441ea8b25665617920ec_v2.json",
    ]
    
    
    1/8/_sg/01b95005b15d44869993e65d8640ad0f_v2:  [1/8/_b/0c4f4ca921e044dd87a789a6c8930327_v2.parquet]
    
    1/8/_sg/9cd09e4e75994dc9a5279ed35dc317fc_v2.json: [
      1/8/_b/38c25ff1d87e469baf0e30f3a1a74124_v2.parquet, 
      1/8/_b/955c028c14d94668baea6c77e7d58e1b_v2.parquet]

    1/8/_sg/9e7305ffbc15441ea8b25665617920ec_v2.json: [
         1/8/_b/ac744122d1164f168c702227e3352a06_v2.parquet, 
         1/8/_b/58ef61c78fb74559ab515f209b4eea1f_v2.parquet]
    
     flattened (order preserved)
     
    [
    1/8/_b/0c4f4ca921e044dd87a789a6c8930327_v2.parquet,
    1/8/_b/38c25ff1d87e469baf0e30f3a1a74124_v2.parquet, 
    1/8/_b/955c028c14d94668baea6c77e7d58e1b_v2.parquet,
    1/8/_b/ac744122d1164f168c702227e3352a06_v2.parquet, 
    1/8/_b/58ef61c78fb74559ab515f209b4eea1f_v2.parquet,
    ]

the order of blocks is not the same, after segment compaction:

   - before:
    [
    1/8/_b/0c4f4ca921e044dd87a789a6c8930327_v2.parquet,  
    1/8/_b/955c028c14d94668baea6c77e7d58e1b_v2.parquet,
    1/8/_b/38c25ff1d87e469baf0e30f3a1a74124_v2.parquet,
    1/8/_b/58ef61c78fb74559ab515f209b4eea1f_v2.parquet,
    1/8/_b/ac744122d1164f168c702227e3352a06_v2.parquet,
    ]
    - after
    [
    1/8/_b/0c4f4ca921e044dd87a789a6c8930327_v2.parquet,
    1/8/_b/38c25ff1d87e469baf0e30f3a1a74124_v2.parquet, 
    1/8/_b/955c028c14d94668baea6c77e7d58e1b_v2.parquet,
    1/8/_b/ac744122d1164f168c702227e3352a06_v2.parquet, 
    1/8/_b/58ef61c78fb74559ab515f209b4eea1f_v2.parquet,
    ]

is this expected?

@zhyass
Copy link
Member Author

zhyass commented Mar 16, 2023

is this expected?

Yes, this result is as expected.
The data after compact should be stored in the order in which they were written.
The order of the blocks is the same as the order of the sql results.

mysql> create table t(c int)  block_per_segment=2 row_per_block=1;
Query OK, 0 rows affected (0.05 sec)

mysql> insert into t values(1),(2);
Query OK, 2 rows affected (0.13 sec)

mysql> insert into t values(3),(4);
Query OK, 2 rows affected (0.05 sec)

mysql> insert into t values(5);
Query OK, 1 row affected (0.05 sec)

@dantengsky
Copy link
Member

dantengsky commented Mar 16, 2023

is this expected?

Yes, this result is as expected. The data after compact should be stored in the order in which they were written. The order of the blocks is the same as the order of the sql results.

OK.

let me summarize a memo about the "order" of blocks, @zhyass pls correct me if things are not described correctly:

before this PR, given a snapshot, if the blocks are traversed in the following way (one of the ways), before and after segment compactions, the same order of blocks could be observed

(note that segment compaction does not merge blocks):

// pseudocode
let mut block_metas = vec![];
for segment_loc in snaphshot.segment {
   let segment = segment_reader.read(segment_loc);
    for block_loc in segment.block_metas {
         let block_meta = blcok_meta_reader.read(block_loc);
         block_metas.push(block_meta);
   }
}
- before compact:

snapshot.segments = [s5, s4, s3, s2, s1];

s5.block_metas : [b5]
s4.block_metas : [b4]
s3.block_metas : [b3]
s2.block_metas : [b2]
s1.block_metas : [b1]

order of traversed block metas : [b5, b4, b3, b2, b1]

- after segment compaction, with a setting of  2 blocks per seg
snapshot.segments = [s5, s7, s6];
s5.block_metas = [b5]
s7.block_metas = [b4, b3]
s6.block_metas = [b2, b1]

order of traversed block metas : [b5, b4, b3, b2, b1]

- after another segment compaction, with a setting of  4 blocks per seg
snapshot.segments = [s5, s8];
s5.block_metas  = [b5]
s8.block_metas  = [b4, b3, b2, b1]

order of traversed block metas : [b5, b4, b3, b2, b1]

after this PR, given a snapshot, the same order of block metas can also be observed (before and after segment compactions) if they are traversed in the following way (one of the ways):

// pseudocode
let mut block_metas = vec![];
for segment_loc in snaphshot.segment.**reverse()** {
   let segment = segment_reader.read(segment_loc);
    for block_loc in segment.block_metas {
         let block_meta = blcok_meta_reader.read(block_loc);
         block_metas.push(block_meta);
   }
}
- before compact:

snapshot.segments = [s5, s4, s3, s2, s1];

s5.block_metas : [b5]
s4.block_metas : [b4]
s3.block_metas : [b3]
s2.block_metas : [b2]
s1.block_metas : [b1]

order of traversed block metas : [b1, b2, b3, b4, b5]

- after segment compaction, with a setting of  2 blocks per seg
snapshot.segments = [s5, s7, s6];
s5.block_metas = [b5]
s7.block_metas = [b3, b4]  // before this PR, [b4, b3]
s6.block_metas = [b1, b2]  // before this PR, [b2, b1]

order of traversed block metas : [b1, b2, b3, b4, b5]

- after another segment compaction, with a setting of  4 blocks per seg

snapshot.segments = [s5, s8];

s5.block_metas  = [b5]
s8.block_metas  = [b1, b2, b3, b4]  // before this PR [b4, b3, b2, b1]

order of traversed block metas : [b1, b2, b3, b4, b5]
  • The motivation for this adjustment:

during block compaction, the segments are traversed from the last (ordered by their positions in TableSnapshot.segments : Vec<_>, to the first.

and the block of each segment is traversed from the first to the last (ordered by their positions in SegmentInfo.blocks : Vec<_>.

@dantengsky
Copy link
Member

@zhyass

is this expected?

Yes, this result is as expected. The data after compact should be stored in the order in which they were written. The order of the blocks is the same as the order of the sql results.

OK.

let me summarize a memo about the "order" of blocks, @zhyass pls correct me if things are not described correctly:

@zhyass pls have a look at the above memo, if this memo looks good to you, I'd like to put it into the summary of this PR

@mergify mergify bot merged commit d0fd5be into databendlabs:main Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-refactor this PR changes the code base without new features or bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: COMPACT SEGMENT LIMIT 10 cause OOM

4 participants