-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[core] Introduce global index scan structure #6635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
+1 |
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @leaves12138 for the contribution!
Left comments.
| new DataField( | ||
| 3, | ||
| "_INDEX_META", | ||
| new VarBinaryType())))))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataTypes.BYTES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| new DataField(5, "_EXTERNAL_PATH", newStringType(true)), | ||
| new DataField( | ||
| 6, | ||
| "_GLOBAL_INDEX", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract a static field:
public static RowType GLOBAL_INDEX = XXX;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, moved to GlobalIndexMeta.SCHEMA
| "Whether to write the data into fixed bucket for batch writing a postpone bucket table."); | ||
|
|
||
| public static final ConfigOption<Long> GLOBAL_INDEX_ROW_COUNT_PER_SHARD = | ||
| key("global-index.row_count_per_shard") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use '-' instead of '_'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| private final long fileSize; | ||
| private final long rowCount; | ||
|
|
||
| @Nullable private final Long rowRangeStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not provide these fields. Introduce a GlobalIndexMeta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| @Override | ||
| public InternalRow convertTo(IndexManifestEntry record) { | ||
| IndexFileMeta indexFile = record.indexFile(); | ||
| InternalRow globalIndexRow = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This row should be nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fixed
| /** Path factory to create an index path. */ | ||
| public interface IndexPathFactory { | ||
|
|
||
| Path getPath(String fileName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toPath, let this be unified to toPath(file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| return result; | ||
| } | ||
|
|
||
| public List<IndexManifestEntry> scan(Filter<IndexManifestEntry> readTFilter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just provide scan with Snapshot. We should generate Snapshot in outside to avoid load Snapshot many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
0ef1c78 to
f6284e9
Compare
| /** Schema for global index. */ | ||
| public class GlobalIndexMeta { | ||
|
|
||
| public static final RowType SCHEMA = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a static method.
RowType schema(int startFieldId)
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
(cherry picked from commit 657205a)
Purpose
Related to pip:
https://cwiki.apache.org/confluence/display/PAIMON/PIP-38%3A+Introduce+Global+Index+for+Paimon+Table
Tests
API and Format
Documentation