-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pinot upsert features to pinot common #5175
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5175 +/- ##
==========================================
- Coverage 66.20% 56.95% -9.26%
==========================================
Files 1067 1068 +1
Lines 54532 54570 +38
Branches 8128 8133 +5
==========================================
- Hits 36104 31080 -5024
- Misses 15786 21070 +5284
+ Partials 2642 2420 -222
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the PR.
@@ -75,6 +79,9 @@ | |||
private Map<InstancePartitionsType, InstanceAssignmentConfig> _instanceAssignmentConfigMap; | |||
private List<FieldConfig> _fieldConfigList; | |||
|
|||
@JsonPropertyDescription(value = "The update semantic of the table, either append or upsert, default as append") | |||
private UpdateSemantic _updateSemantic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to IngestMode or just have a boolean enableUpsert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for real-time? We already have push type of either APPEND or REFRESH for offline; aggregateMetrics for real-time. What does UPSERT stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kishoreg I am thinking if using enum here will be more flexible in the case of future features changes or whatnot. Boolean seems to be only limiting two options and it might limit what other changes we might want to do with pinot. Please let me what you think. ingestionMode definitely sounds more explicit than what I am using here and I will definite consider about this.
@Jackie-Jiang yes. So basically it is previous old design about we want to support update in pinot realtime ingestion by adding override by primary key (eg, a second message in Kafka with the same primary key represents the update of existing key instead of two records with the same primary key). Upsert stands for this new ingestion semantic (insert if no duplicated primary key, update if duplicate primary key). This feature will be applying to realtime table only
* @param type the type of the table for it to fill in if the type info is missing | ||
* @return the table type name with the type info | ||
*/ | ||
public static String ensureTableNameWithType(String tableName, TableType type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use TableNameBuilder.forType(type).tableNameWithType(tableName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this method, @kishoreg this method is used in other components that I will submit PR later
@@ -26,7 +26,8 @@ | |||
* | |||
*/ | |||
public enum BrokerGauge implements AbstractMetrics.Gauge { | |||
QUERY_QUOTA_CAPACITY_UTILIZATION_RATE("tables", false), NETTY_CONNECTION_CONNECT_TIME_MS("nettyConnection", true); | |||
QUERY_QUOTA_CAPACITY_UTILIZATION_RATE("tables", false), NETTY_CONNECTION_CONNECT_TIME_MS("nettyConnection", true), | |||
TABLE_MIN_LOW_WATER_MARK("tables", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It represents how many table pinot broker is collecting the metrics for. The exact usage of the metrics will show up in later diffs
|
||
// upsert related config | ||
// primary key refers to the column name that is the primary key of this upsert table | ||
private String _primaryKey; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the primary key is made up of two columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or multiple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now we don't have support for multiple primary keys. Do you think it will be an important feature to add multiple primary key support right now or you think it is fine to postpone it to later diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to design it for multiple, even if the implementation can handle only one primary key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't have support for multiple primary keys
There will be some cases where multiple keys make sense right? Like in clickstream: userId and timestamp
Will support for multiple keys be added later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kishoreg @chethanuk I have changed the primary key in schema config to be a list instead of a single string. For now we are still performing check to ensure that primary key will be one and only one. However we should be able to change that once we have mulitple-primary-key support in place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please link the design doc in the pr description so that it is easier to understand the concept of upsert?
Since this pr is touching the core configs, we need to spend some time discussing how to design the config keys because they are very hard to change in the future
@@ -30,17 +30,21 @@ | |||
import java.util.List; | |||
import java.util.Map; | |||
import javax.annotation.Nullable; | |||
import javax.validation.constraints.Null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue with merging, fixed. Thanks for pointing it out
pinot-common/src/main/java/org/apache/pinot/common/config/TableConfig.java
Outdated
Show resolved
Hide resolved
@@ -75,6 +79,9 @@ | |||
private Map<InstancePartitionsType, InstanceAssignmentConfig> _instanceAssignmentConfigMap; | |||
private List<FieldConfig> _fieldConfigList; | |||
|
|||
@JsonPropertyDescription(value = "The update semantic of the table, either append or upsert, default as append") | |||
private UpdateSemantic _updateSemantic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for real-time? We already have push type of either APPEND or REFRESH for offline; aggregateMetrics for real-time. What does UPSERT stand for?
* @param type the type of the table for it to fill in if the type info is missing | ||
* @return the table type name with the type info | ||
*/ | ||
public static String ensureTableNameWithType(String tableName, TableType type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use TableNameBuilder.forType(type).tableNameWithType(tableName)
pinot-common/src/main/java/org/apache/pinot/common/metrics/AbstractMetrics.java
Outdated
Show resolved
Hide resolved
Linked the design doc in the description. also google doc design doc with comments. We have discussed this a long time back but I definitely open to go through it again as things have already changed. Do you think we should do some remote meeting sync or something else? |
One concern here would be that the update related configuration appears in multiple places. One thing that can simplify this PR is to add a UpdateConfig under TableConfig and move everything under that. That allows you to mark that config as beta/unstable and iterate quickly
|
It's asking uber login. Can you change share settings?
@kishoreg At least monthly meeting and sync up on project status will be helpful to all... |
Yeah our google doc account sharing setting is kind of wanky that it doesn't allow blanket share with everyone outside our organization. I hate it but it is really out of my control. However, just submit a request to view/comment and I can add you as soon as I see it. We originally plan to have a meeting earlier this month but the pandemic kind of mess up with all of our plans. Maybe we can have a work-group specific remote meeting (eg, upsert meeting with kishore/jackie/chethan etc) in the meantime and do the pinot committee meeting later @Jackie-Jiang @kishoreg @chethanuk let me know if you feel meeting is necessary or you prefer me to update the schema first and see if a remote meeting is necessary |
pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking better. I still feel that it might be easier to iterate and less invasive if we move everything related to upsertfeature into TableConfig.UpsertConfig and move all the changed in Schema and few things in TableConfig like UpdateSemantic etc to TableConfig.UpsertConfig.
That will allow you to iterate on UpsertConfig very fast, you can mark that config as unstable for now.
_updateSemantic = updateSemantic; | ||
} | ||
|
||
public boolean isTableForUpsert() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's avoid adding additional utility methods to config classes if possible
@kishoreg @mcvsubbu I have updated the diff per suggestion you guys gave in the last meeting, please feel free to take a look @Jackie-Jiang I have updated the diff to make sure #1: it is only related table config changes required by pinot upsert feature (see doc for details) and #2 push other pinot-common related changes for pinot upsert features to later PRs |
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/TableConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/IngestionModeConfig.java
Outdated
Show resolved
Hide resolved
@mcvsubbu @Jackie-Jiang updated per your feedback and make the config to be used only if the table is for upsert and clean up on naming, please take a look when you have time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise. Please make sure the test can pass
pinot-common/src/main/java/org/apache/pinot/common/utils/config/TableConfigUtils.java
Show resolved
Hide resolved
pinot-common/src/test/java/org/apache/pinot/common/utils/config/TableConfigSerDeTest.java
Outdated
Show resolved
Hide resolved
pinot-common/src/test/java/org/apache/pinot/common/utils/config/TableConfigSerDeTest.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/test/java/org/apache/pinot/spi/config/UpsertConfigTest.java
Outdated
Show resolved
Hide resolved
hi @mcvsubbu we have updated the pr following the comments. Appreciate if you can take a look to see if anything else you feel we need to update further |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, other than that, lgtm thanks
pinot-spi/src/main/java/org/apache/pinot/spi/config/UpsertConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/UpsertConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/UpsertConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/UpsertConfig.java
Outdated
Show resolved
Hide resolved
…ig.java Co-Authored-By: Subbu Subramaniam <mcvsubbu@users.noreply.github.com>
…ig.java Co-Authored-By: Subbu Subramaniam <mcvsubbu@users.noreply.github.com>
…ig.java Co-Authored-By: Subbu Subramaniam <mcvsubbu@users.noreply.github.com>
…ig.java Co-Authored-By: Subbu Subramaniam <mcvsubbu@users.noreply.github.com>
merged the comments diff, thanks for checking on those details. Do you think any further changes you want? It would be great if you can change your review status from requesting-changes so I can start merging it in |
@jamesyfshao I am good with this PR, thanks. |
This diff will create necessary pinot schema/constant value changes to support pinot upsert features.
The following will be changed:
This field will contains several sub-fields:
a. ingestionMode:
ingestion mode could be either upsert/append.
b. primaryKeys:
a list of string that indicate which columns would be use as primary key of the current upsert table. However we are going to enforce primary key to be one only only one column. In the future we can explore the support for multiple primary keys
c. offsetKey:
a string that indicate which column would be use as offset of the current pinot upsert table.
design doc for pinot upsert feature:
https://cwiki.apache.org/confluence/display/PINOT/%5BProposal%5D+Pinot+upsert+design+doc
this change focus on the changes in pinot common repo