-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform functions in Pinot schema #5135
Comments
Detailed design proposal here: https://docs.google.com/document/d/13BywJncHrLAFLm-qy4kfKaPxXfAg9XE5v3_fk9sGVSo/edit?usp=sharing |
This was referenced Apr 15, 2020
Next steps:
|
Closing this, as there's an issue for every followup |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Consider,
X: Data at source. This can be either a stream or data files. The formats are typically JSON, AVRO, CSV etc.
Y: Data in Pinot. This is the record/document in Pinot.
When data is ingested into Pinot (either realtime ingestion or batch ingestion), all columns in X directly need to map to Y. The only exception to this is the time column, where we allow transformation from one time format to another, but we are limited to 1 column. This means that every column in the destination schema should be present exactly as it is in the source schema (except the time column).
This is not always practical. It is often desirable to have some amount of transformations to the source columns before they get to the destination.
For example, consider this sample ads data schema
Source columns - userID, name.firstName, name.lastName, IP, eventType, cost, timestamp
Destination columns - userId, fullName, country, zipcode, impressions, clicks, cost, hoursSinceEpoch, daysSinceEpoch
userId - Map userID to userId
fullName - Concat name.firstName and name.lastName
country - Extract country from IP
zipcode - Extract zipcode from IP
impressions - 1 if eventType=IMPRESSION, 0 otherwise
clicks - 1 if eventType=CLICK, 0 otherwise
cost - Directly maps from cost, no transformations
hoursSinceEpoch - convert timestamp to epoch hours
daysSinceEpoch - convert timestamp to epoch days
The only way to achieve this in Pinot is for the user to write a custom transformation job and prepare data based on the destination schema
Hence, the motivations for this proposal are as follows:
The text was updated successfully, but these errors were encountered: