Pinot transform functions is very useful to extract the pluggable information by ad-hoc usages. For example, users could use it to extract the timestamp related info from the kafka metadata.
Meanwhile, there are some use cases that user want to extract info from virtual columns. E.g.
{ "columnName": "pinotPartitionNumber", "transformFunction": "splitPart($segmentName, '__', 1)" }
could extract the partition information from the message. (Though we can use same function in query, it is very slow if the table becomes large.)
However, that is not possible currently due to transform function's fetching data limitation. It could only retrieve data from the current row data Map<String, Object> but not able to get those segment level virtual column data. It would be great to remove such limitations and make transform functions more flexible.
Pinot transform functions is very useful to extract the pluggable information by ad-hoc usages. For example, users could use it to extract the timestamp related info from the kafka metadata.
Meanwhile, there are some use cases that user want to extract info from virtual columns. E.g.
{ "columnName": "pinotPartitionNumber", "transformFunction": "splitPart($segmentName, '__', 1)" }could extract the partition information from the message. (Though we can use same function in query, it is very slow if the table becomes large.)
However, that is not possible currently due to transform function's fetching data limitation. It could only retrieve data from the current row data Map<String, Object> but not able to get those segment level virtual column data. It would be great to remove such limitations and make transform functions more flexible.