You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We cannot remove this enum because we need to keep backward compatibility, but we should decide what to do with the four literals that are not documented and are not being used (or used in an inconsistent way).
Description
FieldConfig.indexTypes was intended to be the new way to create all indexes, but it was superseded by FieldConfig.indexes attribute introduced in #10183. The main problems with FieldConfig.indexTypes were that:
Most indexes can be configured in some way. To do so, FieldConfig.indexTypes required to introduce unsafe string-based configs in FieldConfig.properties.
IndexTypes is an enum that doesn't contain all valid indexes and includes some extra values that are not valid.
This issue is related to the last point. Reading the documentation, the valid values here should be: Text, FST, Timestamp and h3. But the enum right now contains the following literals:
TEXT
FST
JSON
H3
INVERTED
SORTED
RANGE
TIMESTAMP
The last 4 are not included in the documentation and they are not being consistently used in the code.
Inconsistent usage
To do this analysis I used tag release-0.12.1. The reason to do not use master is that #10183 (already merged) modifies a lot of code related to indexes, so I want to be sure that the inconsistent usage of these literals was already there before index-spi was merged:
Inverted
It was introduced in the first version of this document (2020-02-07).
It is not read in production code. The only two usages are in MutableSegmentImpl (here, and here). In both cases it is being used to report some error.
It is being used in test, but its usage doesn't seem actually important.
Sorted
It was introduced in the first version of this document (2020-02-07).
It is only being used in HadoopSegmentPreprocessingJob. This class does not exist in master and it is not being used in master. The fact that it is not an actual index type and it was only used here makes me think that it was introduced as an attempt to do something, but it wasn't finished.
The only usage in production code is a check. This check fails if the FieldConfig points to a column that is not a timestamp but indexTypes contains the TIMESTAMP literal.
Timestamp indexes are not actual indexes. There is no timestamp index writer, reader or handler. They are syntactic sugar (or an alias). When pinot detects that this syntactic sugar has to be applied (more on that later) what actually happens is that Pinot generates one extra row for each timestamp granularity and index that column with a range index.
Contrary to what documentation says, FieldConfig.indexTypes doesn't have to contain the TIMESTAMP literal in order to activate this syntactic sugar. It is activated if and only if the FieldConfig.timestampConfig is not null. Whether or not TIMESTAMP is included in FieldConfig.indexTypes has zero impact on that decision.
The timestamp syntactic sugar is just a modification of the table config and the schema and it is applied in:
Objective
The objective of this issue is to discuss about FieldConfig.IndexType, which:
We cannot remove this enum because we need to keep backward compatibility, but we should decide what to do with the four literals that are not documented and are not being used (or used in an inconsistent way).
Description
FieldConfig.indexTypeswas intended to be the new way to create all indexes, but it was superseded byFieldConfig.indexesattribute introduced in #10183. The main problems withFieldConfig.indexTypeswere that:FieldConfig.indexTypesrequired to introduce unsafe string-based configs inFieldConfig.properties.This issue is related to the last point. Reading the documentation, the valid values here should be: Text, FST, Timestamp and h3. But the enum right now contains the following literals:
The last 4 are not included in the documentation and they are not being consistently used in the code.
Inconsistent usage
To do this analysis I used tag
release-0.12.1. The reason to do not use master is that #10183 (already merged) modifies a lot of code related to indexes, so I want to be sure that the inconsistent usage of these literals was already there beforeindex-spiwas merged:Inverted
Sorted
Range
indexTypes.Timestamp
indexTypescontains the TIMESTAMP literal.FieldConfig.indexTypesdoesn't have to contain the TIMESTAMP literal in order to activate this syntactic sugar. It is activated if and only if theFieldConfig.timestampConfigis not null. Whether or not TIMESTAMP is included inFieldConfig.indexTypeshas zero impact on that decision.