fix issue with projection __time equivalent column mapping#18215
Conversation
| .setDataSource("test") | ||
| .setInterval(Intervals.ETERNITY) | ||
| .addAggregator(new LongSumAggregatorFactory("c_sum", "c")); | ||
| if (sortByDim) { |
There was a problem hiding this comment.
i thought this meant query has a sortBy at first, but i guess here's just we want to test out time as granularity & virtual column? might worth some comments.
There was a problem hiding this comment.
also, some projections are named as daily, e.x. abfoo_daily but it's not grouping by DAY right?
There was a problem hiding this comment.
i thought this meant query has a sortBy at first, but i guess here's just we want to test out time as granularity & virtual column? might worth some comments.
yea, sortByDim is whether or not the segment we are querying is stored sorted first by __time or not, its one of the test parameters which controls if we set the forceSegmentSortByTime parameter on DImensionsSpec added in #16849. When this is set, using a granularity on a query doesn't work because the assumptions the engines make about the segments when using granularity (which acts like an implicit group by time_floor(__time, granularity)) are that the rows of the cursor are ordered by __time and so it can just scan and collect the time bucket group of the granularity as it goes instead of grouping on it properly as if it were a dimension. So for group by, we have to explicitly make a time_floor and group on it instead of using the query granularity. Timeseries queries don't work either if the segment isn't first sorted by __time, so some of the tests also just skip if the segment isn't time sorted. I can rename this parameter to segmentSortedByTime or something and it might be clearer?
also, some projections are named as daily, e.x. abfoo_daily but it's not grouping by DAY right?
ah I guess I could rename them to leave the granularity off of the implicit ones, some of those are projections without a __time equivalent column, but they still live within the constraints of the segment granularity, so its an implicit time_floor of day in this case since that is when the segment interval starts.
…me-equivalent-column-remapping
Description
Fixes some issues with project
__timecolumn remapping. The first issue is that a projection with finer granularity would incorrectly replace a coarser granularitytime_floorfunction with the projection__timecolumn, instead of retaining thetime_flooroperation to further rollup the projection at query time, producing incorrect results. The added test covers this. Fixing this stumbled into another issue with remapping the__timecolumn when it was the same as the projection granularity, which would incorrectly map the matching query virtual column to the internal projection time column name instead of to__timeitself. idk what i was thinking with the previous logic 🤷