Remove "granularity" from IngestSegmentFirehose. #4110

gianm · 2017-03-24T09:07:36Z

It wasn't doing anything useful (the sequences were being concatted, and
cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE.
Changing it to Granularities.ALL gave me a 700x+ performance boost on a
small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding
making a lot of unnecessary column selectors.

It wasn't doing anything useful (the sequences were being concatted, and cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE. Changing it to Granularities.ALL gave me a 700x+ performance boost on a small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding making a lot of unnecessary column selectors.

leventov · 2017-03-24T09:12:45Z

server/src/main/java/io/druid/segment/realtime/firehose/IngestSegmentFirehose.java

@@ -77,7 +76,7 @@ public IngestSegmentFirehose(
                            Filters.toFilter(dimFilter),


Maybe could be further simplified by not calling concat() a few lines above.

How else would this be turned into a Sequence<InputRow> rather than Sequence<Sequence<InputRow>>?

Ok, seems there is no way.

It wasn't doing anything useful (the sequences were being concatted, and cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE. Changing it to Granularities.ALL gave me a 700x+ performance boost on a small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding making a lot of unnecessary column selectors.

himanshug · 2017-10-31T17:59:45Z

@gianm it appears that this breaks re-indexing which expects IngestSegmentFireHose to give individual rows from the segment without any truncation.
how about making ALL default granularity but still providing the option for caller to change the granularity so that re-indexing path can stay same ?
I wonder if this impacts re-indexing done via local index task too.

gianm · 2017-10-31T18:06:41Z

@himanshug What specifically has broken? IIRC the rows still do have their original, unchanged timestamps -- the only difference is that the cursor timestamps are truncated. But reindexing shouldn't be using the cursor timestamps anyway.

himanshug · 2017-10-31T18:17:39Z

ok, I assumed CursorFactory.makeCursor(..) could truncate timestamps based on provided granularity. its fine if it returns rows without truncation. thanks.

gianm · 2017-10-31T18:29:55Z

IIRC what happens is the cursor.getTime() is truncated, but the timestamp on the actual rows (row.getTimestampFromEpoch()) is not truncated, since it comes from the time column, not from the cursor. So I think it should be ok. If you notice anything different please raise it…

leventov · 2018-03-02T19:42:54Z

indexing-hadoop/src/main/java/io/druid/indexer/hadoop/DatasourceIngestionSpec.java

@@ -171,7 +158,6 @@ public DatasourceIngestionSpec withQueryGranularity(Granularity granularity)
        intervals,


Now this method is effectively just "clone", the method argument is unused.

gianm added the Performance label Mar 24, 2017

gianm added this to the 0.10.1 milestone Mar 24, 2017

leventov reviewed Mar 24, 2017

View reviewed changes

leventov approved these changes Mar 24, 2017

View reviewed changes

fjy merged commit b4289c0 into apache:master Mar 24, 2017

jon-wei mentioned this pull request Jun 13, 2017

Druid 0.10.1 release notes #4384

Closed

gianm deleted the isff branch October 31, 2017 18:04

leventov reviewed Mar 2, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove "granularity" from IngestSegmentFirehose. #4110

Remove "granularity" from IngestSegmentFirehose. #4110

gianm commented Mar 24, 2017

leventov Mar 24, 2017

gianm Mar 24, 2017

leventov Mar 24, 2017

himanshug commented Oct 31, 2017

gianm commented Oct 31, 2017

himanshug commented Oct 31, 2017

gianm commented Oct 31, 2017

leventov Mar 2, 2018

		@@ -77,7 +76,7 @@ public IngestSegmentFirehose(
		Filters.toFilter(dimFilter),

		@@ -171,7 +158,6 @@ public DatasourceIngestionSpec withQueryGranularity(Granularity granularity)
		intervals,

Remove "granularity" from IngestSegmentFirehose. #4110

Remove "granularity" from IngestSegmentFirehose. #4110

Conversation

gianm commented Mar 24, 2017

leventov Mar 24, 2017

Choose a reason for hiding this comment

gianm Mar 24, 2017

Choose a reason for hiding this comment

leventov Mar 24, 2017

Choose a reason for hiding this comment

himanshug commented Oct 31, 2017

gianm commented Oct 31, 2017

himanshug commented Oct 31, 2017

gianm commented Oct 31, 2017

leventov Mar 2, 2018

Choose a reason for hiding this comment