Allow writing InputRowParser extensions that use hadoop/any libraries #1695

himanshug · 2015-09-01T23:47:34Z

This PR is a result of discussion #1682 (diff)
in #1682 . This patch

now allows users to write custom InputRowParser for specific cases while hadoop ingestion and can depend upon hadoop or any other libs
reverts the special handling introduced for BytesWritable in Support parsing of BytesWritable strings in HadoopDruidIndexerMapper #1682 and handles it via a HadoopyStringInputRowParser

himanshug · 2015-09-02T00:09:45Z

most of the changes are updates due to constructor arg changes in DataSchema. changes to note are in DataSchema.java, HadoopDruidIndexerConfig.java, IndexingHadoopModule.java and HadoopyStringInputRowParser.java

himanshug · 2015-09-04T02:48:35Z

@gianm pls see this when you can, it is the continuation to discussion in #1682

I see there are some merge conflicts now, will resolve them when review is done/makes-progresses.

gianm · 2015-09-08T16:27:29Z

indexing-hadoop/src/test/java/io/druid/indexer/BatchDeltaIngestionTest.java

-                        null,
-                        ImmutableList.of("timestamp", "host", "host2", "visited_num")
-                    )
+                MAPPER.convertValue(


Does this work recursively? Or would you get a Map where the values are regular objects rather than other Maps?

it does convert recursively, everything to Map. also, it wouldn't matter for this test as long as

mapper.convertValue(mapper.convertValue(InputRowParser parser, Map.class), InputRowParser.class)

produces the right thing. Specific changes for DataSchema serde are tested in DataSchemaTest .

gianm · 2015-09-08T17:04:17Z

@himanshug had a couple minor comments but overall looks good to me. I don't see a better way of having hadoopy parsers and still respecting the classpath situation…

gianm · 2015-09-08T17:20:48Z

Discussed this on the dev sync, @cheddar thought that in the future it would make sense for the hadoop input stuff to be loadable from extensions, similar to how the firehoses can be loadable from extensions.

👍 on this pr for now though, since that's a longer term approach

himanshug · 2015-09-09T16:06:21Z

@gianm addressed review comments and rebased against latest master.

drcrallen · 2015-09-09T16:31:09Z

indexing-hadoop/src/test/java/io/druid/indexer/HadoopIngestionSpecTest.java

+  private static final ObjectMapper jsonMapper;
+  static {
+    jsonMapper = new DefaultObjectMapper();
+    jsonMapper.setInjectableValues(new InjectableValues.Std().addValue(ObjectMapper.class, jsonMapper));


Injected ObjectMappers should be tagged with @Json or @Smile. Is there somewhere that is not doing this?

drcrallen · 2015-09-09T16:43:14Z

@gianm / @himanshug the problem of lazy converting of Map<String, Object> --> ClassOfInterest is not unique to here. It is also present in LoadSpec as well as QTL (of which QTL is not fixed yet, routers require it in order to serde the query)

drcrallen · 2015-09-09T16:44:33Z

.../src/test/java/io/druid/indexer/HadoopIngestionSpecUpdateDatasourcePathSpecSegmentsTest.java

+  public HadoopIngestionSpecUpdateDatasourcePathSpecSegmentsTest()
+  {
+    jsonMapper = new DefaultObjectMapper();
+    jsonMapper.setInjectableValues(


same @Json comment

xvrl · 2015-09-09T19:01:28Z

@himanshug does this change introduce any performance overheads?

himanshug · 2015-09-09T21:17:13Z

@xvrl none observable . That said, in theory, the InputRowParser is created each time getParser() is called (see #1695 (comment) ) , however that never happens in a tight loop so it does not matter.

himanshug · 2015-09-16T14:17:06Z

@drcrallen this PR is pending response on
#1695 (comment)
#1695 (comment)
#1695 (comment)

pls see whenever you have time. thanks.

drcrallen · 2015-09-16T14:33:50Z

Thanks @himanshug, responded.

so that user hadoop related InputRowParsers are created only when needed this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser and not fail because hadoopy InputRowParser might need hadoop libraries

…c into InputRow

himanshug · 2015-09-16T16:23:32Z

@drcrallen all review comments have been addressed.

drcrallen · 2015-09-16T17:14:13Z

👍

…parser Allow writing InputRowParser extensions that use hadoop/any libraries

…tory

gianm reviewed Sep 8, 2015
View reviewed changes

himanshug force-pushed the allow_hadoop_based_input_row_parser branch from 70d4478 to 7b305c2 Compare September 9, 2015 16:05

drcrallen reviewed Sep 9, 2015
View reviewed changes

himanshug added 2 commits September 16, 2015 10:58

Lazily deserialize "parser" to InputRowParser in DataSchema

74f4572

so that user hadoop related InputRowParsers are created only when needed this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser and not fail because hadoopy InputRowParser might need hadoop libraries

HadoopyStringInputRowParser to convert stringy Text, BytesWritable et…

e8b9ee8

…c into InputRow

himanshug force-pushed the allow_hadoop_based_input_row_parser branch from 7b305c2 to e8b9ee8 Compare September 16, 2015 15:58

drcrallen added a commit that referenced this pull request Sep 16, 2015

Merge pull request #1695 from himanshug/allow_hadoop_based_input_row_…

42bd4f6

…parser Allow writing InputRowParser extensions that use hadoop/any libraries

drcrallen merged commit 42bd4f6 into apache:master Sep 16, 2015

himanshug deleted the allow_hadoop_based_input_row_parser branch September 22, 2015 13:56

drcrallen mentioned this pull request May 5, 2016

DataSchema.equals does not account for different parse specs meaning the same thing #2914

Closed

seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020

apache#1695 Fix invalid maxY from buffer aggregator of 'envelope' fac…

ac592a4

…tory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow writing InputRowParser extensions that use hadoop/any libraries #1695

Allow writing InputRowParser extensions that use hadoop/any libraries #1695

himanshug commented Sep 1, 2015

himanshug commented Sep 2, 2015

himanshug commented Sep 4, 2015

gianm Sep 8, 2015

himanshug Sep 9, 2015

gianm commented Sep 8, 2015

gianm commented Sep 8, 2015

himanshug commented Sep 9, 2015

drcrallen Sep 9, 2015

himanshug Sep 16, 2015

drcrallen commented Sep 9, 2015

drcrallen Sep 9, 2015

himanshug Sep 16, 2015

xvrl commented Sep 9, 2015

himanshug commented Sep 9, 2015

himanshug commented Sep 16, 2015

drcrallen commented Sep 16, 2015

himanshug commented Sep 16, 2015

drcrallen commented Sep 16, 2015

Allow writing InputRowParser extensions that use hadoop/any libraries #1695

Allow writing InputRowParser extensions that use hadoop/any libraries #1695

Conversation

himanshug commented Sep 1, 2015

himanshug commented Sep 2, 2015

himanshug commented Sep 4, 2015

gianm Sep 8, 2015

Choose a reason for hiding this comment

himanshug Sep 9, 2015

Choose a reason for hiding this comment

gianm commented Sep 8, 2015

gianm commented Sep 8, 2015

himanshug commented Sep 9, 2015

drcrallen Sep 9, 2015

Choose a reason for hiding this comment

himanshug Sep 16, 2015

Choose a reason for hiding this comment

drcrallen commented Sep 9, 2015

drcrallen Sep 9, 2015

Choose a reason for hiding this comment

himanshug Sep 16, 2015

Choose a reason for hiding this comment

xvrl commented Sep 9, 2015

himanshug commented Sep 9, 2015

himanshug commented Sep 16, 2015

drcrallen commented Sep 16, 2015

himanshug commented Sep 16, 2015

drcrallen commented Sep 16, 2015