optimize InputRowSerde #2047

binlijin · 2015-12-05T05:00:20Z

I test it with some data（1million record，100 dimesions, 20 metrics）：
serTime 68122 ms (InputRowSerde.toBytes)
deTime 76761 ms (InputRowSerde.fromBytes)
serDe size 2721126345 byte
With the patch:
serTime 34231 ms (InputRowSerde.toBytes)
deTime 27868 ms (InputRowSerde.fromBytes)
serDe size 2048581349 byte

himanshug · 2015-12-05T16:52:26Z

indexing-hadoop/src/main/java/io/druid/indexer/InputRowSerde.java

+      int metricSize = WritableUtils.readVInt(in);
+      for(int i = 0; i < metricSize; i++) {
+        String metric = readString(in);
+        String type = readString(in);


type can be encoded as a number instead of string, you can probably use ValueType.XXX.ordinal() as encoding.
io.druid.segment.column.ValueType is an enum defined with the types.

I just change and not write the type.

himanshug · 2015-12-05T16:53:35Z

looks good overall, faster is better

b-slim · 2015-12-05T19:30:45Z

@binlijin for the seek of knowledge, why there is a performance gain ?

binlijin · 2015-12-07T03:36:05Z

@b-slim The performance gain from:(1)As less as new Text(String) which will encode String to UTF-8 (2)Write dimension name only once.

xvrl · 2015-12-08T06:00:16Z

@binlijin @himanshug does this mean that the InputRowSerde introduced in #1472 caused performance degradation? How does this new ser/de compare to just passing Text between mapper and reducer (ignoring the combiner for a moment)?

himanshug · 2015-12-08T06:34:21Z

@xvrl this is used in batch ingestion only . batch ingestion runtime is mostly dependent upon actual indexing and trips of data to/from hdfs, serding time here plays little role . We did not notice any significant differences in our batch ingestion times post that patch. Also, this was really added so that we could support batch delta ingestion (and other arbitrary InputFormats which would produce non-Text data), combiner optimization was an added benefit.

himanshug · 2015-12-08T06:40:38Z

indexing-hadoop/src/main/java/io/druid/indexer/InputRowSerde.java

+    if(aggs[i].getName().equals(metric)) {
+      return aggs[i].getTypeName();
+    }
+    for(AggregatorFactory agg : aggs) {


please add a log.warn(..) here so that we know if in a real use case this fall backs to for loop.
also it will be nice to have a UT that verifies if ordering of AggregatorFactory[] in serialization and deserialization stayed same then the for loop is not run.

sign, I don't know how to add a UT to verify it...

fjy · 2015-12-09T02:03:03Z

👍

optimize InputRowSerde

himanshug reviewed Dec 5, 2015
View reviewed changes

binlijin closed this Dec 8, 2015

binlijin reopened this Dec 8, 2015

himanshug reviewed Dec 8, 2015
View reviewed changes

binlijin closed this Dec 9, 2015

optimize InputRowSerde

2193672

binlijin reopened this Dec 9, 2015

himanshug added a commit that referenced this pull request Dec 9, 2015

Merge pull request #2047 from binlijin/master

f29c25b

optimize InputRowSerde

himanshug merged commit f29c25b into apache:master Dec 9, 2015

fjy modified the milestone: 0.9.0 Feb 4, 2016

fjy mentioned this pull request Feb 5, 2016

druid-0.9.0 release notes #2404

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize InputRowSerde #2047

optimize InputRowSerde #2047

binlijin commented Dec 5, 2015

himanshug Dec 5, 2015

binlijin Dec 7, 2015

himanshug commented Dec 5, 2015

b-slim commented Dec 5, 2015

binlijin commented Dec 7, 2015

xvrl commented Dec 8, 2015

himanshug commented Dec 8, 2015

himanshug Dec 8, 2015

binlijin Dec 8, 2015

fjy commented Dec 9, 2015

optimize InputRowSerde #2047

optimize InputRowSerde #2047

Conversation

binlijin commented Dec 5, 2015

himanshug Dec 5, 2015

Choose a reason for hiding this comment

binlijin Dec 7, 2015

Choose a reason for hiding this comment

himanshug commented Dec 5, 2015

b-slim commented Dec 5, 2015

binlijin commented Dec 7, 2015

xvrl commented Dec 8, 2015

himanshug commented Dec 8, 2015

himanshug Dec 8, 2015

Choose a reason for hiding this comment

binlijin Dec 8, 2015

Choose a reason for hiding this comment

fjy commented Dec 9, 2015