Skip to content

[SPARK-15795] [SQL] Enable more optimizations in whole stage codegen when isNull is a compile-time constant#13539

Closed
inouehrs wants to merge 4 commits intoapache:masterfrom
inouehrs:dev_nullcheck_opt
Closed

[SPARK-15795] [SQL] Enable more optimizations in whole stage codegen when isNull is a compile-time constant#13539
inouehrs wants to merge 4 commits intoapache:masterfrom
inouehrs:dev_nullcheck_opt

Conversation

@inouehrs
Copy link
Contributor

@inouehrs inouehrs commented Jun 7, 2016

What changes were proposed in this pull request?

Whole stage codegen often creates isNull variable initialized with constant false, like
boolean mapelements_isNull = false || false;

If there is no further assignment for this isNull variable, whole stage codegen can do more optimizations by assuming isNull as a compile-time constant.

In the example below, which is generated for a dataset map operation, mapelements_isNull defined at line 115 can be assumed by a compile-time constant (false).
By assuming this as a constant, the whole stage codegen eliminates zeroOutNullBytes at line 119 and an if-statement at line 121.
In addition to the benefits of improved readability of generated code, eliminating zeroOutNullBytes will give performance advantage since it is difficult to remove for Java JIT compiler.

without this patch

/* 107 */       // CONSUME: Project [id#0L AS l#3L]
/* 108 */       // CONSUME: DeserializeToObject l#3: bigint, obj#16: bigint
/* 109 */       // CONSUME: MapElements <function1>, obj#17: bigint
/* 110 */       // CONSUME: SerializeFromObject [input[0, bigint, true] AS value#18L]
/* 111 */       // <function1>.apply
/* 112 */       Object mapelements_obj = ((Expression) references[1]).eval(null);
/* 113 */       scala.Function1 mapelements_value1 = (scala.Function1) mapelements_obj;
/* 114 */
/* 115 */       boolean mapelements_isNull = false || false;
/* 116 */       final long mapelements_value = mapelements_isNull ? -1L : (Long) mapelements_value1.apply(range_value);
/* 117 */
/* 118 */       // CONSUME: WholeStageCodegen
/* 119 */       serializefromobject_rowWriter.zeroOutNullBytes();
/* 120 */
/* 121 */       if (mapelements_isNull) {
/* 122 */         serializefromobject_rowWriter.setNullAt(0);
/* 123 */       } else {
/* 124 */         serializefromobject_rowWriter.write(0, mapelements_value);
/* 125 */       }
/* 126 */       append(serializefromobject_result);

with this patch

/* 107 */       // CONSUME: Project [id#0L AS l#3L]
/* 108 */       // CONSUME: DeserializeToObject l#3: bigint, obj#9: bigint
/* 109 */       // CONSUME: MapElements <function1>, obj#10: bigint
/* 110 */       // CONSUME: SerializeFromObject [input[0, bigint, true] AS value#11L]
/* 111 */       // <function1>.apply
/* 112 */       Object mapelements_obj = ((Expression) references[1]).eval(null);
/* 113 */       scala.Function1 mapelements_value1 = (scala.Function1) mapelements_obj;
/* 114 */
/* 115 */       final boolean mapelements_isNull = false || false;
/* 116 */       final long mapelements_value = mapelements_isNull ? -1L : (Long) mapelements_value1.apply(range_value);
/* 117 */
/* 118 */       // CONSUME: WholeStageCodegen
/* 119 */       serializefromobject_rowWriter.write(0, mapelements_value);
/* 120 */       append(serializefromobject_result);

How was this patch tested?

by unit tests

@kiszk
Copy link
Member

kiszk commented Jun 7, 2016

test this please

@inouehrs
Copy link
Contributor Author

@rxin could you please review this pull request (or please suggest someone I should ask for the review)?

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

This PR seems kind of hacky to me. The optimization in GenerateUnsafeProject which depends on the foldability of generated code is also hacky. This is something our string based codegen framework doesn't support essentially.

I think we should think about it more, and come up with a holistic solution, i.e. find out all the patterns that represent a foldable value.

@cloud-fan
Copy link
Contributor

cc @davies

@SparkQA
Copy link

SparkQA commented Jun 13, 2016

Test build #60412 has finished for PR 13539 at commit 186283e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Jun 13, 2016

As @cloud-fan said, the implementation is hacky, the improvements is not not obvious (I believe JIT compile do these very well, correct me if I'm wrong), I'd like not do this. There are millions ways to improve the readability of the generated code, but since it's not designed to read by human, so we should not trade in readability of the Scala code for generated code.

@inouehrs
Copy link
Contributor Author

@cloud-fan @davies Thank you so much for the comments.
I agree that my implementation is hacky. I tried to do optimization without adding members in ExprCode or Expression. I look for a less-hacky way.

(BTW, my original motivation is to eliminate zeroOutNullBytes from the inner-most loop for better optimization in Java JIT compiler. It seems that zeroOutNullBytes affected the performance by few percents for simple map operation.)

@gatorsmile
Copy link
Member

@inouehrs Since the solution is kind of hacky, could you close this PR, or you have some other better solution? Also cc @kiszk

@inouehrs inouehrs closed this Jun 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants