Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24999][SQL]Reduce unnecessary 'new' memory operations #21968

Closed
wants to merge 1 commit into from

Conversation

heary-cao
Copy link
Contributor

What changes were proposed in this pull request?

This PR is to solve the CodeGen code generated by fast hash, and there is no need to apply for a block of memory for every new entry, because unsafeRow's memory can be reused.

How was this patch tested?

the existed test cases.

val loc = new map.Location // this could be allocated in stack
binaryMap.safeLookup(unsafeKey.getBaseObject, unsafeKey.getBaseOffset,
unsafeKey.getSizeInBytes, loc, unsafeKey.hashCode())
val loc = map.lookup(unsafeKey.getBaseObject, unsafeKey.getBaseOffset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this change makes this part thread-unsafe. Is it OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is safe to lookup, and It is different from the get(key: InternalRow).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, loc is allocated at each call of getValue(). After this PR, loc will be shared within each binaryMap that is passed to a constructor of UnsafeHashedRelation.
Is this behavior change safe?

@@ -44,6 +44,12 @@ class RowBasedHashMapGenerator(
groupingKeySchema, bufferSchema) {

override protected def initializeAggregateHashMap(): String = {
val numVarLenFields = groupingKeys.map(_.dataType).count {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can't this just be .count(!UnsafeRow.isFixedLength(_))?

@heary-cao heary-cao force-pushed the updateNewMemory branch 2 times, most recently from e223446 to e0748e1 Compare August 27, 2018 11:18
@@ -48,6 +48,8 @@ class RowBasedHashMapGenerator(
val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)

val numVarLenFields = groupingKeys.map(_.dataType).count(!UnsafeRow.isFixedLength(_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not remove the TODO comment below.

@@ -141,9 +141,6 @@ class RowBasedHashMapGenerator(
| if (buckets[idx] == -1) {
| if (numRows < capacity && !isBatchFull) {
| // creating the unsafe for new entry
| org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter
| = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(
| ${groupingKeySchema.length}, ${numVarLenFields * 32});
| agg_rowWriter.reset(); //TODO: investigate if reset or zeroout are actually needed
| agg_rowWriter.zeroOutNullBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, if groupingKeySchema has no nullable field, can we drop agg_rowWriter.zeroOutNullBytes()?

@maropu
Copy link
Member

maropu commented Aug 27, 2018

The change looks reasonable to me, so can you trigger tests? @gatorsmile @cloud-fan @hvanhovell

@SparkQA
Copy link

SparkQA commented Aug 27, 2018

Test build #4294 has finished for PR 21968 at commit e0748e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor Author

cc @maropu ,@kiszk, @cloud-fan

@cloud-fan
Copy link
Contributor

can we do the same thing for the columnar one?

@heary-cao
Copy link
Contributor Author

heary-cao commented Aug 28, 2018

cc @cloud-fan I'm sorry, I look VectorizedHashMapGenerator is fine. and i don't know whether you refer to other. thanks.

@heary-cao
Copy link
Contributor Author

cc @cloud-fan @maropu

@@ -141,11 +151,8 @@ class RowBasedHashMapGenerator(
| if (buckets[idx] == -1) {
| if (numRows < capacity && !isBatchFull) {
| // creating the unsafe for new entry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or update this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, updated. thanks.

@@ -141,11 +151,8 @@ class RowBasedHashMapGenerator(
| if (buckets[idx] == -1) {
| if (numRows < capacity && !isBatchFull) {
| // creating the unsafe for new entry
| org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter
| = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(
| ${groupingKeySchema.length}, ${numVarLenFields * 32});
| agg_rowWriter.reset(); //TODO: investigate if reset or zeroout are actually needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now reset and zero out is needed? So maybe remove this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,

@@ -48,6 +48,12 @@ class RowBasedHashMapGenerator(
val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)

val numVarLenFields = groupingKeys.map(_.dataType).count {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks.

@@ -130,6 +134,12 @@ class RowBasedHashMapGenerator(
}
}.mkString(";\n")

val nullByteWriter = if (groupingKeySchema.map(_.nullable).forall(_ == false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe name it resetNullBits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks.

@@ -48,6 +48,8 @@ class RowBasedHashMapGenerator(
val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)

val numVarLenFields = groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: .count(!UnsafeRow.isFixedLength(_))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz keep the comment // TODO: consider large decimal and interval type below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan We want to discuss, how to modify?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code style doesn't matter, both are fine. but let's keep the comment.

@heary-cao
Copy link
Contributor Author

cc @cloud-fan @hvanhovell @maropu

@kiszk
Copy link
Member

kiszk commented Sep 7, 2018

LGTM cc @cloud-fan @hvanhovell @maropu

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in e7853dc Sep 10, 2018
@heary-cao
Copy link
Contributor Author

@cloud-fan thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants