Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24763][SS] Remove redundant key data from value in streaming aggregation #21733

Closed
wants to merge 13 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Jul 9, 2018

What changes were proposed in this pull request?

This patch proposes a new flag option for stateful aggregation: remove redundant key data from value.
Enabling new option runs similar with current, and uses less memory for state according to key/value fields of state operator.

Please refer below link to see detailed perf. test result:
https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16536539&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16536539

Since the state between enabling the option and disabling the option is not compatible, the option is set to 'disable' by default (to ensure backward compatibility), and OffsetSeqMetadata would prevent modifying the option after executing query.

How was this patch tested?

Modify unit tests to cover both disabling option and enabling option.
Also did manual tests to see whether propose patch improves state memory usage.

@HeartSaVioR
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented Jul 9, 2018

Test build #92734 has finished for PR 21733 at commit 2a9cc49.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 9, 2018

Test build #92735 has finished for PR 21733 at commit 89a30ab.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -53,7 +54,30 @@ class StreamingAggregationSuite extends StateStoreMetricsTest

import testImplicits._

test("simple count, update mode") {
val confAndTestNamePostfixMatrix = List(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this within beforeAll and afterAll with spark.conf.set / spark.conf.unset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withSQLConf looks like used widely between SQL unit tests, and does additional work (SparkSession.setActiveSession), so I'm not sure it will work technically same. Moreover, we need to run same test "multiple times", with changing configuration.

Could you propose your code if you don't really mind? Thanks in advance!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. "we need to run same test "multiple times", with changing configuration." I missed this.
We could consider like:

override def beforeEach(): Unit = {
super.beforeEach()
// Note that there are many tests here that require record-level filtering set to be true.
spark.conf.set(SQLConf.PARQUET_RECORD_FILTER_ENABLED.key, "true")
}
override def afterEach(): Unit = {
try {
spark.conf.unset(SQLConf.PARQUET_RECORD_FILTER_ENABLED.key)
} finally {
super.afterEach()
}

e.g.,

StreamingAggregationSuite {
  override def afterAll(): Unit = {
    // false
  }

  override def beforeAll(): Unit = {
    // false
  }
}

RemoveRedundantStreamingAggregationSuite extends StreamingAggregationSuite {
  override def afterAll(): Unit = {
    // true
  }

  override def beforeAll(): Unit = {
    // true
  }
}

but I believe the current implementation works too in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'd like to wait for other reviewers regarding opinions/suggestions on this. Let me keep this as it is until then.

@SparkQA
Copy link

SparkQA commented Jul 9, 2018

Test build #92738 has finished for PR 21733 at commit bb5f672.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jul 9, 2018

Test build #92755 has finished for PR 21733 at commit bb5f672.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Jul 10, 2018

Test build #92791 has finished for PR 21733 at commit bb5f672.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -825,6 +825,16 @@ object SQLConf {
.intConf
.createWithDefault(100)

val ADVANCED_REMOVE_REDUNDANT_IN_STATEFUL_AGGREGATION =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get worried when I see things described as "advanced features". What will go wrong if a user who's insufficiently advanced tries to use it?

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not compatible with current stateful aggregation (definitely, that's the improvement of this patch) and there is no undo. So only new query can turn on this option, and once end users enable the option in the query, the option must be enabled unless end users clear out checkpoint as well as state. (I've added the new option to OffsetSeqMetadata to remember the first setting like partition count).

I'm seeing performance on far or even slightly better on specific workload (publicized in description link), but I would say I cannot try out exhaustive workloads. I actually expected a tradeoff between performance vs state memory usage, so assuming if other workloads follow the tradeoff, end users may need to try out this option in their query with non-production environment (for example, staged) to ensure enabling option doesn't break their expectation of performance with benefit of reducing state memory.

That's why I also make changes available as an option instead of modifying default behavior. If we apply this to the default behavior, we need to provide state migration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a query option instead of a SparkConf, then? I worry it will be very hard to reason about the current scenario, where the conf defines how all states are stored - except that some streams started with a different value will silently override it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I'm not sure if I understand your suggestion correctly. I guess defining configuration to spark conf would be easier to guard against modification after starting query, via existing approach - adding conf to OffsetSeqMetadata - whereas I'm not sure we could guard against modification of query option. I might be missing something here.
Could you elaborate a bit more? Thanks in advance!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the default value of this is false so end users will be aware of existence of this option, and have a chance to read the explanation before setting this option to true.

We might elaborate a bit more on the config: tradeoff between reduced memory usage vs possible perf. hit and suggest running this in non-production before applying this to production. If we feel safer on elaborating more on this, I'm happy to update it.

@arunmahadevan
Copy link
Contributor

@HeartSaVioR , the results looks promising. I am wondering if theres a way to make this default option than introducing new configs. Since this is internal details anyway theres no need to expose any config if we can identify the old vs new format by looking at the fields in the row or by introducing a row version to differentiate old vs new.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 11, 2018

@arunmahadevan
I'm actually in favor of changing default behavior, just not 100% sure the result would be promising for exhaustive use cases. I might need to prepare more kinds of key/value pair (key size bigger than value size, key size smaller than value size, key size equals to value size, what else I'm missing here?) and run some tests and back it up with new numbers.

Btw, as you commented, there seems two approaches to identify the old and new format:

looking at the fields in the row

Actually I tried to do it before (via checking count of fields in value row, since this patch reduces the count of fields in value row), and soon realized I can't do it because HDFSBackedStateStoreProvider relies on provided keySchema and valueSchema when serializing / deserializing rows (only byte arrays are stored... no numFields), not leveraging UnsafeRow's serialization/deserialization mechanism (writeExternal/readExternal or write/read via Kyro), so the numFields in deserialized row could go wrong if the schema doesn't match with actual rows, and we can't verify this.

Current approach saves cost to write/read one additional integer with sacrificing the way to verify the rows. If we would want to add the feature back, state migration should be happened or we should have row version to differentiate the twos.

introducing a row version to differentiate old vs new

We could do this via applying same approach in #21739 so this is valid, but query with old state format should do state migration (not easy to do since it should be done against multiple versions of states), or continue relying on old state format (same as above).

@jose-torres Could you please take a look at @arunmahadevan 's comment as well as this comment and comment yours? Thanks in advance!

@jose-torres
Copy link
Contributor

We could still save the value of the option to offsetSeqMetadata and error if it's changed. The value of using an option would just be that there's no global default; a poweruser can set the option for the queries they think would benefit without affecting all the other queries which get run.

I agree it would be nice to just have some safe path allowing us to always use the new strategy. Absent that, there's an unfortunate tradeoff of reduced memory footprint vs added complexity. I think we ultimately need a committer to decide whether that's worth it.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 11, 2018

I guess we would have to treat reducing state memory size to have worth to do: as described in above commit, we already optimized in HDFSBackedStateStoreProvider for reducing state store disk size (as well as network transfer) via not storing 4 bytes per each row (from both key and value). This approach would normally save more than previous optimization on value row, given key would have window information which contains two values: start and end.

The main issue on this approach for me is possible perf. impact on workloads. Hopefully the workload I've covered shows even slight perf. improvement but not sure for other workloads yet. I might say we need to consider changing default behavior when I have overall good backing numbers afterwards, but in any way, I'm sure I agree that deciding from committer(s) is necessary. Would we be better to initiate mail thread in dev. mailing list?

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 12, 2018

@arunmahadevan @jose-torres

https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16541367&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16541367

I had a chance to test this patch with more kinds of use cases, and in overall enabling option shows on far or slightly better performance whereas it reduces state size according to the ratio of size of key-value pair. I'm now feeling that it would make sense to adopt new strategy to the default and use old behavior as fallback of supporting old app, but the numbers are for persuading committers and I still agree decision would be necessary from committer(s).

@SparkQA
Copy link

SparkQA commented Jul 18, 2018

Test build #93221 has finished for PR 21733 at commit db9d9ce.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait StreamingAggregationStateManager extends Serializable
  • abstract class StreamingAggregationStateManagerBaseImpl(
  • class StreamingAggregationStateManagerImplV1(
  • class StreamingAggregationStateManagerImplV2(

@SparkQA
Copy link

SparkQA commented Jul 18, 2018

Test build #93222 has finished for PR 21733 at commit 4754469.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait StreamingAggregationStateManager extends Serializable
  • abstract class StreamingAggregationStateManagerBaseImpl(
  • class StreamingAggregationStateManagerImplV1(
  • class StreamingAggregationStateManagerImplV2(

@SparkQA
Copy link

SparkQA commented Jul 19, 2018

Test build #93277 has finished for PR 21733 at commit ca198ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait StreamingAggregationStateManager extends Serializable
  • abstract class StreamingAggregationStateManagerBaseImpl(
  • class StreamingAggregationStateManagerImplV1(
  • class StreamingAggregationStateManagerImplV2(

…ggregation

* add option to configure enabling new feature: remove redundant key data from value
* modify code to respect new option (turning on/off feature)
* modify tests to run tests with both on/off
* Add guard in OffsetSeqMetadata to prevent modifying option after executing query
@HeartSaVioR
Copy link
Contributor Author

Now I'd like to propose changing default behavior to apply new path but keeping backward compatibility, so applied it to the patch. I'm still open on decision to apply it as advanced option as first approach, and happy to roll back when we decide on that way.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Jul 20, 2018

Add tests for StatefulOperatorsHelper itself as well. (Sorry for pushing commits multiple times which trigger multiple builds. It might be ideal if older test builds are terminated once newer test build for specific PR is just launched: and looks like it already works like the way via failing with -9.)

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Aug 7, 2018

@tdas
Done running perf. test with 4 more tests:

https://github.com/HeartSaVioR/iot-trucking-app-spark-structured-streaming/blob/master/src/main/scala/com/hortonworks/spark/benchmark/

BenchmarkMovingAggregationsListenerKeyMuchBigger

rate: 160000

version input rows per second processed rows per second total state rows used bytes of current state version
latest master (c9914cf) 159877.232 149537.817 65000 133511303
patch (on top of c9914cf) 160049.118 152497.945 65000 73236351

state size: 54.854 % (reduces 45.15%)

BenchmarkMovingAggregationsListenerManyKeys

rate: 120000

version input rows per second processed rows per second total state rows used bytes of current state version
latest master (c9914cf) 120266.810 107482.042 65000 38433719
patch (on top of c9914cf) 119865.855 109268.772 65000 24900343

state size: 64.787% (reduces 35.21%)

BenchmarkMovingAggregationsListenerManyValues

rate: 25000

version input rows per second processed rows per second total state rows used bytes of current state version
latest master (c9914cf) 25009.236 21216.126 90000 77161711 (857.352 per row)
patch (on top of c9914cf) 25060.635 20774.500 99495 78230335 (786.274 per row)

state size: 91.709 % (reduces 8.29 %)

BenchmarkMovingAggregationsListenerValueMuchBigger

rate: 85000

version input rows per second processed rows per second total state rows used bytes of current state version
latest master (c9914cf) 85310.774 79091.271 1000 1324255
patch (on top of c9914cf) 84791.761 79755.905 1000 1282687

state size: 96.861 % (reduces 3.14 %)

I don't find any outstanding perf. hit, and expected state size reduction is shown from all over the cases.

@HeartSaVioR
Copy link
Contributor Author

Also added javadoc as well. Most of contents are from StateStore but I didn't copy the note to implementation for state store since it is duplicated. Please let me know if we want to add content for the parameter target state store as well.

@SparkQA
Copy link

SparkQA commented Aug 8, 2018

Test build #94403 has finished for PR 21733 at commit e0ee04a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better, thank you for the changes! There are still a few improvements that can be made.

@@ -81,4 +85,221 @@ package object state {
storeCoordinator)
}
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ummm why is it in this package class and not in separate file?? Is there any reason it has to be state package object when not all of stateful require it, only streaming aggregation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misinterpret your suggestion before. I thought you are suggesting move to state package class. Will place it to separate file.

def remove(store: StateStore, key: UnsafeRow): Unit

/**
* Return an iterator containing all the key-value pairs in target state store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: some of these can be compressed to a single line doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address.

* Commit all the updates that have been made to the target state store, and return the
* new version.
*
* @param store The target StateStore instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superfluous. just the main statement has all the information.

* Extract columns consisting key from input row, and return the new row for key columns.
*
* @param row The input row.
* @return The row instance which only contains key columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a lot of the @param and @return in the docs are a bit superfluous as it just repeats what the main statement already says.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will just remove all the @param and @return if they are repeating.

* @param row The input row.
* @return The row instance which only contains key columns.
*/
def getKey(row: InternalRow): UnsafeRow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is the input typed InternalRow where everything else is UnsafeRow? seems inconsistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getKey was basically UnsafeProjection in statefulOperator so didn't necessarily require UnsafeRow. I just followed the usage to make it less restrict, but we know, in reality row will be always UnsafeRow. So OK to fix if it provides consistency.

// state manager should return row which is same as input row regardless of format version
assert(inputRow === stateManager.get(memoryStateStore, keyRow))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

return savedState
}

val joinedRow = joiner.join(key, savedState)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant you dedup the code with restoreOriginRow method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed spot. Will leverage restoreOriginRow.

store.iterator().map(rowPair => restoreOriginRow(rowPair))
}

private def restoreOriginRow(rowPair: UnsafeRowPair): UnsafeRow = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to restoreOriginalRow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename.

@@ -167,6 +165,18 @@ trait WatermarkSupport extends UnaryExecNode {
}
}
}

protected def removeKeysOlderThanWatermark(storeManager: StreamingAggregationStateManager,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect indent of parameters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... where is this used? This does not seem to be used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed spot. It will be called from Update mode of StateStoreSaveExec. Will address.

val hasInput = iter.hasNext
if (!hasInput && keyExpressions.isEmpty) {
// If our `keyExpressions` are empty, we're getting a global aggregation. In that case
// the `HashAggregateExec` will output a 0 value for the partial merge. We need to
// restore the value, so that we don't overwrite our state with a 0 value, but rather
// merge the 0 with existing state.
// In this case the value should represent origin row, so no need to restore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean? I think this is not needed any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be removed now. Will remove.

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @tdas for the super detailed and thoughtful review! Will address your review comments and update the PR.

"be modified after running.")
.intConf
.checkValue(v => Set(1, 2).contains(v), "Valid versions are 1 and 2")
.createWithDefault(2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion. Will add the test.

* Extract columns consisting key from input row, and return the new row for key columns.
*
* @param row The input row.
* @return The row instance which only contains key columns.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will just remove all the @param and @return if they are repeating.

* @param row The input row.
* @return The row instance which only contains key columns.
*/
def getKey(row: InternalRow): UnsafeRow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getKey was basically UnsafeProjection in statefulOperator so didn't necessarily require UnsafeRow. I just followed the usage to make it less restrict, but we know, in reality row will be always UnsafeRow. So OK to fix if it provides consistency.


def getKey(row: InternalRow): UnsafeRow = keyProjector(row)

override def commit(store: StateStore): Long = store.commit()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is actually based on your review comment: always use state manager and don't directly access state store whenever possible. If your suggestion only applies to operations I can remove commit() from this interface.

val hasInput = iter.hasNext
if (!hasInput && keyExpressions.isEmpty) {
// If our `keyExpressions` are empty, we're getting a global aggregation. In that case
// the `HashAggregateExec` will output a 0 value for the partial merge. We need to
// restore the value, so that we don't overwrite our state with a 0 value, but rather
// merge the 0 with existing state.
// In this case the value should represent origin row, so no need to restore.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be removed now. Will remove.

return savedState
}

val joinedRow = joiner.join(key, savedState)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed spot. Will leverage restoreOriginRow.

GenerateUnsafeRowJoiner.create(StructType.fromAttributes(keyExpressions),
StructType.fromAttributes(valueExpressions))
@transient private lazy val restoreValueProjector = GenerateUnsafeProjection.generate(
keyValueJoinedExpressions, inputRowAttributes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. You're right. Will fix. Btw, needToProjectToRestoreValue is always false, unless sequence of columns for key and value get mixed up.


private val valueExpressions: Seq[Attribute] = inputRowAttributes.diff(keyExpressions)
private val keyValueJoinedExpressions: Seq[Attribute] = keyExpressions ++ valueExpressions
private val needToProjectToRestoreValue: Boolean =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

* - key: Same as key expressions.
* - value: Same as input row attributes. The schema of value contains key expressions as well.
*
* This implementation only works when input row attributes contain all the key attributes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to put the sentence as a requirement / precondition for possible future usages, but if you think we don't need to put it explicitly I can remove it.

def remove(store: StateStore, key: UnsafeRow): Unit

/**
* Return an iterator containing all the key-value pairs in target state store.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address.

TODO list
* add test which reads checkpoint from 2.3.1 and runs query
* add test which restores query from Spark 2.3.1
@HeartSaVioR
Copy link
Contributor Author

@tdas Addressed review comments. Please take a look again. Thanks in advance.

@SparkQA
Copy link

SparkQA commented Aug 9, 2018

Test build #94469 has finished for PR 21733 at commit 65801a6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Aug 9, 2018

Test build #94474 has finished for PR 21733 at commit 65801a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

@tdas Kindly reminder.

@tdas
Copy link
Contributor

tdas commented Aug 21, 2018

This looks good!! Only one comment, please don't add the .crc files. They are useless and adds unnecessarily clutter.

@tdas
Copy link
Contributor

tdas commented Aug 21, 2018

LGTM. Will merge when tests pass. :)

@HeartSaVioR
Copy link
Contributor Author

@tdas Removed the .crc files. Thanks for reviewing!
Btw, it might be good to find the way to add .crc files to .gitignore if .crc files are mostly useless all the times. Might worth to add another minor PR?

@tdas
Copy link
Contributor

tdas commented Aug 21, 2018

Good point. That can be minor Pr.

@SparkQA
Copy link

SparkQA commented Aug 21, 2018

Test build #95003 has finished for PR 21733 at commit 19888ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

HyukjinKwon pushed a commit to HyukjinKwon/spark that referenced this pull request Aug 21, 2018
## What changes were proposed in this pull request?

Add .crc files to .gitignore so that we don't add .crc files in state checkpoint to git repo which could be added in test resources.
This is based on comments in apache#21733, apache#21733 (comment).

## How was this patch tested?

Add `.1.delta.crc` and `.2.delta.crc` in `<spark root>/sql/core/src/test/resources`, and confirm git doesn't suggest the files to add to stage.

Closes apache#22170 from HeartSaVioR/add-crc-files-to-gitignore.

Authored-by: Jungtaek Lim <kabhwan@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
@asfgit asfgit closed this in 6c5cb85 Aug 21, 2018
@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and thanks @tdas for merging this in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants