Skip to content

[SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index#27763

Closed
gengliangwang wants to merge 2 commits intoapache:masterfrom
gengliangwang:useNaturalIndex
Closed

[SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index#27763
gengliangwang wants to merge 2 commits intoapache:masterfrom
gengliangwang:useNaturalIndex

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

The method removeAllByIndexValues in KVStore is to delete all the objects which have certain values in the given index.
However, in the current implementation of InMemoryStore, when the given index is the natural key index, there is no special handling for it and a linear search over all the task data is performed.

We can improve it by deleting the natural keys directly from the internal hashmap.

Why are the changes needed?

Better performance if the given index for removeAllByIndexValues is the natural key index in
InMemoryStore

Does this PR introduce any user-facing change?

No

How was this patch tested?

Enhance the existing test.

@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Test build #119187 has finished for PR 27763 at commit ded0bbd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@gengliangwang gengliangwang requested a review from cloud-fan March 2, 2020 23:32
Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@HeartSaVioR
Copy link
Contributor

Btw, out-of-topic, but while we are here, is it intentional or missing spot CountingRemoveIfForEach only removes the data and doesn't touch parentToChildrenMap? If it's just a missing spot maybe I could submit a PR to fix it.

@gengliangwang
Copy link
Member Author

gengliangwang commented Mar 2, 2020

Btw, out-of-topic, but while we are here, is it intentional or missing spot CountingRemoveIfForEach only removes the data and doesn't touch parentToChildrenMap? If it's just a missing spot maybe I could submit a PR to fix it.

You are right. It is a missing part of #27716 .
Feel free to submit a PR.

@HeartSaVioR
Copy link
Contributor

Thanks for confirming, @gengliangwang !

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119191 has finished for PR 27763 at commit 60ed5d0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119188 has finished for PR 27763 at commit ded0bbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

Just submitted a PR #27765. It might be possible to be conflicted - I'll rebase if necessary once this PR gets merged.

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119195 has finished for PR 27763 at commit 60ed5d0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119205 has finished for PR 27763 at commit 60ed5d0.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119210 has finished for PR 27763 at commit 60ed5d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in b5166aa Mar 3, 2020
gengliangwang added a commit to gengliangwang/spark that referenced this pull request Mar 4, 2020
…ues over natural key index

### What changes were proposed in this pull request?

The method `removeAllByIndexValues` in KVStore is to delete all the objects which have certain values in the given index.
However, in the current implementation of `InMemoryStore`, when the given index is the natural key index, there is no special handling for it and a linear search over all the task data is performed.

We can improve it by deleting the natural keys directly from the internal hashmap.

### Why are the changes needed?

Better performance if the given index for `removeAllByIndexValues` is the natural key index in
`InMemoryStore`
### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Enhance the existing test.

Closes apache#27763 from gengliangwang/useNaturalIndex.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ues over natural key index

### What changes were proposed in this pull request?

The method `removeAllByIndexValues` in KVStore is to delete all the objects which have certain values in the given index.
However, in the current implementation of `InMemoryStore`, when the given index is the natural key index, there is no special handling for it and a linear search over all the task data is performed.

We can improve it by deleting the natural keys directly from the internal hashmap.

### Why are the changes needed?

Better performance if the given index for `removeAllByIndexValues` is the natural key index in
`InMemoryStore`
### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Enhance the existing test.

Closes apache#27763 from gengliangwang/useNaturalIndex.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants