update proportion of memory #66

CrazyJvm · 2014-03-03T08:42:59Z

The default value of "spark.storage.memoryFraction" has been changed from 0.66 to 0.6 . So it should be 60% of the memory to cache while 40% used for task execution.

the default value of "spark.storage.memoryFraction" has been change from 0.66 to 0.6 . So it should be 60% of the memory to cache while 40% used for task execution.

AmplabJenkins · 2014-03-03T09:23:13Z

Can one of the admins verify this patch?

rxin · 2014-03-03T22:41:16Z

Thanks. I've merged this.

SKIPME merging Apache branch-1.4 bug fixes

add spark streaming requirements to pomfile

* revert change hosts * Update Jenkinsfile

[SPARK-21414] Refine SlidingWindowFunctionFrame to avoid OOM Refine SlidingWindowFunctionFrame to avoid OOM resolve apache#66 See merge request !59

[SPARK-22683][CORE] Allow tuning the number of dynamically allocated executors

Signed-off-by: Rostyslav Sotnychenko <rsotnychenko@maprtech.com> (cherry picked from commit de237dc)

[SPARK-300] fix docs for HDFS

[SPARK-22683][CORE] Allow tuning the number of dynamically allocated executors

UT: dims/openstack-cloud-controller-manager + devstack [in vm]

apache#39 performance issue in fuction getAliasedConstraints of LogicalPlan

…join can be planned as broadcast join ### What changes were proposed in this pull request? Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases. ```scala spark.range(50000000L).selectExpr("id % 10000 as a", "id % 10000 as b").write.saveAsTable("t1") spark.range(40000000L).selectExpr("id % 8000 as c", "id % 8000 as d").write.saveAsTable("t2") spark.sql("SELECT distinct a, b FROM t1 INTERSECT SELECT distinct c, d FROM t2").explain ``` Before this pr: ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#72] +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L)], LeftSemi :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, [id=#65] : +- FileScan parquet default.t1[a#16L,b#17L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint> +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, [id=#66] +- HashAggregate(keys=[c#18L, d#19L], functions=[]) +- Exchange hashpartitioning(c#18L, d#19L, 5), ENSURE_REQUIREMENTS, [id=#61] +- HashAggregate(keys=[c#18L, d#19L], functions=[]) +- FileScan parquet default.t2[c#18L,d#19L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c:bigint,d:bigint> ``` After this pr: ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#74] +- HashAggregate(keys=[a#16L, b#17L], functions=[]) +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L)], LeftSemi :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, [id=#67] : +- HashAggregate(keys=[a#16L, b#17L], functions=[]) : +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#61] : +- HashAggregate(keys=[a#16L, b#17L], functions=[]) : +- FileScan parquet default.t1[a#16L,b#17L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint> +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, [id=#68] +- HashAggregate(keys=[c#18L, d#19L], functions=[]) +- Exchange hashpartitioning(c#18L, d#19L, 5), ENSURE_REQUIREMENTS, [id=#63] +- HashAggregate(keys=[c#18L, d#19L], functions=[]) +- FileScan parquet default.t2[c#18L,d#19L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c:bigint,d:bigint> ``` ### Why are the changes needed? 1. Pushdown LeftSemi/LeftAnti over Aggregate will affect performance. 2. It will remove user added DISTINCT operator, e.g.: [q38](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q38.sql), [q87](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q87.sql). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test and benchmark test. SQL | Before this PR(Seconds) | After this PR(Seconds) -- | -- | -- q14a | 660 | 594 q14b | 660 | 600 q38 | 55 | 29 q87 | 66 | 35 Before this pr: ![image](https://user-images.githubusercontent.com/5399861/104452849-8789fc80-55de-11eb-88da-44059899f9a9.png) After this pr: ![image](https://user-images.githubusercontent.com/5399861/104452899-9a043600-55de-11eb-9286-d8f3a23ca3b8.png) Closes #31145 from wangyum/SPARK-34081. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…r `postgreSQL/float4.sql` and `postgreSQL/int8.sql` ### What changes were proposed in this pull request? This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` to fix Java 21 daily test. ### Why are the changes needed? Fix Java 21 daily test: - https://github.com/apache/spark/actions/runs/10823897095/job/30030200710 ``` [info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds) [info] postgreSQL/float4.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]expression" : "'N A ...", but got "...arameters" : { [info] "[]expression" : "'N A ..." Result did not match for query #11 [info] SELECT float('N A N') (SQLQueryTestSuite.scala:663) ... [info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds) [info] postgreSQL/int8.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]sourceType" : "\"BIG...", but got "...arameters" : { [info] "[]sourceType" : "\"BIG..." Result did not match for query #66 [info] SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 (SQLQueryTestSuite.scala:663) ... [info] *** 2 TESTS FAILED *** [error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4 [error] Failed tests: [error] org.apache.spark.sql.SQLQueryTestSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Acitons - Manual checked: `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed ` ### Was this patch authored or co-authored using generative AI tooling? No Closes #48089 from LuciferYang/SPARK-49578-FOLLOWUP. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

update proportion of memory

0f84d86

the default value of "spark.storage.memoryFraction" has been change from 0.66 to 0.6 . So it should be 60% of the memory to cache while 40% used for task execution.

asfgit closed this in 9d225a9 Mar 3, 2014

wli600 pushed a commit to wli600/spark that referenced this pull request Jul 29, 2015

Merge pull request apache#66 from markhamstra/csd-1.4

e8336ec

SKIPME merging Apache branch-1.4 bug fixes

JasonMWhite pushed a commit to JasonMWhite/spark that referenced this pull request Dec 2, 2015

Merge pull request apache#66 from Shopify/spark_streaming_pom_updates

17e27dd

add spark streaming requirements to pomfile

marcosdotps pushed a commit to marcosdotps/spark that referenced this pull request Sep 21, 2017

revert change hosts (apache#66)

46e3ec0

* revert change hosts * Update Jenkinsfile

cenyuhai added a commit to cenyuhai/spark that referenced this pull request Oct 8, 2017

Merge branch 'SPARK-21414' into 'spark_2.1'

a94c3ca

[SPARK-21414] Refine SlidingWindowFunctionFrame to avoid OOM Refine SlidingWindowFunctionFrame to avoid OOM resolve apache#66 See merge request !59

ashangit added a commit to ashangit/spark that referenced this pull request Feb 22, 2018

Merge pull request apache#66 from jcuquemelle/SPARK-22683-criteo2.2

bd21292

[SPARK-22683][CORE] Allow tuning the number of dynamically allocated executors

jamesrgrinter pushed a commit to jamesrgrinter/spark that referenced this pull request Apr 22, 2018

Kafka streaming producer added. (apache#66)

317ca9e

Signed-off-by: Rostyslav Sotnychenko <rsotnychenko@maprtech.com> (cherry picked from commit de237dc)

Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018

Merge pull request apache#66 from mesosphere/spark-300-docs

a42c0e3

[SPARK-300] fix docs for HDFS

clems4ever pushed a commit to clems4ever/spark that referenced this pull request Feb 11, 2019

Merge pull request apache#66 from jcuquemelle/SPARK-22683-criteo2.2

43f679f

[SPARK-22683][CORE] Allow tuning the number of dynamically allocated executors

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#66 from animationzl/55

65a7fb9

UT: dims/openstack-cloud-controller-manager + devstack [in vm]

yuexingri pushed a commit to yuexingri/spark that referenced this pull request Dec 9, 2019

Merge pull request apache#66 from zheniantoushipashi/spark-39

e863cef

apache#39 performance issue in fuction getAliasedConstraints of LogicalPlan

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

Kafka streaming producer added. (apache#66)

f4d6113

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update proportion of memory #66

update proportion of memory #66

CrazyJvm commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

rxin commented Mar 3, 2014

update proportion of memory #66

update proportion of memory #66

Conversation

CrazyJvm commented Mar 3, 2014

AmplabJenkins commented Mar 3, 2014

rxin commented Mar 3, 2014