Branch 3.2 switch jdk by deepak-shivanandappa · Pull Request #38855 · apache/spark

deepak-shivanandappa · 2022-12-01T02:06:31Z

Switch JDK from openjdk to eclipse temurin

### What changes were proposed in this pull request? This PR proposes to pin the Python package `markupsafe` to 2.0.1 to fix the CI failure as below. ``` ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/home/runner/work/_temp/setup-sam-43osIE/.venv/lib/python3.10/site-packages/markupsafe/__init__.py) ``` Since `markupsafe==2.1.0` has removed `soft_unicode`, `from markupsafe import soft_unicode` no longer working properly. See aws/aws-sam-cli#3661 for more detail. ### Why are the changes needed? To fix the CI failure on branch-3.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests are should be passed Closes apache#35602 from itholic/SPARK-38279. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

### What changes were proposed in this pull request? when `replacement=true`, `Sample.maxRows` returns `None` ### Why are the changes needed? the underlying impl of `SampleExec` can not guarantee that its number of output rows <= `Sample.maxRows` ``` scala> val df = spark.range(0, 1000) df: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> df.count res0: Long = 1000 scala> df.sample(true, 0.999999, 10).count res1: Long = 1004 ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing testsuites Closes apache#35593 from zhengruifeng/fix_sample_maxRows. Authored-by: Ruifeng Zheng <ruifengz@foxmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b683279) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…o_numpy in POS MyPy build currently fails as below: ``` starting mypy annotations test... annotations failed mypy checks: python/pyspark/pandas/generic.py:585: error: Incompatible return value type (got "Union[ndarray[Any, Any], ExtensionArray]", expected "ndarray[Any, Any]") [return-value] Found 1 error in 1 file (checked 324 source files) 1 ``` https://github.com/apache/spark/runs/5298261168?check_suite_focus=true I tried to reproduce in my local by matching NumPy and MyPy versions but failed. So I decided to work around the problem first by explicitly casting to make MyPy happy. To make the build pass. No, dev-only. CI in this PR should verify if it's fixed. Closes apache#35617 from HyukjinKwon/SPARK-38297. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit b46b74c) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…SI mode Elt() should return null if the input index is null under ANSI mode, which is consistent with MySQL where the function is from. Before changes: <img width="824" alt="image" src="https://user-images.githubusercontent.com/1097932/155308033-2e47b49a-b98b-4fd6-b1f1-d89762452fba.png"> After changes: The query returns null. Bug fix Yes, SQL function Elt() returns null if the input index is null under ANSI mode, instead of runtime error. UT Closes apache#35629 from gengliangwang/fixEltErrorMsg. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit a2448a4) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? check Union's maxRows and maxRowsPerPartition ### Why are the changes needed? Union's maxRows and maxRowsPerPartition may overflow: case 1: ``` scala> val df1 = spark.range(0, Long.MaxValue, 1, 1) df1: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> val df2 = spark.range(0, 100, 1, 10) df2: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> val union = df1.union(df2) union: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> union.queryExecution.logical.maxRowsPerPartition res19: Option[Long] = Some(-9223372036854775799) scala> union.queryExecution.logical.maxRows res20: Option[Long] = Some(-9223372036854775709) ``` case 2: ``` scala> val n = 2000000 n: Int = 2000000 scala> val df1 = spark.range(0, n, 1, 1).selectExpr("id % 5 as key1", "id as value1") df1: org.apache.spark.sql.DataFrame = [key1: bigint, value1: bigint] scala> val df2 = spark.range(0, n, 1, 2).selectExpr("id % 3 as key2", "id as value2") df2: org.apache.spark.sql.DataFrame = [key2: bigint, value2: bigint] scala> val df3 = spark.range(0, n, 1, 3).selectExpr("id % 4 as key3", "id as value3") df3: org.apache.spark.sql.DataFrame = [key3: bigint, value3: bigint] scala> val joined = df1.join(df2, col("key1") === col("key2")).join(df3, col("key1") === col("key3")) joined: org.apache.spark.sql.DataFrame = [key1: bigint, value1: bigint ... 4 more fields] scala> val unioned = joined.select(col("key1"), col("value3")).union(joined.select(col("key1"), col("value2"))) unioned: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [key1: bigint, value3: bigint] scala> unioned.queryExecution.optimizedPlan.maxRows res32: Option[Long] = Some(-2446744073709551616) scala> unioned.queryExecution.optimizedPlan.maxRows res33: Option[Long] = Some(-2446744073709551616) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added testsuite Closes apache#35609 from zhengruifeng/union_maxRows_validate. Authored-by: Ruifeng Zheng <ruifengz@foxmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 683bc46) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…aFrame.to_numpy in POS" This reverts commit a0d2be5.

…o_numpy in POS MyPy build currently fails as below: ``` starting mypy annotations test... annotations failed mypy checks: python/pyspark/pandas/generic.py:585: error: Incompatible return value type (got "Union[ndarray[Any, Any], ExtensionArray]", expected "ndarray[Any, Any]") [return-value] Found 1 error in 1 file (checked 324 source files) 1 ``` https://github.com/apache/spark/runs/5298261168?check_suite_focus=true I tried to reproduce in my local by matching NumPy and MyPy versions but failed. So I decided to work around the problem first by explicitly casting to make MyPy happy. To make the build pass. No, dev-only. CI in this PR should verify if it's fixed. Closes apache#35617 from HyukjinKwon/SPARK-38297. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit b46b74c) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…new Path(locationUri).isAbsolute" in create/alter table ### What changes were proposed in this pull request? After apache#28527, we change to create table under the database location when the table location is relative. However the criteria to determine if a table location is relative/absolute is `URI.isAbsolute`, which basically checks if the table location URI has a scheme defined. So table URIs like `/table/path` are treated as relative and the scheme and authority of the database location URI are used to create the table. For example, when the database location URI is `s3a://bucket/db`, the table will be created at `s3a://bucket/table/path`, while it should be created under the file system defined in `SessionCatalog.hadoopConf` instead. This change fixes that by treating table location as absolute when the first letter of its path is slash. This also applies to alter table. ### Why are the changes needed? This is to fix the behavior described above. ### Does this PR introduce _any_ user-facing change? Yes. When users try to create/alter a table with a location that starts with a slash but without a scheme defined, the table will be created under/altered to the file system defined in `SessionCatalog.hadoopConf`, instead of the one defined in the database location URI. ### How was this patch tested? Updated unit tests. Closes apache#35591 from bozhang2820/spark-31709-3.2. Authored-by: Bo Zhang <bo.zhang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ying input streams ### What changes were proposed in this pull request? Wrapping the DataInputStream in the SparkPlan.decodeUnsafeRows method with a NextIterator as opposed to a plain Iterator, this will allow us to close the DataInputStream properly. This happens in Spark driver only. ### Why are the changes needed? SPARK-34647 replaced the ZstdInputStream with ZstdInputStreamNoFinalizer. This meant that all usages of `CompressionCodec.compressedInputStream` would need to manually close the stream as this would no longer be handled by the finaliser mechanism. In SparkPlan, the result of `CompressionCodec.compressedInputStream` is wrapped in an Iterator which never calls close. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? #### Spark Shell Configuration ```bash $> export SPARK_SUBMIT_OPTS="-XX:+AlwaysPreTouch -Xms1g" $> $SPARK_HOME/bin/spark-shell --conf spark.io.compression.codec=zstd ``` #### Test Script ```scala import java.sql.Timestamp import java.time.Instant import spark.implicits._ case class Record(timestamp: Timestamp, batch: Long, value: Long) (1 to 300).foreach { batch => sc.parallelize(1 to 1000000).map(Record(Timestamp.from(Instant.now()), batch, _)).toDS.write.parquet(s"test_data/batch_$batch") } (1 to 300).foreach(batch => spark.read.parquet(s"test_data/batch_$batch").as[Record].repartition().collect()) ``` #### Memory Monitor ```shell $> while true; do echo \"$(date +%Y-%m-%d' '%H:%M:%S)\",$(pmap -x <PID> | grep "total kB" | awk '{print $4}'); sleep 10; done; ``` #### Results ##### Before ``` "2022-02-22 11:55:23",1400016 "2022-02-22 11:55:33",1522024 "2022-02-22 11:55:43",1587812 "2022-02-22 11:55:53",1631868 "2022-02-22 11:56:03",1657252 "2022-02-22 11:56:13",1659728 "2022-02-22 11:56:23",1664640 "2022-02-22 11:56:33",1674152 "2022-02-22 11:56:43",1697320 "2022-02-22 11:56:53",1689636 "2022-02-22 11:57:03",1783888 "2022-02-22 11:57:13",1896920 "2022-02-22 11:57:23",1950492 "2022-02-22 11:57:33",2010968 "2022-02-22 11:57:44",2066560 "2022-02-22 11:57:54",2108232 "2022-02-22 11:58:04",2158188 "2022-02-22 11:58:14",2211344 "2022-02-22 11:58:24",2260180 "2022-02-22 11:58:34",2316352 "2022-02-22 11:58:44",2367412 "2022-02-22 11:58:54",2420916 "2022-02-22 11:59:04",2472132 "2022-02-22 11:59:14",2519888 "2022-02-22 11:59:24",2571372 "2022-02-22 11:59:34",2621992 "2022-02-22 11:59:44",2672400 "2022-02-22 11:59:54",2728924 "2022-02-22 12:00:04",2777712 "2022-02-22 12:00:14",2834272 "2022-02-22 12:00:24",2881344 "2022-02-22 12:00:34",2935552 "2022-02-22 12:00:44",2984896 "2022-02-22 12:00:54",3034116 "2022-02-22 12:01:04",3087092 "2022-02-22 12:01:14",3134432 "2022-02-22 12:01:25",3198316 "2022-02-22 12:01:35",3193484 "2022-02-22 12:01:45",3193212 "2022-02-22 12:01:55",3192872 "2022-02-22 12:02:05",3191772 "2022-02-22 12:02:15",3187780 "2022-02-22 12:02:25",3177084 "2022-02-22 12:02:35",3173292 "2022-02-22 12:02:45",3173292 "2022-02-22 12:02:55",3173292 ``` ##### After ``` "2022-02-22 12:05:03",1377124 "2022-02-22 12:05:13",1425132 "2022-02-22 12:05:23",1564060 "2022-02-22 12:05:33",1616116 "2022-02-22 12:05:43",1637448 "2022-02-22 12:05:53",1637700 "2022-02-22 12:06:03",1653912 "2022-02-22 12:06:13",1659532 "2022-02-22 12:06:23",1673368 "2022-02-22 12:06:33",1687580 "2022-02-22 12:06:43",1711076 "2022-02-22 12:06:53",1849752 "2022-02-22 12:07:03",1861528 "2022-02-22 12:07:13",1871200 "2022-02-22 12:07:24",1878860 "2022-02-22 12:07:34",1879332 "2022-02-22 12:07:44",1886552 "2022-02-22 12:07:54",1884160 "2022-02-22 12:08:04",1880924 "2022-02-22 12:08:14",1876084 "2022-02-22 12:08:24",1878800 "2022-02-22 12:08:34",1879068 "2022-02-22 12:08:44",1880088 "2022-02-22 12:08:54",1880160 "2022-02-22 12:09:04",1880496 "2022-02-22 12:09:14",1891672 "2022-02-22 12:09:24",1878552 "2022-02-22 12:09:34",1876136 "2022-02-22 12:09:44",1890056 "2022-02-22 12:09:54",1878076 "2022-02-22 12:10:04",1882440 "2022-02-22 12:10:14",1893172 "2022-02-22 12:10:24",1894216 "2022-02-22 12:10:34",1894204 "2022-02-22 12:10:44",1894716 "2022-02-22 12:10:54",1894720 "2022-02-22 12:11:04",1894720 "2022-02-22 12:11:15",1895232 "2022-02-22 12:11:25",1895496 "2022-02-22 12:11:35",1895496 ``` Closes apache#35613 from kevins-29/spark-38273. Lead-authored-by: Kevin Sewell <kevins_25@apple.com> Co-authored-by: kevins-29 <100220899+kevins-29@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 43c89dc) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? Upgrade ansi-regex from 5.0.0 to 5.0.1 in /dev ### Why are the changes needed? [CVE-2021-3807](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3807) [Releases notes at github](https://github.com/chalk/ansi-regex/releases) By upgrading ansi-regex from 5.0.0 to 5.0.1 we will resolve this issue. ### Does this PR introduce _any_ user-facing change? Some users use remote security scanners and this is one of the issues that comes up. How this can do some damage with spark is highly uncertain. but let's remove the uncertainty that any user may have. ### How was this patch tested? All test must pass. Closes apache#35628 from bjornjorgensen/ansi-regex-from-5.0.0-to-5.0.1. Authored-by: bjornjorgensen <bjornjorgensen@gmail.com> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com> (cherry picked from commit 9758d55) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>

…in.extractKeyExprAt() ### What changes were proposed in this pull request? SubqueryBroadcastExec retrieves the partition key from the broadcast results based on the type of HashedRelation returned. If the key is packed inside a Long, we extract it through bitwise operations and cast it as Byte/Short/Int if necessary. The casting here can cause a potential runtime error. This PR is to fix it. ### Why are the changes needed? Bug fix ### Does this PR introduce _any_ user-facing change? Yes, avoid potential runtime error in dynamic pruning under ANSI mode ### How was this patch tested? UT Closes apache#35659 from gengliangwang/fixHashJoin. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 29eca8c) Signed-off-by: Gengliang Wang <gengliang@apache.org>

…ore calling FileUtil methods ### What changes were proposed in this pull request? Explicitly check existence of source file in Utils.unpack before calling Hadoop FileUtil methods ### Why are the changes needed? A discussion from the Hadoop community raised a potential issue in calling these methods when a file doesn't exist. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes apache#35632 from srowen/SPARK-38305. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit 64e1f28) Signed-off-by: Sean Owen <srowen@gmail.com>

### What changes were proposed in this pull request? This PR fixes sql query in doc, let the query confrom to the query result in the following ### Why are the changes needed? Just a fix to doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? use project test Closes apache#35624 from redsnow1992/patch-1. Authored-by: Alfonso <alfonso_men@yahoo.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit daa5f9d) Signed-off-by: Sean Owen <srowen@gmail.com>

…ty vulnerabilities This is a backport of apache#34362 to branch 3.2. ### What changes were proposed in this pull request? This PR ported HIVE-21498, HIVE-25098 and upgraded libthrift to 0.16.0. The CHANGES list for libthrift 0.16.0 is available at: https://github.com/apache/thrift/blob/v0.16.0/CHANGES.md ### Why are the changes needed? To address [CVE-2020-13949](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test. Closes apache#35646 from wangyum/SPARK-37090-branch-3.2. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <srowen@gmail.com>

…liased array types An aliased array type in a product, in a Dataset or Dataframe, causes an exception: ``` type Data = Array[Long] val xs:List[(Data,Int)] = List((Array(1),1), (Array(2),2)) sc.parallelize(xs).toDF("a", "b") ``` Causing ``` scala.MatchError: Data (of class scala.reflect.internal.Types$AliasNoArgsTypeRef) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:104) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:904) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:903) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$6(ScalaReflection.scala:573) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:562) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:904) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:903) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:432) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:421) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:904) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:903) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:413) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:55) at org.apache.spark.sql.Encoders$.product(Encoders.scala:285) at org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:251) at org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:251) at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32) ... 48 elided ``` It seems that this can be fixed by changing, in ScalaReflection.dataTypeFor: ``` val TypeRef(_, _, Seq(elementType)) = tpe ``` to ``` val TypeRef(_, _, Seq(elementType)) = tpe.dealias ``` ### Why are the changes needed? Without this change, any attempt to create datasets or dataframes using such types throws the exception above. ### Does this PR introduce _any_ user-facing change? No, except for preventing this exception from being thrown. ### How was this patch tested? Added a test to DatasetSuite Closes apache#35370 from jtnystrom/spark-38042. Lead-authored-by: Johan Nystrom <johan@monomorphic.org> Co-authored-by: Johan Nystrom-Persson <johan@jnpersson.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 89799b8) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ity predicates This PR relaxes the constraint added in [SPARK-35080](https://issues.apache.org/jira/browse/SPARK-35080) by allowing safe up-cast expressions in correlated equality predicates. Cast expressions are often added by the compiler during query analysis. Correlated equality predicates can be less restrictive to support this common pattern if a cast expression guarantees one-to-one mapping between the child expression and the output datatype (safe up-cast). Yes. Safe up-cast expressions are allowed in correlated equality predicates: ```sql SELECT (SELECT SUM(b) FROM VALUES (1, 1), (1, 2) t(a, b) WHERE CAST(a AS STRING) = x) FROM VALUES ('1'), ('2') t(x) ``` Before this change, this query will throw AnalysisException "Correlated column is not allowed in predicate...", and after this change, this query can run successfully. Unit tests. Closes apache#35486 from allisonwang-db/spark-38180-cast-in-predicates. Authored-by: allisonwang-db <allison.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2f5cfb0) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…utput ### What changes were proposed in this pull request? In `updateAttr`, let the new Attribute have the same nullability as the Attribute to be replaced. ### Why are the changes needed? `attrMap` can possibly be populated below an outer join and the outer join changes nullability. ### How was this patch tested? New unit test - verified that it fails without the fix. Closes apache#35685 from sigmod/nullability2. Authored-by: Yingyi Bu <yingyi.bu@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>

… as logical plan In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both of them with `View`. This causes a regression that below case works in 3.1 but failed in 3.2 ```sql sql("select 1").createOrReplaceTempView("v") sql("select * from v").createOrReplaceTempView("v") -- in 3.1 it works well, and select will output 1 -- in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)" ``` The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they are wrapped by `SubqueryAlias` instead of `View` In this PR, we want to skip the cyclic check if the view is stored as a logical plan. i.e., `storeAnalyzedPlanForView = true` or view is created by Dataset API. fix regression No newly added ut Closes apache#35653 from linhongliu-db/SPARK-38318. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 1d068ce) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

… DeduplicateRelations When the join with duplicate view like ``` SELECT l1.idFROM v1 l1 INNER JOIN ( SELECT id FROM v1 GROUP BY id HAVING COUNT(DISTINCT name) > 1 ) l2 ON l1.id = l2.id GROUP BY l1.name, l1.id; ``` The error stack is: ``` Resolved attribute(s) name#26 missing from id#31,name#32 in operator !Aggregate [id#31], [id#31, count(distinct name#26) AS count(distinct name#26)#33L]. Attribute(s) with the same name appear in the operation: name. Please check if the right attribute(s) are used.; Aggregate [name#26, id#25], [id#25] +- Join Inner, (id#25 = id#31) :- SubqueryAlias l1 : +- SubqueryAlias spark_catalog.default.v1 : +- View (`default`.`v1`, [id#25,name#26]) : +- Project [cast(id#20 as int) AS id#25, cast(name#21 as string) AS name#26] : +- Project [id#20, name#21] : +- SubqueryAlias spark_catalog.default.t : +- Relation default.t[id#20,name#21] parquet +- SubqueryAlias l2 +- Project [id#31] +- Filter (count(distinct name#26)#33L > cast(1 as bigint)) +- !Aggregate [id#31], [id#31, count(distinct name#26) AS count(distinct name#26)#33L] +- SubqueryAlias spark_catalog.default.v1 +- View (`default`.`v1`, [id#31,name#32]) +- Project [cast(id#27 as int) AS id#31, cast(name#28 as string) AS name#32] +- Project [id#27, name#28] +- SubqueryAlias spark_catalog.default.t +- Relation default.t[id#27,name#28] parquet ``` Spark will consider the two views to be duplicates, which will cause the query to fail. Fix bug when using join in duplicate views. Yes. When we join with duplicate view, the query would be successful. DeduplicateRelations should only kick in if the plan's children are all resolved and valid. Add new UT Closes apache#35684 from chenzhx/SPARK-37932. Authored-by: chenzhx <chen@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit a633f77) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…escribe() when ANSI mode is on ### What changes were proposed in this pull request? When executing `df.summary()` or `df.describe()`, Spark SQL converts String columns as Double for the percentiles/mean/stddev stats. ``` scala> val person2: DataFrame = Seq( | ("Bob", 16, 176), | ("Alice", 32, 164), | ("David", 60, 192), | ("Amy", 24, 180)).toDF("name", "age", "height") scala> person2.summary().show() +-------+-----+------------------+------------------+ |summary| name| age| height| +-------+-----+------------------+------------------+ | count| 4| 4| 4| | mean| null| 33.0| 178.0| | stddev| null|19.148542155126762|11.547005383792515| | min|Alice| 16| 164| | 25%| null| 16| 164| | 50%| null| 24| 176| | 75%| null| 32| 180| | max|David| 60| 192| +-------+-----+------------------+------------------+ ``` This can cause runtime errors with ANSI mode on. ``` org.apache.spark.SparkNumberFormatException: invalid input syntax for type numeric: Bob ``` This PR is to fix it by using `TryCast` for String columns. ### Why are the changes needed? For better adoption of the ANSI mode. Since both APIs are for getting a quick summary of the Dataframe, I suggest using `TryCast` for the problematic stats so that both APIs still work under ANSI mode. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? UT Closes apache#35699 from gengliangwang/fixSummary. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 80f25ad) Signed-off-by: Gengliang Wang <gengliang@apache.org>

…d security vulnerabilities" This reverts commit 286891b.

…ver` suffix to drivers during IT ### What changes were proposed in this pull request? There are two small proposals: 1) prefix the name of the temporary k8s namespaces with `"spark-"` so that the output of `kubectl get ns" is more clear. 2) unify the name of the driver pod in non-test and IT tests to always use `-driver` as a suffix. ### Why are the changes needed? At the moment the name of the temporary namespace is just UUID without the `-`s. When one reads the result of `kubectl get ns` it is a bit cryptic to see UUIDs. The names of the driver pods in ITs are not telling me that they are Drivers. In non-test (i.e. production) the driver pod names are suffixed with `-driver`. I propose the same for IT tests. Executor pods always use `-exec-` in their pod names, both in non-test and ITs. ### Does this PR introduce _any_ user-facing change? Yes! Developers who debug IT tests will see more clear names now. ### How was this patch tested? Manually with `kubectl get ns --watch` and `kubectl get po --watch`. Closes apache#35711 from martin-g/k8s-test-names-improvement. Authored-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4d4c044) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

This PR aims to support K8S integration test in SBT. Currently, SBT only support `minikube` in a hard-coded way. No. Manually, because this is an integration test. Closes apache#35327 from williamhyun/sbt_k8s. Authored-by: William Hyun <william@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 69c213d) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This PR aims to support K8s namespace parameter in SBT K8s integration test. ### Why are the changes needed? - This allows the users to set the test namespace name - When there is no given namespace, it will generate a random namespace and use it like `Maven` test ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually using the following command ``` build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=minikube -Dspark.kubernetes.test.deployMode=docker-for-desktop -Dspark.kubernetes.test.namespace=spark-it-test "kubernetes-integration-tests/test" ``` Closes apache#35364 from williamhyun/sbtnamespace. Authored-by: William Hyun <william@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 71c34b4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This PR aims to support K8s `imageTag` parameter in SBT K8s integration test. ### Why are the changes needed? To make maven and SBT consistent. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually. Closes apache#35365 from williamhyun/imagetag. Authored-by: William Hyun <william@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 419d173) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…o support all K8s test backends ### What changes were proposed in this pull request? This PR aims to add `IntegrationTestBackend.describePods` to support all K8s test backends ### Why are the changes needed? Currently the docker based K8s tests cannot get the pod information when it fails. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually. Closes apache#35344 from williamhyun/describePOD. Authored-by: William Hyun <william@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4f75577) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? SPARK-31007 introduce an auxiliary statistics to speed up computation in KMeasn. However, it needs a array of size `k * (k + 1) / 2`, which may cause overflow or OOM when k is too large. So we should skip this optimization in this case. ### Why are the changes needed? avoid overflow or OOM when k is too large (like 50,000) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing testsuites Closes apache#35457 from zhengruifeng/kmean_k_limit. Authored-by: Ruifeng Zheng <ruifengz@foxmail.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> (cherry picked from commit ad5427e) Signed-off-by: huaxingao <huaxin_gao@apple.com>

…le-datasource ### What changes were proposed in this pull request? Add more examples to sql-ref-syntax-ddl-create-table-datasource: 1. Create partitioned and bucketed table through CTAS. 2. Create bucketed table through CTAS and CTE ### Why are the changes needed? Improve doc. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes apache#35712 from wangyum/sql-ref-syntax-ddl-create-table-datasource. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> (cherry picked from commit 829d7fb) Signed-off-by: huaxingao <huaxin_gao@apple.com>

…-desktop` for Docker K8S IT deployMode and context name ### What changes were proposed in this pull request? Change `docker-for-desktop` to `docker-desktop`. ### Why are the changes needed? The context name of the kubernetes on docker for desktop should be `docker-desktop` rather than `docker-for-desktop` ``` $ k config current-context docker-desktop ``` According to the [comments](docker/for-win#5089 (comment)), since docker desktop v2.4 (current is v4.5.1), `docker` are using use a alias `docker-for-desktop` to link `docker-desktop` cluster for legacy. See also here: apache#35557 (comment) . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - build/sbt -Dspark.kubernetes.test.deployMode=docker-for-desktop -Pvolcano -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=minikube,r "kubernetes-integration-tests/test" - build/sbt -Dspark.kubernetes.test.deployMode=docker-desktop -Pvolcano -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=minikube,r "kubernetes-integration-tests/test" Closes apache#35595 from Yikun/SPARK-38272. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit ceb32c9) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…or small index files ### What changes were proposed in this pull request? Increasing the shuffle index weight with a constant number to avoid underestimating retained memory size caused by the bookkeeping objects: the `java.io.File` (depending on the path ~ 960 bytes) object and the `ShuffleIndexInformation` object (~180 bytes). ### Why are the changes needed? Underestimating cache entry size easily can cause OOM in the Yarn NodeManager. In the following analyses of a prod issue (HPROF file) we can see the leak suspect Guava's `LocalCache$Segment` objects: <img width="943" alt="Screenshot 2022-02-17 at 18 55 40" src="https://user-images.githubusercontent.com/2017933/154541995-44014212-2046-41d6-ba7f-99369ca7d739.png"> Going further we can see a `ShuffleIndexInformation` for a small index file (16 bytes) but the retained heap memory is 1192 bytes: <img width="1351" alt="image" src="https://user-images.githubusercontent.com/2017933/154645212-e0318d0f-cefa-4ae3-8a3b-97d2b506757d.png"> Finally we can see this is very common within this heap dump (using MAT's Object Query Language): <img width="1418" alt="image" src="https://user-images.githubusercontent.com/2017933/154547678-44c8af34-1765-4e14-b71a-dc03d1a304aa.png"> I have even exported the data to a CSV and done some calculations with `awk`: ``` $ tail -n+2 export.csv | awk -F, 'BEGIN { numUnderEstimated=0; } { sumOldSize += $1; corrected=$1 + 1176; sumCorrectedSize += corrected; sumRetainedMem += $2; if (corrected < $2) numUnderEstimated+=1; } END { print "sum old size: " sumOldSize / 1024 / 1024 " MB, sum corrected size: " sumCorrectedSize / 1024 / 1024 " MB, sum retained memory:" sumRetainedMem / 1024 / 1024 " MB, num under estimated: " numUnderEstimated }' ``` It gives the followings: ``` sum old size: 76.8785 MB, sum corrected size: 1066.93 MB, sum retained memory:1064.47 MB, num under estimated: 0 ``` So using the old calculation we were at 7.6.8 MB way under the default cache limit (100 MB). Using the correction (applying 1176 as increment to the size) we are at 1066.93 MB (~1GB) which is close to the real retained sum heap: 1064.47 MB (~1GB) and there is no entry which was underestimated. But we can go further and get rid of `java.io.File` completely and store the `ShuffleIndexInformation` for the file path. This way not only the cache size estimate is improved but the its size is decreased as well. Here the path size is not counted into the cache size as that string is interned. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? With the calculations above. Closes apache#35714 from attilapiros/SPARK-33206-3.2. Authored-by: attilapiros <piros.attila.zsolt@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…ask.cpus by default for spark executor JVM processes Signed-off-by: Weichen Xu <weichen.xudatabricks.com> ### What changes were proposed in this pull request? Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes. ### Why are the changes needed? This is for limiting the thread number for OpenBLAS routine to the number of cores assigned to this executor because some spark ML algorithms calls OpenBlAS via netlib-java, e.g.: Spark ALS estimator training calls LAPACK API `dppsv` (internally it will call BLAS lib), if it calls OpenBLAS lib, by default OpenBLAS will try to use all CPU cores. But spark will launch multiple spark tasks on a spark worker, and each spark task might call `dppsv` API at the same time, and each call internally it will create multiple threads (threads number equals to CPU cores), this causes CPU oversubscription. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually. Closes apache#38699 from WeichenXu123/SPARK-41188. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com> (cherry picked from commit 82a41d8) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

…oYarnResource key existence ### What changes were proposed in this pull request? bugfix, a misuse of ConcurrentHashMap.contains causing map YarnAllocator.rpIdToYarnResource always updated ### Why are the changes needed? It causing duplicated log during yarn resource allocation and unnecessary object creation and gc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes apache#38790 from CavemanIV/SPARK-41254. Authored-by: John Caveman <selnteer@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit bccfe5b) Signed-off-by: Sean Owen <srowen@gmail.com>

…the lock is unlocked gracefully ### What changes were proposed in this pull request? `BlockManager#removeBlockInternal` should ensure the lock is unlocked gracefully. `removeBlockInternal` tries to call `removeBlock` in the finally block. ### Why are the changes needed? When the driver submits a job, `DAGScheduler` calls `sc.broadcast(taskBinaryBytes)`. `TorrentBroadcast#writeBlocks` may fail due to disk problems during `blockManager#putBytes`. `BlockManager#doPut` calls `BlockManager#removeBlockInternal` to clean up the block. `BlockManager#removeBlockInternal` calls `DiskStore#remove` to clean up blocks on disk. `DiskStore#remove` will try to create the directory because the directory does not exist, and an exception will be thrown at this time. `BlockInfoManager#blockInfoWrappers` block info and lock not removed. The catch block in `TorrentBroadcast#writeBlocks` will call `blockManager.removeBroadcast` to clean up the broadcast. Because the block lock in `BlockInfoManager#blockInfoWrappers` is not released, the `dag-scheduler-event-loop` thread of `DAGScheduler` will wait forever. ``` 22/11/01 18:27:48 WARN BlockManager: Putting block broadcast_0_piece0 failed due to exception java.io.IOException: XXXXX. 22/11/01 18:27:48 ERROR TorrentBroadcast: Store broadcast broadcast_0 fail, remove all pieces of the broadcast ``` ``` "dag-scheduler-event-loop" apache#54 daemon prio=5 os_prio=31 tid=0x00007fc98e3fa800 nid=0x7203 waiting on condition [0x0000700008c1e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007add3d8c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.apache.spark.storage.BlockInfoManager.$anonfun$acquireLock$1(BlockInfoManager.scala:221) at org.apache.spark.storage.BlockInfoManager.$anonfun$acquireLock$1$adapted(BlockInfoManager.scala:214) at org.apache.spark.storage.BlockInfoManager$$Lambda$3038/1307533457.apply(Unknown Source) at org.apache.spark.storage.BlockInfoWrapper.withLock(BlockInfoManager.scala:105) at org.apache.spark.storage.BlockInfoManager.acquireLock(BlockInfoManager.scala:214) at org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:293) at org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1979) at org.apache.spark.storage.BlockManager.$anonfun$removeBroadcast$3(BlockManager.scala:1970) at org.apache.spark.storage.BlockManager.$anonfun$removeBroadcast$3$adapted(BlockManager.scala:1970) at org.apache.spark.storage.BlockManager$$Lambda$3092/1241801156.apply(Unknown Source) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1970) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:179) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:99) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:38) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:78) at org.apache.spark.SparkContext.broadcastInternal(SparkContext.scala:1538) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1520) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1539) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1355) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1297) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2929) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2921) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2910) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Throw an exception before `Files.createDirectory` to simulate disk problems. DiskBlockManager#getFile ```java if (filename.contains("piece")) { throw new java.io.IOException("disk issue") } Files.createDirectory(path) ``` ``` ./bin/spark-shell ``` ```scala spark.sql("select 1").collect() ``` ``` 22/11/24 19:29:58 WARN BlockManager: Putting block broadcast_0_piece0 failed due to exception java.io.IOException: disk issue. 22/11/24 19:29:58 ERROR TorrentBroadcast: Store broadcast broadcast_0 fail, remove all pieces of the broadcast org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.io.IOException: disk issue java.io.IOException: disk issue at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:109) at org.apache.spark.storage.DiskBlockManager.containsBlock(DiskBlockManager.scala:160) at org.apache.spark.storage.DiskStore.contains(DiskStore.scala:153) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$getCurrentBlockStatus(BlockManager.scala:879) at org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1998) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1484) at org.apache.spark.storage.BlockManager$BlockStoreUpdater.save(BlockManager.scala:378) at org.apache.spark.storage.BlockManager.putBytes(BlockManager.scala:1419) at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$writeBlocks$1(TorrentBroadcast.scala:170) at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$writeBlocks$1$adapted(TorrentBroadcast.scala:164) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:99) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:38) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:78) at org.apache.spark.SparkContext.broadcastInternal(SparkContext.scala:1538) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1520) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1539) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1355) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1297) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2929) ``` Closes apache#38467 from cxzl25/SPARK-40987. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Mridul <mridul<at>gmail.com> (cherry picked from commit bbab0af) Signed-off-by: Mridul <mridulatgmail.com>

…ch On/OffHeapStorageMemory info ### What changes were proposed in this pull request? This PR aims to fix `SparkStatusTracker.getExecutorInfos` to return a correct `on/offHeapStorageMemory`. ### Why are the changes needed? `SparkExecutorInfoImpl` used the following parameter order. https://github.com/apache/spark/blob/54c57fa86906f933e089a33ef25ae0c053769cc8/core/src/main/scala/org/apache/spark/StatusAPIImpl.scala#L42-L45 SPARK-20659 introduced a bug with wrong parameter order at Apache Spark 2.4.0. - https://github.com/apache/spark/pull/20546/files#diff-7daca909d33ff8e9b4938e2b4a4aaa1558fbdf4604273b9e38cce32c55e1508cR118-R121 ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Manually review. Closes apache#38843 from ylybest/master. Lead-authored-by: Lingyun Yuan <ylybest@gmail.com> Co-authored-by: ylybest <119458293+ylybest@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 388824c) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

itholic and others added 30 commits February 22, 2022 16:41

Revert "[SPARK-38297][PYTHON] Explicitly cast the return value at Dat…

2b3ece5

…aFrame.to_numpy in POS" This reverts commit a0d2be5.

Revert "[SPARK-37090][BUILD][3.2] Upgrade libthrift to 0.16.0 to avoi…

74aa894

…d security vulnerabilities" This reverts commit 286891b.

sunchao and others added 6 commits November 14, 2022 16:01

Preparing development version 3.2.4-SNAPSHOT

b9a0261

fix(spark): Switched spark base image from openjdk to eclipse temurin.

e59bba8

deepak-shivanandappa changed the base branch from master to branch-3.3 December 1, 2022 02:06

github-actions bot added BUILD CORE DEPLOY labels Dec 1, 2022

deepak-shivanandappa closed this Dec 1, 2022

github-actions bot added DOCS DSTREAM EXAMPLES GRAPHX INFRA KUBERNETES MESOS ML MLLIB PANDAS API ON SPARK PYTHON R SPARK SHELL SQL STRUCTURED STREAMING WEB UI WINDOWS YARN labels Dec 1, 2022

deepak-shivanandappa deleted the branch-3.2-switch-jdk branch December 1, 2022 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch 3.2 switch jdk#38855

Branch 3.2 switch jdk#38855
deepak-shivanandappa wants to merge 945 commits intoapache:branch-3.3from
deepak-shivanandappa:branch-3.2-switch-jdk

deepak-shivanandappa commented Dec 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

deepak-shivanandappa commented Dec 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants