{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":215379814,"defaultBranch":"master","name":"spark","ownerLogin":"a0x8o","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2019-10-15T19:26:36.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/22206500?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1688767690.0","currentOid":""},"activityList":{"items":[{"before":"3867fb0aa65ff3610ebf10fc4508e9008ba737c8","after":"becea4be12acb78919cc4957c1d58ba50fed713d","ref":"refs/heads/master","pushedAt":"2024-07-11T17:05:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[MINOR][SQL][TESTS] Remove a duplicate test case in `CSVExprUtilsSuite`\n\n### What changes were proposed in this pull request?\n\nThis PR aims to remove a duplicate test case in `CSVExprUtilsSuite`.\n\n### Why are the changes needed?\n\nClean duplicate code.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47298 from wayneguow/csv_suite.\n\nAuthored-by: Wei Guo <guow93@gmail.com>\nSigned-off-by: Dongjoon Hyun <dhyun@apple.com>","shortMessageHtmlLink":"[MINOR][SQL][TESTS] Remove a duplicate test case in <code>CSVExprUtilsSuite</code>"}},{"before":"457796c46d126edc1fab8d0df6f481f1b65c0075","after":"3867fb0aa65ff3610ebf10fc4508e9008ba737c8","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-07-11T17:05:11.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48791][CORE] Fix perf regression caused by the accumulators registration overhead using CopyOnWriteArrayList\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to use the `ArrayBuffer` together with the read/write lock rather than `CopyOnWriteArrayList` for `TaskMetrics._externalAccums`.\n\n### Why are the changes needed?\n\nFix the perf regression that caused by the accumulators registration overhead using `CopyOnWriteArrayList`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nManually tested.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47197 from Ngone51/SPARK-48791.\n\nAuthored-by: Yi Wu <yi.wu@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[SPARK-48791][CORE] Fix perf regression caused by the accumulators re…"}},{"before":"aa3208d4243c43c1cb1de20c93f639c312d839a5","after":"3867fb0aa65ff3610ebf10fc4508e9008ba737c8","ref":"refs/heads/master","pushedAt":"2024-07-10T14:10:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48791][CORE] Fix perf regression caused by the accumulators registration overhead using CopyOnWriteArrayList\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to use the `ArrayBuffer` together with the read/write lock rather than `CopyOnWriteArrayList` for `TaskMetrics._externalAccums`.\n\n### Why are the changes needed?\n\nFix the perf regression that caused by the accumulators registration overhead using `CopyOnWriteArrayList`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nManually tested.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47197 from Ngone51/SPARK-48791.\n\nAuthored-by: Yi Wu <yi.wu@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[SPARK-48791][CORE] Fix perf regression caused by the accumulators re…"}},{"before":"457796c46d126edc1fab8d0df6f481f1b65c0075","after":"aa3208d4243c43c1cb1de20c93f639c312d839a5","ref":"refs/heads/master","pushedAt":"2024-07-09T18:40:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48716] Add jobGroupId to SparkListenerSQLExecutionStart\n\n### What changes were proposed in this pull request?\nAdd jobGroupId to SparkListenerSQLExecutionStart\n\n### Why are the changes needed?\nJobGroupId can be used to combine jobs within the same group. This is going to be useful in the listener so it makes the job grouping easy to do\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nUnit Test\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47092 from gjxdxh/gjxdxh/SPARK-48716.\n\nAuthored-by: Lingkai Kong <lingkai.kong@databricks.com>\nSigned-off-by: Josh Rosen <joshrosen@databricks.com>","shortMessageHtmlLink":"[SPARK-48716] Add jobGroupId to SparkListenerSQLExecutionStart"}},{"before":"42a4dd2dff81a758c72e8a782a147862a10ca303","after":"457796c46d126edc1fab8d0df6f481f1b65c0075","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-07-09T18:39:59.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-46625] CTE with Identifier clause as reference\n\n### What changes were proposed in this pull request?\nDECLARE agg = 'max';\nDECLARE col = 'c1';\nDECLARE tab = 'T';\n\nWITH S(c1, c2) AS (VALUES(1, 2), (2, 3)),\n      T(c1, c2) AS (VALUES ('a', 'b'), ('c', 'd'))\nSELECT IDENTIFIER(agg)(IDENTIFIER(col)) FROM IDENTIFIER(tab);\n\n-- OR\n\nWITH S(c1, c2) AS (VALUES(1, 2), (2, 3)),\n      T(c1, c2) AS (VALUES ('a', 'b'), ('c', 'd'))\nSELECT IDENTIFIER('max')(IDENTIFIER('c1')) FROM IDENTIFIER('T');\n\nCurrently we don't support Identifier clause as part of CTE reference.\n\n### Why are the changes needed?\nAdding support for Identifier clause as part of CTE reference for both constant string expressions and session variables.\n\n### Does this PR introduce _any_ user-facing change?\nIt contains user facing changes in sense that identifier clause as cte reference will now be supported.\n\n### How was this patch tested?\nAdded tests as part of this PR.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47180 from nebojsa-db/SPARK-46625.\n\nAuthored-by: Nebojsa Savic <nebojsa.savic@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[SPARK-46625] CTE with Identifier clause as reference"}},{"before":"cf28ca623dd79489ffb0a0dc2579a9969ec48e84","after":"457796c46d126edc1fab8d0df6f481f1b65c0075","ref":"refs/heads/master","pushedAt":"2024-07-09T18:12:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-46625] CTE with Identifier clause as reference\n\n### What changes were proposed in this pull request?\nDECLARE agg = 'max';\nDECLARE col = 'c1';\nDECLARE tab = 'T';\n\nWITH S(c1, c2) AS (VALUES(1, 2), (2, 3)),\n      T(c1, c2) AS (VALUES ('a', 'b'), ('c', 'd'))\nSELECT IDENTIFIER(agg)(IDENTIFIER(col)) FROM IDENTIFIER(tab);\n\n-- OR\n\nWITH S(c1, c2) AS (VALUES(1, 2), (2, 3)),\n      T(c1, c2) AS (VALUES ('a', 'b'), ('c', 'd'))\nSELECT IDENTIFIER('max')(IDENTIFIER('c1')) FROM IDENTIFIER('T');\n\nCurrently we don't support Identifier clause as part of CTE reference.\n\n### Why are the changes needed?\nAdding support for Identifier clause as part of CTE reference for both constant string expressions and session variables.\n\n### Does this PR introduce _any_ user-facing change?\nIt contains user facing changes in sense that identifier clause as cte reference will now be supported.\n\n### How was this patch tested?\nAdded tests as part of this PR.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47180 from nebojsa-db/SPARK-46625.\n\nAuthored-by: Nebojsa Savic <nebojsa.savic@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[SPARK-46625] CTE with Identifier clause as reference"}},{"before":"42a4dd2dff81a758c72e8a782a147862a10ca303","after":"cf28ca623dd79489ffb0a0dc2579a9969ec48e84","ref":"refs/heads/master","pushedAt":"2024-07-02T16:00:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48764][PYTHON] Filtering out IPython-related frames from user stack\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to fix internal function `_capture_call_site` for filtering out IPython-related frames from user stack.\n\n### Why are the changes needed?\n\nIPython-related frames are unnecessarily polluting the user stacks so it harms debuggability of IPython Notebook.\n\nFor example, there are some garbage stacks recorded from `IPython` and `ipykernel` such as:\n\n- `...lib/python3.9/site-packages/IPython/core/interactiveshell.py...`\n- `...lib/python3.9/site-packages/ipykernel/zmqshell.py...`\n\n### Does this PR introduce _any_ user-facing change?\n\nNo API changes, but the user stack from IPython will be cleaned up as below:\n\n**Before**\n<img width=\"457\" alt=\"Screenshot 2024-07-01 at 3 26 45 PM\" src=\"https://github.com/apache/spark/assets/44108233/67ba8b49-f52f-4a7d-8031-b7272fceb581\">\n\n**After**\n<img width=\"456\" alt=\"Screenshot 2024-07-01 at 3 25 07 PM\" src=\"https://github.com/apache/spark/assets/44108233/950035cd-4397-41a5-9664-7040b84ebd6f\">\n\n### How was this patch tested?\n\nThe existing CI should pass\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47159 from itholic/ipython_followup.\n\nAuthored-by: Haejoon Lee <haejoon.lee@databricks.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[SPARK-48764][PYTHON] Filtering out IPython-related frames from user …"}},{"before":"157bebb9ff3dbc4e5a3860de22966e7aaae96c7a","after":"42a4dd2dff81a758c72e8a782a147862a10ca303","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-06-27T16:53:42.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48735][SQL] Performance Improvement for BIN function\n\n### What changes were proposed in this pull request?\n\nThis PR implemented a long-to-binary form UTF8String method directly to improve the performance of the BIN function. It omits the procedure of encoding/decoding and array copying.\n\n### Why are the changes needed?\n\nperformance improvement\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\n- new unit tests\n- offline benchmarking ~2x\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #47119 from yaooqinn/SPARK-48735.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[SPARK-48735][SQL] Performance Improvement for BIN function"}},{"before":"63b67ba1644755b907d860210e7f1b54b983c645","after":"42a4dd2dff81a758c72e8a782a147862a10ca303","ref":"refs/heads/master","pushedAt":"2024-06-27T16:31:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48735][SQL] Performance Improvement for BIN function\n\n### What changes were proposed in this pull request?\n\nThis PR implemented a long-to-binary form UTF8String method directly to improve the performance of the BIN function. It omits the procedure of encoding/decoding and array copying.\n\n### Why are the changes needed?\n\nperformance improvement\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\n- new unit tests\n- offline benchmarking ~2x\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #47119 from yaooqinn/SPARK-48735.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[SPARK-48735][SQL] Performance Improvement for BIN function"}},{"before":"157bebb9ff3dbc4e5a3860de22966e7aaae96c7a","after":"63b67ba1644755b907d860210e7f1b54b983c645","ref":"refs/heads/master","pushedAt":"2024-06-21T15:35:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48675][SQL] Fix cache table with collated column\n\n### What changes were proposed in this pull request?\n\nFollowing sequence of queries produces the error:\n```\n> cache lazy table t as select col from values ('a' collate utf8_lcase) as (col);\n> select col from t;\norg.apache.spark.SparkException: not support type: org.apache.spark.sql.types.StringType1.\n        at org.apache.spark.sql.errors.QueryExecutionErrors$.notSupportTypeError(QueryExecutionErrors.scala:1069)\n        at org.apache.spark.sql.execution.columnar.ColumnBuilder$.apply(ColumnBuilder.scala:200)\n        at org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.$anonfun$next$1(InMemoryRelation.scala:85)\n        at scala.collection.immutable.List.map(List.scala:247)\n        at scala.collection.immutable.List.map(List.scala:79)\n        at org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.next(InMemoryRelation.scala:84)\n        at org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.next(InMemoryRelation.scala:82)\n        at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.next(InMemoryRelation.scala:296)\n        at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.next(InMemoryRelation.scala:293)\n...\n```\nThis is also the problem on non-lazy cached tables.\n\nIt turns out that the problem happens to occur during the execution of `InMemoryTableScanExec` where we need to update `ColumnAccessor`, `ColumnBuilder`, `ColumnType` and `ColumnStats`.\n\n### Why are the changes needed?\n\nTo fix the described error.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, the described sequence of queries should produce valid results after these changes are applied instead of throwing error.\n\n### How was this patch tested?\n\nAdded checks to columnar suites for the mentioned classes and integration test to `CollationSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47045 from nikolamand-db/SPARK-48675.\n\nAuthored-by: Nikola Mandic <nikola.mandic@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[SPARK-48675][SQL] Fix cache table with collated column"}},{"before":"2680353d22f703370770aa84f0edfcc351661b0b","after":"157bebb9ff3dbc4e5a3860de22966e7aaae96c7a","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-06-17T15:10:16.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type widen capability\n\n### What changes were proposed in this pull request?\n\nSPARK-40876 enhanced the built-in Parquet data source reader to support widening type promotions, we should update the test to reflect this capability.\n\n### Why are the changes needed?\n\nUpdate test to cover new feature.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\n```\nbuild/sbt \"sql/testOnly *ParquetReadSchemaSuite\"\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46959 from pan3793/SPARK-48603.\n\nAuthored-by: Cheng Pan <chengpan@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type wide…"}},{"before":"34d7cae2441aa0e8bf7bc856fb19d807820aec6f","after":"157bebb9ff3dbc4e5a3860de22966e7aaae96c7a","ref":"refs/heads/master","pushedAt":"2024-06-17T14:09:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type widen capability\n\n### What changes were proposed in this pull request?\n\nSPARK-40876 enhanced the built-in Parquet data source reader to support widening type promotions, we should update the test to reflect this capability.\n\n### Why are the changes needed?\n\nUpdate test to cover new feature.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\n```\nbuild/sbt \"sql/testOnly *ParquetReadSchemaSuite\"\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46959 from pan3793/SPARK-48603.\n\nAuthored-by: Cheng Pan <chengpan@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type wide…"}},{"before":"2680353d22f703370770aa84f0edfcc351661b0b","after":"34d7cae2441aa0e8bf7bc856fb19d807820aec6f","ref":"refs/heads/master","pushedAt":"2024-06-17T13:26:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type widen capability\n\n### What changes were proposed in this pull request?\n\nSPARK-40876 enhanced the built-in Parquet data source reader to support widening type promotions, we should update the test to reflect this capability.\n\n### Why are the changes needed?\n\nUpdate test to cover new feature.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\n```\nbuild/sbt \"sql/testOnly *ParquetReadSchemaSuite\"\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46959 from pan3793/SPARK-48603.\n\nAuthored-by: Cheng Pan <chengpan@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type wide…"}},{"before":"4ddd7cf3c18cdbdb0e2e26844f903e8ba7f91dbf","after":"2680353d22f703370770aa84f0edfcc351661b0b","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-06-03T15:40:07.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48507][INFRA] Use Hadoop 3.3.6 winutils in `build_sparkr_window`\n\n### What changes were proposed in this pull request?\nThe pr aims to use `Hadoop 3.3.6` winutils in `build_sparkr_window`.\n\n### Why are the changes needed?\nLet's use the latest version.\nhttps://github.com/cdarlint/winutils/tree/master\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46846 from panbingkun/SPARK-48507.\n\nAuthored-by: panbingkun <panbingkun@baidu.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[SPARK-48507][INFRA] Use Hadoop 3.3.6 winutils in <code>build_sparkr_window</code>"}},{"before":"6f9c335c520950d4db57a5bcb8ad3a94dcacd13f","after":"2680353d22f703370770aa84f0edfcc351661b0b","ref":"refs/heads/master","pushedAt":"2024-06-03T15:31:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48507][INFRA] Use Hadoop 3.3.6 winutils in `build_sparkr_window`\n\n### What changes were proposed in this pull request?\nThe pr aims to use `Hadoop 3.3.6` winutils in `build_sparkr_window`.\n\n### Why are the changes needed?\nLet's use the latest version.\nhttps://github.com/cdarlint/winutils/tree/master\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46846 from panbingkun/SPARK-48507.\n\nAuthored-by: panbingkun <panbingkun@baidu.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[SPARK-48507][INFRA] Use Hadoop 3.3.6 winutils in <code>build_sparkr_window</code>"}},{"before":"4ddd7cf3c18cdbdb0e2e26844f903e8ba7f91dbf","after":"6f9c335c520950d4db57a5bcb8ad3a94dcacd13f","ref":"refs/heads/master","pushedAt":"2024-05-29T14:09:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[MINOR][PS] Fallback code clean up\n\n### What changes were proposed in this pull request?\nFallback code clean up\n\n### Why are the changes needed?\n```\n   DataFrame.to_feather\n   DataFrame.to_stata\n```\n\nthe two methods were already added as normal methods, no need to be in the fallback list\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46784 from zhengruifeng/ps_fallback_cleanup.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[MINOR][PS] Fallback code clean up"}},{"before":"4677bc3429dfa78c5b70d49ca02aa71add84e972","after":"4ddd7cf3c18cdbdb0e2e26844f903e8ba7f91dbf","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-05-29T14:09:35.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48168][SQL] Add bitwise shifting operators support\n\n### What changes were proposed in this pull request?\n\nThis PR introduces three bitwise shifting operators as aliases for existing shifting functions.\n\n### Why are the changes needed?\n\nThe bit shifting functions named in alphabet form vary from one platform to anthor. Take our shiftleft as an example,\n- Hive, shiftleft (where we copied it from)\n- MsSQL Server LEFT_SHIFT\n- MySQL, N/A\n- PostgreSQL, N/A\n- Presto,  bitwise_left_shift\n\nThe [bit shifting operators](https://en.wikipedia.org/wiki/Bitwise_operations_in_C) share a much more common and consistent way for users to port their queries.\n\nFor self-consistent with existing bit operators in Spark, `AND   &`, `OR    |`, `XOR   ^` and `NOT   ~`, we now add `<<`, `>>` and `>>>`.\n\nFor other systems that we can refer to:\n\nhttps://learn.microsoft.com/en-us/sql/t-sql/functions/left-shift-transact-sql?view=sql-server-ver16\nhttps://www.postgresql.org/docs/9.4/functions-bitstring.html\nhttps://dev.mysql.com/doc/refman/8.0/en/bit-functions.html\n### Does this PR introduce _any_ user-facing change?\n\nYes, new operators were added but no behavior change\n\n### How was this patch tested?\n\nnew tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46440 from yaooqinn/SPARK-48168.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: youxiduo <youxiduo@corp.netease.com>","shortMessageHtmlLink":"[SPARK-48168][SQL] Add bitwise shifting operators support"}},{"before":"3abb20fd64f0755903f9825df1b849e79137654c","after":"4ddd7cf3c18cdbdb0e2e26844f903e8ba7f91dbf","ref":"refs/heads/master","pushedAt":"2024-05-24T16:12:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48168][SQL] Add bitwise shifting operators support\n\n### What changes were proposed in this pull request?\n\nThis PR introduces three bitwise shifting operators as aliases for existing shifting functions.\n\n### Why are the changes needed?\n\nThe bit shifting functions named in alphabet form vary from one platform to anthor. Take our shiftleft as an example,\n- Hive, shiftleft (where we copied it from)\n- MsSQL Server LEFT_SHIFT\n- MySQL, N/A\n- PostgreSQL, N/A\n- Presto,  bitwise_left_shift\n\nThe [bit shifting operators](https://en.wikipedia.org/wiki/Bitwise_operations_in_C) share a much more common and consistent way for users to port their queries.\n\nFor self-consistent with existing bit operators in Spark, `AND   &`, `OR    |`, `XOR   ^` and `NOT   ~`, we now add `<<`, `>>` and `>>>`.\n\nFor other systems that we can refer to:\n\nhttps://learn.microsoft.com/en-us/sql/t-sql/functions/left-shift-transact-sql?view=sql-server-ver16\nhttps://www.postgresql.org/docs/9.4/functions-bitstring.html\nhttps://dev.mysql.com/doc/refman/8.0/en/bit-functions.html\n### Does this PR introduce _any_ user-facing change?\n\nYes, new operators were added but no behavior change\n\n### How was this patch tested?\n\nnew tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46440 from yaooqinn/SPARK-48168.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: youxiduo <youxiduo@corp.netease.com>","shortMessageHtmlLink":"[SPARK-48168][SQL] Add bitwise shifting operators support"}},{"before":"4677bc3429dfa78c5b70d49ca02aa71add84e972","after":"3abb20fd64f0755903f9825df1b849e79137654c","ref":"refs/heads/master","pushedAt":"2024-05-23T15:42:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[MINOR][TESTS] Add a helper function for `spark.table` in dsl\n\n### What changes were proposed in this pull request?\nAdd a helper function for `spark.table` in dsl\n\n### Why are the changes needed?\nto be used in tests\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46717 from zhengruifeng/dsl_read.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[MINOR][TESTS] Add a helper function for <code>spark.table</code> in dsl"}},{"before":"59ac4a1cb938cfac4bad7442382a92b6dedcb178","after":"4677bc3429dfa78c5b70d49ca02aa71add84e972","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-05-22T16:39:43.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseError parameter map to work with collated strings\n\n### What changes were proposed in this pull request?\nFollowing up on the introduction of AbstractMapType (https://github.com/apache/spark/pull/46458) and changes that introduce collation awareness for RaiseError expression (https://github.com/apache/spark/pull/46461), this PR should add the appropriate type casting rules for AbstractMapType.\n\n### Why are the changes needed?\nFix the CI failure for the `Support RaiseError misc expression with collation` test when ANSI is off.\n\n### Does this PR introduce _any_ user-facing change?\nYes, type casting is now allowed for map types with collated strings.\n\n### How was this patch tested?\nExtended suite `CollationSQLExpressionsANSIOffSuite` with ANSI disabled.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46661 from uros-db/fix-abstract-map.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseErro…"}},{"before":"a372a6e4265a5f048194d5f50ba115e7ae921aeb","after":"4677bc3429dfa78c5b70d49ca02aa71add84e972","ref":"refs/heads/master","pushedAt":"2024-05-22T16:32:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseError parameter map to work with collated strings\n\n### What changes were proposed in this pull request?\nFollowing up on the introduction of AbstractMapType (https://github.com/apache/spark/pull/46458) and changes that introduce collation awareness for RaiseError expression (https://github.com/apache/spark/pull/46461), this PR should add the appropriate type casting rules for AbstractMapType.\n\n### Why are the changes needed?\nFix the CI failure for the `Support RaiseError misc expression with collation` test when ANSI is off.\n\n### Does this PR introduce _any_ user-facing change?\nYes, type casting is now allowed for map types with collated strings.\n\n### How was this patch tested?\nExtended suite `CollationSQLExpressionsANSIOffSuite` with ANSI disabled.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46661 from uros-db/fix-abstract-map.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[SPARK-48364][SQL] Add AbstractMapType type casting and fix RaiseErro…"}},{"before":"59ac4a1cb938cfac4bad7442382a92b6dedcb178","after":"a372a6e4265a5f048194d5f50ba115e7ae921aeb","ref":"refs/heads/master","pushedAt":"2024-05-21T13:46:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48336][PS][CONNECT] Implement `ps.sql` in Spark Connect\n\n### What changes were proposed in this pull request?\nImplement `ps.sql` in Spark Connect\n\n### Why are the changes needed?\nfeature parity in Spark Connect\n\n### Does this PR introduce _any_ user-facing change?\nyes:\n\n```\nIn [4]: spark\nOut[4]: <pyspark.sql.connect.session.SparkSession at 0x105136390>\n\nIn [5]:     >>> ps.sql('''\n   ...:     ...   SELECT m1.a, m2.b\n   ...:     ...   FROM {table1} m1 INNER JOIN {table2} m2\n   ...:     ...   ON m1.key = m2.key\n   ...:     ...   ORDER BY m1.a, m2.b''',\n   ...:     ...   table1=ps.DataFrame({\"a\": [1,2], \"key\": [\"a\", \"b\"]}),\n   ...:     ...   table2=pd.DataFrame({\"b\": [3,4,5], \"key\": [\"a\", \"b\", \"b\"]}))\n/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1018: PandasAPIOnSparkAdviceWarning: The config 'spark.sql.ansi.enabled' is set to True. This can cause unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL.\n  warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1018: PandasAPIOnSparkAdviceWarning: The config 'spark.sql.ansi.enabled' is set to True. This can cause unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL.\n  warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n\n   a  b\n0  1  3\n1  2  4\n2  2  5\n```\n\n### How was this patch tested?\n\n1. enabled UTs\n2. also manually tested all the examples\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46658 from zhengruifeng/ps_sql.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[SPARK-48336][PS][CONNECT] Implement <code>ps.sql</code> in Spark Connect"}},{"before":"e06e909bd948e0d97af19a42d24702411b7c1a13","after":"59ac4a1cb938cfac4bad7442382a92b6dedcb178","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-05-21T13:46:21.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked implementation\n\n### What changes were proposed in this pull request?\n\nThis PR replaces AmIpFilter with a forked implementation, and removes the dependency `hadoop-yarn-server-web-proxy`\n\n### Why are the changes needed?\n\nSPARK-47118 upgraded Spark built-in Jetty from 10 to 11, and migrated from `javax.servlet` to `jakarta.servlet`, which breaks the Spark on YARN.\n\n```\nCaused by: java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter\n    at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)\n    at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)\n    at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724)\n    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)\n    at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)\n    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)\n    at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)\n    ... 38 more\n```\n\nDuring the investigation, I found a comment here https://github.com/apache/spark/pull/31642#issuecomment-786257114\n\n> Agree that in the long term we should either: 1) consider to re-implement the logic in Spark which allows us to get away from server-side dependency in Hadoop ...\n\nThis should be a simple and clean way to address the exact issue, then we don't need to wait for Hadoop `jakarta.servlet` migration, and it also strips a Hadoop dependency.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, this recovers the bootstrap of the Spark application on YARN mode, keeping the same behavior with Spark 3.5 and earlier versions.\n\n### How was this patch tested?\n\nUTs are added. (refer to `org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter`)\n\nI tested it in a YARN cluster.\n\nSpark successfully started.\n```\nroothadoop-master1:/opt/spark-SPARK-48238# JAVA_HOME=/opt/openjdk-17 bin/spark-sql --conf spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17 --conf spark.executorEnv.JAVA_HOME=/opt/openjdk-17\nWARNING: Using incubator modules: jdk.incubator.vector\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n2024-05-18 04:11:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n2024-05-18 04:11:44 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive} is set, falling back to uploading libraries under SPARK_HOME.\nSpark Web UI available at http://hadoop-master1.orb.local:4040\nSpark master: yarn, Application Id: application_1716005503866_0001\nspark-sql (default)> select version();\n4.0.0 4ddc2303c7cbabee12a3de9f674aaacad3f5eb01\nTime taken: 1.707 seconds, Fetched 1 row(s)\nspark-sql (default)>\n```\n\nWhen access `http://hadoop-master1.orb.local:4040`, it redirects to `http://hadoop-master1.orb.local:8088/proxy/redirect/application_1716005503866_0001/`, and the UI looks correct.\n\n<img width=\"1474\" alt=\"image\" src=\"https://github.com/apache/spark/assets/26535726/8500fc83-48c5-4603-8d05-37855f0308ae\">\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46611 from pan3793/SPARK-48238.\n\nAuthored-by: Cheng Pan <chengpan@apache.org>\nSigned-off-by: yangjie01 <yangjie01@baidu.com>","shortMessageHtmlLink":"[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked impl…"}},{"before":"5a735a268be87cd7f9257e20d0f68900b7a69ce7","after":"59ac4a1cb938cfac4bad7442382a92b6dedcb178","ref":"refs/heads/master","pushedAt":"2024-05-20T16:35:42.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked implementation\n\n### What changes were proposed in this pull request?\n\nThis PR replaces AmIpFilter with a forked implementation, and removes the dependency `hadoop-yarn-server-web-proxy`\n\n### Why are the changes needed?\n\nSPARK-47118 upgraded Spark built-in Jetty from 10 to 11, and migrated from `javax.servlet` to `jakarta.servlet`, which breaks the Spark on YARN.\n\n```\nCaused by: java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter\n    at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)\n    at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)\n    at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724)\n    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)\n    at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)\n    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)\n    at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)\n    ... 38 more\n```\n\nDuring the investigation, I found a comment here https://github.com/apache/spark/pull/31642#issuecomment-786257114\n\n> Agree that in the long term we should either: 1) consider to re-implement the logic in Spark which allows us to get away from server-side dependency in Hadoop ...\n\nThis should be a simple and clean way to address the exact issue, then we don't need to wait for Hadoop `jakarta.servlet` migration, and it also strips a Hadoop dependency.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, this recovers the bootstrap of the Spark application on YARN mode, keeping the same behavior with Spark 3.5 and earlier versions.\n\n### How was this patch tested?\n\nUTs are added. (refer to `org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter`)\n\nI tested it in a YARN cluster.\n\nSpark successfully started.\n```\nroothadoop-master1:/opt/spark-SPARK-48238# JAVA_HOME=/opt/openjdk-17 bin/spark-sql --conf spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17 --conf spark.executorEnv.JAVA_HOME=/opt/openjdk-17\nWARNING: Using incubator modules: jdk.incubator.vector\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n2024-05-18 04:11:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n2024-05-18 04:11:44 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive} is set, falling back to uploading libraries under SPARK_HOME.\nSpark Web UI available at http://hadoop-master1.orb.local:4040\nSpark master: yarn, Application Id: application_1716005503866_0001\nspark-sql (default)> select version();\n4.0.0 4ddc2303c7cbabee12a3de9f674aaacad3f5eb01\nTime taken: 1.707 seconds, Fetched 1 row(s)\nspark-sql (default)>\n```\n\nWhen access `http://hadoop-master1.orb.local:4040`, it redirects to `http://hadoop-master1.orb.local:8088/proxy/redirect/application_1716005503866_0001/`, and the UI looks correct.\n\n<img width=\"1474\" alt=\"image\" src=\"https://github.com/apache/spark/assets/26535726/8500fc83-48c5-4603-8d05-37855f0308ae\">\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46611 from pan3793/SPARK-48238.\n\nAuthored-by: Cheng Pan <chengpan@apache.org>\nSigned-off-by: yangjie01 <yangjie01@baidu.com>","shortMessageHtmlLink":"[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked impl…"}},{"before":"e06e909bd948e0d97af19a42d24702411b7c1a13","after":"5a735a268be87cd7f9257e20d0f68900b7a69ce7","ref":"refs/heads/master","pushedAt":"2024-05-20T13:17:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked implementation\n\n### What changes were proposed in this pull request?\n\nThis PR replaces AmIpFilter with a forked implementation, and removes the dependency `hadoop-yarn-server-web-proxy`\n\n### Why are the changes needed?\n\nSPARK-47118 upgraded Spark built-in Jetty from 10 to 11, and migrated from `javax.servlet` to `jakarta.servlet`, which breaks the Spark on YARN.\n\n```\nCaused by: java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter\n    at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)\n    at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)\n    at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724)\n    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)\n    at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)\n    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)\n    at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)\n    ... 38 more\n```\n\nDuring the investigation, I found a comment here https://github.com/apache/spark/pull/31642#issuecomment-786257114\n\n> Agree that in the long term we should either: 1) consider to re-implement the logic in Spark which allows us to get away from server-side dependency in Hadoop ...\n\nThis should be a simple and clean way to address the exact issue, then we don't need to wait for Hadoop `jakarta.servlet` migration, and it also strips a Hadoop dependency.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, this recovers the bootstrap of the Spark application on YARN mode, keeping the same behavior with Spark 3.5 and earlier versions.\n\n### How was this patch tested?\n\nUTs are added. (refer to `org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter`)\n\nI tested it in a YARN cluster.\n\nSpark successfully started.\n```\nroothadoop-master1:/opt/spark-SPARK-48238# JAVA_HOME=/opt/openjdk-17 bin/spark-sql --conf spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17 --conf spark.executorEnv.JAVA_HOME=/opt/openjdk-17\nWARNING: Using incubator modules: jdk.incubator.vector\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n2024-05-18 04:11:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n2024-05-18 04:11:44 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive} is set, falling back to uploading libraries under SPARK_HOME.\nSpark Web UI available at http://hadoop-master1.orb.local:4040\nSpark master: yarn, Application Id: application_1716005503866_0001\nspark-sql (default)> select version();\n4.0.0 4ddc2303c7cbabee12a3de9f674aaacad3f5eb01\nTime taken: 1.707 seconds, Fetched 1 row(s)\nspark-sql (default)>\n```\n\nWhen access `http://hadoop-master1.orb.local:4040`, it redirects to `http://hadoop-master1.orb.local:8088/proxy/redirect/application_1716005503866_0001/`, and the UI looks correct.\n\n<img width=\"1474\" alt=\"image\" src=\"https://github.com/apache/spark/assets/26535726/8500fc83-48c5-4603-8d05-37855f0308ae\">\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46611 from pan3793/SPARK-48238.\n\nAuthored-by: Cheng Pan <chengpan@apache.org>\nSigned-off-by: yangjie01 <yangjie01@baidu.com>","shortMessageHtmlLink":"[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked impl…"}},{"before":"797b545c26af13768a5a08eb1cc253a0a4b9b14d","after":"e06e909bd948e0d97af19a42d24702411b7c1a13","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-05-16T19:53:17.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLoggerSuite*\n\n### What changes were proposed in this pull request?\nThe pr is follow up https://github.com/apache/spark/pull/46600\n\n to . Similarly, to maintain consistency,  should be renamed to\n\n### Why are the changes needed?\nAfter `org.apache.spark.internal.Logger` is renamed to `org.apache.spark.internal.SparkLogger` and `org.apache.spark.internal.LoggerFactory` is renamed to `org.apache.spark.internal.SparkLoggerFactory.`, the related UT's names should also be `renamed`, so that developers can easily locate the related UT.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46615 from panbingkun/SPARK-48291_follow_up.\n\nAuthored-by: panbingkun <panbingkun@baidu.com>\nSigned-off-by: Gengliang Wang <gengliang@apache.org>","shortMessageHtmlLink":"[SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLogg…"}},{"before":"c51d0b47c242fe481646e1047fb03d63d3614457","after":"e06e909bd948e0d97af19a42d24702411b7c1a13","ref":"refs/heads/master","pushedAt":"2024-05-16T19:37:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLoggerSuite*\n\n### What changes were proposed in this pull request?\nThe pr is follow up https://github.com/apache/spark/pull/46600\n\n to . Similarly, to maintain consistency,  should be renamed to\n\n### Why are the changes needed?\nAfter `org.apache.spark.internal.Logger` is renamed to `org.apache.spark.internal.SparkLogger` and `org.apache.spark.internal.LoggerFactory` is renamed to `org.apache.spark.internal.SparkLoggerFactory.`, the related UT's names should also be `renamed`, so that developers can easily locate the related UT.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46615 from panbingkun/SPARK-48291_follow_up.\n\nAuthored-by: panbingkun <panbingkun@baidu.com>\nSigned-off-by: Gengliang Wang <gengliang@apache.org>","shortMessageHtmlLink":"[SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLogg…"}},{"before":"797b545c26af13768a5a08eb1cc253a0a4b9b14d","after":"c51d0b47c242fe481646e1047fb03d63d3614457","ref":"refs/heads/master","pushedAt":"2024-05-15T20:53:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"[SPARK-48256][BUILD] Add a rule to check file headers for the java side, and fix inconsistent files\n\n### What changes were proposed in this pull request?\nThe pr aims to add a `rule` to `check file headers` for `the java side`, and `fix` inconsistent files.\n\n### Why are the changes needed?\nOn the scala side, we have a rule to `check file header`, but we have no corresponding rules on the `java` side to do the same thing.\n\n### Does this PR introduce _any_ user-facing change?\nYes, only for spark developers.\n\n### How was this patch tested?\n- Manually test\n```\nsh dev/lint-java\nUsing `mvn` from path: /Users/panbingkun/Developer/infra/maven/maven/bin/mvn\n-e Checkstyle checks passed.\n```\n- Pass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46557 from panbingkun/java_header_check.\n\nAuthored-by: panbingkun <panbingkun@baidu.com>\nSigned-off-by: Dongjoon Hyun <dhyun@apple.com>","shortMessageHtmlLink":"[SPARK-48256][BUILD] Add a rule to check file headers for the java si…"}},{"before":"34cb59d34fb81e98134457b7f359c9ff51be49b8","after":"797b545c26af13768a5a08eb1cc253a0a4b9b14d","ref":"refs/heads/0x1CAB5A3","pushedAt":"2024-05-15T20:53:09.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-48250][PYTHON][CONNECT][TESTS] Enable array inference tests at test_parity_types.py\"\n\nThis reverts commit 13b0d1aab36740293814ce54e38cb4d86f8b762d.","shortMessageHtmlLink":"Revert \"[SPARK-48250][PYTHON][CONNECT][TESTS] Enable array inference …"}},{"before":"9b08059a293253043422d1f7e2d3fddd819ba31e","after":"797b545c26af13768a5a08eb1cc253a0a4b9b14d","ref":"refs/heads/master","pushedAt":"2024-05-13T16:13:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a0x8o","name":"Alex","path":"/a0x8o","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22206500?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-48250][PYTHON][CONNECT][TESTS] Enable array inference tests at test_parity_types.py\"\n\nThis reverts commit 13b0d1aab36740293814ce54e38cb4d86f8b762d.","shortMessageHtmlLink":"Revert \"[SPARK-48250][PYTHON][CONNECT][TESTS] Enable array inference …"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEfROwowA","startCursor":null,"endCursor":null}},"title":"Activity · a0x8o/spark"}