-
Notifications
You must be signed in to change notification settings - Fork 28.7k
Insights: apache/spark
Overview
-
0 Active issues
-
- 0 Merged pull requests
- 88 Open pull requests
- 0 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
88 Pull requests opened by 58 people
-
[SPARK-52923][CORE] Allow ShuffleManager to control push merge during shuffle registration
#51629 opened
Jul 23, 2025 -
[SPARK-52937][SDP] Sinks
#51644 opened
Jul 24, 2025 -
[SPARK-52930][CONNECT] Use DataType.Array/Map for Array/Map Literals
#51653 opened
Jul 24, 2025 -
[SPARK-52953][SQL] Incorrect parameter order in some ExpressionEvalHelper.checkResult() method invocations
#51664 opened
Jul 25, 2025 -
[SPARK-52844][PYTHON][TESTS] Update black to 24.3.0
#51687 opened
Jul 28, 2025 -
[SPARK-52978][SQL] Make FileFormatWriter customizable via SQL configuration
#51690 opened
Jul 28, 2025 -
[SPARK-52991][SQL] Implement MERGE INTO with SCHEMA EVOLUTION for V2 Data Source
#51698 opened
Jul 29, 2025 -
[SPARK-53019][SQL] Fix job attempt path conflicts in o.a.hadoop..FileOutputCommitter
#51724 opened
Jul 30, 2025 -
[SPARK-53022][TESTS] Add MemoryConsumerBenchmark
#51728 opened
Jul 30, 2025 -
[SPARK-53038][SQL][HIVE] Call initialize only once per GenericUDF instance
#51743 opened
Jul 31, 2025 -
[SPARK-52844][PYTHON] Update protobuf to 5.29.5
#51747 opened
Jul 31, 2025 -
[SPARK-53044] Change Declarative Pipelines import alias convention from "sdp" to "dp"
#51752 opened
Jul 31, 2025 -
Wip naming sources
#51756 opened
Jul 31, 2025 -
[SPARK-42360][SQL] Rule to convert Left Outer Join with suitable filter to Left Anti Join
#51762 opened
Aug 1, 2025 -
[SPARK-53060] Test to showcase Aggregate followed by ORDER BY doesn't preserve orders
#51768 opened
Aug 1, 2025 -
[SPARK-52844][PYTHON] Update mlflow to 3.1.0
#51774 opened
Aug 1, 2025 -
Fix invalid exit codes and enhance CLI validation tools
#51797 opened
Aug 3, 2025 -
[SPARK-53094][SQL] Fix cube-related data quality problem
#51810 opened
Aug 4, 2025 -
[SPARK-53103][SS] Throw an error if state directory is not empty when query starts
#51817 opened
Aug 4, 2025 -
[SPARK-53113][SQL] Support the time type by try_make_timestamp()
#51824 opened
Aug 4, 2025 -
[SPARK-53108][SQL] Implement the time_diff function in Scala
#51826 opened
Aug 4, 2025 -
[SPARK-53109][SQL] Support TIME in the make_timestamp_ntz and try_make_timestamp_ntz functions in Scala
#51828 opened
Aug 4, 2025 -
[SPARK-53111][SQL][PYTHON][CONNECT] Implement the time_diff function in PySpark
#51829 opened
Aug 4, 2025 -
[SPARK-53105][Structured Streaming] Fix tests for checkpoint v2 in RocksDBSuite
#51834 opened
Aug 4, 2025 -
[SPARK-53125][TEST] RemoteSparkSession prints whole `spark-submit` command
#51846 opened
Aug 5, 2025 -
[SPARK-53127][SQL] Enable LIMIT ALL to override recursion row limit
#51847 opened
Aug 5, 2025 -
[SPARK-53128][CORE] Include unmanaged memory bytes in the usage log before execution memory OOM
#51848 opened
Aug 5, 2025 -
[SPARK-49133][CORE] Make member `MemoryConsumer#used` atomic to avoid user code causing deadlock
#51849 opened
Aug 5, 2025 -
[SPARK-53182][PYTHON][DOCS] Fix broken and missing links in PySpark DataFrames user guide
#51851 opened
Aug 5, 2025 -
[SPARK-53142][SQL] Support dynamic expression addition in SemanticComparator
#51871 opened
Aug 6, 2025 -
[SPARK-53143][SQL] Fix self join in DataFrame API - Join is not the only expected output from analyzer
#51873 opened
Aug 6, 2025 -
[SPARK-53156][CORE] Track Driver Memory Metrics when the Application ends
#51882 opened
Aug 6, 2025 -
[SPARK-53157][CORE] Decouple driver and executor heartbeat polling intervals
#51885 opened
Aug 6, 2025 -
[SPARK-53158][WEBUI] Missing metricsProperties in KV Store should be handled correctly
#51887 opened
Aug 6, 2025 -
[DRAFT] Cluster mode HiveThriftServer2
#51899 opened
Aug 7, 2025 -
[SPARK-53174][CORE] Add TMPDIR environment variable with the value of java.io.tmpdir
#51902 opened
Aug 7, 2025 -
[SPARK-53148][CONNECT][SQL] Make SqlCommand in SparkConnectPlanner side effect free
#51903 opened
Aug 7, 2025 -
[SPARK-53324][K8S] Introduce pending pod limit per ResourceProfile
#51913 opened
Aug 7, 2025 -
[SPARK-53193][DOCS] Add advanced JVM optimization parameters to tuning guide
#51920 opened
Aug 8, 2025 -
[SPARK-53212][PYTHON] improve error handling for scalar Pandas UDFs
#51937 opened
Aug 8, 2025 -
[SPARK-53209][YARN] Add ActiveProcessorCount JVM option to YARN executor and AM
#51948 opened
Aug 9, 2025 -
[SPARK-53230][SQL] Assign a name to error class _LEGACY_ERROR_TEMP_1011
#51955 opened
Aug 10, 2025 -
[SPARK-53207][SDP] Send Pipeline Event to Client Asynchronously
#51956 opened
Aug 10, 2025 -
[WIP][TESTS] Upgrade pypy to 3.11
#51966 opened
Aug 11, 2025 -
[SPARK-53254][PYTHON][TESTS] Skip hanging tests in PyPy daily workflow
#51983 opened
Aug 12, 2025 -
[SPARK-52844][PYTHON] Update pyyaml to 5.4
#51993 opened
Aug 12, 2025 -
[SPARK-53262][SS] Support schema evolution for streaming dedupe operation
#51996 opened
Aug 12, 2025 -
temp:
#52001 opened
Aug 13, 2025 -
[SPARK-53264][SQL][CATALYST]. Incorrect nullability when correlated scalar subquery ge…
#52003 opened
Aug 13, 2025 -
[SPARK-52969][SQL] Support DSv2 OrcScan Dynamic Partition Pruning
#52009 opened
Aug 13, 2025 -
[TEST-ONLY] Test PyPy 7.3.19 with Python 3.10
#52014 opened
Aug 13, 2025 -
[SPARK-52677][SQL] Simplify DataTypeUtils.canWrite and TableOutputResolver
#52016 opened
Aug 13, 2025 -
[SPARK-53273][CONNECT][SQL] Make RegisterUserDefinedFunction in SparkConnectPlanner side effect free
#52026 opened
Aug 14, 2025 -
[SPARK-52336][CORE] Prepend Spark identifier to GCS user agent
#52027 opened
Aug 14, 2025 -
[SPARK-53275][SQL] Handle stateful expressions when ordering in interpreted mode
#52028 opened
Aug 14, 2025 -
[TEST-ONLY] A branch-3.5 PR to check the CI
#52040 opened
Aug 15, 2025 -
[SPARK-53293][SQL] Modify exprIdToOrdinal implementation for speedup on queries with wide tables
#52046 opened
Aug 15, 2025 -
[SPARK-53294][SS] Enable StateDataSource with state checkpoint v2 (only batchId option)
#52047 opened
Aug 15, 2025 -
[SPARK-52982][PYTHON] Disallow lateral join with Arrow Python UDTFs
#52048 opened
Aug 15, 2025 -
[SPARK-53296][CORE] ESS exit main thread in case boss thread exits
#52050 opened
Aug 16, 2025 -
[SPARK-53298][SQL] Make an isolation to control Shuffle partitionSizeInBytes converted from `REBALANCE` hint
#52052 opened
Aug 17, 2025 -
[SPARK-53318][SQL] Support the time type by make_timestamp_ltz()
#52062 opened
Aug 18, 2025 -
[SPARK-53319][SQL] Support the time type by try_make_timestamp_ltz()
#52063 opened
Aug 18, 2025 -
[WIP][SPARK-53292] Make CreateResourceProfileCommand in SparkConnectPlanner side effect free
#52064 opened
Aug 18, 2025 -
[SPARK-52873][SQL] Further restrict when SHJ semi/anti join can ignore duplicate keys on the build side
#52067 opened
Aug 18, 2025 -
[SPARK-53335][K8S] Optionally capture diagnostics for jobs on Kubernetes
#52068 opened
Aug 18, 2025 -
[SPARK-53309][SQL] Allow to propagate extra conf to executor
#52071 opened
Aug 19, 2025 -
[SPARK-53329] Improve exception handling when adding artifacts
#52073 opened
Aug 19, 2025 -
[WIP][SPARK-52090] retry fetch on ExecutorDeadException if block is found on another executor
#52076 opened
Aug 19, 2025 -
[SPARK-53330][SQL][PYTHON] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end)
#52077 opened
Aug 19, 2025 -
[SPARK-53320] Make RegisterUserDefinedTableFunction in SparkConnectPlanner side effect free
#52081 opened
Aug 20, 2025 -
[SPARK-53321] Make RegisterUserDefinedDataSource in SparkConnectPlanner side effect free
#52082 opened
Aug 20, 2025 -
[SPARK-53339][CONNECT] Fix an issue which occurs when an operation in pending state is interrupted
#52083 opened
Aug 20, 2025 -
[SPARK-53337][CORE] Ensure the application name get escaped.
#52084 opened
Aug 20, 2025 -
[WIP] Preparser
#52085 opened
Aug 20, 2025 -
[SPARK-53341][Core] Expand golden test coverage on multivariable DECLARE
#52086 opened
Aug 20, 2025 -
[SPARK-53344][DOCS] Add user guide for Arrow Python UDTFs
#52087 opened
Aug 21, 2025 -
[SPARK-53345][SS][TESTS] Use withTempDir for consistent directory across restarts in streaming test
#52088 opened
Aug 21, 2025 -
[SPARK-53342][SQL] Fix Arrow converter to handle multiple record batches in single IPC stream
#52090 opened
Aug 21, 2025 -
[SPARK-53348] [SQL] Always persist ANSI value when creating a view or assume it when querying if not stored
#52092 opened
Aug 21, 2025 -
[SPARK-53349][SQL] Optimized XML parser can't handle corrupted files correctly
#52093 opened
Aug 22, 2025
37 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[SPARK-52777][SQL] Enable shuffle cleanup mode configuration in Spark SQL
#51458 commented on
Aug 21, 2025 • 21 new comments -
[SPARK-52729][SQL] Add MetadataOnlyTable in DS v2 API
#51419 commented on
Aug 4, 2025 • 15 new comments -
[SPARK-52444][SQL][CONNECT] Add support for Variant/Char/Varchar Literal
#51215 commented on
Jul 28, 2025 • 4 new comments -
[SPARK-52798] [SQL] Add function approx_top_k_combine
#51505 commented on
Jul 24, 2025 • 3 new comments -
[SPARK-52621][SQL] Cast TIME to/from VARIANT
#51553 commented on
Aug 11, 2025 • 2 new comments -
[SPARK-52617][SQL] Cast TIME to/from TIMESTAMP_NTZ
#51381 commented on
Jul 24, 2025 • 1 new comment -
[SPARK-52544][SQL] Allow configuring Json datasource string length limit through SQLConf
#51235 commented on
Aug 8, 2025 • 1 new comment -
[SPARK-51069][SQL] Add big-endian support to UnsafeRowUtils.validateStructuralIntegrityWithReasonImpl
#49773 commented on
Jul 29, 2025 • 1 new comment -
[SPARK-52388][SQL] Handle named and positional parameters under `PlanWithUnresolvedIdentifier`
#51073 commented on
Aug 6, 2025 • 1 new comment -
[SPARK-52449][CONNECT][PYTHON][ML] Make datatypes for Expression.Literal.Map/Array optional
#51473 commented on
Jul 23, 2025 • 1 new comment -
[SPARK-52226] [SQL] Fix unusual equality checks in three operators
#50949 commented on
Jul 29, 2025 • 1 new comment -
[SPARK-52669][PySpark]Improvement PySpark choose pythonExec in cluster yarn client mode
#51357 commented on
Aug 6, 2025 • 0 new comments -
approx_top_k_combine
#51393 commented on
Jul 24, 2025 • 0 new comments -
[SPARK-33737][K8S] Support getting pod state using Informers + Listers
#51396 commented on
Aug 15, 2025 • 0 new comments -
[SPARK-52769][SQL] InjectRuntimeFilter should take into account join type and hints
#51453 commented on
Aug 11, 2025 • 0 new comments -
[SPARK-52807][SDP] Proto changes to support analysis inside Declarative Pipelines query functions
#51502 commented on
Jul 31, 2025 • 0 new comments -
[WIP][SPARK-51169] Set up a daily job for Python 3.14
#51532 commented on
Aug 19, 2025 • 0 new comments -
[WIP][SPARK-52764][PYTHON][ML][CONNECT][TESTS] Retry flaky tests in `test_parity_classification`
#51535 commented on
Jul 24, 2025 • 0 new comments -
[SPARK-52867][SQL] Remove redundant GetTimestamp
#51556 commented on
Jul 29, 2025 • 0 new comments -
[SPARK-52868][SQL] CBO: OOM-risky stats underestimation for some filters and sources
#51558 commented on
Jul 28, 2025 • 0 new comments -
[DRAFT][DO-NOT-REVIEW][SPARK-51XXX][SQL] Enable implicit cast from STRING to TIME type
#51583 commented on
Jul 27, 2025 • 0 new comments -
[SPARK-52407][SQL] Add support for Theta Sketch
#51298 commented on
Aug 10, 2025 • 0 new comments -
[SPARK-50603][SQL] Respect user-provided basePath for streaming file source reads without glob
#51267 commented on
Aug 15, 2025 • 0 new comments -
[SPARK-51168][BUILD] Test Hadoop 3.4.2
#51127 commented on
Aug 18, 2025 • 0 new comments -
Increase report interval of spaming logs to 10 seconds
#51012 commented on
Jul 25, 2025 • 0 new comments -
[SPARK-52020][TEST] Build hive-test-udfs.jar from source
#50790 commented on
Jul 28, 2025 • 0 new comments -
[SPARK-51554][SQL] Add the time_trunc() function for TIME datatype
#50607 commented on
Jul 22, 2025 • 0 new comments -
Enable -Xsource:3 compiler flag
#50474 commented on
Jul 29, 2025 • 0 new comments -
[SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters
#50230 commented on
Jul 23, 2025 • 0 new comments -
[SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files
#50215 commented on
Jul 24, 2025 • 0 new comments -
[SPARK-51243][CORE][ML] Configurable allow native BLAS
#49986 commented on
Aug 8, 2025 • 0 new comments -
[BUILD] Upgrade `RoaringBitmap` to 1.5.2
#49710 commented on
Aug 18, 2025 • 0 new comments -
[SPARK-49547][SQL][PYTHON] Add iterator of `RecordBatch` API to `applyInArrow`
#49005 commented on
Aug 15, 2025 • 0 new comments -
[SPARK-22876][YARN] Respect YARN AM failure validity interval
#42570 commented on
Aug 13, 2025 • 0 new comments -
[SPARK-44639][SS][YARN] Use Java tmp dir for local RocksDB state storage on Yarn
#42301 commented on
Aug 15, 2025 • 0 new comments -
[SPARK-37019][SQL] Add codegen support to array higher-order functions
#34558 commented on
Aug 13, 2025 • 0 new comments -
[SPARK-35564][SQL] Support subexpression elimination for conditionally evaluated expressions
#32987 commented on
Aug 15, 2025 • 0 new comments