[SparkConnector][No Review]FixNoClassDefFoundError for MetadataVersionUtil#48837
Merged
FabianMeiswinkel merged 10 commits intoAzure:mainfrom Apr 17, 2026
Merged
Conversation
…ector Inline version validation logic in ChangeFeedInitialOffsetWriter instead of depending on Spark-internal MetadataVersionUtil, which has been relocated in Databricks Runtime 17.3 LTS (Spark 4.0). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
@sdkReviewAgent |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR removes a runtime dependency on Spark’s MetadataVersionUtil (which can be relocated in some Spark distributions) by inlining equivalent log-version validation logic into the Cosmos Spark connector’s change feed offset metadata reader/writer.
Changes:
- Removed the import/reference to
org.apache.spark.sql.execution.streaming.MetadataVersionUtil. - Added an internal
validateVersionimplementation and switcheddeserializeto use it.
Member
Author
|
@sdkReviewAgent |
Member
Author
|
✅ Review complete (14:39) No new comments — existing review coverage is sufficient. Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Member
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Add ChangeFeedInitialOffsetWriterSpec with tests covering: - Valid version strings within supported range - Version exceeding max supported (UnsupportedLogVersion) - Malformed versions: non-numeric, empty, missing v prefix, v0, negative, bare v Widen companion object visibility to private[spark] for testability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…st notebooks Add structured streaming scenarios using cosmos.oltp.changeFeed to both basicScenario.scala and basicScenarioAadManagedIdentity.scala notebooks. These scenarios exercise the ChangeFeedInitialOffsetWriter and HDFSMetadataLog code paths that can break on certain Spark distributions (e.g. Databricks Runtime 17.3+). Each scenario: - Creates a sink container - Reads change feed from source via readStream with micro-batch - Writes to sink container via writeStream - Validates records were copied - Cleans up both containers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Use file:/tmp/ instead of /tmp/ for checkpoint location to avoid DBFS access issues on Unity Catalog-enabled Databricks clusters. Also: - Remove unused Trigger import - Stop query before reading sink to avoid race conditions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace cosmos.oltp sink with in-memory sink to eliminate the need for a separate sink container. This avoids 404 errors from sink container creation/resolution and removes checkpoint path concerns. The test still exercises the full ChangeFeedInitialOffsetWriter and HDFSMetadataLog code paths (readStream with cosmos.oltp.changeFeed), which is the goal for validating the MetadataVersionUtil fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Both notebooks now use the same pattern: derive changeFeedCfg from the existing cfg map (which already has the correct auth config) plus the change feed-specific options. Write to an in-memory sink to avoid container creation issues. This ensures both key-based and AAD/MSI notebooks exercise identical streaming logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
The MSI notebook shares a cluster with basicScenario, and the Cosmos client cache retains references from the first notebook's proactive connection init. When basicScenario drops the source container during cleanup, the MSI notebook's change feed streaming fails with 404 on the cached (now-deleted) container. The change feed streaming test in basicScenario already provides sufficient coverage for the ChangeFeedInitialOffsetWriter code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add detailed logging to capture: - Endpoint, database, container, auth config used - Source container record count before streaming - Streaming query ID - Full exception details on failure This will help diagnose why the change feed streaming fails on the MSI notebook but succeeds on the key-based one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The MSI change feed test passes on a fresh cluster but fails when basicScenario runs first on the same cluster without restart. The basicScenario leaves cached Cosmos client state (proactive connection init on the ephemeral endpoint) that causes the MSI streaming query to resolve to the wrong endpoint, resulting in a 404. The change feed test in basicScenario provides sufficient coverage for the ChangeFeedInitialOffsetWriter/HDFSMetadataLog code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
tvaron3
added a commit
to tvaron3/azure-sdk-for-java
that referenced
this pull request
Apr 17, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tvaron3
added a commit
that referenced
this pull request
Apr 17, 2026
* Release azure-cosmos-spark 4.47.0 Version bumps and CHANGELOG updates for: - azure-cosmos-spark_3-3_2-12 4.47.0 - azure-cosmos-spark_3-4_2-12 4.47.0 - azure-cosmos-spark_3-5_2-12 4.47.0 - azure-cosmos-spark_3-5_2-13 4.47.0 - azure-cosmos-spark_4-0_2-13 4.47.0 Features Added: - Added support for change feed with startFrom point-in-time on merged partitions (PR #48752) Bugs Fixed: - Fixed readContainerThroughput unnecessary permission requirement (PR #48800) Also updated azure-cosmos CHANGELOG to reclassify the startFrom fix as a feature. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review: add clinit fix to CHANGELOGs and DBR 17.3 known issue - Added JVM <clinit> deadlock fix (PR #48689) to all 5 spark connector CHANGELOGs - Added Known Issues section to Spark 4.0 README for Structured Streaming incompatibility with Databricks Runtime 17.3 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Reword DBR 17.3 known issue based on IcM 779484786 Updated with accurate details: MetadataVersionUtil$ class removal, DBR 17.3 includes Spark 4.1 changes while reporting 4.0.0, and recommendation to stay on previous LTS until DBR 18 LTS. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove DBR 17.3 known issue - will be fixed before release Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update spark release date to 2026-04-17 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add MetadataVersionUtil fix to Spark 4.0 CHANGELOG (PR #48837) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes a
NoClassDefFoundErrorforMetadataVersionUtilin the Cosmos Spark connector that occurs on certain Spark distributions (e.g., Databricks Runtime 17.3+) whereMetadataVersionUtilhas been relocated or removed.Problem
ChangeFeedInitialOffsetWriterdirectly referencesorg.apache.spark.sql.execution.streaming.MetadataVersionUtil, which is an internal Spark class. Some Spark distributions relocate or remove this class, causing aNoClassDefFoundErrorat runtime when the change feed offset writer attempts to deserialize a log file.Solution
Inline the
validateVersionlogic fromMetadataVersionUtilinto a private companion object (ChangeFeedInitialOffsetWriterobject) to eliminate the runtime dependency onMetadataVersionUtil. The inlined implementation preserves the same validation semantics:"v1") from the log file headerIllegalStateExceptionwith descriptive messages for malformed or unsupported versionsChanges
ChangeFeedInitialOffsetWriter.scala: Removed the import ofMetadataVersionUtil, replaced the call toMetadataVersionUtil.validateVersion(...)with a localChangeFeedInitialOffsetWriter.validateVersion(...), and added a companion object with the inlined validation logic.Tests
HDFSMetadataLog