[SparkConnector][No Review]FixNoClassDefFoundError for MetadataVersionUtil by xinlian12 · Pull Request #48837 · Azure/azure-sdk-for-java

xinlian12 · 2026-04-16T21:35:00Z

Description

Fixes a NoClassDefFoundError for MetadataVersionUtil in the Cosmos Spark connector that occurs on certain Spark distributions (e.g., Databricks Runtime 17.3+) where MetadataVersionUtil has been relocated or removed.

Problem

ChangeFeedInitialOffsetWriter directly references org.apache.spark.sql.execution.streaming.MetadataVersionUtil, which is an internal Spark class. Some Spark distributions relocate or remove this class, causing a NoClassDefFoundError at runtime when the change feed offset writer attempts to deserialize a log file.

Solution

Inline the validateVersion logic from MetadataVersionUtil into a private companion object (ChangeFeedInitialOffsetWriter object) to eliminate the runtime dependency on MetadataVersionUtil. The inlined implementation preserves the same validation semantics:

Parses the version string (e.g., "v1") from the log file header
Validates the version is within the supported range
Throws IllegalStateException with descriptive messages for malformed or unsupported versions

Changes

ChangeFeedInitialOffsetWriter.scala: Removed the import of MetadataVersionUtil, replaced the call to MetadataVersionUtil.validateVersion(...) with a local ChangeFeedInitialOffsetWriter.validateVersion(...), and added a companion object with the inlined validation logic.

Tests

Added spark live test for change feed streaming
Manual testing on spark Databricks with 17.3 runtime
Also confirmed the path for HDFSMetadataLog

…ector Inline version validation logic in ChangeFeedInitialOffsetWriter instead of depending on Spark-internal MetadataVersionUtil, which has been relocated in Databricks Runtime 17.3 LTS (Spark 4.0). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-04-16T21:40:51Z

@sdkReviewAgent

Copilot

Pull request overview

This PR removes a runtime dependency on Spark’s MetadataVersionUtil (which can be relocated in some Spark distributions) by inlining equivalent log-version validation logic into the Cosmos Spark connector’s change feed offset metadata reader/writer.

Changes:

Removed the import/reference to org.apache.spark.sql.execution.streaming.MetadataVersionUtil.
Added an internal validateVersion implementation and switched deserialize to use it.

xinlian12 · 2026-04-16T21:52:49Z

@sdkReviewAgent

xinlian12 · 2026-04-16T22:08:08Z

✅ Review complete (14:39)

No new comments — existing review coverage is sufficient.

_{Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage}

FabianMeiswinkel · 2026-04-16T22:10:10Z

/azp run java - cosmos - spark

azure-pipelines · 2026-04-16T22:10:20Z

Azure Pipelines successfully started running 1 pipeline(s).

Add ChangeFeedInitialOffsetWriterSpec with tests covering: - Valid version strings within supported range - Version exceeding max supported (UnsupportedLogVersion) - Malformed versions: non-numeric, empty, missing v prefix, v0, negative, bare v Widen companion object visibility to private[spark] for testability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…st notebooks Add structured streaming scenarios using cosmos.oltp.changeFeed to both basicScenario.scala and basicScenarioAadManagedIdentity.scala notebooks. These scenarios exercise the ChangeFeedInitialOffsetWriter and HDFSMetadataLog code paths that can break on certain Spark distributions (e.g. Databricks Runtime 17.3+). Each scenario: - Creates a sink container - Reads change feed from source via readStream with micro-batch - Writes to sink container via writeStream - Validates records were copied - Cleans up both containers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-04-16T22:41:37Z

/azp run java - cosmos - spark

azure-pipelines · 2026-04-16T22:41:48Z

Azure Pipelines successfully started running 1 pipeline(s).

Use file:/tmp/ instead of /tmp/ for checkpoint location to avoid DBFS access issues on Unity Catalog-enabled Databricks clusters. Also: - Remove unused Trigger import - Stop query before reading sink to avoid race conditions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace cosmos.oltp sink with in-memory sink to eliminate the need for a separate sink container. This avoids 404 errors from sink container creation/resolution and removes checkpoint path concerns. The test still exercises the full ChangeFeedInitialOffsetWriter and HDFSMetadataLog code paths (readStream with cosmos.oltp.changeFeed), which is the goal for validating the MetadataVersionUtil fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-04-17T03:54:16Z

/azp run java - cosmos - spark

azure-pipelines · 2026-04-17T03:54:26Z

Azure Pipelines successfully started running 1 pipeline(s).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-04-17T04:20:28Z

/azp run java - cosmos - spark

azure-pipelines · 2026-04-17T04:20:38Z

Azure Pipelines successfully started running 1 pipeline(s).

Both notebooks now use the same pattern: derive changeFeedCfg from the existing cfg map (which already has the correct auth config) plus the change feed-specific options. Write to an in-memory sink to avoid container creation issues. This ensures both key-based and AAD/MSI notebooks exercise identical streaming logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-04-17T04:29:01Z

/azp run java - cosmos - spark

azure-pipelines · 2026-04-17T04:29:11Z

Azure Pipelines successfully started running 1 pipeline(s).

The MSI notebook shares a cluster with basicScenario, and the Cosmos client cache retains references from the first notebook's proactive connection init. When basicScenario drops the source container during cleanup, the MSI notebook's change feed streaming fails with 404 on the cached (now-deleted) container. The change feed streaming test in basicScenario already provides sufficient coverage for the ChangeFeedInitialOffsetWriter code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add detailed logging to capture: - Endpoint, database, container, auth config used - Source container record count before streaming - Streaming query ID - Full exception details on failure This will help diagnose why the change feed streaming fails on the MSI notebook but succeeds on the key-based one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The MSI change feed test passes on a fresh cluster but fails when basicScenario runs first on the same cluster without restart. The basicScenario leaves cached Cosmos client state (proactive connection init on the ephemeral endpoint) that causes the MSI streaming query to resolve to the wrong endpoint, resulting in a 404. The change feed test in basicScenario provides sufficient coverage for the ChangeFeedInitialOffsetWriter/HDFSMetadataLog code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-04-17T05:19:47Z

/azp run java - cosmos - spark

azure-pipelines · 2026-04-17T05:19:57Z

Azure Pipelines successfully started running 1 pipeline(s).

FabianMeiswinkel

LGTM

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Release azure-cosmos-spark 4.47.0 Version bumps and CHANGELOG updates for: - azure-cosmos-spark_3-3_2-12 4.47.0 - azure-cosmos-spark_3-4_2-12 4.47.0 - azure-cosmos-spark_3-5_2-12 4.47.0 - azure-cosmos-spark_3-5_2-13 4.47.0 - azure-cosmos-spark_4-0_2-13 4.47.0 Features Added: - Added support for change feed with startFrom point-in-time on merged partitions (PR #48752) Bugs Fixed: - Fixed readContainerThroughput unnecessary permission requirement (PR #48800) Also updated azure-cosmos CHANGELOG to reclassify the startFrom fix as a feature. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review: add clinit fix to CHANGELOGs and DBR 17.3 known issue - Added JVM <clinit> deadlock fix (PR #48689) to all 5 spark connector CHANGELOGs - Added Known Issues section to Spark 4.0 README for Structured Streaming incompatibility with Databricks Runtime 17.3 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Reword DBR 17.3 known issue based on IcM 779484786 Updated with accurate details: MetadataVersionUtil$ class removal, DBR 17.3 includes Spark 4.1 changes while reporting 4.0.0, and recommendation to stay on previous LTS until DBR 18 LTS. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove DBR 17.3 known issue - will be fixed before release Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update spark release date to 2026-04-17 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add MetadataVersionUtil fix to Spark 4.0 CHANGELOG (PR #48837) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions bot added the Cosmos label Apr 16, 2026

xinlian12 marked this pull request as ready for review April 16, 2026 21:39

xinlian12 requested review from a team and kirankumarkolli as code owners April 16, 2026 21:39

Copilot AI review requested due to automatic review settings April 16, 2026 21:39

xinlian12 changed the title ~~Fix NoClassDefFoundError for MetadataVersionUtil in Cosmos Spark conn…~~ [SparkConnector][No Review]FixNoClassDefFoundError for MetadataVersionUtil Apr 16, 2026

Copilot started reviewing on behalf of xinlian12 April 16, 2026 21:40 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread ...ure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/ChangeFeedInitialOffsetWriter.scala

xinlian12 and others added 2 commits April 16, 2026 15:12

xinlian12 and others added 2 commits April 16, 2026 20:01

Remove change feed streaming scenarios from Databricks notebooks

11bfba7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 and others added 2 commits April 16, 2026 22:03

FabianMeiswinkel approved these changes Apr 17, 2026

View reviewed changes

FabianMeiswinkel merged commit df7614a into Azure:main Apr 17, 2026
36 checks passed

tvaron3 added a commit to tvaron3/azure-sdk-for-java that referenced this pull request Apr 17, 2026

Add MetadataVersionUtil fix to Spark 4.0 CHANGELOG (PR Azure#48837)

13629fc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Conversation

xinlian12 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Tests

Uh oh!

xinlian12 commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

xinlian12 commented Apr 16, 2026

Uh oh!

xinlian12 commented Apr 16, 2026

Uh oh!

FabianMeiswinkel commented Apr 16, 2026

Uh oh!

azure-pipelines bot commented Apr 16, 2026

Uh oh!

xinlian12 commented Apr 16, 2026

Uh oh!

azure-pipelines bot commented Apr 16, 2026

Uh oh!

xinlian12 commented Apr 17, 2026

Uh oh!

azure-pipelines bot commented Apr 17, 2026

Uh oh!

xinlian12 commented Apr 17, 2026

Uh oh!

azure-pipelines bot commented Apr 17, 2026

Uh oh!

xinlian12 commented Apr 17, 2026

Uh oh!

azure-pipelines bot commented Apr 17, 2026

Uh oh!

xinlian12 commented Apr 17, 2026

Uh oh!

azure-pipelines bot commented Apr 17, 2026

Uh oh!

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xinlian12 commented Apr 16, 2026 •

edited

Loading