Skip to content

[Cosmos] Migrate Java Cosmos weekly pipelines to TME#48877

Draft
tvaron3 wants to merge 10 commits intoAzure:mainfrom
tvaron3:users/tomasvaron/migrate-live-tests-to-tme
Draft

[Cosmos] Migrate Java Cosmos weekly pipelines to TME#48877
tvaron3 wants to merge 10 commits intoAzure:mainfrom
tvaron3:users/tomasvaron/migrate-live-tests-to-tme

Conversation

@tvaron3
Copy link
Copy Markdown
Member

@tvaron3 tvaron3 commented Apr 20, 2026

Description

Migrate the Java Cosmos weekly test pipelines (tests.yml, spark.yml, kafka.yml) from the Microsoft corp tenant to the Azure SDK Test Resources – TME tenant / subscription.

  • Tenant: 70a036f6-8e4d-4615-bad6-149c02e7720d
  • Subscription: 4d042dc6-fe17-4698-a23f-ec6a8d1e98f4
  • New service connection: azure-sdk-tests-cosmos-tme

The TME tenant id is already in $wellKnownTMETenants in eng/common/TestResources/New-TestResources.ps1, so per-run resource groups are automatically prefixed with SSS3PT_ to keep local-auth Cosmos keys from tripping S360. For the one long-lived RG we hard-code (oltp-spark-ci), this PR adds the prefix manually.

Changes

File Change
sdk/cosmos/tests.yml ServiceConnection: azure-sdk-tests-cosmosazure-sdk-tests-cosmos-tme across all 7 stages
sdk/cosmos/spark.yml Replace hardcoded corp SubscriptionId / TenantId with TME values; rename ResourceGroupName: oltp-spark-ciSSS3PT_oltp-spark-ci across all 6 stages
sdk/cosmos/kafka.yml Update ACCOUNT_TENANT_ID to the TME tenant and switch service connection to azure-sdk-tests-cosmos-tme

Draft PR — Manual prerequisites (not in this PR)

This PR only covers the in-repo YAML. The following must be completed out-of-band by eng-sys / Cosmos test owners before the pipelines can run green. I'm marking the PR as Draft until they're all done.

1. Azure DevOps service connection

  • File request with Azure SDK Eng-Sys (SOP: https://dev.azure.com/azure-sdk/internal/_wiki/wikis/internal.wiki/206/Subscription-and-Tenant-Usage) to provision an SPN in the TME tenant with Contributor on the Azure SDK Test Resources – TME subscription.
  • Have eng-sys create the ADO service connection named azure-sdk-tests-cosmos-tme pointing at that subscription and grant the cosmos weekly pipelines permission to use it.
  • Capture the SPN object id — needed for the kafkaTestApplicationOid follow-up (see §4).

2. TME variable group

Create a pipeline variable group attached to the new service connection. Keep identical variable names to the existing corp variable group so no additional YAML changes are needed. Variables consumed by these pipelines:

  • tests.yml (ThinClient stages): thinclient-test-endpoint, thinclient-test-key, thin-client-canary-multi-region-session-endpoint/key, thin-client-canary-multi-writer-session-endpoint/key
  • spark.yml: spark-databricks-cosmos-endpoint, spark-databricks-cosmos-endpoint-msi, spark-databricks-cosmos-key, spark-databricks-endpoint-with-msi, spark-databricks-token-with-msi, spark-databricks-cosmos-spn-clientId, spark-databricks-cosmos-spn-clientSecret, spark-databricks-cosmos-spn-clientIdCert, spark-databricks-cosmos-spn-clientCertBase64
  • kafka.yml: cosmos-client-telemetry-endpoint, cosmos-client-telemetry-cosmos-account, kafka-mcr-name (also reuses spark-databricks-cosmos-spn-clientId/Secret)

3. Long-lived resources to pre-create in TME

All must live in SSS3PT_-prefixed resource groups (e.g. SSS3PT_rg-cosmos-java-weekly, SSS3PT_oltp-spark-ci) to suppress S360 alerts for local-auth:

# Resource Feeds variables Used by
1 Cosmos DB – ThinClient single-region thinclient-test-endpoint/key tests.yml Cosmos_Live_Test_ThinClient
2 Cosmos DB – ThinClient multi-region (session) thin-client-canary-multi-region-session-endpoint/key tests.yml ThinClient_MultiRegion
3 Cosmos DB – ThinClient multi-master (session) thin-client-canary-multi-writer-session-endpoint/key tests.yml ThinClient_MultiMaster
4 Cosmos DB – Spark long-lived spark-databricks-cosmos-endpoint/key and -msi spark.yml (all 6 stages)
5 Azure Databricks workspace + long-lived PAT spark-databricks-endpoint-with-msi, spark-databricks-token-with-msi spark.yml — the existing oltp-spark-ci workspace can't be moved cross-tenant, a brand-new one is required in TME
6 Storage account for spark connector jars (analog of oltpsparkcijarstore0326) + cert/SAS spark-databricks-cosmos-spn-clientIdCert, spark-databricks-cosmos-spn-clientCertBase64 spark.yml
7 Service principal for Spark/Kafka driver with data-plane RBAC on (1)–(4) + (8) spark-databricks-cosmos-spn-clientId/Secret/clientIdCert/clientCertBase64; its object id also feeds kafkaTestApplicationOid (§4) spark.yml, kafka.yml
8 Cosmos DB – Kafka client-telemetry sink cosmos-client-telemetry-endpoint, cosmos-client-telemetry-cosmos-account kafka.yml
9 Validate / mirror MCR registry (kafka-mcr-name) reachable from TME pipeline agents kafka-mcr-name kafka.yml

4. Follow-up YAML change (requires SPN object id from §1)

  • Update the default for kafkaTestApplicationOid in sdk/cosmos/test-resources/kafka-testcontainer/test-resources.json (line 36) from the current corp-tenant SPN (3b254cc1-3ecc-4d33-9d61-e867badcef16) to the object id of the TME SPN created in §1. Without this the Cosmos data-plane RBAC role assignment inside New-TestResources.ps1 will fail.

5. Validation (before marking ready for review)

  • Manually run tests.yml against this branch; confirm the deploy step uses the TME subscription, RG is created with SSS3PT_, and all 7 stages complete.
  • Manually run spark.yml against this branch; confirm the Databricks notebook job succeeds on the new TME workspace + Cosmos account.
  • Manually run kafka.yml against this branch.

Testing

Validated the diff is self-consistent (no leftover corp tenant/subscription ids, SSS3PT_ prefix applied to all 6 spark.yml stages, service connection renamed in all 7 tests.yml stages + kafka.yml).

References

tvaron3 and others added 7 commits April 16, 2026 15:24
Version bumps and CHANGELOG updates for:
- azure-cosmos-spark_3-3_2-12 4.47.0
- azure-cosmos-spark_3-4_2-12 4.47.0
- azure-cosmos-spark_3-5_2-12 4.47.0
- azure-cosmos-spark_3-5_2-13 4.47.0
- azure-cosmos-spark_4-0_2-13 4.47.0

Features Added:
- Added support for change feed with startFrom point-in-time on merged partitions (PR Azure#48752)

Bugs Fixed:
- Fixed readContainerThroughput unnecessary permission requirement (PR Azure#48800)

Also updated azure-cosmos CHANGELOG to reclassify the startFrom fix as a feature.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Added JVM <clinit> deadlock fix (PR Azure#48689) to all 5 spark connector CHANGELOGs
- Added Known Issues section to Spark 4.0 README for Structured Streaming
  incompatibility with Databricks Runtime 17.3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Updated with accurate details: MetadataVersionUtil$ class removal,
DBR 17.3 includes Spark 4.1 changes while reporting 4.0.0, and
recommendation to stay on previous LTS until DBR 18 LTS.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Point tests.yml, spark.yml, and kafka.yml at the Azure SDK Test Resources
- TME tenant/subscription and the new azure-sdk-tests-cosmos-tme service
connection. Prefix the long-lived Spark resource group with SSS3PT_ so
that local-auth Cosmos keys do not trip S360 alerts (see
eng/common/TestResources/New-TestResources.ps1 lines 130/314).

Per-run resource groups created by New-TestResources.ps1 are prefixed
automatically because the TME tenant id is in $wellKnownTMETenants.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tvaron3 and others added 2 commits April 20, 2026 14:24
…grate-live-tests-to-tme

# Conflicts:
#	eng/versioning/version_client.txt
#	sdk/cosmos/azure-cosmos-spark-account-data-resolver-sample/pom.xml
#	sdk/cosmos/azure-cosmos-spark_3-3_2-12/CHANGELOG.md
#	sdk/cosmos/azure-cosmos-spark_3-3_2-12/pom.xml
#	sdk/cosmos/azure-cosmos-spark_3-4_2-12/CHANGELOG.md
#	sdk/cosmos/azure-cosmos-spark_3-4_2-12/pom.xml
#	sdk/cosmos/azure-cosmos-spark_3-5_2-12/CHANGELOG.md
#	sdk/cosmos/azure-cosmos-spark_3-5_2-12/pom.xml
#	sdk/cosmos/azure-cosmos-spark_3-5_2-13/CHANGELOG.md
#	sdk/cosmos/azure-cosmos-spark_3-5_2-13/pom.xml
#	sdk/cosmos/azure-cosmos-spark_4-0_2-13/CHANGELOG.md
#	sdk/cosmos/azure-cosmos-spark_4-0_2-13/pom.xml
#	sdk/cosmos/fabric-cosmos-spark-auth_3/pom.xml
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Override CloudConfig.Public.ServiceConnection to azure-sdk-tests-cosmos-tme
for the IT_Cosmos and Spring_Data_Cosmos_Integration stages in
sdk/spring/tests.yml. Thread a CloudConfig passthrough parameter through
tests-supported-spring-versions-template.yml and
tests-supported-spring-versions-filter-template.yml so the override reaches
archetype-sdk-tests-isolated.yml. Defaults are unchanged so non-cosmos
Spring stages (AppConfig, ServiceBus, EventHubs_Storage, KeyVault,
AppConfig_IT) continue to use their current service connections.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant