Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addAadSupportInSpark #32393

Merged
merged 29 commits into from Feb 2, 2023
Merged

addAadSupportInSpark #32393

merged 29 commits into from Feb 2, 2023

Conversation

xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Dec 1, 2022

Description

Add aad support in spark.

Why the need

You cannot use any Azure Cosmos DB data plane SDK to authenticate management operations with an Azure AD identity.
Azure Cosmos DB RBAC

Design

Cosmos Config

New config entry has been introduced which allows customer to provide service principle related config
spark.cosmos.auth.type
spark.cosmos.account.subscriptionId
spark.cosmos.account.tenantId
spark.cosmos.account.resourceGroupName
spark.cosmos.account.azureEnvironment
spark.cosmos.auth.aad.clientId
spark.cosmos.auth.aad.clientSecret

Spark catalog

Catalog is the API interfaces which allows customer to interact with the metadata store. CosmosCatalogBase and CosmosCatalog contain all the related implementations, which includes most of the related management operations (like creating database, creating container etc). The idea here is to use ManagementSDK for the underlying operations if customer is using ServicePrinciple based AAD authentication, use CosmosDB Java SDK V4 if customer is using masterKey authentication.
image

Throughput control

Query database/container throughput throughput AAD authentication is also not supported through Cosmos DB data plane SDK currently, so in order for throughput control to work correctly, it needs a way to query throughput successfully.
A new internal method is being introduced which accept a throughputQueryMono, which will be used in ThroughputContainerController.
image

Test

Currently spark CI pipeline only targets CosmosDB Emulator, however using CosmosDB Emulator for testing is challenging as:

  • Emulator does not support management operations with AAD authentication
  • Emulator only accepts a pre-crafted AAD token

So the following tests is used as gates:

  • Added new basicScenarioAad notebook which can be triggered through /azp run java - cosmos - spark
  • Tested locally with prod database account and valid ServicePrinciple authentication in CosmosClientCacheITest, CosmosCatalogITest
  • Run 01_Batch notebook with masterKey and AAD auth
  • Will follow up [BUG]Follow up questions for Management SDK #33245

@ghost ghost added the Cosmos label Dec 1, 2022
@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@xinlian12 xinlian12 force-pushed the aadSupportInSpark branch 3 times, most recently from 86cf08e to 205be9c Compare January 24, 2023 07:09
@xinlian12
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member

LGTM - thanks!

@xinlian12
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - spark

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12 xinlian12 merged commit 6511cff into Azure:main Feb 2, 2023
@XiaofeiCao
Copy link
Contributor

XiaofeiCao commented Feb 2, 2023

Hi @xinlian12 , azure-core-http-netty version has bumped to 1.13.0. The version referenced here somehow didn't get bumped correctly.

I've created a PR to bump the version.

PR's merged. It should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[QUERY] Azure CosmosDB Spark OLTP Connector with Managed Identity
5 participants