Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance cosmos aad token error logging #37977

Merged
merged 13 commits into from
Dec 12, 2023

Conversation

TheovanKraay
Copy link
Member

@TheovanKraay TheovanKraay commented Dec 7, 2023

Description

Currently when customers use AAD/Microsoft Entra ID and there is a problem, the error message is not sufficiently informative/helpful - it is hard coded when DatabaseAccount is null after client initialization :

"Check if the endpoint is reachable and if your auth token is valid. More info: https://aka.ms/cosmosdb-tsg-service-unavailable-java"

There are multiple scenarios possible (i.e. AAD endpoint not reachable, invalid secret, invalid tenant id, client id, etc) and ideally user needs to see the exception that actually comes back from AAD code paths if the problem lies there.

Change is to capture the error logged earlier in the flow that causes DatabaseAccount to be null, and throw error details along with current error text so it is clear what the AAD related error is.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@azure-sdk
Copy link
Collaborator

azure-sdk commented Dec 7, 2023

API change check

APIView has identified API level changes in this PR and created following API reviews.

com.azure:azure-cosmos

@TheovanKraay TheovanKraay changed the title enhance cosmos aad token validation enhance cosmos aad token error logging Dec 8, 2023
@@ -310,6 +323,7 @@ private Mono<DatabaseAccount> getDatabaseAccountAsync(URI serviceEndpoint) {
.doOnNext(databaseAccount -> {
if(databaseAccount != null) {
this.latestDatabaseAccount = databaseAccount;
this.setLatestDatabaseRefreshError(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this to null might reflect as NullPointerException somewhere, IMO, it would be better to set it to empty string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think as ong as it is consumed correctly it shoudl be fine. Initial version was indeed not thread-safe - I have changed that part

@@ -310,6 +323,7 @@ private Mono<DatabaseAccount> getDatabaseAccountAsync(URI serviceEndpoint) {
.doOnNext(databaseAccount -> {
if(databaseAccount != null) {
this.latestDatabaseAccount = databaseAccount;
this.setLatestDatabaseRefreshError(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we should track the exception itself instead of the error message, and then later when new RunTimeException(message, throwable)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I made change to track exception, which I think addresses also Kushagra concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really - but with my changes it should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In RxDocumentClientImp - the important part is that the throwable in both logging and when wrapping in RuntimException should be passed in as cause - doing that with null is fine. For consistency you also have to use a snapshot - otherwise the value might have changed between logging and throwing inner exception

@@ -543,8 +543,10 @@ private void initializeGatewayConfigurationReader() {
if (databaseAccount == null) {
logger.error("Client initialization failed."
+ " Check if the endpoint is reachable and if your auth token is valid. More info: https://aka.ms/cosmosdb-tsg-service-unavailable-java");
logger.error(" More error details: "+this.globalEndpointManager.getLatestDatabaseRefreshError());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not thread-safe - getLatestDatabaseRefreshError could have been changed (set to null) between these two invocations - instead capture a snapshot - then you can safely log it and throw it if != null - also the refresh error should be passed as inner exception in both cases.

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@@ -79,6 +80,27 @@ public void createAadTokenCredential() throws InterruptedException {

Thread.sleep(TIMEOUT);

TokenCredential dummyServicePrincipal = new ClientSecretCredentialBuilder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also track the change in changelog :)

Copy link
Member

@jeet1995 jeet1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@TheovanKraay TheovanKraay merged commit 73604e7 into Azure:main Dec 12, 2023
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants