-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance cosmos aad token error logging #37977
enhance cosmos aad token error logging #37977
Conversation
API change check APIView has identified API level changes in this PR and created following API reviews. |
@@ -310,6 +323,7 @@ private Mono<DatabaseAccount> getDatabaseAccountAsync(URI serviceEndpoint) { | |||
.doOnNext(databaseAccount -> { | |||
if(databaseAccount != null) { | |||
this.latestDatabaseAccount = databaseAccount; | |||
this.setLatestDatabaseRefreshError(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting this to null might reflect as NullPointerException
somewhere, IMO, it would be better to set it to empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think as ong as it is consumed correctly it shoudl be fine. Initial version was indeed not thread-safe - I have changed that part
...cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/GlobalEndpointManager.java
Outdated
Show resolved
Hide resolved
@@ -310,6 +323,7 @@ private Mono<DatabaseAccount> getDatabaseAccountAsync(URI serviceEndpoint) { | |||
.doOnNext(databaseAccount -> { | |||
if(databaseAccount != null) { | |||
this.latestDatabaseAccount = databaseAccount; | |||
this.setLatestDatabaseRefreshError(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether we should track the exception itself instead of the error message, and then later when new RunTimeException(message, throwable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I made change to track exception, which I think addresses also Kushagra concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really - but with my changes it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In RxDocumentClientImp - the important part is that the throwable in both logging and when wrapping in RuntimException should be passed in as cause - doing that with null is fine. For consistency you also have to use a snapshot - otherwise the value might have changed between logging and throwing inner exception
@@ -543,8 +543,10 @@ private void initializeGatewayConfigurationReader() { | |||
if (databaseAccount == null) { | |||
logger.error("Client initialization failed." | |||
+ " Check if the endpoint is reachable and if your auth token is valid. More info: https://aka.ms/cosmosdb-tsg-service-unavailable-java"); | |||
logger.error(" More error details: "+this.globalEndpointManager.getLatestDatabaseRefreshError()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is not thread-safe - getLatestDatabaseRefreshError could have been changed (set to null) between these two invocations - instead capture a snapshot - then you can safely log it and throw it if != null - also the refresh error should be passed as inner exception in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@@ -79,6 +80,27 @@ public void createAadTokenCredential() throws InterruptedException { | |||
|
|||
Thread.sleep(TIMEOUT); | |||
|
|||
TokenCredential dummyServicePrincipal = new ClientSecretCredentialBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also track the change in changelog :)
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
Description
Currently when customers use AAD/Microsoft Entra ID and there is a problem, the error message is not sufficiently informative/helpful - it is hard coded when DatabaseAccount is
null
after client initialization :"Check if the endpoint is reachable and if your auth token is valid. More info: https://aka.ms/cosmosdb-tsg-service-unavailable-java"
There are multiple scenarios possible (i.e. AAD endpoint not reachable, invalid secret, invalid tenant id, client id, etc) and ideally user needs to see the exception that actually comes back from AAD code paths if the problem lies there.
Change is to capture the error logged earlier in the flow that causes DatabaseAccount to be null, and throw error details along with current error text so it is clear what the AAD related error is.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines