Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Public cloud regionalization breaks existing services before 4.49.x #4022

Closed
1 of 7 tasks
tomkerkhove opened this issue Mar 22, 2023 · 9 comments
Closed
1 of 7 tasks
Assignees
Labels
Feature Request ICM This issue has a corresponding ICM, either for our team or another. scenario:DaemonApp
Milestone

Comments

@tomkerkhove
Copy link

Logs and network traces
Without logs or traces, it is unlikely that the team can investigate your issue. Capturing logs and network traces is described in Logging wiki.

Which version of MSAL.NET are you using?

4.49.1

Platform

What authentication flow has the issue?

  • Desktop / Mobile
    • Interactive
    • Integrated Windows Authentication
    • Username Password
    • Device code flow (browserless)
  • Web app
    • Authorization code
    • On-Behalf-Of
  • Daemon app
    • Service to Service calls

Other?

Relates to #3252

Is this a new or existing app?

The 1P service is in production, and we have upgraded to a 4.49.1 or above of MSAL. This is breaking customers which use VNETs as they have to allow the traffic.

Repro

builder.WithAzureRegion(region);

Expected behavior

Not break customers.

Actual behavior

Customers have to change network to allow traffic to be sent to new regional endpoints.

Possible solution

Make the DNS suffix configurable so that 1P services can use later version of the package; but still use the old DNS suffix.

For example:
builder.WithAzureRegion(region, regionalHostSuffix: "r.login.microsoftonline.com")

@tomkerkhove tomkerkhove changed the title [Bug] Public cloud regionalization breaks existing services before 4.49 [Bug] Public cloud regionalization breaks existing services before 4.49.x Mar 22, 2023
@bgavrilMS
Copy link
Member

bgavrilMS commented Mar 22, 2023

Hi @tomkerkhove - can you please contact us via email so that we can add ESTS-R folks reponsible for the DNS? bogavril

@bgavrilMS bgavrilMS closed this as not planned Won't fix, can't repro, duplicate, stale Mar 22, 2023
@bgavrilMS bgavrilMS reopened this Mar 30, 2023
@bgavrilMS
Copy link
Member

After discussions with the ESTS-R team, we have reched the conclusion that we cannot enforce the new alias, as some end-app devs block it and it is outside our control.

We need a mechanism to:

  • perform the regional call on <region>.microsoft.login.com (like today)
  • catch exception and log
  • retry with <region>.r.login.microsoftonline.com

I propose we use either HttpClient extensibility or OnBeforeTokenRequest to achieve this. https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/blob/main/src/client/Microsoft.Identity.Client/Extensibility/AbstractConfidentialClientAcquireTokenParameterBuilderExtension.cs#L23

MSAL team to provide a POC.

@bgavrilMS bgavrilMS added scenario:DaemonApp Feature Request ICM This issue has a corresponding ICM, either for our team or another. labels Mar 30, 2023
@bgavrilMS bgavrilMS added this to the 4.52.1 milestone Mar 30, 2023
@bgavrilMS bgavrilMS self-assigned this Mar 30, 2023
@rayluo
Copy link
Contributor

rayluo commented Mar 31, 2023

We need a mechanism to:

* perform the regional call on `<region>.microsoft.login.com` (like today)

Typo. You meant to say <region>.login.microsoft.com.

I propose we use either HttpClient extensibility or OnBeforeTokenRequest to achieve this.

This change is also applicable to other confidential MSALs, isn't it? Some may not have OnBeforeTokenRequest().

Why did the original proposal builder.WithAzureRegion(region, regionalHostSuffix: "r.login.microsoftonline.com") not make it to the final suggestions? That sounds more generic.

@tomkerkhove
Copy link
Author

While that approach would work, it does not stimulate to move away to the new DNS name nor does it easily allow you to gain insights if the new DNS entry became available unless you made a code change.

The proposed approach above, however, does give you that by doing a failover to the old DNS name in case the new one is not reachable.

@rayluo
Copy link
Contributor

rayluo commented Mar 31, 2023

(the suffix parameter) does not stimulate to move away to the new DNS name

Good point.

doing a failover to the old DNS name in case the new one is not reachable

Will that work well, though? Before that failover happens, an attempt on an unreachable domain name could hang indefinitely.

Besides, if "some end-app devs block it and it is outside our control", those app devs would control which domain suffix would work. Does that mean at any given time, there would be one and only one suffix that works? Then we might as well also let those app devs to control which suffix to use accordingly, in a deterministic way.

@tomkerkhove
Copy link
Author

Besides, if "some end-app devs block it and it is outside our control", those app devs would control which domain suffix would work. Does that mean at any given time, there would be one and only one suffix that works? Then we might as well also let those app devs to control which suffix to use accordingly, in a deterministic way.

I hope not because that means platform builders such as Azure API Management have to do code changes given customers own the networking and they can change it without us changing the code.

Hence why our preference is to go with the fallback approach.

Will that work well, though? Before that failover happens, an attempt on an unreachable domain name could hang indefinitely.

We could set a time threshold after which we consider it as failed though, no?

@pmaytak
Copy link
Contributor

pmaytak commented Apr 1, 2023

Will that work well, though? Before that failover happens, an attempt on an unreachable domain name could hang indefinitely.

We could set a time threshold after which we consider it as failed though, no?

Is the proposal here to do a failover per each acquire token request? We allow users to pass a cancellation token with a timeout, so I think adding our own timeout and then figuring out which endpoint works adds complexity.

If the assumption is that whichever regional endpoint is valid during the app's existence is constant, we can do some sort of a ping during the initial regional discovery. Then cache the result of which regional endpoint is available statically. We already do an IMDS call with a 2 second timeout. So this would be similar.

@bgavrilMS
Copy link
Member

bgavrilMS commented Apr 3, 2023

@tomkerkhove - here's a sample implementation using MSAL's existing HttpClient extensibility. Let me know what you think.

#4059

@bgavrilMS
Copy link
Member

Closing this as resolved, please re-open if more guidance is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request ICM This issue has a corresponding ICM, either for our team or another. scenario:DaemonApp
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants