Skip to content

MSI Latency #3290

@Ro3A

Description

@Ro3A

Description

We are experiencing significant latency and frequent timeouts on the token acquisition step when using connection string with authentication type set to ActiveDirectoryMsi.

This is especially impactful Azure Functions on consumption plans as they often have no pre-warmed instance retaining cached tokens. We have also experienced this with Azure App Services and local development (Visual Studio Credential, Visual Studio Code credential, User-Assign identity, DefaultAzureCredential with most auth types disabled).

We ensured this slowness was not due to cold starts. Our logs indicate initial response time to a trigger is nearly instant or within a reasonable duration after a cold start. The traces shown below occur well after the function responds to the trigger and show the elapsed time for authenticating and opening a connect to Azure SQL database using MSI auth connection string:

Image

To reproduce

Create a python function that connects to a SQL Azure database via pyodbc library, using MSI authentication in the connection string. I'm unable to provide a complete repro however, the code below is our implementation:

def create_sql_alchemy_engine(self) -> Engine:

        """
        Creates and returns a SQLAlchemy engine based on the environment configuration.

        Returns:
        - engine: SQLAlchemy engine object.
        """
        # Check if Managed Identity is available
        msi_available = os.getenv('MSI_ENDPOINT') and os.getenv('MSI_SECRET')

        # Create credentials based on the availability of Managed Identity
        if msi_available:
            logging.info("Creating SQLAlchemy Engine using Managed Identity")
            
            # Create the connection string
            conn_str = f'DRIVER=ODBC Driver 17 for SQL Server;' \
                    f'SERVER={self.server};' \
                    f'DATABASE={self.database};' \
                    f'Authentication=ActiveDirectoryMsi'
            
            # Create the pyodbc connection
            conn = pyodbc.connect(conn_str)
            
            # Create the SQLAlchemy engine
            engine = create_engine('mssql+pyodbc://', creator=lambda: conn)
        else:
            logging.info("Creating SQLAlchemy Engine using Local Credentials")
            engine = create_engine(
                f'mssql+pyodbc://{self.LDWusername}:{self.LDWpassword}@{self.server}/{self.database}?driver=ODBC Driver 17 for SQL Server'
            )

        return engine

Expected behavior

Based on documentation found here: https://learn.microsoft.com/en-us/sql/connect/ado-net/sql/azure-active-directory-authentication?view=sql-server-ver16

Authentication with Managed Identities for Azure resources is the recommended authentication method for programmatic access to SQL. A client application can use the system-assigned or user-assigned managed identity of a resource to authenticate to SQL with Microsoft Entra ID, by providing the identity and using it to obtain access tokens. This method eliminates the need to manage credentials and secrets, and can simplify access management.

Given that we experience this issue locally, in App Services, Azure functions, in both .net core and Python, this issue needs to be addressed promptly if it is to remain Microsoft's recommended approach. We have also seen token acquisition delays when fetching tokens manually as well, so I'm unconvinced that even explicit acquisition would resolve the issue, which is the current recommended alternative.

Token acquisition should be in the millisecond range as it is for most other types such as obtaining tokens from App Registrations from javascript, etc.

Additional context

I have confirmation from Microsoft support that this is an issue within the Microsoft stack and have already spent several hours working with that team to prove it. The following screenshot is their conclusion and recommended work around.

Image

Azure/azure-sdk-for-net#26584
#1403
Azure/azure-functions-docker#1008

It appears each of the Github bugs support linked to are either already closed due to lack pf participation or are likely to be. My hope is that this Bug can be used to consolidate and represent those, and that I've provided sufficient level of detail such that my own continued participation is minimal and I can use this to track Microsoft's acknowledgement of the issue and their progress resolving it.

Impact

Aside from the additional compute time due to slow performance we're currently experiencing, we have a significant Infrastructure project with a non-trivial component related to configuration management and a reduction of secrets. Using System-Assigned identities is a key component of that effort especially with regards to SQL Server connections. We cannot achieve the desired state unless this token issue is resolved.

The work around provided above is sufficient for the functions that are experiencing this issue. However, we have a substantial number of other applications, APIs, and Functions that would require code changes to employ this workaround rendering it unfeasible. Those apps use connection string values set in ADO pipelines and it's quite easy for us to enable Identity on these Azure Resources and update the connection string to use ActiveDirectoryMsi instead.

Metadata

Metadata

Assignees

Labels

External 🔗Issue is in an external component

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions