Skip to content

Conversation

@kathiehuang
Copy link
Contributor

@kathiehuang kathiehuang commented Sep 25, 2025

What does this PR do?

Adds a check to see if we are in an Azure function that is on the flex consumption plan and doesn't have the DD_AZURE_RESOURCE_GROUP env var set. If so, shut down the trace agent.

  • We updated the libdatadog Azure metadata detection logic to check for the env var similarly
  • We also plan to update the serverless-compat layers to check for these env vars too for defense in depth.

Motivation

  • Currently the aas.resource.group span attribute for functions on flex consumption plans is set incorrectly in Datadog - they're all set to "flex"
    • This is important to fix because aas.resource.id is built using aas.resource.group, and the resource id is used in billing, which needs to be accurate
    • We had to figure out what to do if a customer who is using Datadog to monitor their Azure function on a flex consumption plan doesn't set the DD_AZURE_RESOURCE_GROUP env var. Rather than handling that in libdatadog, we decided to do it at a higher level in the trace agent to inform the customer of the error while shutting down the trace agent gracefully and preventing any traces from being sent to Datadog

Jira Ticket

Describe how to test/QA your changes

  1. Update the commit hash in datadog-trace-agent/Cargo.toml
    everywhere that libdatadog is used to the most recent commit hash of the PR in libdatadog (currently d1b35ef21fff3c4588073504905081c8923bbc4b)
  2. Follow the instructions in the Serverless Compatability Layer docs to build the Rust binary
  3. Deploy an Azure function on flex consumption plan using the terraform tool, setting use_serverless_compat_local_path to true and making sure the built binary is in your python folder
  4. Hit an endpoint in your function or wait a minute for the invoker to invoke your function and check the Datadog traces for your function. Without the DD_AZURE_RESOURCE_GROUP env var, you should see no traces. Check the logs in Azure Portal - you should see an error log with the message "ERROR: Resource group not found. If you are using Azure Functions on Flex Consumption plan, please add your resource group name as an environment variable called DD_AZURE_RESOURCE_GROUP in Azure app settings."
  5. Go to Settings > Environment Variables in the Azure Portal for your function and add DD_AZURE_RESOURCE_GROUP as an environment variable with your resource group. Repeat step 4, you should see the correct resource group in the resource.group span attribute!

Expected error without DD_AZURE_RESOURCE_GROUP, no traces sent to DD:
image

With DD_AZURE_RESOURCE_GROUP - traces sent to DD with the correct resource group/id.
Screenshot 2025-09-25 at 11 39 44 AM

@kathiehuang kathiehuang marked this pull request as ready for review September 25, 2025 17:24
@lucaspimentel
Copy link
Member

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

@kathiehuang kathiehuang marked this pull request as ready for review September 25, 2025 22:14
@kathiehuang
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

@kathiehuang
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.


// Check for Azure Flex Consumption plan without DD_AZURE_RESOURCE_GROUP
if is_azure_flex_without_resource_group() {
error!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what log level does this use? what amount of logging would it cause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the pattern of how other parts of this file handles errors, but I'm a little confused because in the Azure logs, the error is being mapped to a severityLevel of 1, but Azure docs says errors should be severity level 3.
image

For amount of logging, it seems like it logs this error whenever the Azure Function app instance starts (happens a couple logs after "Initializing Warm up Extension"). But it seems like the trace agent keeps trying to restart so a lot of logs get outputted that look like these:
image

I'm a little bit confused but happy to call about this!

Copy link
Collaborator

@duncanpharvey duncanpharvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kathiehuang kathiehuang merged commit ab05b89 into main Oct 1, 2025
26 checks passed
@kathiehuang kathiehuang deleted the kathie.huang/check-env-var-for-rg branch October 1, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants