Azure Functions – Observability #9273

RohitRanjanMS · 2023-05-13T08:15:20Z

Azure Function has robust integration with Application Insights to monitor Functions execution. Along with Application Insights, Function also supports integration with other APM systems through Azure Monitor integration. Logs and metrics are also sent to Geneva/kusto for debugging/diagnostic and billing purposes. These integrations allow developers to gain insights into the performance and behavior of their functions, as well as diagnose issues and troubleshoot problems.

Logs, Traces and Metrics are primarily consumed by:

Microsoft - supportability and billing purpose (only platform logs and metrics).
Customers

In Azure Functions, the host process is responsible for capturing and processing all logs, including logs generated by the worker process. The host process then sends these logs to all configured sinks, which can include Application Insights, Azure Monitor, File, Kusto and Log Streaming.
The host process in Azure Functions captures logs from several sources, including:

Platform logs (Host): These are logs generated by the Azure Functions host process itself, which provide information about the startup and shutdown of the host process, as well as any errors or warnings encountered during execution.
Platform logs (worker): These are logs generated by the worker process.
Function logs: These are logs generated by the customer code that runs in the worker process. These logs can be captured using the logging framework provided by the language runtime, such as the ILogger interface in .NET.
Dependency logs: These are logs generated by dependencies used by the function code, such as database drivers, HTTP clients, or messaging libraries.

Challenges and Gaps

While Azure Functions offers built-in integration with Application Insights for monitoring and logging, this can make it difficult to use other third-party Application Performance Management (APM) providers. This lack of flexibility can be challenging for organizations that have already invested in other APM solutions.
The host process in Azure Functions is responsible for handling all logs, including logs from the worker process, and forwarding them to the appropriate sinks. This can limit the flexibility and control that developers have over logging and tracing in their applications.
Some of the logs/traces are emitted by the Host e.g. Trigger and binding related traces, linked activities.
Azure Functions does not currently offer native support for OpenTelemetry, a popular observability framework that enables developers to collect, analyze, and export telemetry data from their applications. Some of the logs and traces in Azure Functions are emitted by the host process, such as metrics, platform logs and trigger/binding related traces. There is no way for a customer to export these using OpenTelemetry.

Goals

Adopt OpenTelemetry: Azure Functions plans to adopt OpenTelemetry to enable developers to collect, analyze, and export telemetry data from their applications.
Follow FaaS specification: All the logs, traces, and metrics emitted by Azure Functions will follow the Function-as-a-Service (FaaS) specification, enabling developers to use a variety of observability tools and services.
Empower customers: Azure Functions will empower customers to take control of their observability by allowing them to define their own configuration and exporter.
Default option: Azure Functions will provide a default option for telemetry data for customers who are not interested in managing it themselves. The default option will send all the telemetry data to Application Insights.

Proposal

Customers have complete control over the telemetry data from the worker process and can configure the OpenTelemetry Protocol (OTLP) exporter to send this data to any supported endpoint.
Customer may be interested in some of the telemetry data emitted by the host. They can also choose to export host telemetry data to an OTLP-supported endpoint.
As part of the default option, Azure Functions will send both worker and host telemetry data to Application Insights.

Step Back

File system logging

The initial implementation of file system logging in Azure Functions was done because there was no other option to log/stream data at that time. However, the plan is to stop supporting file system logging because it does not add much value, has significant overhead, and can potentially cause other issues.
Going forward, Application Insights will be the only option to stream logs on the portal and other client tooling.

Azure monitor Integration

Azure Monitor has both platform logs (generated by the Azure Functions platform) and customer logs (generated by the customer's functions code). This was very helpful as there was no other supported way to export telemetry data to an external ecosystem.
This requires Microsoft to process and store customer data in a secure manner, which can be a potential security risk. The plan is to stop processing customer logs through Azure monitor integration.
By separating the customer logs from platform logs, the customer will have complete control over their data, including where it is stored and who has access to it. Customers will have the option to configure their Application Insights resource to forward logs to a Log Analytics Workspace. This approach will provide customers with richer experience and will also prevent Microsoft from processing sensitive data.

Telemetry Initializers and Processors

Azure Functions InProc model does provide out of the box support to enrich/override telemetry with additional information. As both, the Host process and the customer code, are running in the same process any initializers and processers if implemented, applies to the logs and traces emitted in both host and customer code.
In the proposed solution, as the host and worker will have an independent integration with the ApplicationInsights, the host OTLP extension won’t support initializers and processors. This is also true for the Isolated Application Insights Package.

jviau · 2023-05-15T18:10:17Z

Going forward, Application Insights will be the only option to stream logs on the portal and other client tooling.

Can you elaborate on this a bit more? Have we evaluated / ruled out other solutions? What does application insights give us here that we cannot solve otherwise?

martinjt · 2024-01-19T22:20:30Z

the host OTLP extension won’t support initializers and processors

Can you clarify what you mean by this. Are you saying that we won't be able to implement head sampling, add custom attributes, Add custom authentication to the OTLP exporter? That seems like it won't really be that useful if it's limited to static sampling, and we can add additional context, that's the core of what makes OpenTelemetry and Observability so useful and inexpensive compared to APM.

RohitRanjanMS · 2024-01-19T23:56:02Z

Hi @martinjt , there won't be any "programmatic/script approach" to add processors and initializers. Our intent is to expose most of the Options through config-based approach. For example, Meter configuration, logging configuration and OtlpExporterOptions. We will also extend the config to support out of the box samplers and processors. Obviously, anything custom would be a challenge.
This is on the host side; users will have full control on the worker.
We have just started the work, let me find some opportunity to collaborate with the wider community and get feedback.

martinjt · 2024-01-20T00:12:59Z

That's my point really, custom head sampling is a key part of what makes tracing manageable, without that ability, we're hindering the users.

Random/probalistic sampling isn't sufficient.

Head sampling like probalistic sampling for routes like healthchecks or production, then no sampling for others, and full sampling for some more. Those are the kinds of rules you need at scale.

Then there's the ability to add additional context from the http context (like adding some product information for a call to get a product).

Finally, the ability to add span events for exceptions and validation type errors.

Both of those we'd be forcing users to create additional spans (using more data), just so they can add more context.

This feels like it doesn't fulfil the usecase that users need, beyond the probabilistic sampling and server sampling AppInsights has implemented.

It seems like it may be better to make all the information available to user code so they can have full control of the spans, rather than only preperscribed configuration.

Right now, this will be a weird, and subpar, experience for people coming from asp.net core or literally anywhere else that tracing is support in or out of the .net ecosystem.

jviau · 2024-01-22T23:10:03Z

@martinjt is your concern with in-proc dotnet or out of proc? I believe @RohitRanjanMS 's comments primarily apply to out of proc.

For in-proc, we will need to evaluate this work and how customers will be able to hook into the OTel SDK.

For out-of-proc, customer code is not expected to be part of the host process and we will not be changing that for OTel. In this mode it is better to view the host as its own platform - we will be providing customers with a curated set of telemetry and configurability. You will have full control of the worker process though, including the OTel SDK. We are still working on what the host telemetry looks like here.

martinjt · 2024-01-23T00:06:35Z

This is all to do with isolated. InProc has different problems related to the hard coding of System.Diagnostics versions.

The issue is the assumption that the "server" spam is generated in the function host, and this is the one that contains information like http routes, urls, etc. That the user code doesn't. One such thing is routePrefix which apparently only available to the host runtime.

If the idea is that a user should use the server spans from the host runtime, it breaks the way that user should be doing tracing, including head sampling, and augmenting spans with data needed (like taking headers, or values from a database so spans can be filtered and queried) and various other things as I mentioned above.

If there is a mechanism for them to ignore those spans, turn them off, and then have access to all the same data to generate them in user code, then they'll still be able to do all that, but if there is information needed to generate an OTel compliant Server span for HTTP that isn't available other than to the host, people can't use otel with functions, which is a problem.

jviau · 2024-01-23T00:28:15Z

We will be improving that story as well, it just won't be solved by giving access to the OTel SDK in the host. We are still designing it, but it will involve improving the spans the host emits, and how spans flow to the worker. For HTTP specifically, we are moving towards a reverse-proxy model (GA'd for dotnet isolated), so the worker will have a more robust HTTP experience - including access to more information for its own spans. (I can't speak for other language workers here though).

The way I see it is the host and worker are two pieces of a distributed system. Much like you do not get to minutely configure telemetry for other reverse proxies, the same will be for the host. We will provide our own set of telemetry with select configurability. But the bulk of the telemetry you, the end user, rely on should come from the worker process.

I know this is not how it is today with Functions, but it is what we would like to move towards.

lsl-2020 · 2024-03-14T08:59:33Z

Hi @jviau and Azure Function team, is there any updates on this one or somewhere to track the process? Last time @RohitRanjanMS mentioned that its Public Preview is expected in April/May. I just hope to check if we're now on track for this ETA. Thanks.

And to justify our team's needs here in case there are other ways to achieve them: We need to pass trace context from parent function (say an orchestrator function) to its child function (say a sub-orchestrator function). My understanding is there are two ways:

Make host traces accessible to be exported by OTel (being covered in current thread), so we could simply leverage existing "DurableTask.Core" ActivitySource by subscribing to it.
When building StreamingMessage for MsgType.InvocationRequest, make sure it collects trace context from Acvitity.Current and so when later building InvocationRequest and FunctionContext, they can reuse the trace context in their TraceContext.

My team is migrating to isolated-worker mode and we are eagerly looking forward to your suggestions on this issue to build an end-to-end distributed tracing path compatible to OpenTelemetry. Much appreciated!

BigMorty · 2024-03-20T18:48:55Z

Hi @lsl-2020, I am program manager working with @RohitRanjanMS and @jviau on the OTel integration in Azure Functions. We are very actively working on this. I can't say exactly when we will have a preview but would say late-May/June is a rough timeframe, unless we run into unknow issues of course.

RohitRanjanMS self-assigned this May 13, 2023

ghost added the Needs: Triage (Functions) label May 13, 2023

ghost assigned kshyju May 13, 2023

This was referenced Jun 21, 2023

ApplicationVersion doesn't seem to be uniformally available in Application Insights requests Azure/azure-functions-dotnet-worker#1292

Closed

Not possible to set cloud_RoleName Azure/azure-functions-dotnet-worker#655

Closed

This was referenced Jan 19, 2024

OpenTelemetry.Instrumentation.AspNetCore Does Not Work as Expected in Azure Function Isolated-Worker Mode open-telemetry/opentelemetry-dotnet#5232

Closed

Ability to export Prometheus Metrics on a function project open-telemetry/opentelemetry-dotnet#4283

Open

jviau mentioned this issue Apr 3, 2024

Distributed Tracing between different Azure Function Instances Azure/azure-functions-dotnet-worker#1492

Closed

ngu-khoi mentioned this issue Apr 5, 2024

Application Insights Durable Function Distributed Tracing not working Azure/azure-functions-durable-extension#2662

Open

jviau mentioned this issue Apr 8, 2024

Improved Telemetry Processing in Azure Functions #9961

Open

This was referenced Apr 9, 2024

OpenTelemetry feature #9966

Closed

OpenTelemetry support #9985

Merged

ejizba mentioned this issue Apr 11, 2024

Support open telemetry Azure/azure-functions-nodejs-library#245

Open

RohitRanjanMS closed this as completed in #9985 Apr 13, 2024

microsoft-github-policy-service bot removed the Needs: Triage (Functions) label Apr 13, 2024

lsl-2020 mentioned this issue Apr 26, 2024

Distributed Tracing GA Plan Azure/azure-functions-durable-extension#2071

Open

12 tasks

ejizba mentioned this issue May 4, 2024

feat: Add Azure Functions instrumentation open-telemetry/opentelemetry-js-contrib#2177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure Functions – Observability #9273

Azure Functions – Observability #9273

RohitRanjanMS commented May 13, 2023

jviau commented May 15, 2023

martinjt commented Jan 19, 2024

RohitRanjanMS commented Jan 19, 2024

martinjt commented Jan 20, 2024

jviau commented Jan 22, 2024 •

edited

martinjt commented Jan 23, 2024

jviau commented Jan 23, 2024 •

edited

lsl-2020 commented Mar 14, 2024 •

edited

BigMorty commented Mar 20, 2024

Azure Functions – Observability #9273

Azure Functions – Observability #9273

Comments

RohitRanjanMS commented May 13, 2023

Challenges and Gaps

Goals

Proposal

Step Back

File system logging

Azure monitor Integration

Telemetry Initializers and Processors

jviau commented May 15, 2023

martinjt commented Jan 19, 2024

RohitRanjanMS commented Jan 19, 2024

martinjt commented Jan 20, 2024

jviau commented Jan 22, 2024 • edited

martinjt commented Jan 23, 2024

jviau commented Jan 23, 2024 • edited

lsl-2020 commented Mar 14, 2024 • edited

BigMorty commented Mar 20, 2024

jviau commented Jan 22, 2024 •

edited

jviau commented Jan 23, 2024 •

edited

lsl-2020 commented Mar 14, 2024 •

edited