Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI (Internal) Errors when using version 2.8.0 or 2.8.1 in Azure Functions #973

Closed
Royhoey opened this issue Nov 5, 2018 · 23 comments
Closed
Assignees
Labels

Comments

@Royhoey
Copy link

Royhoey commented Nov 5, 2018

When using TelemetryClient in an Azure Function, AI internal errors are added in application insights.
This does not occur when downgrading to 2.7.2.

Repro Steps

  1. Create a Function App with Application Insights enabled in Azure
  2. Create a queue 'trigger-test' in the storage account that is created with the function app
  3. Create an Azure Function in Visual Studio 2017
[FunctionName("Function1")]
public static void Run([QueueTrigger("trigger-test", Connection = "AzureWebJobsStorage")]string message, ILogger log, ExecutionContext context)
{
        var config = new ConfigurationBuilder()
	.SetBasePath(context.FunctionAppDirectory)
	.AddJsonFile("local.settings.json", true, true)
	.AddEnvironmentVariables()
	.Build();

	log.LogInformation($"Starting {message}");
	var telemetryClient = new TelemetryClient(TelemetryConfiguration.Active);
        log.LogInformation("Created tel client");
}

Project file:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>netcoreapp2.1</TargetFramework>
    <AzureFunctionsVersion>v2</AzureFunctionsVersion>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Microsoft.ApplicationInsights" Version="2.8.1" />
    <PackageReference Include="Microsoft.NET.Sdk.Functions" Version="1.0.23" />
	  <PackageReference Include="Microsoft.Azure.WebJobs" Version="3.0.0" />
	  <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.ServiceBus" Version="3.0.0" />
	  <PackageReference Include="Microsoft.Azure.WebJobs.ServiceBus" Version="2.2.0" />
	  <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Storage" Version="3.0.0" />	
  </ItemGroup>
  <ItemGroup>
    <None Update="host.json">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </None>
    <None Update="local.settings.json">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
      <CopyToPublishDirectory>Never</CopyToPublishDirectory>
    </None>
  </ItemGroup>
</Project>
  1. Publish the function to the Function app
  2. Add a message to the trigger-test queue
  3. Check application insights

Actual Behavior

Traces with AI Internal Errors:
AI (Internal): ERROR: Exception in Command Processing for EventSource Microsoft-ApplicationInsights-Core: An instance of EventSource with Guid 74af9f20-af6a-5582-9382-f21f674fb271 already exists.
and
AI (Internal): ERROR: Exception in Command Processing for EventSource Microsoft-ApplicationInsights-Core: Object reference not set to an instance of an object.

appinsights error

appinsights error 2

Expected Behavior

No AI internal errors

Version Info

Azure Function runtime version: 2.0.12134.0 (~2)
.NET Version : .NET Core 2.1

@Royhoey Royhoey changed the title AI (Internal) Errors when using version 2.8.1 in Azure Functions AI (Internal) Errors when using version 2.8.0 or 2.8.1 in Azure Functions Nov 5, 2018
@TimothyMothra
Copy link
Member

Hello @Royhoey, thank you for reporting this.

As a workaround, please continue to use 2.7.2.

What you're seeing is expected behavior, but obviously not customer friendly.
Behind the scenes, Azure Functions is running the 2.7.2 SDK. (More details here)

The error you're seeing An instance of EventSource with Guid ### already exists occurs because in your app both 2.7 and 2.8 are trying to register EventSource listeners.
We need to improve our documentation around this.

@lmolkova would be able to provide more information when she is back in the office next week.

@lmolkova
Copy link
Member

Having several sub-applications n the same process is a valid customer scenario. It reproduces in functions and in ServiceFabric.

Unfortunately, RichPayloadEventSource attempts to create EventSource each time ApplicationInsights is initialized

https://github.com/Microsoft/ApplicationInsights-dotnet/blob/f47679b87f978fc5769f74a15368890c42a6d955/src/Microsoft.ApplicationInsights/Extensibility/Implementation/RichPayloadEventSource.cs#L371

https://github.com/Microsoft/ApplicationInsights-dotnet/blob/f47679b87f978fc5769f74a15368890c42a6d955/src/Microsoft.ApplicationInsights/Extensibility/Implementation/RichPayloadEventSource.cs#L395

So I believe this is a bug and fix is to have static event source and never dispose it. It requires some testing as it might not be appropriate solution. @pharring do you know if this approach could work?

@lmolkova lmolkova added the bug label Nov 20, 2018
@pharring
Copy link
Member

@lmolkova The expectation is that the RPES is a singleton -- at least per AppDomain. Accessed through the Log static field. That appears to be the case already.
For this case, do we have two different versions of the AI SDK loaded in a process? Are they in different AppDomains?

@pharring
Copy link
Member

BTW, this only accounts for one of the error messages from the screenshot.
There's a NullReferenceException inside LogVerbose and at least one other EventSource listed.
I suspect these all come down to having two SDKs loaded in the same process. We need to harden against that.

@lmolkova
Copy link
Member

lmolkova commented Nov 20, 2018

The expectation is that the RPES is a singleton -- at least per AppDomain. Accessed through the Log static field. That appears to be the case already.

Indeed, this is the case.

I'll try to repro it and check what happens. AFAIK there is no multi app-domain code in functions. Before function V2 there was no supported way of having explicit AI SDK version different than one implicitly referenced by functions host. Perhaps functions do some magic to load explicit version along with implicit one - I'll check.

@lmolkova lmolkova self-assigned this Nov 20, 2018
@lmolkova
Copy link
Member

I've experimented with it and was able to repro the issue.

Apparently, there are two versions of AppInsights loaded in the same appdomain

Microsoft.ApplicationInsights, Version=2.7.2.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35, D:\Program Files (x86)\SiteExtensions\Functions\2.0.12180\32bit\Microsoft.ApplicationInsights.dll

Microsoft.ApplicationInsights, Version=2.8.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35, D:\home\site\wwwroot\bin\Microsoft.ApplicationInsights.dll

@brettsam @fabiocav is it expected? It should be a common problem for all libs having event sources (i.e. almost all Azure Sdks), so it means some features or diagnostics capabilities are still breken when customer brings their own version of dependency. Am I missing something?

@lmolkova
Copy link
Member

AFAIK everything works as expected, except showing noisy error messages. The feature that would be affected is service profiler, but I think we don't have it for functions. @pharring is it correct?

@Royhoey do you see any issue except messages? As a workaround, can you downgrade explicit AppInsights to 2.7.2?

@fabiocav
Copy link

Host dependencies are isolated from extensions/user dependencies. If you examine the two assemblies, they'll be loaded in different assembly load contexts.

We have automatic unification, but will never downgrade a user/extension assembly to match the host. We do have set of assemblies that we always unify to host versions (if within the same major), but those are primarily framework, ASP.NET and WebJobs core assemblies. We could consider adding AI to this, but we'd essentially be snapping customers to what the host has.

@lmolkova
Copy link
Member

Thanks for the explanation, @fabiocav!

So what happens is that we have two different types of RichPayloadEventSource defined in two different ApplicationInsight dlls and both define the same event source, which is not allowed.
It does not reproduce if user dependency has lower or same version of AppInsights.

For now the resolution is: functions do not support bringing explicit ApplicationInsights dependency with version higher than host has.
It's natural to install the leatest stable, and since there is no obvious way of knowing which version host uses, we'll get more issues like this.

It seems that loading two different versions of assembly may lead to unpredictable issues similar to this one and there is nothing specific to ApplicationInsights here.
@fabiocav, have you heard similar concerns for other extensions in V2?

@fabiocav
Copy link

We haven't had similar issues.

From a runtime (CLR) perspective, those are essentially two distinct types. This pattern will become more common with other hosts with similar requirements adopt the new isolation features in .NET Core.

What you've confirmed is what I mentioned above; we'll unify within safe ranges, but we'll never downgrade the user. We could add a unification policy for AI, but currently, the behavior means that users would run into this if they're referencing something newer than the runtime.

Is the side effect here just failures when with the second event source and noise related to that? Any other impact?

@lmolkova
Copy link
Member

right, as far as I can tell, there is no impact except the noise so we can tolerate it for now.

@Royhoey
Copy link
Author

Royhoey commented Nov 22, 2018

@lmolkova
AppInsights works fine, it's just annoying that it includes all the internal errors.
For now I'm using 2.7.2 in my project as a workaround.

@oluatte
Copy link

oluatte commented Jan 11, 2019

Hey @lmolkova, @fabiocav

When we say we are tolerating this for now? Can you confirm that this is temporary and will be fixed in the future?

I ask because this noise is only tolerable in small volume scenarios. We sometimes execute 20+ million functions while ingesting a batch of data. If we're getting this internal traces on every one of them that's a ton of noise that we have to wade through (and that the client has to pay for).

The traces don't have any category so we can't even filter them out in App Insights.

Right now it sounds like the only work around is to downgrade and that's fine in the short term but not long term.

@lmolkova
Copy link
Member

@mayoatbm
we do not have a good solution to solve the issue: there are two instances of ApplicationInsights in the process and they fight for resources - for the event source which has to be unique within the process and does not respect .NET isolation boundaries.

For now, the options are:

  1. use the same version of AppInsihgts that Functions use (or lower)
  2. Filter out internal SDK messages. You can do this by creating and configuring telemetry processor. You can have a filter that checks if telemetry is TraceTelemetry and traceTelemetry.Context.Operation.SyntheticSource is SDKTelemetry

We'll simplify p2 a bit and will expose a configuration that disables SDK error collection.

@oluatte
Copy link

oluatte commented Jan 14, 2019

@lmolkova

Thanks for the fast response. That sounds like a thorny problem for sure. A few followup questions/comments?

  • Will the upcoming dependency injection work improve the situation at all? I know there are lots of requests for access to the telemetry client, access to request telemetry, initializers, distributed tracing via parentid etc. There's a lot of app insights integration work left to do in functions (i've discussed some issues with @brettsam in the past). And having two instances of Application Insights seems like it is a blocker for all these features in the long run.
  • Do telemetry processors currently work in Azure Functions v2? I know initializers are temporarily broken but I don't know if processors ever worked.
  • I think the internal sdk telemetry should be disabled by default. Right now it doesn't grant any benefits to customers (what do we do with these AI internal errors) and I'm not even sure if the AI engineers have access to this telemetry. If it's disabled by default, customers don't have to pay for both processing time and storage of all the messages they will never use and they can turn it on if there's a specific issue.

Thanks again.

@lmolkova
Copy link
Member

lmolkova commented Jan 14, 2019

@mayoatbm

Thanks for the feedback.

And having two instances of Application Insights seems like it is a blocker for all these features in the long run.

I agree. However always upgrading to user's version of ApplicationInsights could also be problematic. We'll consider different options, but there is nothing planned immediately to solve it.

Do telemetry processors currently work in Azure Functions v2?

Both initializers and processors work in V2. What does not work is the ability to access Functions's DI container (Azure/azure-functions-host#3731)
You can always control the TelemetryConfiguration you create explicitly (or TelemetryConfiguration.Active) and configure processors on it. I expect that functions configure their TelemetryConfiguration first and then errors are coming from the custom one you've created.

I think the internal sdk telemetry should be disabled by default.

I agree that users should not pay for our bugs. By default there should be no such errors - only some critical issues appear there and these errors intend to indicate something useful for users: problems with configuration or custom code: e.g. some messages indicate that operation was not tracked because of erroneous parameters - this is sent instead of the telemetry and is actionable for users. These errors substitute exceptions as we cannot disrupt normal application flow, but still have to communicate them somehow to users.
We have some known issues that may cause ~dozen of AI internal messages per app lifetime (e.g. #1029) and we are working on fixing it, but, unfortunately, it takes time.
We (AI engineers) have access to these messages and we monitor the number of customers affected and new error appearances.

So, these errors are generally useful and we still want to send them making sure they are actionable and do not create noise. When something unexpected happens, we want to let you disable this feature to minimize the impact.

@oluatte
Copy link

oluatte commented Jan 15, 2019

@lmolkova

Thanks for responding and for being patient with my questions. These are good answers and i feel better about the current situation.

We're pretty big users of Azure Functions & App Insights together and we are really looking forward to getting over that integration hump as it would improve our monitoring so much.

Thanks again.

TimothyMothra pushed a commit that referenced this issue Oct 25, 2019
* update base to beta and related changes
* read correlation context always
@pksorensen
Copy link

This is an older post, what version does Functions SDK 1.0.29 use? I dont know which version to downgrade to.

@tedterrill
Copy link

@pksorensen

This is an older post, what version does Functions SDK 1.0.29 use? I dont know which version to downgrade to.

Not sure if you figured this out, but in case someone else has the same problem, I used v2.9.1 and seems to be working for me. I happened upon the page below which seems to confirm they are now using 2.9.1 sdk in Azure v2 functions.

https://docs.microsoft.com/en-us/azure/azure-monitor/app/azure-functions-supported-features#supported-features

@cijothomas
Copy link
Contributor

@pksorensen @tedterrill Please take a dependency on WebJobs SDK, and not directly on ApplicationInsights. This means you'll never have to worry about which version of application insights sdk to use.
#1507 (comment)

@MaxiPigna
Copy link

I have the same problem using Microsoft.NET.Sdk.Functions 3.0.3. How can I solve the problem for Azure Function v3?

@tntwist
Copy link

tntwist commented Jun 8, 2020

Still getting this exceptions with Microsoft.NET.Sdk.Functions 3.0.7

@TimothyMothra
Copy link
Member

Please report these issues to the Azure Functions repo (https://github.com/Azure/azure-webjobs-sdk).
They own the integration and we support them.

@brettsam FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests