Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SB][pyamqp][OTEL] pyamqp tracing is missing values for amqpLink and amqpSession #32190

Closed
macieyng opened this issue Sep 22, 2023 · 18 comments · Fixed by #34458
Closed

[SB][pyamqp][OTEL] pyamqp tracing is missing values for amqpLink and amqpSession #32190

macieyng opened this issue Sep 22, 2023 · 18 comments · Fixed by #34458
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Messaging Messaging crew needs-team-attention This issue needs attention from Azure service team or SDK team Service Bus

Comments

@macieyng
Copy link
Contributor

  • Package Name: azure-servicebus==7.11.2
  • Package Version: azure-servicebus==7.11.2, azure-core==1.29.4, azure-core-tracing-opentelemetry==1.0.0b11, azure-monitor-opentelemetry-exporter==1.0.0b17, opentelemetry-*==1.20.0/0.41b
  • Operating System: macOS, Ubuntu, Debian
  • Python Version: 3.10

Describe the bug
We get logs like

2023-09-22 07:40:48.193 [WARNING] [opentelemetry.attributes] __init__._clean_attribute -: Invalid type NoneType for attribute 'amqpLink' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
2023-09-22 07:40:48.040 [WARNING] [opentelemetry.attributes] __init__._clean_attribute -: Invalid type NoneType for attribute 'amqpSession' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types

Looks like pyamqp distributed tracing is trying to add attributes to the span that are clearly empty.

To Reproduce
Use Azure SB with pyamqp and enable opentelemetry instrumentation. Try sending some messages.

Expected behavior
amqpLink and amqpSession is not an empty value or if it's an empty value then it's not added as attribute to the span.

Screenshots
Zrzut ekranu 2023-09-22 o 10 08 01

Additional context
As a library user I would like to know if there's an issue, but this logs would come up too often and would only increase our bill.

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Sep 22, 2023
@swathipil swathipil added bug This issue requires a change to an existing behavior in the product in order to be resolved. Service Bus Client This issue points to a problem in the data-plane of the library. Messaging Messaging crew and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that needs-triage This is a new issue that needs to be triaged to the appropriate team. labels Sep 22, 2023
@github-actions github-actions bot added the needs-team-attention This issue needs attention from Azure service team or SDK team label Sep 22, 2023
@swathipil
Copy link
Member

Hi @macieyng - Thanks for opening an issue! We'll take a look asap! Can you also provide the following info?

  • Are you using the sync or async client?
  • Do you see any other errors in the logs originating from the azure-servicebus library around the same time that this error pops up?

@swathipil swathipil added the needs-author-feedback More information is needed from author to address the issue. label Sep 22, 2023
@github-actions github-actions bot removed the needs-team-attention This issue needs attention from Azure service team or SDK team label Sep 22, 2023
@github-actions
Copy link

Hi @macieyng. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@macieyng
Copy link
Contributor Author

  1. Sync client
  2. Here are some logs query_data (17).csv

@github-actions github-actions bot added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Sep 25, 2023
@annatisch
Copy link
Member

Ah this is very interesting - thanks for reporting @macieyng!
@swathipil - this is because we are using the built-in log formatting. We may need to "pre format" all the log messages...

It should be a simple update - though it would be a relatively extensive one.

@annatisch
Copy link
Member

While there's probably other ways to resolve this, I think simply changing the log formatting might be the easiest. We will however need to test what this means for people logging with the existing parameters.
@swathipil, @kashifkhan - do you know if we documented the logging parameters in the troubleshooting guide?
I've started a draft PR here:
#32215

@annatisch
Copy link
Member

Actually after some thought and re-reading the error message - this might simply be because we are defaulting amqpSession and amqpLink to None rather than empty str:
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/servicebus/azure-servicebus/azure/servicebus/_pyamqp/_connection.py#L116

So this might be a very simple fix - but needs some further testing.

@swathipil
Copy link
Member

@annatisch - We don't have the logging parameters documented but have an issue here (#32220) to address the troubleshooting documentation during the upcoming MQ.

@macieyng - Thanks for your logs! I see a few errors in there that look like:
Invalid type NoneType for attribute 'order_id' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types and
Invalid type NoneType for attribute '***_id' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types

We're not passing order_id explicitly within the azure-servicebus library anywhere that I know of. Do you know where these logs above could be originating from?

@kashifkhan kashifkhan added the needs-author-feedback More information is needed from author to address the issue. label Sep 26, 2023
@github-actions github-actions bot removed the needs-team-attention This issue needs attention from Azure service team or SDK team label Sep 26, 2023
@github-actions
Copy link

Hi @macieyng. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@macieyng
Copy link
Contributor Author

macieyng commented Sep 27, 2023

Actually after some thought and re-reading the error message - this might simply be because we are defaulting amqpSession and amqpLink to None rather than empty str:

@annatisch

I don't agree with that. This will result in sending empty strings to analytic tools (like Application Insights) which would add more noise and increase size of ingested data (not a lot for a single request, but quite a lot if you have a lot of traffic). If you think that it's crucial for users to know that amqpSession or amqpLink is an empty value, then I would prefer this to by "null" (or any other reserved value) that is also mentioned in the docs so folks would not use it (ie for service bus sessions - I'm assuming amqpSession is the same as service bus session). From my experience, if there's no session then I don't care and probably wouldn't be using this key/attribtue/field for searching for stuff in Application Insights.

We're not passing order_id explicitly within the azure-servicebus library anywhere that I know of. Do you know where these logs above could be originating from?

@swathipil yes, this is something we're sending. I was filtering logs with regex, so that's a leftover probably. Not something you should be concerned about.

@github-actions github-actions bot added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Sep 27, 2023
@swathipil
Copy link
Member

swathipil commented Sep 29, 2023

Hi @macieyng -

Having "amqpLink"/"amqpSession" logged as either an empty string or None is useful for debugging scenarios to establish that they haven't been created yet or are being re-created. We're currently looking into whether this error might be an issue with either opentelemetry or the Azure Monitor otel exporter package. Will keep you updated.

I'm assuming amqpSession is the same as service bus session

For clarification, "amqpSession" and ServiceBusSession are two different concepts. A Service Bus session is used for ordered handling/processing of messages related by "session_id". An AMQP Session is a protocol level concept for grouping links that are used for communication, in our case, with the service.

From my experience, if there's no session then I don't care and probably wouldn't be using this key/attribtue/field for searching for stuff in Application Insights.

For your purposes of tracking a Service Bus session, you should not be tracking the "amqpSession" logging parameter. The "amqpLink/Session/Connection" are for advanced usage & debugging. Thanks.

@swathipil
Copy link
Member

After digging through and chatting with @pvaneck, it looks like this may be an issue with one of the opentelemetry packages. @macieyng - Are you using the AzureMonitorLogExporter?

@swathipil
Copy link
Member

swathipil commented Oct 2, 2023

@lzchen @jeremydvoss - Would you be able to take a look and see if this could be an issue with the azure-monitor-opentelemetry-exporter package?

Currently, we're passing in "amqpLink"/"amqpSession"=None as extra arguments when logging in the azure-servicebus package. However, we don't add these attributes to the tracing/spans in the azure-servicebus package anywhere. So, we're thinking this could be an issue with one of the other packages not handling attributes with None values in the tracing.

@lzchen
Copy link
Member

lzchen commented Oct 2, 2023

This error is generated by the upstream opentelemetry api library. This is because None attribute values are actually invalid according to the spec. @swathipil azure-servicebus may not be adding attributes to the log record directly but the opentelemetry python logging sdk adds entries in the extra argument as attributes directly on the LogRecord. This means that any values defined in extra will be subject to OpenTelemetry attribute validations. That is probably what is causing the error, and because @macieyng is most likely using the AzureMonitorLogExporter, they are capturing these errorneous log messages and they get sent to their app insights resource. There are a couple of things we can do here:

  1. Enforce OT attribute validation on all azure-sdk libraries that log messages (or at least for now, azure-servicebus
  2. Go to OpenTelemetry Python and request for configuration of what fields of a LogRecord can be omitted from becoming opentelemetry attributes
  3. (Advised to do anyway) Don't add the LoggingHandler to your root logger as this will cause telemetry collection for ALL Python loggers in your application. This will be an issue regardless of the original problem that was reported. Instead, use namespaced loggers specifically for your application and add the handler to those so you only collect and export logs related to your application.

@swathipil
Copy link
Member

Thanks for the info @lzchen!
Quick clarification on 1: How will the attribute validation be enforced for the azure-sdk libraries? Will this be through logging an error or just a written guideline somewhere? IOW, how will this validation be surfaced?

@lzchen
Copy link
Member

lzchen commented Oct 3, 2023

@swathipil

The only validation from the opentelemetry python sdk that is surfaced will be warning logs that you see above (they should never throw an exception). OpenTelemetry api uses _clean_attributes to validate OpenTelemetry attributes from a dictionary. Technically you can use that for any additional fields (specifically extra) that you place on LogRecord but I think for this specific case maybe you just need to check for None values and default them to something appropriate. As well, additionally I will to bring this up in upstream OpenTelemetry to perform validation of attributes on the additional fields as well since we don't do that today (this is a bug). The validation will still need to be performed in azure-sdk however because upstream will eventually be changed to drop invalid attributes (instead of spitting out warnings like above).

@macieyng
Copy link
Contributor Author

macieyng commented Oct 3, 2023

@swathipil

Are you using the AzureMonitorLogExporter?

Yes, we're using AzureMonitorLogExporter.

@lzchen

Don't add the LoggingHandler to your root logger as this will cause telemetry collection for ALL Python loggers in your application. This will be an issue regardless of the original problem that was reported. Instead, use namespaced loggers specifically for your application and add the handler to those so you only collect and export logs related to your application.

This is smart and when you said it it seems obvious! Should be a part of guidelines, recommendation or docs. And actually kind of aligns with the fact that we only recently have introduced our own namespace.

But then, how I would be able to tell if some of my application issues are not related to other packages I'm using. If I attach to root logger then I have a big picture of the situation when error occur in one place (UI).

Listing all packages I have in dictConfig doesn't make sense. Picking specific packages - that's still few of them, but doable.

upstream will eventually be changed to drop invalid attributes

I do agree with that!

@lzchen
Copy link
Member

lzchen commented Oct 6, 2023

@macieyng

But then, how I would be able to tell if some of my application issues are not related to other packages I'm using. If I attach to root logger then I have a big picture of the situation when error occur in one place (UI).

That is a good question. Ideally we have some way to suppress logging calls made with our exporter (this would be a feature in upstream opentelemetry). If you think this would be a useful feature, you could make a feature request to configure to ignore loggers in a logger tree that a handler is added to.

Depending on your project, you can probably rearrange your folder structure to take advantage of python logger's hierarchy structure with namespace to capture only the logs you want.

@swathipil
Copy link
Member

Hi @macieyng - Sorry for the delay on this. The fix will be included in the upcoming release. Thanks!

@github-actions github-actions bot locked and limited conversation to collaborators May 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Messaging Messaging crew needs-team-attention This issue needs attention from Azure service team or SDK team Service Bus
Projects
None yet
6 participants