RFC: Support for external observability providers - Tracer #2030

Vandita2020 · 2023-03-21T23:29:03Z

Is this related to an existing feature request or issue?

Issue: #1433
Logger RFC: #2014
Metrics RFC: #2015

Which AWS Lambda Powertools utility does this relate to?

Tracer

Summary

This RFC is one of the three that defines the format when setting up loggers, metrics and traces for better integration with other observability providers.

This RFC is specifically for the Tracer. Currently, we have undocumented BaseProvider for Tracer, but we need to decide more on what minimum features the BaseProvider should support. The RFC discusses on the features that could be a part of custom tracer for users to integrate other Observability providers easily.

Use case

The use case for this utility would be for developers who want to use other observability providers to trace their application, other than AWS X-Ray.

Proposal

Current tracer experience

The Powertools’ tracer utility is essentially a wrapper for the AWS X-Ray SDK. Some key features of this utility include auto capturing cold start as annotation, auto capturing responses or full exceptions as metadata, and auto-disabling when not running in AWS Lambda environment. Tracer also auto patches supported modules by AWS X-Ray.

from aws_lambda_powertools import Tracer
from aws_lambda_powertools.utilities.typing import LambdaContext

# Current experience in using metrics should not change
tracer = Tracer(service="ServerlessAirline")

def collect_payment(charge_id: str) -> str:
    return f"dummy payment collected for charge: {charge_id}"

@tracer.capture_lambda_handler
def handler(event: dict, context: LambdaContext) -> str:
    charge_id = event.get("charge_id", "")
    return collect_payment(charge_id=charge_id)

JSON output

{
    "trace_id": "1-5e367daf-6c7f6d9f6c3a6e5800c7d42d",
    "id": "e986a861d4590d97",
    "name": "payment",
    "start_time": 1580441546.023,
    "end_time": 1580441552.983,
    "http": {
        "request": {
            "method": "GET",
            "url": "https://api.example.com/",
            "client_ip": "192.168.1.1",
            "user_agent": "Mozilla/5.0",
        },
        "response": {
            "status": 200,
            "content_length": 1024,
            "headers": {
                "Content-Type": "application/json"
            }
        }
    },
    "subsegments": [
        {
            "id": "3b3b3d8ba74fa7fe",
            "name": "my-subsegment",
            "start_time": 1580441548.023,
            "end_time": 1580441551.983,
            "http": {
                "request": {
                    "method": "POST",
                    "url": "https://api.example.com/submit",
                    "headers": {
                        "Content-Type": "application/json",
                        "Authorization": "Bearer abc123"
                    },
                    "body": "{\"data\": \"example\"}"
                },
                "response": {
                    "status": 200,
                    "content_length": 128,
                    "headers": {
                        "Content-Type": "application/json"
                    }
                }
            },
            "annotations": {
                "example": "annotation"
            }
        }
    ],
    "annotations": {
        "example": "annotation"
    },
    "metadata": {
        "example": "metadata"
    }
}

Tracer proposal

We propose a new parameter to the existing tracer utility that developers can use to specify which observability provider they would like their traces to be pushed to. The below code snippet is a rudimentary look at how this utility can be used and how it will function. Out of the box, we will support DataDog. Other providers TBD

from aws_lambda_powertools import Tracer
from aws_lambda_powertools.utilities.typing import LambdaContext

tracer = Tracer(service="ServerlessAirline", format=Tracer.DATADOG)

def collect_payment(charge_id: str) -> str:
    return f"dummy payment collected for charge: {charge_id}"

@tracer.capture_lambda_handler
def handler(event: dict, context: LambdaContext) -> str:
    charge_id = event.get("charge_id", "")
    return collect_payment(charge_id=charge_id)

JSON output

{
   "trace_id": "3541457326329954564",
   "span_id": "467508042476235233",
   "parent_id": "3541457326329954564",
   "name": "payment",
   "resource": "GET /api",
   "start": 1647370203.4475,
   "duration": 0.0325,
   "service": "serverlessAirline",
   "type": "web",
   "meta": {
       "http": {
           "method": "GET",
           "url": "http://localhost:8000/api",
           "status_code": 200
       }
   }
}

Bring your own provider

If you would like to use an observability provider not supported out of the box, or define their own tracer functions, we will define an interface that the customer can implement and pass in to the Tracer class.

classDiagram
    class BaseProvider {
        +start_span() -> Span
        +end_span() -> Span
        +put_annotation(key: str, value: Union[str, numbers.Number, bool]) -> None
        +put_metadata(key: str, value: Any, namespace: str = "default") -> None
    }
    class CustomTracerProvider {
        +start_span() -> Span
        +end_span() -> Span
        +put_annotation(key: str, value: Union[str, numbers.Number, bool]) -> None
        +put_metadata(key: str, value: Any, namespace: str = "default") -> None
    }
    BaseProvider <|-- CustomTracerProvider

Example

from aws_lambda_powertools.tracing.tracer import Tracer
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.tracing.base import BaseProvider
from contextlib import contextmanager
import threading
import asyncio

class CustomTracerProvider(BaseProvider):
    _thread_local = threading.local()
    
    @contextmanager
    def trace_context(self):
        # when we enter this context, we start a new span and store its context
        if not hasattr(self._thread_local, "trace_id"):
            self._thread_local.trace_id = self.start_span()
        try:
            yield
        finally:
            self.end_span(self._thread_local.trace_id)
    
    @contextmanager
    def trace(self, span:"Span",parent_context: Optional[context_api.Context] = None) -> None:
        # when we enter this we start a child span with the given parent context
        try:
            self.start_span(self, span, parent_context=parent_context)
            yield 
        finally:
            self.end_span(self, span)
    
            
    def start_span(self, span, parent_context: Optional[context_api.Context] = None) -> span:
        """This method is proposed as a solution as it exists for other providers
        This method is responsible for starting the trace. This might involve initializing some data structures,
        connecting to an external service, or performing some other setup work"""
        return span()
    
    def end_span(self, span):
        """This method is proposed as a solution as it exists for other providers.
        This method is responsible for ending the tracing of a span. This might involve finalizing data structures,
        sending data to an external service, or performing some other cleanup work"""
    
    def put_annotation(self, key: str, value: Union[str, numbers.Number, bool]) -> None:
        """Annotate current active trace entity with a key-value pair."""
        
    def put_metadata(self, key: str, value: Any, namespace: str = "default") -> None:
        """Add metadata to the current active trace entity."""
    
    def add_exception(self, exception):
        """Add an exception to trace entities."""
    
    def ignore_endpoint(self, hostname: Optional[str] = None, urls: Optional[List[str]] = None):
        """To ignore the endpoints you don't want requests to be traced, 
        perhaps due to the volume of calls or sensitive URLs. """
    
    def inject_context(self,context):
        """To inject missing context/information like service name"""
    
    def capture_method_async(self, method:Callable, capture_response: Optional[Union[bool, str]] = None, capture_error: Optional[Union[bool, str]] = None):
        """To capture async method"""

tracer = Tracer({serviceName: 'serverlessAirline'},provider=CustomTracerProvider())

@tracer.capture_method_async
async def collect_payment_async(charge_id: str) -> str:
    tracer.put_annotation(key="PaymentId", value=charge_id)
    await asyncio.sleep(0.5)
    return f"dummy payment collected for charge: {charge_id}"
    
@tracer.capture_method
def collect_payment(charge_id: str) -> str:
    return f"dummy payment collected for charge: {charge_id}"

@tracer.capture_lambda_handler   
def handler(event: dict, context: LambdaContext) -> str:
    charge_id = event.get("charge_id", "")
    
    with tracer.provider.trace(span="charge"):
        return collect_payment(charge_id=charge_id)

The five methods defined above are a combination of methods that already exist in the BaseProvider and the ones that are most common in other observability providers.

The current BaseProvider does support most of the features used in the major observability providers. There are couple of differences I noticed while researching through the other Observability providers.

There is difference in nomenclature used to define data that gets received from services, Powertools call them segments whereas other observability providers call them span.
Observability providers like Datadog, Lumigo, NewRelic provides an option to start and end tracing through their start_span and end_span methods, whereas in Powertools we do not have such methods. The possible reason could be that AWS X-Ray anyway keeps track of how the request flows within the application from the start and if it doesn’t support any service, it gives an option to add a subsegment there to keep track of it. Whereas for most of the other providers we need to mention it to start and end tracing. We will add those methods so people can utilize them when using other providers with this capability.

Out of scope

Sending traces from Powertools to the customer's desired observability platform will not be in the scope of this project. The implementation should only support modifying the output of the Tracer so that the customer can push them to their platform of choice.

Potential challenges

We need to determine which platforms we want to support out-of-the-box (apart from Datadog).

Dependencies and Integrations

We will have to integrate with (and thus, have a dependency on) Datadog and any other platforms we decide to support out-of-the-box.

Alternative solutions

No response

Acknowledgment

This feature request meets Lambda Powertools Tenets
Should this be considered in other Lambda Powertools languages? i.e. Java, TypeScript

The text was updated successfully, but these errors were encountered:

boring-cyborg · 2023-03-21T23:29:05Z

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our AWS Lambda Powertools Discord: Invite link

heitorlessa · 2023-03-23T17:58:56Z

got dragged in meetings and will reply to you properly tomorrow. It'll be along these lines I posted on Metrics: #2015 (comment)

TracerProvider is good, but format will confuse customers.

Vandita2020 · 2023-03-29T21:29:26Z

Thank you @heitorlessa

TracerProvider is good, but format will confuse customers.

I see how it can be a bit confusing to the customers. I have revised the RFC and tried to make it simple by removing the format part while keeping the BaseProvider as a parent class to CustomTraceProvider.

heitorlessa · 2023-03-31T17:10:42Z

I underestimated how much feedback I needed to write for Metrics RFC so I ran out of time (apologies!) - I've blocked time on Monday afternoon (morning PST) to go through this.

At a quick glance, the piece I'm missing in the contract is patching - a custom provider will have to own the responsibility to patch one or all supported libraries.

heitorlessa · 2023-04-03T15:03:37Z

hey @Vandita2020 that's a great start!! I took liberty to address some low hanging fruits and made a list of suggestions similar to what I made to Metrics.

There are some minor changes like using the span terminology instead of X-Ray specific segment to ease authoring custom providers. Two major ones are considering that we need a Span representation, and what strategies should we consider to support threading without leaking too much.

Changes

Enabled syntax highlighting
Fixed Tracer initialization to use Python Tracer instead of TypeScript Tracer (e.g., Tracer(service=""))
Split JSON output to enable syntax highlighting and ease reading
Renamed Custom tracer usage to Bring your own provider
Added a sample MermaidJS Class Diagram to quickly visualize contract

Asks

Vandita2020 · 2023-04-06T22:32:30Z

Hey @heitorlessa,

Thanks so much for the review. I have made the changes in the RFC accordingly, also for couple of comments, I have provided come comments/context below.

Add capture_method_async (something we should move towards given the complexity of having under a single method

Currently async methods are handled using @tracer.capture_method, which uses couple of if-else conditions to check if it is sync or async. To simplify the execution of async methods, we create a new method to specifically handle async methods.

Review the use of tracer.start_segment(), tracer.end_segment() in the Bring your own provider section, I suspect you meant something else entirely.

I have corrected it, earlier I was using tracer.start_segment() to start the tracing for everything and tracer.end_segment() to end the tracing, but it got more clearer to me now. As now we are using capture_lambda_handler to trace the handler, capture_method to trace any method, and start_span end_span to trace any particular span.

Add a section on threading (e.g., what does the Base provider need to implement to support it?)

For threading, the concept that I used is that the BaseProvider class needs to implement methods to create and manage thread-local storage for each trace. When a new child thread is created, the trace context can be copied to the new thread-local storage, allowing the new thread to continue the trace without interfering with the parent thread’s trace. When the thread finishes, the trace context can be removed from the thread-local storage.
I've implemented context manager for it now, however still need to provide an example of how to use it. But before I need to make sure if this sounds a good way to support threading?

heitorlessa · 2023-04-19T14:16:08Z

Great!! As for threading, before you dive in the implementation per se, take a look at how DataDog, NewRelic, and OpenTelemetry Tracers handle threading + asyncio.

Reason I ask that is that we wouldn't necessarily need to implement threadlocal or contextvars, because the Provider would be a wrapper on top of the actual implementation (e.g., DataDog Tracer, OpenTelemetry Tracer, etc.).

What's missing in the RFC is a section with a comparison on how observability providers out there handle threading. Then call out whether we need additional public methods in the BaseProvider that will be implemented by the actual provider who already handle the threadlocal or contextvars.

As this is a complex topic, to recap, our Tracer Observability Provider is a thin wrapper on top of the actual Observability Provider SDK (e.g., DataDog SDK, OpenTelemetry SDK, New Relic SDK, etc.). This helps customers use the same Powertools DX across Providers, and each provider simply implement our interface while bringing their various flavours of SDK already handling this use case.

Please do let me know if I can make this any clearer.

Tks a lot!!

roger-zhangg · 2023-05-05T19:58:05Z

Hi @heitorlessa! I'm taking over this issue and trying to catch up on the current process. From what I perceive the current pending item is investigate "How OPTL, new relic,DD handle threading in tracing? - related to issue 2047". Please help me to confirm and remind me what else should I take a look. Thank you!

heitorlessa · 2023-05-05T20:43:21Z

Yup, that’s correct!

…

On Fri, 5 May 2023 at 21:58, Roger Zhang ***@***.***> wrote: Hi @heitorlessa <https://github.com/heitorlessa>! I'm taking over this issue and trying to catch up on the current process. From what I perceive the current pending item is investigate "How OPTL, new relic,DD handle threading in tracing? - related to issue 2047". Please help me to confirm and remind me what else should I take a look. — Reply to this email directly, view it on GitHub <#2030 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZPQBDB3HVQ5X3ARWIJGXTXEVLVPANCNFSM6AAAAAAWDDLERQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

roger-zhangg · 2024-04-11T00:28:33Z

Hello everyone! I'm updating this issue with some decisions we made regard the Tracer provider

Naming

We decide the follow the naming convention of OpenTelemetry tracing. Comparing to the current provider:
BaseSegment -> BaseSpan

current	proposed
close	removed
add_subsegment	removed
remove_subsegment	removed
put_annotation	set_attribute
put_metadata	set_attribute
record_exception	add_exception

BaseProvider

current	proposed
in_subsegment	trace
in_subsegment_async	trace_async
put_annotation	set_attribute
put_metadata	set_attribute

signature of `set_attribute`

The current proposed signature

powertools-lambda-python/aws_lambda_powertools/tracing/provider/base.py

Lines 10 to 14 in ee869b3

    
               @abc.abstractmethod 
        
               def set_attribute(self, key: str, value: Any, **kwargs) -> None: 
        
                   """Set an attribute for a span with a key-value pair. 
        
                   Parameters

We decided to accept Any value here, but the actual supported data type will be decided in the specific provider

Deprecation of current `BaseSegment` and `BaseProvider`

These two classes will remain for backwards compatibility until Powertools V3

docstring change in the current `Subsegment.close`:

https://github.com/aws-powertools/powertools-lambda-python/pull/2342/files#diff-af72ce002a5d3a9cfd406fddde5c809367a781bae77dad2bef6009d79dde6502R12-R18

We come to the conclusion that using float in epoch seconds in this case is more appropriate. I also created a aws/aws-xray-sdk-python#424 for this in X-Ray python SDK as I believe the docstring here is referencing the X-Ray Python SDK.

Backwards compatibility of X-Ray provider

We support all Backwards compatibility except escape hatch usage directly on X-Ray recorder. For example:

tracer = Tracer()
tracer.provider.capture('subsegment_name')
def myfunc():
    # Do something here

We didn't have capture function in the current BaseProvider thus this behavior will not be supported in the new provider. For existing function in the current BaseProvider like in_subsegment, they will still be supported.

Vandita2020 added RFC triage Pending triage from maintainers labels Mar 21, 2023

seshubaws mentioned this issue Mar 22, 2023

RFC: Support for external observability providers - Logging #2014

Closed

2 tasks

heitorlessa removed the triage Pending triage from maintainers label Mar 24, 2023

heitorlessa self-assigned this Mar 24, 2023

heitorlessa added the tracer Tracer utility label Mar 31, 2023

heitorlessa assigned Vandita2020 and unassigned heitorlessa Apr 3, 2023

leandrodamascena assigned heitorlessa Apr 21, 2023

heitorlessa removed their assignment Apr 25, 2023

heitorlessa unassigned Vandita2020 May 3, 2023

leandrodamascena assigned roger-zhangg May 8, 2023

roger-zhangg mentioned this issue May 30, 2023

feat(tracer): Support for external observability providers - Tracer #2342

Draft

7 tasks

heitorlessa added this to the Observability Provider milestone Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Support for external observability providers - Tracer #2030

RFC: Support for external observability providers - Tracer #2030

Vandita2020 commented Mar 21, 2023 •

edited

Loading

boring-cyborg bot commented Mar 21, 2023

heitorlessa commented Mar 23, 2023

Vandita2020 commented Mar 29, 2023

heitorlessa commented Mar 31, 2023

heitorlessa commented Apr 3, 2023

Vandita2020 commented Apr 6, 2023 •

edited

Loading

heitorlessa commented Apr 19, 2023

roger-zhangg commented May 5, 2023 •

edited

Loading

heitorlessa commented May 5, 2023 via email

roger-zhangg commented Apr 11, 2024

RFC: Support for external observability providers - Tracer #2030

RFC: Support for external observability providers - Tracer #2030

Comments

Vandita2020 commented Mar 21, 2023 • edited Loading

Is this related to an existing feature request or issue?

Which AWS Lambda Powertools utility does this relate to?

Summary

Use case

Proposal

Current tracer experience

Tracer proposal

Bring your own provider

Out of scope

Potential challenges

Dependencies and Integrations

Alternative solutions

Acknowledgment

boring-cyborg bot commented Mar 21, 2023

heitorlessa commented Mar 23, 2023

Vandita2020 commented Mar 29, 2023

heitorlessa commented Mar 31, 2023

heitorlessa commented Apr 3, 2023

Vandita2020 commented Apr 6, 2023 • edited Loading

heitorlessa commented Apr 19, 2023

roger-zhangg commented May 5, 2023 • edited Loading

heitorlessa commented May 5, 2023 via email

roger-zhangg commented Apr 11, 2024

Naming

signature of set_attribute

Deprecation of current BaseSegment and BaseProvider

docstring change in the current Subsegment.close:

Backwards compatibility of X-Ray provider

Vandita2020 commented Mar 21, 2023 •

edited

Loading

Vandita2020 commented Apr 6, 2023 •

edited

Loading

roger-zhangg commented May 5, 2023 •

edited

Loading

signature of `set_attribute`

Deprecation of current `BaseSegment` and `BaseProvider`

docstring change in the current `Subsegment.close`: