Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTel Template adds aws.* bc it represents X-Ray expected data #383

Conversation

NathanielRN
Copy link
Contributor

@NathanielRN NathanielRN commented Nov 3, 2021

Description

We discovered that removing the aws.operation attribute on the S3 span resulted in a regression in the OTel Python + X-Ray experience: aws-observability/aws-otel-lambda#165

Because of this, we filed (BLOCKED ON) open-telemetry/opentelemetry-collector-contrib#6109 to make the Collector add these attributes.

This is the discrepancy in X-Ray:

Before (Expected):

{
    ...,
    "Segments": [
        ...,
        ...,
        {
            ...,
            "Document": {
                ...,
                "subsegments": [
                    ...,
                    {
                        "id": "889eb8ec2a65dea8",
                        "name": "S3",
                        ...,
                        "aws": {
                            "xray": {
                                "auto_instrumentation": false,
                                "sdk_version": "1.5.0",
                                "sdk": "opentelemetry for python"
                            },
                            "region": "us-west-2",
                            "operation": "ListBuckets",
                            "request_id": "Q21YRKZZZ5Y3NKN0"
                        },
                        "metadata": {
                            "default": {
                                "aws.service": "s3",
                                "retry_attempts": 0
                            }
                        },
                        "namespace": "aws"
                    }
                ]
            }
        },
        ...,
        {
            "Id": "3b5da1970904eded",
            "Document": {
                "id": "3b5da1970904eded",
                "name": "S3",
                ...,
                "inferred": true,
                ...,
                "aws": {
                    "xray": {
                        "auto_instrumentation": false,
                        "sdk_version": "1.5.0",
                        "sdk": "opentelemetry for python"
                    },
                    "region": "us-west-2",
                    "operation": "ListBuckets",
                    "request_id": "Q21YRKZZZ5Y3NKN0"
                },
                "origin": "AWS::S3"
            }
        }
    ]
}

After (Regression):

{
    ...,
    "Segments": [
        {
            ...,
            "Document": {
                ...,
                "subsegments": [
                    ...,
                    {
                        "id": "93db1e4f7259e9ff",
                        "name": "S3",
                        ...,
                        "aws": {
                            "xray": {
                                "auto_instrumentation": false,
                                "sdk_version": "1.6.2",
                                "sdk": "opentelemetry for python"
                            },
                            "region": "us-west-2",
                            "request_id": "W5YH3M930Q6JKM8S"
                        },
                        "metadata": {
                            "default": {
                                "rpc.service": "S3",
                                "rpc.method": "ListBuckets",
                                "retry_attempts": 0,
                                "rpc.system": "aws-api"
                            }
                        },
                        "namespace": "remote"
                    }
                ]
            }
        },
        ...,
        {
            "Id": "0734305e3d141d9f",
            "Document": {
                "id": "0734305e3d141d9f",
                "name": "S3",
                ...,
                "inferred": true,
                ...,
            }
        },
        ...
    ]
}

So we use the Before to guide our template here.

"metadata": {
"default": {
"aws.service": "s3",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this one is needed? Maybe AWS X-Ray backend uses this to set the AWS::S3 origin?

@anuraaga
Copy link
Contributor

anuraaga commented Nov 4, 2021

Oops I had meant to comment on the test framework PR, not lambda one. But same comment here

aws-observability/aws-otel-lambda#171 (review)

@willarmiros
Copy link
Contributor

Yep +1 to @anuraaga's comment - if we keep the template to fields that we know are important to X-Ray's backend, then we can much more easily enforce the rule of "don't remove stuff from the template unless we're really, really sure it won't cause regressions"

@@ -21,14 +21,23 @@
"status": 200
}
},
"aws": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't service go here, not in metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was surprised about this too, but based on my test with the most recent release layer I don't see it there? Here is a screenshot of the raw data on AWS X-Ray:

image

I put a more concise version of the entire raw data in the PR description (it is using the default Lambda function we provide in the opentelemetry-lambda repo), but here is the entire output if that is convenient 🙂

I guess this means the X-Ray backend must be reading the metadata since the collector is not populating service in the "aws" key?

{
    "Id": "1-6189643c-0a5ed1e06cae8505021d6d20",
    "Duration": 3.315,
    "LimitExceeded": false,
    "Segments": [
        {
            "Id": "51014f1583193090",
            "Document": {
                "id": "51014f1583193090",
                "name": "Python-1-5-API-GateWay-OTel",
                "start_time": 1636394046.2079146,
                "trace_id": "1-6189643c-0a5ed1e06cae8505021d6d20",
                "end_time": 1636394046.9450457,
                "parent_id": "aa80a1f3c893f2d4",
                "fault": false,
                "error": false,
                "throttle": false,
                "aws": {
                    "xray": {
                        "auto_instrumentation": false,
                        "sdk_version": "1.5.0",
                        "sdk": "opentelemetry for python"
                    }
                },
                "metadata": {
                    "default": {
                        "otel.resource.telemetry.sdk.name": "opentelemetry",
                        "faas.id": "arn:aws:lambda:us-west-2:<ACCOUNT_ID>:function:Python-1-5-API-GateWay-OTel",
                        "otel.resource.faas.name": "Python-1-5-API-GateWay-OTel",
                        "faas.name": "Python-1-5-API-GateWay-OTel",
                        "faas.version": "$LATEST",
                        "faas.execution": "4cdd4679-1173-47e6-a42c-2b58aa55db7c",
                        "otel.resource.cloud.region": "us-west-2",
                        "otel.resource.service.name": "Python-1-5-API-GateWay-OTel",
                        "otel.resource.telemetry.sdk.language": "python",
                        "otel.resource.cloud.provider": "aws",
                        "otel.resource.faas.version": "$LATEST",
                        "otel.resource.telemetry.sdk.version": "1.5.0"
                    }
                },
                "subsegments": [
                    {
                        "id": "d9f4fddc5bc49889",
                        "name": "HTTP GET",
                        "start_time": 1636394046.2276938,
                        "end_time": 1636394046.4599159,
                        "fault": false,
                        "error": false,
                        "throttle": false,
                        "http": {
                            "request": {
                                "url": "http://httpbin.org/",
                                "method": "GET"
                            },
                            "response": {
                                "status": 200,
                                "content_length": 0
                            }
                        },
                        "aws": {
                            "xray": {
                                "auto_instrumentation": false,
                                "sdk_version": "1.5.0",
                                "sdk": "opentelemetry for python"
                            }
                        },
                        "namespace": "remote"
                    },
                    {
                        "id": "7b60c864be47092d",
                        "name": "S3",
                        "start_time": 1636394046.5654962,
                        "end_time": 1636394046.9063969,
                        "fault": false,
                        "error": false,
                        "throttle": false,
                        "http": {
                            "request": {},
                            "response": {
                                "status": 200,
                                "content_length": 0
                            }
                        },
                        "aws": {
                            "xray": {
                                "auto_instrumentation": false,
                                "sdk_version": "1.5.0",
                                "sdk": "opentelemetry for python"
                            },
                            "region": "us-west-2",
                            "operation": "ListBuckets",
                            "request_id": "QGNTZRZRSTMNJ7WN"
                        },
                        "metadata": {
                            "default": {
                                "aws.service": "s3",
                                "retry_attempts": 0
                            }
                        },
                        "namespace": "aws"
                    }
                ]
            }
        },
        {
            "Id": "210945ae60512faf",
            "Document": {
                "id": "210945ae60512faf",
                "name": "Python-1-5-API-GateWay-OTel",
                "start_time": 1636394044.454,
                "trace_id": "1-6189643c-0a5ed1e06cae8505021d6d20",
                "end_time": 1636394047.769,
                "http": {
                    "response": {
                        "status": 200
                    }
                },
                "aws": {
                    "request_id": "4cdd4679-1173-47e6-a42c-2b58aa55db7c"
                },
                "origin": "AWS::Lambda",
                "resource_arn": "arn:aws:lambda:us-west-2:<ACCOUNT_ID>:function:Python-1-5-API-GateWay-OTel"
            }
        },
        {
            "Id": "6d190c52179b6225",
            "Document": {
                "id": "6d190c52179b6225",
                "name": "Python-1-5-API-GateWay-OTel",
                "start_time": 1636394046.206664,
                "trace_id": "1-6189643c-0a5ed1e06cae8505021d6d20",
                "end_time": 1636394047.7654328,
                "parent_id": "210945ae60512faf",
                "aws": {
                    "account_id": "<ACCOUNT_ID>",
                    "function_arn": "arn:aws:lambda:us-west-2:<ACCOUNT_ID>:function:Python-1-5-API-GateWay-OTel",
                    "resource_names": [
                        "Python-1-5-API-GateWay-OTel"
                    ]
                },
                "origin": "AWS::Lambda::Function",
                "subsegments": [
                    {
                        "id": "fe2b475aacb3f09f",
                        "name": "Initialization",
                        "start_time": 1636394044.5885403,
                        "end_time": 1636394046.2052553,
                        "aws": {
                            "function_arn": "arn:aws:lambda:us-west-2:<ACCOUNT_ID>:function:Python-1-5-API-GateWay-OTel"
                        }
                    },
                    {
                        "id": "aa80a1f3c893f2d4",
                        "name": "Invocation",
                        "start_time": 1636394046.2069714,
                        "end_time": 1636394047.7519643,
                        "aws": {
                            "function_arn": "arn:aws:lambda:us-west-2:<ACCOUNT_ID>:function:Python-1-5-API-GateWay-OTel"
                        }
                    },
                    {
                        "id": "c3650172c503402d",
                        "name": "Overhead",
                        "start_time": 1636394047.7520638,
                        "end_time": 1636394047.7653346,
                        "aws": {
                            "function_arn": "arn:aws:lambda:us-west-2:<ACCOUNT_ID>:function:Python-1-5-API-GateWay-OTel"
                        }
                    }
                ]
            }
        },
        {
            "Id": "2221f6e3138a697f",
            "Document": {
                "id": "2221f6e3138a697f",
                "name": "HTTP GET",
                "start_time": 1636394046.2276938,
                "trace_id": "1-6189643c-0a5ed1e06cae8505021d6d20",
                "end_time": 1636394046.4599159,
                "parent_id": "d9f4fddc5bc49889",
                "inferred": true,
                "http": {
                    "request": {
                        "url": "http://httpbin.org/",
                        "method": "GET"
                    },
                    "response": {
                        "status": 200,
                        "content_length": 0
                    }
                }
            }
        },
        {
            "Id": "043674cb1059683b",
            "Document": {
                "id": "043674cb1059683b",
                "name": "S3",
                "start_time": 1636394046.5654962,
                "trace_id": "1-6189643c-0a5ed1e06cae8505021d6d20",
                "end_time": 1636394046.9063969,
                "parent_id": "7b60c864be47092d",
                "inferred": true,
                "http": {
                    "request": {},
                    "response": {
                        "status": 200,
                        "content_length": 0
                    }
                },
                "aws": {
                    "xray": {
                        "auto_instrumentation": false,
                        "sdk_version": "1.5.0",
                        "sdk": "opentelemetry for python"
                    },
                    "region": "us-west-2",
                    "operation": "ListBuckets",
                    "request_id": "QGNTZRZRSTMNJ7WN"
                },
                "origin": "AWS::S3"
            }
        }
    ]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh this is unexpected but it looks to be the case, e.g. in the X-Ray SDK for Java we record the aws.operation attribute in the AWS SDK instrumentation but never aws.service, and there is no key for service in our reserved aws attribute keys: https://github.com/aws/aws-xray-sdk-java/blob/c2d101b5c0ea9c087776e1d27c3d264ed5f4ac86/aws-xray-recorder-sdk-aws-sdk/src/main/java/com/amazonaws/xray/handlers/TracingHandler.java#L188

I guess that makes sense because we're recording the service name as the name of the subsegment.

I guess this means the X-Ray backend must be reading the metadata since the collector is not populating service in the "aws" key?

@NathanielRN can you explain what you mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this may just be my incorrect assumption. I was assuming that the "aws" key in the segment document is created by the AWS X-Ray exporter in the ADOT collector. i.e. this part of the "Document":

"aws": {
    "xray": {
        "auto_instrumentation": false,
        "sdk_version": "1.5.0",
        "sdk": "opentelemetry for python"
    }
},

Then, later when AWS X-Ray backend creates the inferred: true span, you see the following updated value for this "aws" key:

"aws": {
    "xray": {
        "auto_instrumentation": false,
        "sdk_version": "1.5.0",
        "sdk": "opentelemetry for python"
    },
    "region": "us-west-2",
    "operation": "ListBuckets",
    "request_id": "QGNTZRZRSTMNJ7WN"
},

So I assumed the X-Ray backend is reading the default.metadata key to be able to set things like "operation": "ListBuckets".

Finally, to update the X-Ray console UI, I guess X-Ray backend reads from aws.operation for operation (as seen above). Because (as we noticed) aws.serivce does not exist, I assumed it reads from default.metadata."aws.service" to determine the "origin": "AWS::S3" and name and other things.

But maybe it does not do that at all and just reads from the segment name. Maybe that's the same conclusion you came to when you said this @willarmiros

I guess that makes sense because we're recording the service name as the name of the subsegment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think generally speaking because metadata is typically customer-defined (except in the case of ADOT which the backend is still not really aware of) the backend rarely if ever attempts to parse the content of the metadata. But hopefully that can change in the future as the backend becomes more aware of the otel semantic conventions/they become more stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I guess service was leftover from code initially inherited from the opencensus x-ray exporter. In that case we probably don't need to validate that here, and presumably do validate the subsegment name which seems to be what the backend uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I assumed you meant we don't need to verify that aws.service is set in the metadata.default so I have removed it from this template.

@NathanielRN NathanielRN force-pushed the change-expected-default-otel-sdk-template branch from 7315415 to bbf6ea2 Compare November 9, 2021 23:07
@NathanielRN NathanielRN merged commit a9930f6 into aws-observability:terraform Nov 16, 2021
@NathanielRN NathanielRN deleted the change-expected-default-otel-sdk-template branch November 16, 2021 20:40
@NathanielRN NathanielRN changed the title Default OTel SDK Template expects aws.* regards of spec definition OTel Template adds aws.* bc it represents X-Ray expected data Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants