Tracing Context Preservation (async/await and various of libraries) #60

haotianw465 · 2018-10-19T00:02:54Z

Goal

The goal of this issue is to track all tracing context preservation related problems, dive deep on why some of them work and some of them don't, discuss and gather feedback for a way going forward to provide much better customer experience with X-Ray node tracing and further contribute/shape the node context propagation in general.

Problem Statement

The Node.js event loop model has its downsides. One of those is the context preservation through the call chain within a single http request. It is hard to bind arbitrary data (user name, transaction id, tracing context etc) to a full request/response cycle and all relevent context stops at event loop. It's hard to know what is the context for the next callback being executed by event loop.

There is a library continuation-local-storage (CLS for short) that solved most of the problem by binding request context to callbacks. However, this library has two limitations:

It's both a bug and a feature that CLS doesn't work with all node modules. The full reasoning from CLS author is here. In short it's difficult for CLS to know whether it should bind the context or not all the time.
Node community evolves fast and with the major adoption of async/await syntax from Node 8 CLS doesn't fit some new use cases anymore.

Challenges for X-Ray SDK for Node.js

With all the context preservation issues listed above, they are solvable for most of the node application owners who need to preserve contextual data within requests.

For the first issue, CLS provides various of wrappers with name like cls-* for explicitly binding context to callbacks in a particular module. Here is an example fork for node-redis that patches Redis driver for context preservation.
For the second issue (async/await), cls-hooked are reported to be worked well with Node 8 so far by several SDK users. Zone.js is also a potential candidate for the new dependency of the SDK for context binding.

These workarounds are not ideal but work for each service owner as each service only has limited dependency and the owner knows what dependencies the service depends on. It's inevitable and necessary work for service owners to understand this problem space and make necessary code changes since it is business data (user name, transaction id etc) through a request/response cycle.

But for X-Ray SDK one of the core functionalities is to preserve the tracing context through the async call chain and the SDK should abstract this technical challenge away from users who just want all traced functions work once they import the SDK. This issue #28 has some detailed discussion regarding this concern. There are also limitations and known issues on cls-hooked one of which is it doesn't work with some certain libraries (example) while the X-Ray SDK should try not to have the SDK users to deal with these caveats.

Going Forward

(This section will likely be updated more frequently based on both internal and external discussions and feedback.)

The SDK should re-evaluate CLS as its core dependency and leverage the new tools (cls-hooked or zone.js) for preserving context for most of the major use cases today as well as being flexible enough to be forward compatible with new coming features.

A short-term solution could be to start with an experimental branch that replaces CLS with cls-hooked or Zone.js and thorough testing with existing supported libraries and new use cases we intend to cover. Or an alternative approach is to have something in user code with

var cls = process.env.AWS_XRAY_EXPERIMENTAL_ASYNC_AWAIT ? require('cls-hooked') : require('continuation-local-storage');

as suggested by @nakedible-p but the issue is that it makes the bundle size more bloated when the SDK is a dependency. Additionally, these libraries are not owned by AWS X-Ray and some critical fixes/supports for the library from X-Ray's prospective might not be as critical for those libraries' owners.

Long Term: We will share this portion with broader audience after internal discussions. Any suggestion is welcome.

The text was updated successfully, but these errors were encountered:

rochdev · 2018-11-19T15:29:39Z

We're having a similar conversation in opentracing/opentracing-javascript#112. Since the purpose and the concerns are the same, I'm thinking maybe you would be interested to join the conversation and maybe this can be made as a library that is not specific to either OpenTracing or AWS X-Ray.

Thoughts?

P.S.: there is a know memory leak in async-listener which is used by continuation-local-storage and promise context handling is different between continuation-local-storage and cls-hooked which is described in opentracing/opentracing-javascript#112 (comment). Please be careful if you allow switching between the 2.

chrisradek · 2018-11-21T21:41:43Z

I've created an experimental branch to test out cls-hooks. The branch is currently published to npm using the @experimental prerelease tag. This is to immediately unblock usage of async/await and give us a chance to experiment with other options as well.

dougmoscrop · 2018-11-30T16:24:21Z

hey @chrisradek - do you expect this experimental release to work? I can't seen to pinpoint the issue but subsegments are missing, sometimes they seem to show up a minute or so after the x-ray trace, but other times never.

dougmoscrop · 2018-11-30T16:59:18Z

Edit: rewriting this comment for brevity:

It looks like the last subsegment is not transmitted until the next trace.

Consider the following code:

'use strict';

const AWSXRay = require('aws-xray-sdk');

AWSXRay.capturePromise();

module.exports.handler = async () => {
  for (let i = 0; i < 10; i++) {
    console.log('loop', i);
    await new Promise(resolve => {
      AWSXRay.captureAsyncFunc(`loop ${i}`, async subsegment => {
        await new Promise(resolve => {
          setTimeout(resolve, 100);
        });
        subsegment.close();
        resolve();
      });
    });
  }

  return 'Done';
};

Logs look like this:

First invoke:

2018-11-30 19:03:42.544 +00:00 [DEBUG] Starting the AWS X-Ray SDK in automatic mode (default).
2018-11-30 19:03:42.550 +00:00 [DEBUG] AWS_XRAY_CONTEXT_MISSING is set. Configured context missing strategy to LOG_ERROR.
2018-11-30 19:03:42.681 +00:00 [DEBUG] Subsegment streaming threshold set to: 0
START RequestId: a72f62a2-f4d2-11e8-a842-a79ab7976182 Version: $LATEST
2018-11-30 14:03:42.688 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 0
2018-11-30 19:03:42.689 +00:00 [DEBUG] Lambda trace data found: Root=1-5c01898e-31a001dd2fdc0f7321499b69;Parent=037843747bb819be;Sampled=1
2018-11-30 19:03:42.689 +00:00 [DEBUG] Segment started: {"Root":"1-5c01898e-31a001dd2fdc0f7321499b69","Parent":"037843747bb819be","Sampled":"1"}
2018-11-30 14:03:42.794 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 1
2018-11-30 19:03:42.799 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"e7dfbfcd75436281"}
2018-11-30 19:03:42.799 +00:00 [DEBUG] UDP message sent: {"id":"e7dfbfcd75436281","name":"loop 0","start_time":1543604622.689,"end_time":1543604622.791,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:03:42.895 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 2
2018-11-30 19:03:42.895 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"21ce9b9d08ce2054"}
2018-11-30 19:03:42.912 +00:00 [DEBUG] UDP message sent: {"id":"21ce9b9d08ce2054","name":"loop 1","start_time":1543604622.794,"end_time":1543604622.894,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:03:42.995 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 3
2018-11-30 19:03:42.996 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"90cc23cf106b1681"}
2018-11-30 19:03:42.996 +00:00 [DEBUG] UDP message sent: {"id":"90cc23cf106b1681","name":"loop 2","start_time":1543604622.895,"end_time":1543604622.995,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:03:43.096 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 4
2018-11-30 19:03:43.096 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"fd232f3f6d595171"}
2018-11-30 19:03:43.096 +00:00 [DEBUG] UDP message sent: {"id":"fd232f3f6d595171","name":"loop 3","start_time":1543604622.995,"end_time":1543604623.095,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:03:43.197 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 5
2018-11-30 19:03:43.197 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"00d7bbc1755fe7d6"}
2018-11-30 19:03:43.197 +00:00 [DEBUG] UDP message sent: {"id":"00d7bbc1755fe7d6","name":"loop 4","start_time":1543604623.096,"end_time":1543604623.197,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:03:43.298 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 6
2018-11-30 19:03:43.298 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"40e5a8db710501be"}
2018-11-30 19:03:43.299 +00:00 [DEBUG] UDP message sent: {"id":"40e5a8db710501be","name":"loop 5","start_time":1543604623.197,"end_time":1543604623.298,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:03:43.398 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 7
2018-11-30 19:03:43.399 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"73333e6ea3021d16"}
2018-11-30 19:03:43.399 +00:00 [DEBUG] UDP message sent: {"id":"73333e6ea3021d16","name":"loop 6","start_time":1543604623.298,"end_time":1543604623.398,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:03:43.499 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 8
2018-11-30 19:03:43.499 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"da8569a027a13dc0"}
2018-11-30 19:03:43.499 +00:00 [DEBUG] UDP message sent: {"id":"da8569a027a13dc0","name":"loop 7","start_time":1543604623.398,"end_time":1543604623.498,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:03:43.600 (-05:00)	a72f62a2-f4d2-11e8-a842-a79ab7976182	loop 9
2018-11-30 19:03:43.600 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"047d2ef42419b69b"}
2018-11-30 19:03:43.600 +00:00 [DEBUG] UDP message sent: {"id":"047d2ef42419b69b","name":"loop 8","start_time":1543604623.499,"end_time":1543604623.6,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
END RequestId: a72f62a2-f4d2-11e8-a842-a79ab7976182
REPORT RequestId: a72f62a2-f4d2-11e8-a842-a79ab7976182	Duration: 1015.84 ms	Billed Duration: 1100 ms 	Memory Size: 1024 MB	Max Memory Used: 44 MB

Second invoke (you can see the first thing it does is send the last subsegment for trace 1-5c01898e-31a001dd2fdc0f7321499b69)

START RequestId: e44c38f6-f4d2-11e8-a610-a561c6876dc2 Version: $LATEST
2018-11-30 19:05:24.610 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c01898e-31a001dd2fdc0f7321499b69","id":"ce80317ec63a16de"}
2018-11-30 19:05:24.610 +00:00 [DEBUG] UDP message sent: {"id":"ce80317ec63a16de","name":"loop 9","start_time":1543604623.6,"end_time":1543604623.701,"type":"subsegment","parent_id":"037843747bb819be","trace_id":"1-5c01898e-31a001dd2fdc0f7321499b69"}
2018-11-30 14:05:24.611 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 0
2018-11-30 19:05:24.611 +00:00 [DEBUG] Lambda trace data found: Root=1-5c0189f4-df208b36e5d0be99515006f5;Parent=7fc7587c19777e0e;Sampled=1
2018-11-30 19:05:24.611 +00:00 [DEBUG] Segment started: {"Root":"1-5c0189f4-df208b36e5d0be99515006f5","Parent":"7fc7587c19777e0e","Sampled":"1"}
2018-11-30 14:05:24.712 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 1
2018-11-30 19:05:24.712 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"592f5f95b4502724"}
2018-11-30 19:05:24.712 +00:00 [DEBUG] UDP message sent: {"id":"592f5f95b4502724","name":"loop 0","start_time":1543604724.611,"end_time":1543604724.712,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
2018-11-30 14:05:24.813 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 2
2018-11-30 19:05:24.813 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"f34c185682339d91"}
2018-11-30 19:05:24.814 +00:00 [DEBUG] UDP message sent: {"id":"f34c185682339d91","name":"loop 1","start_time":1543604724.712,"end_time":1543604724.813,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
2018-11-30 14:05:24.913 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 3
2018-11-30 19:05:24.914 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"8bf5b42ee0883e35"}
2018-11-30 19:05:24.914 +00:00 [DEBUG] UDP message sent: {"id":"8bf5b42ee0883e35","name":"loop 2","start_time":1543604724.813,"end_time":1543604724.913,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
2018-11-30 14:05:25.014 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 4
2018-11-30 19:05:25.014 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"f3e54a6646c673bf"}
2018-11-30 19:05:25.014 +00:00 [DEBUG] UDP message sent: {"id":"f3e54a6646c673bf","name":"loop 3","start_time":1543604724.913,"end_time":1543604725.013,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
2018-11-30 14:05:25.115 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 5
2018-11-30 19:05:25.115 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"d723888ad9e92f67"}
2018-11-30 19:05:25.116 +00:00 [DEBUG] UDP message sent: {"id":"d723888ad9e92f67","name":"loop 4","start_time":1543604725.014,"end_time":1543604725.115,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
2018-11-30 14:05:25.215 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 6
2018-11-30 19:05:25.216 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"ff429de06db0f5c2"}
2018-11-30 19:05:25.216 +00:00 [DEBUG] UDP message sent: {"id":"ff429de06db0f5c2","name":"loop 5","start_time":1543604725.115,"end_time":1543604725.215,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
2018-11-30 14:05:25.315 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 7
2018-11-30 19:05:25.316 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"6f36294a32882eaf"}
2018-11-30 19:05:25.316 +00:00 [DEBUG] UDP message sent: {"id":"6f36294a32882eaf","name":"loop 6","start_time":1543604725.215,"end_time":1543604725.315,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
2018-11-30 14:05:25.417 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 8
2018-11-30 19:05:25.417 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"1bf01e6cd64a2193"}
2018-11-30 19:05:25.417 +00:00 [DEBUG] UDP message sent: {"id":"1bf01e6cd64a2193","name":"loop 7","start_time":1543604725.316,"end_time":1543604725.417,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
2018-11-30 14:05:25.517 (-05:00)	e44c38f6-f4d2-11e8-a610-a561c6876dc2	loop 9
2018-11-30 19:05:25.518 +00:00 [DEBUG] Subsegment sent: {"trace_id:"1-5c0189f4-df208b36e5d0be99515006f5","id":"26ecc12d7a9476bf"}
2018-11-30 19:05:25.518 +00:00 [DEBUG] UDP message sent: {"id":"26ecc12d7a9476bf","name":"loop 8","start_time":1543604725.417,"end_time":1543604725.517,"type":"subsegment","parent_id":"7fc7587c19777e0e","trace_id":"1-5c0189f4-df208b36e5d0be99515006f5"}
END RequestId: e44c38f6-f4d2-11e8-a610-a561c6876dc2
REPORT RequestId: e44c38f6-f4d2-11e8-a610-a561c6876dc2	Duration: 1015.74 ms	Billed Duration: 1100 ms 	Memory Size: 1024 MB	Max Memory Used: 44 MB

chrisradek · 2018-12-03T17:39:51Z

@dougmoscrop
Thanks for reporting your findings on using the experimental branch. I'll take a look at your test case and investigate why the subsegment isn't being sent when each Lambda invocation ends.

By chance, have you seen this issue happening outside of Lambda functions?

dougmoscrop · 2018-12-03T17:55:19Z

Sorry, I don't have any non-Lambda based use of X-Ray.

I think if anything this can make it appear like the experimental branch isn't working although It seems a separate issue from actual async/await support.

jboehle · 2019-01-23T16:12:45Z

Any update on this? AWS Lambda has had a Node.js 8 runtime since 4/2/18. When is this SDK going to have a stable release that supports async/await?

dougmoscrop · 2019-01-23T16:14:50Z

I haven't had time to really dive in to it, but after deploying the experimental branch, our traces still lack hierarchy too. It looked like it was working but wasn't. I realize this is a crappy/unhelpfully vague comment, but yeah this incompatibility between AWS services is being woefully under-addressed :(

chrisradek · 2019-01-23T17:42:37Z

@dougmoscrop
Can you share an example of what your traces look like that lack the hierarchy you'd expect?

dougmoscrop · 2019-01-23T18:15:24Z

Sure, here's an example:

This is a Lambda processing a Kinesis stream. It looks up a "pointer" from Dynamo to the current S3 key then it updates that object with the new records, saves it as a new S3 object, then moves the Dynamo pointer to the new S3 key (key is a uuid), then deletes the old one.

So it's Dynamo.Get -> S3.Get -> (business logic) -> S3.Put -> Dynamo.Update -> S3.Delete

We open a custom segment for this workflow called Context View. But most of the S3/Dynamo operations do not appear as children of it. You can see that they 'belong' to the Context View because it spans the entire time of all the operations. However, the records in our stream are 'grouped' in to separate Context Views depending on some property; so while the attached image shows only one, there could be a dozen or a hundred depending on what's in the Kinesis stream. When this happens, it's much less clear which sub-segments correlate to which job.

What I would expect is that simply all the S3 and Dynamo calls you see in the above screenshot show up as children of Context View.

Edit: Oh and then there are times where there's a parent-child hierarchy that is incorrect, such as:

PutObject happens entirely after GetObject, not within it as a call. The gap you see is our "business logic" as I mentioned above, so its like this:

await s3.get(...)
await businessLogic(...)
await s3.put(...)

But it shows up as if s3.get was calling s3.put

Stuff like this makes it really hard to know what to believe, like seeing S3 taking 100+ms to get an object that is probably a kilobyte at most; we checked with AWS support and S3 logs show response times more like 13ms. It could totally be a bug in the SDK or the way we're calling it, but it makes us wonder if it's just doing accounting wrong. For that issue specifically, we can start instrumenting our code ourselves, but that kinda compromises the X-Ray value prop.

styfle · 2019-02-01T21:03:01Z

Hello 👋

I want to be able to log the time it takes for an async function to return.

I tried the latest version of aws-xray-sdk available on npm (2.2.0) as well as the experimental version (2.1.0-experimental.1) and both of these resulted in the same error.

2019-02-01 20:34:44.496 +00:00 [ERROR] Error: Failed to get the current sub/segment from the context.
at Object.contextMissingLogError [as contextMissing] (/usr/src/app/node_modules/aws-xray-sdk-core/lib/context_utils.js:26:19)
at Object.getSegment (/usr/src/app/node_modules/aws-xray-sdk-core/lib/context_utils.js:92:45)
at Object.resolveSegment (/usr/src/app/node_modules/aws-xray-sdk-core/lib/context_utils.js:73:19)
at Object.captureAsyncFunc (/usr/src/app/node_modules/aws-xray-sdk-core/lib/capture.js:55:32)

Here's my code

const AWSXRay = require('aws-xray-sdk');
AWSXRay.captureAWS(require('aws-sdk'));
AWSXRay.middleware.enableDynamicNaming();

// This sleep function is to demonstrate an async function
function sleep(ms) {
    return new Promise((resolve, reject) => {
        setTimeout(resolve, ms);
    });
}

function captureAsync(func, args) {
	return new Promise((resolve, reject) => {
		AWSXRay.captureAsyncFunc(func.name, (seg) => {
			// The seg is `undefined` here
			func.apply(this, args)
				.then((result) => {
					seg.close();
					resolve(result);
				})
				.catch((error) => {
					seg.close(error);
					reject(error);
				});
		});
	});
}

const trace = (handler) => async (req, res) => {
	const open = AWSXRay.express.openSegment('some-name');
	if (!res.req) {
		// This was necessary for another error
		res.req = req;
	}
	open(req, res);
    // Throws in `captureAsync`
	await captureAsync(sleep, [1000]);
	const handlerResponse = await handler(req, res);
	const close = AWSXRay.express.closeSegment();
	close(req, res);
	return handlerResponse;
}

module.exports = { trace };

Am I using this API incorrectly or is this the same bug others are experiencing?

chrisradek · 2019-02-01T21:14:46Z

@styfle
Are you running this example in Lambda or another environment?

styfle · 2019-02-01T21:24:34Z

@chrisradek No lambda. This is running on my own machine in a Docker container.

chrisradek · 2019-02-01T23:01:23Z

@styfle
This looks like an issue with the way you've added the x-ray middleware to your express app, combined with using async/await. You're right in that you would want to use the experimental release since you're using async/await.

I roughly converted your example to the following:

const trace = require('./trace').trace;
const AWSXRay = require('aws-xray-sdk');
const express = require('express');

const app = express();

const handle = trace(console.log);
const open = AWSXRay.express.openSegment('some-name');
const close = AWSXRay.express.closeSegment();
app.use(open);
app.get('/', handle);
app.use(close);
app.listen(3001);

I also removed the open/close segment logic from your trace handler:

const trace = (handler) => async (req, res) => {
	if (!res.req) {
		// This was necessary for another error
		res.req = req;
	}
	await captureAsync(sleep, [1000]);
	const handlerResponse = await handler(req, res);

	return handlerResponse;
}

It looks like we currently require the next function to be passed to the openSegment handler, which express will handle for us. I'm not sure what would happen if you also used multiple async middleware. Does the above change workaround the issue you're seeing?

This issue might be specific to our express middleware if it doesn't handle multiple async middlewares in the chain, so we should open another issue to track that if this is the case.

styfle · 2019-02-01T23:06:34Z

@chrisradek Thanks for the response! 😃

I'm not using express, I'm using micro which looks a lot like the Lambda API.

Maybe the missing next function is my problem?

chrisradek · 2019-02-01T23:56:41Z

@styfle
Ah, the middleware you were using is meant for express, so I'm not sure it will work with micro. You might be able to get it to 'work' by passing the rest of your function as a callback to the open method you've defined, and calling close inside that as well.

You can also instrument your code manually. Here's an example of what that could look like, though I haven't used micro so not sure if there are any problems with this approach:

const AWSXRay = require('aws-xray-sdk');

async function example(foo, bar) {
  const namespace = AWSXRay.getNamespace();
  const segment = new AWSXRay.Segment('some-name');
  return await namespace.runAndReturn(async () => {
    AWSXRay.setSegment(segment);

    await captureAsync(sleep, [1000]);
    await sleep(100);

    segment.close();
    return 'done';
  });
}

// This sleep function is to demonstrate an async function
function sleep(ms) {
  return new Promise((resolve, reject) => {
      setTimeout(resolve, ms);
  });
}

function captureAsync(func, args) {
  return new Promise((resolve, reject) => {
    AWSXRay.captureAsyncFunc(func.name, (seg) => {
      // The seg is `undefined` here
      func.apply(this, args)
        .then((result) => {
          seg.close();
          resolve(result);
        })
        .catch((error) => {
          seg.close(error);
          reject(error);
        });
    });
  });
}

example('a', 'b');

This example creates a segment everytime example (the handler) is called, and closes it when it is done.

styfle · 2019-02-02T00:19:09Z

@chrisradek You are amazing!!!!! 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉

I spent hours trying to figure this one out and your solution (with a minor modification) worked 😄

Now the only thing missing is the URL disappeared from the console 🤔

styfle · 2019-02-04T16:41:57Z

@chrisradek On second thought, there are still problems when calling capture more than once. You can use this repository to reproduce the issue https://github.com/styfle/aws-xray-bug

Would you like me to create a new issue here or is this thread fine?

chrisradek · 2019-02-04T16:45:55Z

@styfle
Would you mind creating a new thread for this issue?

dougmoscrop · 2019-02-05T15:43:39Z

Hey @chrisradek - would it be helpful if we opened an issue with our enterprise support? Or is this thread enough?

okonon · 2019-04-06T04:25:59Z

Also interested in async/await support. More and more of modern apps are switching to this pattern and reducing X-Ray's value proposition.

awssandra · 2019-04-09T18:51:53Z

@okonon Agreed. Proper context propagation is still a difficult problem to solve in the JS world. The discussion is also present on the OpenTracing github page, and it seems zone.js will not fix the async/await issue.

A experimental branch is available, however, cls-hooked still has known issues, and does not appear to be in active development at this time.
For further reading: opentracing/opentracing-javascript#112

alvis · 2019-04-12T12:16:09Z

I have the same issue as @dougmoscrop that the last subsegment is not transmitted until the next trace.

I tried both the ordinary and the experimental releases, and both of them have the same issue. My code is deployed as a lambda function, using the serverless framework for packaging and deployment.

awssandra · 2019-04-23T20:02:45Z

@alvis

Sorry for the issues, this is a known issue with the Lambda Async handler. We are working internally to have it fixed.

joebowbeer · 2019-05-28T01:35:40Z

Now that DataDog, Honeycomb, NewRelic, Stackdriver, and even Bluebird are all leveraging async_hooks, is it time to mainline the experimental branch?

akleiber · 2019-07-01T00:39:27Z

@awssandra any update on the async problem with not sending the last segment?
I tried to workaround with sending an extra segment "workaround" but I failed. The only way the last segment is sent, is if the lambda if the same container is called again.

:-( x-ray is amazing but is not useable this way in production environments

okonon · 2019-08-22T12:19:20Z

Also interested in the update regarding this issue

keyvhinng · 2020-02-05T15:41:00Z

I wonder what's the current status of this. I'm very interested in this issue #4 (Koa support)

I see that some PRs were merged but I'm not sure if everything needed to support koa is already done

willarmiros · 2020-02-05T17:57:04Z

This issue supporting tracing of async/await calls has been resolved using the cls-hooked library and is available in two different distribution tags:

# For stable release of SDK, version 2.5.0-experimental.1 
npm install --save aws-xray-sdk@experimental

# For prerelease of next major version of SDK, version 3.0.0-alpha.1
npm install --save aws-xray-sdk@alpha

This issue will be closed when 3.0.0 is released, which is tracked in issue #157. As for explicitly supporting the Koa framework, that has not been prioritized yet on our roadmap.

jgcmarins · 2020-03-03T18:35:50Z

Any updates here?
I am about to setup X-Ray on a Koa + GraphQL server.

willarmiros · 2020-03-03T18:43:00Z

For Koa support, please post on #4. At this time we don't have the bandwidth to add support for the Koa framework ourselves, but are always open to PRs.

willarmiros · 2020-04-23T16:11:17Z

Closing this issue since the use of cls-hooked was mainlined in 3.0.0.

haotianw465 added the enhancement label Oct 19, 2018

This was referenced Oct 19, 2018

Async/Await context perservation #12

Closed

Does not work with node-redis #28

Closed

Koa Support #4

Closed

charsleysa mentioned this issue Oct 19, 2018

Promise support for Postgres #19

Closed

haotianw465 mentioned this issue Oct 25, 2018

Capturing Subsegment doesn't work with async AWS Lambdas #58

Closed

kh01 mentioned this issue Oct 25, 2018

Add support for nodejs mysql2 package #6

Closed

chrisradek mentioned this issue Nov 16, 2018

feat: switches continuation-local-storage to cls-hooked for async support #74

Merged

awssandra mentioned this issue Dec 3, 2018

Support for MYSQL / Elasticsearch #50

Closed

styfle mentioned this issue Feb 4, 2019

xray middleware for micro fails with Error: Failed to get the current sub/segment from the context #95

Closed

yogiraj07 mentioned this issue Jul 23, 2019

Node SDK release V3.0.0 #157

Closed

jasonwilson mentioned this issue Jul 30, 2019

Add support for x-ray aws-samples/aws-bookstore-demo-app#16

Open

awssandra mentioned this issue Sep 13, 2019

Support for Node ^v12 #154

Closed

pauldraper mentioned this issue Sep 20, 2019

Scope Manager should use continuation passing opentracing/specification#126

Open

willarmiros mentioned this issue Dec 4, 2019

Replace continuation-local-storage with cls-hooked #227

Merged

willarmiros closed this as completed Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracing Context Preservation (async/await and various of libraries) #60

Tracing Context Preservation (async/await and various of libraries) #60

haotianw465 commented Oct 19, 2018 •

edited

Loading

rochdev commented Nov 19, 2018

chrisradek commented Nov 21, 2018

dougmoscrop commented Nov 30, 2018

dougmoscrop commented Nov 30, 2018 •

edited

Loading

chrisradek commented Dec 3, 2018

dougmoscrop commented Dec 3, 2018

jboehle commented Jan 23, 2019

dougmoscrop commented Jan 23, 2019

chrisradek commented Jan 23, 2019

dougmoscrop commented Jan 23, 2019 •

edited

Loading

styfle commented Feb 1, 2019

chrisradek commented Feb 1, 2019

styfle commented Feb 1, 2019

chrisradek commented Feb 1, 2019

styfle commented Feb 1, 2019

chrisradek commented Feb 1, 2019

styfle commented Feb 2, 2019

styfle commented Feb 4, 2019

chrisradek commented Feb 4, 2019

dougmoscrop commented Feb 5, 2019

okonon commented Apr 6, 2019

awssandra commented Apr 9, 2019

alvis commented Apr 12, 2019 •

edited

Loading

awssandra commented Apr 23, 2019 •

edited

Loading

joebowbeer commented May 28, 2019

akleiber commented Jul 1, 2019 •

edited

Loading

okonon commented Aug 22, 2019

keyvhinng commented Feb 5, 2020

willarmiros commented Feb 5, 2020

jgcmarins commented Mar 3, 2020

willarmiros commented Mar 3, 2020

willarmiros commented Apr 23, 2020

Tracing Context Preservation (async/await and various of libraries) #60

Tracing Context Preservation (async/await and various of libraries) #60

Comments

haotianw465 commented Oct 19, 2018 • edited Loading

Goal

Problem Statement

Challenges for X-Ray SDK for Node.js

Going Forward

rochdev commented Nov 19, 2018

chrisradek commented Nov 21, 2018

dougmoscrop commented Nov 30, 2018

dougmoscrop commented Nov 30, 2018 • edited Loading

chrisradek commented Dec 3, 2018

dougmoscrop commented Dec 3, 2018

jboehle commented Jan 23, 2019

dougmoscrop commented Jan 23, 2019

chrisradek commented Jan 23, 2019

dougmoscrop commented Jan 23, 2019 • edited Loading

styfle commented Feb 1, 2019

chrisradek commented Feb 1, 2019

styfle commented Feb 1, 2019

chrisradek commented Feb 1, 2019

styfle commented Feb 1, 2019

chrisradek commented Feb 1, 2019

styfle commented Feb 2, 2019

styfle commented Feb 4, 2019

chrisradek commented Feb 4, 2019

dougmoscrop commented Feb 5, 2019

okonon commented Apr 6, 2019

awssandra commented Apr 9, 2019

alvis commented Apr 12, 2019 • edited Loading

awssandra commented Apr 23, 2019 • edited Loading

joebowbeer commented May 28, 2019

akleiber commented Jul 1, 2019 • edited Loading

okonon commented Aug 22, 2019

keyvhinng commented Feb 5, 2020

willarmiros commented Feb 5, 2020

jgcmarins commented Mar 3, 2020

willarmiros commented Mar 3, 2020

willarmiros commented Apr 23, 2020

haotianw465 commented Oct 19, 2018 •

edited

Loading

dougmoscrop commented Nov 30, 2018 •

edited

Loading

dougmoscrop commented Jan 23, 2019 •

edited

Loading

alvis commented Apr 12, 2019 •

edited

Loading

awssandra commented Apr 23, 2019 •

edited

Loading

akleiber commented Jul 1, 2019 •

edited

Loading