Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datadog tracing to compiled graphql queries #5681

Merged
merged 6 commits into from
Dec 2, 2021

Conversation

Dschoordsch
Copy link
Contributor

@Dschoordsch Dschoordsch commented Nov 18, 2021

Adding the tracing functions is a 2 step process. Step 1 wraps all field
resolvers in the schema itself. Step 2 wraps the CompiledQuery to
initialize the tracing.
The tracing funcitonality itself is heavily inspired by dd-trace-js
graphql plugin.

This changeset will only enable tracing for graphql-jit compiled queries. It can be added to subscriptions after #5468.

Resolves: #5333

@Dschoordsch Dschoordsch force-pushed the feat/5333/graphql-jit-dd-traces branch from 5635524 to 120bc60 Compare November 19, 2021 11:19
Adding the tracing functions is a 2 step process. Step 1 wraps all field
resolvers in the schema itself. Step 2 wraps the CompiledQuery to
initialize the tracing.
The tracing funcitonality itself is heavily inspired by dd-trace-js
graphql plugin.
Copy link
Contributor

@tianrunhe tianrunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, works well. Great work! Two suggestions:

  1. Comparing the output between this version vs. vanilla DataDog GraphQL plugin, it looks like this version is missing the query doc on the resource column. I think with the query content there is super helpful when we investigate performance issue later:

Screen Shot 2021-11-22 at 3 09 18 PM

2. Similarly, is it possible to include the `variables` in the query doc, or maybe tags?

@nickoferrall
Copy link
Contributor

Great PR, @Dschoordsch! I'm trying to get a better understanding of how it's working. Did the old implementation slow down the app because we were tracing all queries, including cached queries, in executeGraphQL? Whereas now that we're wrapping the schema, it will only trace queries that aren't cached and require a network request?

@Dschoordsch
Copy link
Contributor Author

The previous version required us to use plain graphql while now we can use the compiled queries from graphql-jit. This brings some performance boost as the field resolvers don't need to be collected etc. This is what we normally use anyways but didn't work with dd-trace.
I think we also ran with the default config which enables signature for the tracing, which is an expensive operation and it was done again for each query.
Tracing is still done for all queries and it's not super cheap, maybe there is still some room for improvement, or we decrease the sample factor, but the 2 biggest issues should be solved.

@nickoferrall
Copy link
Contributor

Makes sense, thanks for the explanation!

Also add execution hook to add custom tags like the graphql plugin
allows.
@Dschoordsch
Copy link
Contributor Author

Dschoordsch commented Nov 24, 2021

@tianrunhe I addressed the resource name and added the viewerId hook back. I'm not sure if that's what you meant with adding tags?
I don't want to add variables by default, since it may contain sensitive data.

Copy link
Contributor

@tianrunhe tianrunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good enough! All I want is that when we have this in production and say we see a query/mutation takes much longer than expected to execute, we could dive into:

  1. What that query/mutation is about? I think now by having the query name, we could look that up in the code. Would be easier to have the entire query there as the vanilla GraphQL plugin, but I think what we have here is good enough and additionally it saves a lot of traffic sent to DataDog.
  2. Does the query perform badly in general or is it just slow in certain scenario (e.g., a meeting with hundreds of users)? That's my initial request on query variables but I agree with your points on avoiding sending sensitive data to DataDog. Having the viewerId there will definitely help us!

GREAT work! Can't wait to see this in production!

Copy link
Member

@mattkrick mattkrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I had more feedback to give, but all the changes are pretty minor. It's really solid work! I'll do some testing with it tomorrow morning

packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
packages/server/graphql/traceGraphQL.ts Outdated Show resolved Hide resolved
@mattkrick
Copy link
Member

mattkrick commented Dec 1, 2021

Testing complete!
Results here: https://app.datadoghq.com/apm/traces?query=%40_top_level%3A1%20service%3Aparabol-local-dev%20env%3Alocal&cols=core_service%2Ccore_resource_name%2Clog_duration%2Clog_http.method%2Clog_http.status_code&historicalData=false&messageDisplay=inline&sort=desc&streamTraces=true&start=1638375774866&end=1638376674866&paused=false

Extra thoughts:

  • Let's add DD_TRACE_ENABLED='false' to the .env.example, to signify that the var is used in the app
  • We should either add docs or add an entry to our dev.yml docker-compose setup to make this easier to setup locally. As a starting point: docker run -d --name dd-agent -v /var/run/docker.sock:/var/run/docker.sock:ro -v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro --env-file .env -p 8126:8126 gcr.io/datadoghq/agent:7. I then just call yarn build && yarn start to run in prod so it kicks in
  • I noticed the webserver is using the tracer from dd-trace instead of ./tracer.ts, which means it's ignoring the env var. To remove confusion, should we delete tracer.ts & just call init directly on each server? https://github.com/ParabolInc/parabol/blob/4406819325be2c2f9c6362504bccc044dec053b8/packages/server/server.ts

For your consideration:

  tracer.init({
    enabled: process.env.DD_TRACE_ENABLED === 'true',
   // might be worthwhile knowing which service it came from
    service: `GQLExecutor ${process.env.SERVER_ID}`,
    plugins: false
  })

@Dschoordsch
Copy link
Contributor Author

I tested locally with the dev server as dd agent, this way we don't add unwanted infrastructure to datadog.
DD_TRACE_ENABLED is a standard variable from datadog and evaluated if enabled is not passed to init. It's already present in .env.example.

Always initialize dd-trace if DD_TRACE_ENABLED is set. Also pass the
correct service name.
import './tracer'

tracer.init({
enabled: process.env.DD_TRACE_ENABLED === 'true',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was incorrect, I read the code & apparently enabled isn't required, dd-trace reads directly from DD_TRACE_ENABLED. 1 less thing to worry about! https://github.com/DataDog/dd-trace-js/blob/f50ff8817d5ff21b29868ace03558ac25cf5cc18/packages/dd-trace/src/config.js#L91

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I've written in my comment. I added it anyways because it's more explicit.

packages/server/graphql/traceGraphQL.ts Show resolved Hide resolved
Instead wrap all object type's fields directly. This also ensures
wrapping each field only once, removing the need to mark these
separately.
@Dschoordsch
Copy link
Contributor Author

@mattkrick mattkrick merged commit 9d5a854 into master Dec 2, 2021
@mattkrick mattkrick deleted the feat/5333/graphql-jit-dd-traces branch December 2, 2021 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Datadog APM for GraphQL
4 participants