Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distributed tracing for Celery #1194

Merged
merged 4 commits into from
May 6, 2020

Conversation

thieman
Copy link
Contributor

@thieman thieman commented Feb 5, 2020

This is an attempt to implement distributed tracing in the Celery contrib per #1179.

I ran into one major snag, which is that we rely on a signal that Celery simply does not execute when running synchronously, which is how our Celery integration tests work: celery/celery#3864

Due to this bug, there's absolutely nothing in our test suite actually executing our pre-publish signals from what I can tell. I attempted to test as much of the actual integration as I could, but I'm all ears if anybody sees a way to get around this restriction.

@thieman thieman requested a review from a team as a code owner February 5, 2020 19:39
@thieman
Copy link
Contributor Author

thieman commented Feb 5, 2020

I'd also like to find a way to test this against an actual running Celery deployment, is there an established way of doing that? Or would I need to package up my fork and install it in another application somewhere?

@thieman
Copy link
Contributor Author

thieman commented Feb 5, 2020

I'm skeptical the segfault in SQLAlchemy tests have anything to do with this PR but I'm not sure how to re-run CI 🤔

@jd
Copy link
Contributor

jd commented Feb 6, 2020

I'm skeptical the segfault in SQLAlchemy tests have anything to do with this PR but I'm not sure how to re-run CI 🤔

Don't worry, this happens all the time, not related to your PR.

Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general this looks good, not sure I have any immediate feedback on the tests, I might need to sit with this for a minute to provide some feedback

have you tested this on a celery app? curious how it is working!

# Note: adding tags from `traceback` or `state` calls will make an
# API call to the backend for the properties so we should rely
# only on the given `Context`
attach_span(task, task_id, span, is_publish=True)

headers = kwargs.get('headers')
if config.celery['distributed_tracing'] and headers is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add headers to kwargs if it doesn't exist? what are the cases when headers won't be present?

can headers be set in args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe headers should always exist and be a dictionary in the current version of Celery, this is being cautious for previous versions. I'll attempt to verify that further when I get this installed in a running Celery app for testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we inject header if it is None?

e.g.

if config.celery['distributed_tracing']:
    kwargs.get('headers')
    if kwargs.get('headers'):
        kwargs['headers'] = dict()

    # inject into `headers` ....

If there ever was a case when headers is not provided, do we still want to attempt to distribute?

(also, totally down for being cautious! in general the approach is good)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to dig in more in Celery to understand exactly how all this works, but I'm not having much luck either with the documentation or the code itself.

I did find this note in the Task docs:

headers: | Mapping of message headers sent with this task message (may be None).

I'm not sure that the headers being None has any special meaning, though, as you're definitely allowed to write to them during these signals. I think we should still be fine to distribute in that case.

@thieman
Copy link
Contributor Author

thieman commented Feb 6, 2020

Alright, I was able to bundle up my fork as a wheel and test this against a distributed Celery install. Good thing, too, because I found yet another bug in Celery (celery/celery#4875) causing differences between how things work in our tests compared to in a real distributed environment. With this last commit, though, I think we should be good to go.

Copy link
Member

@Kyle-Verhoog Kyle-Verhoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small things and I think we're good to go! Thanks for the contribution @thieman! 😄

ddtrace/contrib/celery/signals.py Outdated Show resolved Hide resolved
ddtrace/contrib/celery/signals.py Outdated Show resolved Hide resolved


class CeleryDistributedTracingIntegrationTask(CeleryBaseTestCase):
"""Distributed tracing is tricky to test for two reasons:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for the test and great accompanying documentation too! Much appreciated 😄

@thieman thieman force-pushed the tnt/celery-distributed-tracing branch from 1ffaeb4 to 0dde980 Compare March 28, 2020 15:28
@thieman
Copy link
Contributor Author

thieman commented Mar 28, 2020

@Kyle-Verhoog comments addressed and rebased onto master, thanks!

edit: oh I see there's a black check now, I'll try to get that to pass. I'll squash these once I'm passing

@thieman thieman force-pushed the tnt/celery-distributed-tracing branch from 57bc0e3 to 69d6cfe Compare March 28, 2020 16:00
@Kyle-Verhoog
Copy link
Member

@thieman thanks for the quick response! I'm going to try to find some time today to set up a sample app to see what the traces look like. Also need to think about whether we want the distributed tracing enabled by default or not 🤔.

@thieman
Copy link
Contributor Author

thieman commented Mar 30, 2020

Also need to think about whether we want the distributed tracing enabled by default or not 🤔.

Out of curiosity, what do you see as the downsides of enabling it?

@Kyle-Verhoog
Copy link
Member

One reason is for tasks/jobs that may run minutes/hours/days later. The resulting trace UI isn't great for that at the moment.

@Kyle-Verhoog
Copy link
Member

@thieman I created a small demo app in our trace-examples repo to play around with the distributed tracing and it looks great, good work!

image

I do think we want to change the integration over to be disabled by default because of the above reason. But then this is good to merge 😄

@thieman thieman force-pushed the tnt/celery-distributed-tracing branch from 69d6cfe to d570279 Compare April 1, 2020 13:28
@thieman
Copy link
Contributor Author

thieman commented Apr 1, 2020

Switched to default off. I didn't add any explanation of why to the docstring in init, wasn't sure if that was the right place for it.

@chrono
Copy link
Contributor

chrono commented May 6, 2020

I've been stalking this PR since february. What's left to do to get this merged?

@thieman
Copy link
Contributor Author

thieman commented May 6, 2020

I didn't even realize the tests actually failed, I thought it was another flake like from before. Will take a look.

@thieman
Copy link
Contributor Author

thieman commented May 6, 2020

@Kyle-Verhoog apologies for the delay but this PR is hopefully good to go now

Copy link
Member

@Kyle-Verhoog Kyle-Verhoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thieman no worries! Thanks for seeing it through! 😄

@Kyle-Verhoog Kyle-Verhoog merged commit d21817c into DataDog:master May 6, 2020
@chrono
Copy link
Contributor

chrono commented May 6, 2020

Thanks everyone!

@thieman thieman deleted the tnt/celery-distributed-tracing branch May 6, 2020 17:23
@Kyle-Verhoog Kyle-Verhoog added this to the 0.38.0 milestone May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants