-
Notifications
You must be signed in to change notification settings - Fork 469
Description
TL;DR
A long running celery task keeps adding to a large trace that's already larger than the API payload limit, using a lot of memory and causing a memory usage spike during serialization, all for a trace that's going to be dropped anyway.
Details
We have a celery task that occasionally handles a large payload, taking up to an hour to complete and performing tens of thousands of queries. Since the ddtrace.contrib.celery patch creates a single parent span for the whole task execution, all children spans are added to it and flushed at the end, when the whole trace is serialized and either sent to the datadog API or dropped because it exceeds the API payload limit.
This works as expected for small tasks, but for these large, long-running tasks, it always results in a trace larger than the API payload limit. We get the Trace is larger than the max payload size, dropping it or Trace is too big to fit in a payload, dropping it warnings. If it's the API limit, fair enough, but ddtrace keeps pointlessly adding to the trace even after it's already too large for the API. Also, during the serialization it holds both the raw and serialized payload in memory, causing a short memory spike that in our application sometimes hits Heroku's memory quota and cause it to start swapping.
Our temporary workaround was to disable the celery patch with DD_TRACE_CELERY_ENABLED=False so we still get traces for queries and other inner calls in the taks, but they aren't grouped under a single parent trace.
Ideally, ddtrace should be able to get a rough estimate of when a parent trace is already too large for the API and stop adding to it.
Which version of dd-trace-py are you using?
0.41.0
Which version of the libraries are you using?
django 3.0.5
celery 4.4.7
How can we reproduce your problem?
Create a long-running celery task that creates a lot of spans and activate the ddtrace.contrib.celery. It will generate a large trace that's dropped at the end.
What is the result that you get?
ddtrace keeps adding to a doomed trace, wasting memory, and causing a huge memory spike during serialization at the end.
What is the result that you expected?
ddtrace shouldn't keep adding to a trace that's already past the API payload limit.