Memory leak while handling large trace from long-running celery tasks

 TL;DR

A long running celery task keeps adding to a large trace that's already larger than the API payload limit, using a lot of memory and causing a memory usage spike during serialization, all for a trace that's going to be dropped anyway.

### Details

We have a celery task that occasionally handles a large payload, taking up to an hour to complete and performing tens of thousands of queries. Since the ddtrace.contrib.celery patch creates a single parent span for the whole task execution, all children spans are added to it and flushed at the end, when the whole trace is serialized and either sent to the datadog API or dropped because it exceeds the API payload limit.

This works as expected for small tasks, but for these large, long-running tasks, it always results in a trace larger than the API payload limit. We get the `Trace is larger than the max payload size, dropping it` or `Trace is too big to fit in a payload, dropping it` warnings. If it's the API limit, fair enough, but ddtrace keeps pointlessly adding to the trace even after it's already too large for the API. Also, during the serialization it holds both the raw and serialized payload in memory, causing a short memory spike that in our application sometimes hits Heroku's memory quota and cause it to start swapping.

Our temporary workaround was to disable the celery patch with `DD_TRACE_CELERY_ENABLED=False` so we still get traces for queries and other inner calls in the taks, but they aren't grouped under a single parent trace.

Ideally, ddtrace should be able to get a rough estimate of when a parent trace is already too large for the API and stop adding to it.


### Which version of dd-trace-py are you using?

0.41.0

### Which version of the libraries are you using?

django 3.0.5
celery 4.4.7

### How can we reproduce your problem?

Create a long-running celery task that creates a lot of spans and activate the ddtrace.contrib.celery. It will generate a large trace that's dropped at the end.

### What is the result that you get?

ddtrace keeps adding to a doomed trace, wasting memory, and causing a huge memory spike during serialization at the end.

### What is the result that you expected?

ddtrace shouldn't keep adding to a trace that's already past the API payload limit.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak while handling large trace from long-running celery tasks #1632

Details

Which version of dd-trace-py are you using?

Which version of the libraries are you using?

How can we reproduce your problem?

What is the result that you get?

What is the result that you expected?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak while handling large trace from long-running celery tasks #1632

Description

Details

Which version of dd-trace-py are you using?

Which version of the libraries are you using?

How can we reproduce your problem?

What is the result that you get?

What is the result that you expected?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions