Add a processing pipeline to AsyncWorker #303

bmermet · 2017-07-04T15:59:26Z

This makes it possible to do some processing/filtering of the traces
before sending them to the agent.

Add a FilterRequestsOnUrl processor to remove traces of incoming
requests that match a regexp.

For instance filtering out all traces of incoming requests to http://test.example.com can be done simply by configuring the tracer with:

    from ddtrace import tracer
    from ddtrace.filters import FilterRequestsOnUrl

    tracer.configure(settings={
        'FILTERS': [
            FilterRequestsOnUrl(r'http://test\.example\.com'),
        ],
    })

This makes it possible to do some processing/filtering of the traces before sending them to the agent. Add a FilterRequestsOnUrl processor to remove traces of incoming requests that match a regexp.

palazzem · 2017-07-05T08:45:35Z

@bmermet thanks for the PR! Will take a look soon, in the meantime can you fix the flake8 issues:

ddtrace/processors.py:14:121: E501 line too long (152 > 120 characters)
ddtrace/processors.py:45:31: E711 comparison to None should be 'if cond is None:'
ddtrace/settings.py:3:1: E265 block comment should start with '# '
ddtrace/tracer.py:99:121: E501 line too long (140 > 120 characters)
ddtrace/writer.py:56:121: E501 line too long (125 > 120 characters)

thanks a lot!

palazzem

Good job! Left some comments, feel free to ping me if you need more details!

It will help a lot other developers! Thanks!

palazzem · 2017-07-05T16:01:36Z

ddtrace/tracer.py

-            self.writer = AgentWriter(hostname or self.DEFAULT_HOSTNAME, port or self.DEFAULT_PORT)
+        processing_pipeline = None
+        if settings is not None and PP_KEY in settings:
+                processing_pipeline = settings[PP_KEY]


you can settings.get(PP_KEY) so you don't need the condition PP_KEY in settings.

palazzem · 2017-07-05T16:06:05Z

ddtrace/settings.py

@@ -0,0 +1,4 @@
+PROCESSING_PIPELINE_KEY = "PROCESSING_PIPELINE"


we may call that module constants.py since it contains (at least for now) only key strings. Also, in general, prefer ' over " for consistency with the majority of modules in this project (at some point I will enforce via a tool one or the other!).

palazzem · 2017-07-05T16:06:47Z

ddtrace/tracer.py

@@ -7,6 +7,7 @@
 from .sampler import AllSampler
 from .writer import AgentWriter
 from .span import Span
+from .settings import PP_KEY


instead of doing that, I'd suggest to remove PP_KEY from the previous file and:

from .constants import PROCESSING_PIPELINE_KEY as PP_KEY

palazzem · 2017-07-05T16:11:02Z

ddtrace/writer.py

+                    traces = self._apply_processing_pipeline(traces)
+                except Exception as err:
+                    log.error("error while processing traces:{0}".format(err))
+            if traces:


I think you can simply put everything inside the same if traces, while keeping the try-except separated so a failure in the pipeline will not prevent traces from being sent. Generally speaking, this is correct now and it's the best thing we can do to not introduce a possible huge change.

I kept two if traces: because self._apply_processing_pipeline may return no traces if all of them are filtered out by the pipeline.

oh right, we don't have any further check; let's keep it that way so!

palazzem · 2017-07-05T16:13:17Z

ddtrace/writer.py

@@ -155,6 +168,19 @@ def _log_error_status(self, result, result_name):
                      getattr(result, "status", None), getattr(result, "reason", None),
                      getattr(result, "msg", None))

+    def _apply_processing_pipeline(self, traces):


I'd say to add a docstring here to explain briefly how the pipeline works. But what I'm more interested on, is to specify that traces is owned by the AsyncWorker thread, so it can be freely modified without using a mutex.

palazzem · 2017-07-05T16:17:45Z

ddtrace/processors.py

+
+from .ext import http
+
+class FilterRequestsOnUrl():


For sanity in Python 2 MRO, always inherit from object:

class FilterRequestsOnUrl(object): # ...

palazzem · 2017-07-05T16:27:17Z

docs/index.rst

+
+**Write a custom processor**
+
+Creating your own processors is as simple as implementing a class with a process_trace method and adding it to the processing pipeline parameter of Tracer.configure. process_trace should either return a trace to be fed to the next step of the pipeline or None if the trace should be discarded. (see processors.py for example implementations)


I'd say to add a really small example instead of ask developers to check the processors.py module. What about something like:

class ProcessorExample(object): def process_trace(self, trace): # write here your logic to return the `trace` or None; # `trace` instance is owned by the thread and you can alter # each single span if needed

Or something similar, because I still prefer a really concise snippet rather than long sentences. What do you think?

You're right, it's quicker to grasp.

palazzem · 2017-07-05T16:34:58Z

ddtrace/writer.py

+                for processor in self._processing_pipeline:
+                    trace = processor.process_trace(trace)
+                    if trace is None:
+                        break


Can we add a test for this case when the pipeline is short-circuited? You can have another test with two processors where the first one returns None so that the second is not invoked. If it helps, you can use Mock on the second one such as:

processor = FilterRequestsOnUrl(r'http://example\.com/health') processor = mock.Mock(processor, wraps=processor) eq_(processor.process_trace.call_count, 0)

done in tests_writer.py

palazzem · 2017-07-05T16:37:32Z

ddtrace/tracer.py

+                processing_pipeline = settings[PP_KEY]
+
+        if hostname is not None or port is not None or processing_pipeline is not None:
+            self.writer = AgentWriter(


I think we have some indentation issues here. Don't know why flake8 didn't catch that!

palazzem · 2017-07-09T22:11:10Z

docs/index.rst

+configuring the tracer with a processing pipeline. For instance to filter out
+all traces of incoming requests to a specific url::
+
+    processing_pipeline = [FilterRequestsOnUrl(r'http://test\.example\.com')]


I think our example code can become:

Tracer.configure(settings={ 'PROCESSING_PIPELINE': [ FilterRequestsOnUrl(r'http://test\.example\.com'), ], })

just to make it simpler. Also, as we discussed, this is a WIP functionality so very likely PROCESSING_PIPELINE will be named differently.

palazzem · 2017-07-19T10:14:26Z

ddtrace/constants.py

@@ -0,0 +1 @@
+PROCESSING_PIPELINE_KEY = 'PROCESSING_PIPELINE'


@bmermet what about calling it just FILTERS? Considered that the main usage of the pipeline is to filter traces (or spans)

palazzem · 2017-07-19T10:17:48Z

ddtrace/tracer.py

+        if settings is not None:
+                processing_pipeline = settings.get(PP_KEY)
+
+        if hostname is not None or port is not None or processing_pipeline is not None:


Not something we have to change now, but I think the configure() method must be changed in something else. I don't like the idea to initialize again the AgentWriter, especially because if you set the processing_pipeline and then in another call you change the hostname, you end up with the right hostname but wrong pipeline (and viceversa).

Keeping this minor refactoring in another PR so that it doesn't impact this PR.

palazzem

LGTM! We can ship that in the next major release

Add a processing pipeline to AsyncWorker

fdc7270

This makes it possible to do some processing/filtering of the traces before sending them to the agent. Add a FilterRequestsOnUrl processor to remove traces of incoming requests that match a regexp.

bmermet requested a review from palazzem July 4, 2017 15:59

palazzem added the core label Jul 4, 2017

palazzem added this to the 0.10.0 milestone Jul 4, 2017

palazzem self-assigned this Jul 4, 2017

Fix flake errors

0ce2956

palazzem reviewed Jul 5, 2017

View reviewed changes

Adressing Manu's comments

4f88de8

bmermet force-pushed the bmermet/filter-traces branch from 34082d4 to 4f88de8 Compare July 6, 2017 13:08

Fixed silly errors

3c6f62e

palazzem reviewed Jul 9, 2017

View reviewed changes

bmermet added 2 commits July 10, 2017 10:54

Simplifying code example

e31d845

Fix race condition in AsyncWorker shutdown

2be5500

palazzem reviewed Jul 19, 2017

View reviewed changes

Renamed processing pipeline into filters

e42b881

palazzem approved these changes Jul 19, 2017

View reviewed changes

palazzem modified the milestones: 0.9.1, 0.10.0 Jul 24, 2017

palazzem merged commit 69693dc into master Jul 25, 2017

palazzem deleted the bmermet/filter-traces branch July 25, 2017 16:43

palazzem mentioned this pull request Jul 31, 2017

Feature request: Possibility to ignore controller#action DataDog/dd-trace-rb#168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a processing pipeline to AsyncWorker #303

Add a processing pipeline to AsyncWorker #303

bmermet commented Jul 4, 2017 •

edited by palazzem

palazzem commented Jul 5, 2017

palazzem left a comment

palazzem Jul 5, 2017

palazzem Jul 5, 2017

palazzem Jul 5, 2017

palazzem Jul 5, 2017

bmermet Jul 5, 2017

palazzem Jul 5, 2017

palazzem Jul 5, 2017

palazzem Jul 5, 2017

palazzem Jul 5, 2017

bmermet Jul 6, 2017

palazzem Jul 5, 2017

bmermet Jul 6, 2017

palazzem Jul 7, 2017

palazzem Jul 5, 2017

palazzem Jul 9, 2017

palazzem Jul 19, 2017

palazzem Jul 19, 2017

palazzem left a comment

		@@ -0,0 +1,4 @@
		PROCESSING_PIPELINE_KEY = "PROCESSING_PIPELINE"


		Write a custom processor

		Creating your own processors is as simple as implementing a class with a process_trace method and adding it to the processing pipeline parameter of Tracer.configure. process_trace should either return a trace to be fed to the next step of the pipeline or None if the trace should be discarded. (see processors.py for example implementations)

		@@ -0,0 +1 @@
		PROCESSING_PIPELINE_KEY = 'PROCESSING_PIPELINE'

Add a processing pipeline to AsyncWorker #303

Add a processing pipeline to AsyncWorker #303

Conversation

bmermet commented Jul 4, 2017 • edited by palazzem

palazzem commented Jul 5, 2017

palazzem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

palazzem left a comment

Choose a reason for hiding this comment

bmermet commented Jul 4, 2017 •

edited by palazzem