[cassandra] trace execute_async operations #333

bmermet · 2017-08-24T13:14:58Z

This commit addresses #316

Since Cluster.execute() calls Cluster.execute_async(), I've moved instrumentation to the latter. However the behavior is slightly different from before. Since the tracing is now based on a callback, we no longer have the guarantee that the span is closed before the result is returned to application code.
I've changed the tests to account for this, by creating a custom tracer that allows to wait for all created spans to be closed before doing asserts.

One downside of this change, is that in order to prevent our callback from being removed by application code I had to replicate the code of ResponseFuture.clear_callbacks and directly access internal variables, which makes it more likely to break on future code changes in the cassandra library.

Also while doing this change I noticed that in case the query results are paginated we only trace the first page of the results, I think it would be nice to also patch ResponseFuture.start_fetching_next_page() and create a span for each result page.

bmermet · 2017-08-24T13:41:07Z

ddtrace/contrib/cassandra/session.py

+def traced_execute_async_errback(exc, span):
+    # FIXME how should we handle an exception that hasn't be rethrown yet
+    try:
+        raise exc


This is certainly not the right way to do it. Any idea, how do we usually handle this type of errors ?
BTW this means we will log the cassandra errors differently even in the synchronous scenario.

palazzem

As we've discussed, we may think to change a bit this approach, mostly because using callbacks may cause "race conditions" in the sense of when the DONE signal is propagated.

Looking forward the other implementation you had in mind, with the pagination support! Thank you a lot!

bmermet · 2017-08-30T14:53:15Z

ddtrace/contrib/cassandra/session.py

+    if not span:
+        log.debug('traced_set_final_exception was not able to get the current span from the ResponseFuture')
+        return func(*args, **kwargs)
+    with span:


How do we want to handle that? I don't think that's the right way to do it since the stacktrace be in our code, which will be confusing to the user. Is there a way to set an error in a span, without throwing an exception?

palazzem

A change how the exception is handled, plus some minor nitpicks! That implementation is more reliable and allows us to properly trace async execution! Thanks!

palazzem · 2017-08-31T08:12:29Z

ddtrace/contrib/cassandra/session.py

-def traced_execute(func, instance, args, kwargs):
-    cluster = getattr(instance, 'cluster', None)
+def _close_span_on_success(result, future):
+    span = getattr(future, CURRENT_SPAN, None)


It worked very well in general, using the Future as a carrier, so it's safe to retrieve it that way

So is it good that way or do you think I should change something here?

yes! that one is a good approach! 👍

palazzem · 2017-08-31T08:14:08Z

ddtrace/contrib/cassandra/session.py

+        span.set_tags(_extract_result_metas(cassandra.cluster.ResultSet(future, result)))
+
+def traced_set_final_result(func, instance, args, kwargs):
+    result = kwargs.get('result') or args[0]


the original signature always have an args[0] I assume. If not, we should put that in a try-catch.

palazzem · 2017-08-31T08:43:12Z

ddtrace/contrib/cassandra/session.py

+        try:
+            raise exc
+        except:
+            span.set_exc_info(*sys.exc_info())


in that case it's better not to use a try-except because we're overriding the stacktrace at this point (other than introducing an overhead). Looking at their code, it seems that everything that goes through _set_final_exception() has already been caught by another component, so in any case we don't have a stacktrace. What we can do instead, is handling the Exception object manually, with something like:

try: # handling the exception manually because we # don't have an ongoing exception here span.error = 1 span.set_tag(errors.ERROR_MSG, exc.args[0]) span.set_tag(errors.ERROR_TYPE, exc.__class__.__name__) except Exception: log.debug('Unable to se error') # provide a better message finally: span.finish()

Two things about the implementation above:

we should make the implementation "safe" so we can use a try-except-finally statement, or simply make all accessors safe. The try is really efficient unless there is an exception, that should never happen in general (unless exc is not always an Exception)

be sure to check the span tags in our tests

palazzem · 2017-08-31T08:44:28Z

ddtrace/contrib/cassandra/session.py

    pin = Pin.get_from(cluster)
    if not pin or not pin.enabled():
        return func(*args, **kwargs)

-    service = pin.service
-    tracer = pin.tracer
+    # In case the current span is not finished we make sure to finish it


This could happen quite often? Is it anyway an expected behavior in the execute_async implementation?

This should not happen if the ResponseFuture is used properly. But if someone where to call the start_fetching_next_page before the previous call is finished I don't want us to keep spans open.

In general, if it could happen at most once, I'd prefer to have a safe-guard in any case. Keeping opened spans is something that leads to definitely wrong traces. So good to me.

palazzem · 2017-08-31T08:45:11Z

ddtrace/contrib/cassandra/session.py

+        return func(*args, **kwargs)
+    except:
+        span.set_exc_info(*sys.exc_info())
+        span.finish()


put the span.finish() in the finally block so it's closed even if we have an exception in set_exc_info()

Putting span.finish() in a finally would close the span even when the method returns without exception, which is not the intended behavior. I'll put a with block in the except block instead.

ok so if we don't have exceptions it must not be closed at this stage

That's correct.

palazzem · 2017-08-31T08:45:55Z

ddtrace/contrib/cassandra/session.py

+        return result
+    except:
+        span.set_exc_info(*sys.exc_info())
+        span.finish()


palazzem · 2017-08-31T08:47:00Z

ddtrace/contrib/cassandra/session.py

+    _sanitize_query(span, query)
+    span.set_tags(_extract_session_metas(session))     # FIXME[matt] do once?
+    span.set_tags(_extract_cluster_metas(cluster))
+    span.set_tag(cassx.PAGE_NUMBER, page_number)


do we always have this tag? because if the response is not paginated, I'd say not to add this tag.

palazzem · 2017-08-31T08:47:22Z

ddtrace/ext/cassandra.py

@@ -8,3 +8,4 @@
 CONSISTENCY_LEVEL = "cassandra.consistency_level"
 PAGINATED = "cassandra.paginated"
 ROW_COUNT = "cassandra.row_count"
+PAGE_NUMBER = "cassandra.page_number"


palazzem

I have one question open, think that if we're sure in that comment all the rest should be good.

palazzem · 2017-09-01T08:54:59Z

ddtrace/contrib/cassandra/session.py

+        is_paginated = has_more_pages or page_number > 1
+        metas[cassx.PAGINATED] = is_paginated
+        if is_paginated:
+            metas[cassx.PAGE_NUMBER] = page_number


…ndra 3.5)

palazzem · 2017-09-07T13:14:11Z

tests/contrib/cassandra/test.py

+        future = session.execute_async(self.TEST_QUERY)
+        future.result()
+        span = getattr(future, '_ddtrace_current_span', None)
+        ok_(span is None)


bmermet requested review from palazzem and talwai August 24, 2017 13:15

bmermet force-pushed the bmermet/cassandra-executeasync-bug branch from 11d757d to 4a3bf79 Compare August 24, 2017 13:27

bmermet commented Aug 24, 2017

View reviewed changes

palazzem added this to the 0.9.2 milestone Aug 29, 2017

palazzem added bug integrations labels Aug 29, 2017

palazzem reviewed Aug 30, 2017

View reviewed changes

bmermet commented Aug 30, 2017

View reviewed changes

palazzem suggested changes Aug 31, 2017

View reviewed changes

bmermet force-pushed the bmermet/cassandra-executeasync-bug branch 2 times, most recently from 3055aea to c205cef Compare August 31, 2017 10:50

palazzem reviewed Sep 1, 2017

View reviewed changes

bmermet added 17 commits September 1, 2017 15:50

[Cassandra] patch execute_async instead of execute

733113c

Refactor tests

227984b

Store the current span in ResponseFuture

e60ded0

Cassandra paginated queries with callbacks

a3b9b94

Switched impletation from callbacks to overriding functions

db63501

Clean up tests

5aee3bc

Fix flake errors

57e2411

Add safegards against raceconditions

6372d57

Fix flake tests

541aa84

Put finish in with statements

74d0a11

Removed kwargs for result

3fd99a6

Correct exceptions handling

e60cade

Improve paging handling

998c923

Fix Flake tests

1598727

Fix exception syntax to be compatible with python3

591d9d2

Add safeguard against race condition in callback execution (for cassa…

d62098d

…ndra 3.5)

Addressing Manu's last comment

d199e7e

bmermet force-pushed the bmermet/cassandra-executeasync-bug branch from 7102241 to d199e7e Compare September 1, 2017 13:51

Delete span in future

5f3aac5

palazzem approved these changes Sep 7, 2017

View reviewed changes

palazzem merged commit 737a996 into master Sep 7, 2017

palazzem deleted the bmermet/cassandra-executeasync-bug branch September 7, 2017 16:07

palazzem mentioned this pull request Sep 8, 2017

[cassandra] Queries generated by Session.execute_async do not generate spans. #316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cassandra] trace execute_async operations #333

[cassandra] trace execute_async operations #333

bmermet commented Aug 24, 2017 •

edited

bmermet Aug 24, 2017 •

edited

palazzem left a comment

bmermet Aug 30, 2017 •

edited

palazzem left a comment

palazzem Aug 31, 2017

bmermet Aug 31, 2017

palazzem Aug 31, 2017

palazzem Aug 31, 2017

palazzem Aug 31, 2017

palazzem Aug 31, 2017

bmermet Aug 31, 2017

palazzem Aug 31, 2017

palazzem Aug 31, 2017

bmermet Aug 31, 2017

palazzem Aug 31, 2017

bmermet Aug 31, 2017

palazzem Aug 31, 2017

palazzem Aug 31, 2017

palazzem Aug 31, 2017

palazzem left a comment

palazzem Sep 1, 2017

palazzem Sep 7, 2017

[cassandra] trace execute_async operations #333

[cassandra] trace execute_async operations #333

Conversation

bmermet commented Aug 24, 2017 • edited

bmermet Aug 24, 2017 • edited

Choose a reason for hiding this comment

palazzem left a comment

Choose a reason for hiding this comment

bmermet Aug 30, 2017 • edited

Choose a reason for hiding this comment

palazzem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

palazzem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmermet commented Aug 24, 2017 •

edited

bmermet Aug 24, 2017 •

edited

bmermet Aug 30, 2017 •

edited