-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cassandra] trace execute_async operations #333
Conversation
11d757d
to
4a3bf79
Compare
ddtrace/contrib/cassandra/session.py
Outdated
def traced_execute_async_errback(exc, span): | ||
# FIXME how should we handle an exception that hasn't be rethrown yet | ||
try: | ||
raise exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is certainly not the right way to do it. Any idea, how do we usually handle this type of errors ?
BTW this means we will log the cassandra errors differently even in the synchronous scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we've discussed, we may think to change a bit this approach, mostly because using callbacks may cause "race conditions" in the sense of when the DONE
signal is propagated.
Looking forward the other implementation you had in mind, with the pagination support! Thank you a lot!
ddtrace/contrib/cassandra/session.py
Outdated
if not span: | ||
log.debug('traced_set_final_exception was not able to get the current span from the ResponseFuture') | ||
return func(*args, **kwargs) | ||
with span: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we want to handle that? I don't think that's the right way to do it since the stacktrace be in our code, which will be confusing to the user. Is there a way to set an error in a span, without throwing an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A change how the exception is handled, plus some minor nitpicks! That implementation is more reliable and allows us to properly trace async execution! Thanks!
def traced_execute(func, instance, args, kwargs): | ||
cluster = getattr(instance, 'cluster', None) | ||
def _close_span_on_success(result, future): | ||
span = getattr(future, CURRENT_SPAN, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked very well in general, using the Future
as a carrier, so it's safe to retrieve it that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is it good that way or do you think I should change something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! that one is a good approach! 👍
ddtrace/contrib/cassandra/session.py
Outdated
span.set_tags(_extract_result_metas(cassandra.cluster.ResultSet(future, result))) | ||
|
||
def traced_set_final_result(func, instance, args, kwargs): | ||
result = kwargs.get('result') or args[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original signature always have an args[0]
I assume. If not, we should put that in a try-catch
.
ddtrace/contrib/cassandra/session.py
Outdated
try: | ||
raise exc | ||
except: | ||
span.set_exc_info(*sys.exc_info()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case it's better not to use a try-except
because we're overriding the stacktrace at this point (other than introducing an overhead). Looking at their code, it seems that everything that goes through _set_final_exception()
has already been caught by another component, so in any case we don't have a stacktrace. What we can do instead, is handling the Exception
object manually, with something like:
try:
# handling the exception manually because we
# don't have an ongoing exception here
span.error = 1
span.set_tag(errors.ERROR_MSG, exc.args[0])
span.set_tag(errors.ERROR_TYPE, exc.__class__.__name__)
except Exception:
log.debug('Unable to se error') # provide a better message
finally:
span.finish()
Two things about the implementation above:
- we should make the implementation "safe" so we can use a
try-except-finally
statement, or simply make all accessors safe. The try is really efficient unless there is an exception, that should never happen in general (unlessexc
is not always anException
) - be sure to check the span tags in our tests
pin = Pin.get_from(cluster) | ||
if not pin or not pin.enabled(): | ||
return func(*args, **kwargs) | ||
|
||
service = pin.service | ||
tracer = pin.tracer | ||
# In case the current span is not finished we make sure to finish it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could happen quite often? Is it anyway an expected behavior in the execute_async
implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not happen if the ResponseFuture is used properly. But if someone where to call the start_fetching_next_page
before the previous call is finished I don't want us to keep spans open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, if it could happen at most once, I'd prefer to have a safe-guard in any case. Keeping opened spans is something that leads to definitely wrong traces. So good to me.
ddtrace/contrib/cassandra/session.py
Outdated
return func(*args, **kwargs) | ||
except: | ||
span.set_exc_info(*sys.exc_info()) | ||
span.finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the span.finish()
in the finally
block so it's closed even if we have an exception in set_exc_info()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting span.finish()
in a finally
would close the span even when the method returns without exception, which is not the intended behavior. I'll put a with
block in the except
block instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so if we don't have exceptions it must not be closed at this stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct.
ddtrace/contrib/cassandra/session.py
Outdated
return result | ||
except: | ||
span.set_exc_info(*sys.exc_info()) | ||
span.finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
ddtrace/contrib/cassandra/session.py
Outdated
_sanitize_query(span, query) | ||
span.set_tags(_extract_session_metas(session)) # FIXME[matt] do once? | ||
span.set_tags(_extract_cluster_metas(cluster)) | ||
span.set_tag(cassx.PAGE_NUMBER, page_number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we always have this tag? because if the response is not paginated, I'd say not to add this tag.
@@ -8,3 +8,4 @@ | |||
CONSISTENCY_LEVEL = "cassandra.consistency_level" | |||
PAGINATED = "cassandra.paginated" | |||
ROW_COUNT = "cassandra.row_count" | |||
PAGE_NUMBER = "cassandra.page_number" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
3055aea
to
c205cef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question open, think that if we're sure in that comment all the rest should be good.
is_paginated = has_more_pages or page_number > 1 | ||
metas[cassx.PAGINATED] = is_paginated | ||
if is_paginated: | ||
metas[cassx.PAGE_NUMBER] = page_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
7102241
to
d199e7e
Compare
future = session.execute_async(self.TEST_QUERY) | ||
future.result() | ||
span = getattr(future, '_ddtrace_current_span', None) | ||
ok_(span is None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This commit addresses #316
Since Cluster.execute() calls Cluster.execute_async(), I've moved instrumentation to the latter. However the behavior is slightly different from before. Since the tracing is now based on a callback, we no longer have the guarantee that the span is closed before the result is returned to application code.
I've changed the tests to account for this, by creating a custom tracer that allows to wait for all created spans to be closed before doing asserts.
One downside of this change, is that in order to prevent our callback from being removed by application code I had to replicate the code of ResponseFuture.clear_callbacks and directly access internal variables, which makes it more likely to break on future code changes in the cassandra library.
Also while doing this change I noticed that in case the query results are paginated we only trace the first page of the results, I think it would be nice to also patch ResponseFuture.start_fetching_next_page() and create a span for each result page.