Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cassandra] trace execute_async operations #333

Merged
merged 18 commits into from
Sep 7, 2017

Conversation

bmermet
Copy link
Contributor

@bmermet bmermet commented Aug 24, 2017

This commit addresses #316

Since Cluster.execute() calls Cluster.execute_async(), I've moved instrumentation to the latter. However the behavior is slightly different from before. Since the tracing is now based on a callback, we no longer have the guarantee that the span is closed before the result is returned to application code.
I've changed the tests to account for this, by creating a custom tracer that allows to wait for all created spans to be closed before doing asserts.

One downside of this change, is that in order to prevent our callback from being removed by application code I had to replicate the code of ResponseFuture.clear_callbacks and directly access internal variables, which makes it more likely to break on future code changes in the cassandra library.

Also while doing this change I noticed that in case the query results are paginated we only trace the first page of the results, I think it would be nice to also patch ResponseFuture.start_fetching_next_page() and create a span for each result page.

@bmermet bmermet force-pushed the bmermet/cassandra-executeasync-bug branch from 11d757d to 4a3bf79 Compare August 24, 2017 13:27
def traced_execute_async_errback(exc, span):
# FIXME how should we handle an exception that hasn't be rethrown yet
try:
raise exc
Copy link
Contributor Author

@bmermet bmermet Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly not the right way to do it. Any idea, how do we usually handle this type of errors ?
BTW this means we will log the cassandra errors differently even in the synchronous scenario.

@palazzem palazzem added this to the 0.9.2 milestone Aug 29, 2017
Copy link

@palazzem palazzem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've discussed, we may think to change a bit this approach, mostly because using callbacks may cause "race conditions" in the sense of when the DONE signal is propagated.

Looking forward the other implementation you had in mind, with the pagination support! Thank you a lot!

if not span:
log.debug('traced_set_final_exception was not able to get the current span from the ResponseFuture')
return func(*args, **kwargs)
with span:
Copy link
Contributor Author

@bmermet bmermet Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we want to handle that? I don't think that's the right way to do it since the stacktrace be in our code, which will be confusing to the user. Is there a way to set an error in a span, without throwing an exception?

Copy link

@palazzem palazzem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change how the exception is handled, plus some minor nitpicks! That implementation is more reliable and allows us to properly trace async execution! Thanks!

def traced_execute(func, instance, args, kwargs):
cluster = getattr(instance, 'cluster', None)
def _close_span_on_success(result, future):
span = getattr(future, CURRENT_SPAN, None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked very well in general, using the Future as a carrier, so it's safe to retrieve it that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is it good that way or do you think I should change something here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! that one is a good approach! 👍

span.set_tags(_extract_result_metas(cassandra.cluster.ResultSet(future, result)))

def traced_set_final_result(func, instance, args, kwargs):
result = kwargs.get('result') or args[0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original signature always have an args[0] I assume. If not, we should put that in a try-catch.

try:
raise exc
except:
span.set_exc_info(*sys.exc_info())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case it's better not to use a try-except because we're overriding the stacktrace at this point (other than introducing an overhead). Looking at their code, it seems that everything that goes through _set_final_exception() has already been caught by another component, so in any case we don't have a stacktrace. What we can do instead, is handling the Exception object manually, with something like:

try:
  # handling the exception manually because we 
  # don't have an ongoing exception here
  span.error = 1
  span.set_tag(errors.ERROR_MSG, exc.args[0])
  span.set_tag(errors.ERROR_TYPE, exc.__class__.__name__)
except Exception:
  log.debug('Unable to se error')  # provide a better message
finally:
  span.finish()

Two things about the implementation above:

  • we should make the implementation "safe" so we can use a try-except-finally statement, or simply make all accessors safe. The try is really efficient unless there is an exception, that should never happen in general (unless exc is not always an Exception)
  • be sure to check the span tags in our tests

pin = Pin.get_from(cluster)
if not pin or not pin.enabled():
return func(*args, **kwargs)

service = pin.service
tracer = pin.tracer
# In case the current span is not finished we make sure to finish it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could happen quite often? Is it anyway an expected behavior in the execute_async implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not happen if the ResponseFuture is used properly. But if someone where to call the start_fetching_next_page before the previous call is finished I don't want us to keep spans open.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, if it could happen at most once, I'd prefer to have a safe-guard in any case. Keeping opened spans is something that leads to definitely wrong traces. So good to me.

return func(*args, **kwargs)
except:
span.set_exc_info(*sys.exc_info())
span.finish()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the span.finish() in the finally block so it's closed even if we have an exception in set_exc_info()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting span.finish() in a finally would close the span even when the method returns without exception, which is not the intended behavior. I'll put a with block in the except block instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so if we don't have exceptions it must not be closed at this stage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

return result
except:
span.set_exc_info(*sys.exc_info())
span.finish()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

_sanitize_query(span, query)
span.set_tags(_extract_session_metas(session)) # FIXME[matt] do once?
span.set_tags(_extract_cluster_metas(cluster))
span.set_tag(cassx.PAGE_NUMBER, page_number)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we always have this tag? because if the response is not paginated, I'd say not to add this tag.

@@ -8,3 +8,4 @@
CONSISTENCY_LEVEL = "cassandra.consistency_level"
PAGINATED = "cassandra.paginated"
ROW_COUNT = "cassandra.row_count"
PAGE_NUMBER = "cassandra.page_number"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bmermet bmermet force-pushed the bmermet/cassandra-executeasync-bug branch 2 times, most recently from 3055aea to c205cef Compare August 31, 2017 10:50
Copy link

@palazzem palazzem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question open, think that if we're sure in that comment all the rest should be good.

is_paginated = has_more_pages or page_number > 1
metas[cassx.PAGINATED] = is_paginated
if is_paginated:
metas[cassx.PAGE_NUMBER] = page_number
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bmermet bmermet force-pushed the bmermet/cassandra-executeasync-bug branch from 7102241 to d199e7e Compare September 1, 2017 13:51
future = session.execute_async(self.TEST_QUERY)
future.result()
span = getattr(future, '_ddtrace_current_span', None)
ok_(span is None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@palazzem palazzem merged commit 737a996 into master Sep 7, 2017
@palazzem palazzem deleted the bmermet/cassandra-executeasync-bug branch September 7, 2017 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants