Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle exceptions from Exporters #270

Closed
kornholi opened this issue Aug 21, 2018 · 5 comments · Fixed by #297
Closed

Handle exceptions from Exporters #270

kornholi opened this issue Aug 21, 2018 · 5 comments · Fixed by #297

Comments

@kornholi
Copy link
Contributor

Currently if there's an exception while emitting spans, BackgroundThreadTransport will stop working.

Exception in thread opencensus.trace.Worker:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/app/service/image.binary.runfiles/pypi__opencensus_0_1_4/opencensus/trace/exporters/transports/background_thread.py", line 104, in _thread_main
    self.exporter.emit(span_datas)
  File "/app/service/image.binary.runfiles/pypi__opencensus_0_1_4/opencensus/trace/exporters/stackdriver_exporter.py", line 149, in emit
    self.client.batch_write_spans(name, stackdriver_spans)
  File "/app/service/image.binary.runfiles/pypi__google_cloud_trace_0_19_0/google/cloud/trace/client.py", line 104, in batch_write_spans
    timeout=timeout)
  File "/app/service/image.binary.runfiles/pypi__google_cloud_trace_0_19_0/google/cloud/trace/_gapic.py", line 95, in batch_write_spans
    timeout=timeout)
  File "/app/service/image.binary.runfiles/pypi__google_cloud_trace_0_19_0/google/cloud/trace_v2/gapic/trace_service_client.py", line 200, in batch_write_spans
    request, retry=retry, timeout=timeout, metadata=metadata)
  File "/app/service/image.binary.runfiles/pypi__google_api_core_1_2_1/google/api_core/gapic_v1/method.py", line 139, in __call__
    return wrapped_func(*args, **kwargs)
  File "/app/service/image.binary.runfiles/pypi__google_api_core_1_2_1/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "/app/service/image.binary.runfiles/pypi__six_1_11_0/six.py", line 737, in raise_from
    raise value
ServiceUnavailable: 503 Connect Failed
Exception in thread opencensus.trace.Worker:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/app/service/image.binary.runfiles/pypi__opencensus_0_1_5/opencensus/trace/exporters/transports/background_thread.py", line 104, in _thread_main
    self.exporter.emit(span_datas)
  File "/app/service/image.binary.runfiles/pypi__opencensus_0_1_5/opencensus/trace/exporters/stackdriver_exporter.py", line 152, in emit
    self.client.batch_write_spans(name, stackdriver_spans)
  File "/app/service/image.binary.runfiles/pypi__google_cloud_trace_0_19_0/google/cloud/trace/client.py", line 104, in batch_write_spans
    timeout=timeout)
  File "/app/service/image.binary.runfiles/pypi__google_cloud_trace_0_19_0/google/cloud/trace/_gapic.py", line 96, in batch_write_spans
    timeout=timeout)
  File "/app/service/image.binary.runfiles/pypi__google_cloud_trace_0_19_0/google/cloud/trace_v2/gapic/trace_service_client.py", line 200, in batch_write_spans
    request, retry=retry, timeout=timeout, metadata=metadata)
  File "/app/service/image.binary.runfiles/pypi__google_api_core_1_2_1/google/api_core/gapic_v1/method.py", line 139, in __call__
    return wrapped_func(*args, **kwargs)
  File "/app/service/image.binary.runfiles/pypi__google_api_core_1_2_1/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "/app/service/image.binary.runfiles/pypi__six_1_11_0/six.py", line 737, in raise_from
    raise value
ResourceExhausted: 429 Insufficient tokens for quota 'cloudtrace.googleapis.com/write_requests' and limit 'WriteRequestsPerMinutePerProject' of service 'cloudtrace.googleapis.com' for consumer 'project_number:<project id>'.

Should it be up to the exporters themselves to handle exceptions? There could be some weird interactions in that case, for example, if the exporter sleeps to retry when using SyncTransport.

@ocervell
Copy link
Contributor

+1 to this. Sometimes the Stackdriver Exporter can't contact the Stackdriver APIs temporarily and makes the whole background thread crash.

@ocervell
Copy link
Contributor

ocervell commented Aug 31, 2018

Frequently getting the following exception with the StackdriverExporter, which seems caused by transient unavailability in the metadata service, and crashes the BackgroundThread and stops the trace collection completely.

ServiceUnavailable: 503 Getting metadata from plugin failed with error: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f24fd5277d0>: Failed to establish a new connection: [Errno 24] Too many open files',))

We should add retry logic to the emit function for StackdriverExporter such as:

from retrying import retry

RETRY_EXP_MULTIPLIER = 1000
RETRY_EXP_MAX= 10000

class StackdriverExporter(base.Exporter):
     ...
     @retry(wait_exponential_multiplier=RETRY_EXP_MULTIPLIER , wait_exponential_max=RETRY_EXP_MAX)
     def emit(self, span_datas):
           ....

Should we do this to all the exporters contacting remote APIs to make sure we're not crashing the collection thread ?

@ocervell
Copy link
Contributor

ocervell commented Sep 4, 2018

Actually I think we should just make sure the BackgroundThreadTransport never crashes (or restart itself when it does), as the retries are already included in Google Cloud SDKs.

@ocervell
Copy link
Contributor

ocervell commented Sep 8, 2018

This is linked to googleapis/google-auth-library-python#211

@eddie-scio
Copy link

Frequently getting the following exception with the StackdriverExporter, which seems caused by transient unavailability in the metadata service, and crashes the BackgroundThread and stops the trace collection completely.

ServiceUnavailable: 503 Getting metadata from plugin failed with error: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f24fd5277d0>: Failed to establish a new connection: [Errno 24] Too many open files',))

We should add retry logic to the emit function for StackdriverExporter such as:

from retrying import retry

RETRY_EXP_MULTIPLIER = 1000
RETRY_EXP_MAX= 10000

class StackdriverExporter(base.Exporter):
     ...
     @retry(wait_exponential_multiplier=RETRY_EXP_MULTIPLIER , wait_exponential_max=RETRY_EXP_MAX)
     def emit(self, span_datas):
           ....

Should we do this to all the exporters contacting remote APIs to make sure we're not crashing the collection thread ?

Actually I think we should just make sure the BackgroundThreadTransport never crashes (or restart itself when it does), as the retries are already included in Google Cloud SDKs.

Is the client's default here actually retrying? I'm seeing instances of singular 503's without retries. Thanks to #297 it doesn't crash anything, but 1) I do get an error in reporting and 2) the items get dropped from the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants