Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(profiling): lock Recorder on reset #1560

Merged
merged 4 commits into from
Jul 10, 2020

Commits on Jul 10, 2020

  1. refactor(profiling): move gevent/no-gevent code in a _nogevent module

    This should make it easier to access non-gevent function under gevent.
    jd committed Jul 10, 2020
    Configuration menu
    Copy the full SHA
    5eed924 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d36bc6c View commit details
    Browse the repository at this point in the history
  3. fix(profiling): lock Recorder on reset

    The lock-free approach does not work as the following scenario might happen:
    - Recorder.push_events() is called from a Collector
    - The collector gets a reference to the event queue and is interrupted
    - The Scheduler calls Recorder.reset() and passes the old events queue to the
      Exporter
    - The Exporter starts to export and iterate over the list of events
    - The Exporter gets interrupted and the Collector resumes
    - The Collectors finally pushes its events and go back to sleep
    - The Exporter resumes and the iteration breaks because the event queue changed
    
    This can be seen in the wild with the following backtrace:
    
      Traceback (most recent call last):
        File "/lib/python2.7/site-packages/ddtrace/profiling/scheduler.py", line 44, in flush
          exp.export(events, start, self._last_export)
        File "/lib/python2.7/site-packages/ddtrace/profiling/exporter/http.py", line 132, in export
          profile = super(PprofHTTPExporter, self).export(events, start_time_ns, end_time_ns)
        File "/lib/python2.7/site-packages/ddtrace/profiling/exporter/pprof.py", line 327, in export
          for event in events.get(stack.StackSampleEvent, []):
      RuntimeError: deque mutated during iteration
    
    In order to fix that, a Lock must be used.
    
    For the gevent case, we actually need two locks:
    - one for locking out the coroutines
    - one for locking out the OS thread that might be used (e.g. stack collector)
    
    We introduce a DoubleLock class that does exactly that.
    jd committed Jul 10, 2020
    Configuration menu
    Copy the full SHA
    6ced7f2 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    00878fc View commit details
    Browse the repository at this point in the history