Skip to content

Commit

Permalink
Fix placeholder for skipped samples in allocation profiler breaking h…
Browse files Browse the repository at this point in the history
…eap profiler

I missed it on my original commits, but the placeholder for skipped
samples broke the heap profiler, as it started failing with:

> W, [2024-07-08T15:41:27.128532 #366230]  WARN -- datadog: [datadog]
> CpuAndWallTimeWorker thread error. Cause: RuntimeError Ended a heap
> recording that was not started Location:
> missing-testcase.rb:2:in `new'
> W, [2024-07-08T15:41:27.128583 #366230]  WARN -- datadog: [datadog]
> Detected issue with profiler (worker component), stopping profiling.
> See previous log messages for details.

This is triggered by how our current heap profiler integrates with
the rest of the profiler: it expects to be called twice for each
allocation sample -- once before the stack is collected, and once
after.

The implementation of the placeholder for skipped samples correctly
did not call the heap profiler before the placeholder stack got
collected BUT we were still calling the heap profiler after.

This caused its internal consistency checks to fail.

The fix for this is to not call the heap profiler after the
placeholder sample. I chose to add a new argument to `record_sample`
which is not particularly pretty, but I guess until we come up with
a better way to have the heap profiler integrate with the rest of
the components, it'll do.
  • Loading branch information
ivoanjo committed Jul 8, 2024
1 parent 366df16 commit 6f8f756
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 5 deletions.
6 changes: 4 additions & 2 deletions ext/datadog_profiling_native_extension/collectors_stack.c
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,8 @@ void sample_thread(
recorder_instance,
(ddog_prof_Slice_Location) {.ptr = buffer->locations, .len = captured_frames},
values,
labels
labels,
/* placeholder: */ false
);
}

Expand Down Expand Up @@ -339,7 +340,8 @@ void record_placeholder_stack(
recorder_instance,
(ddog_prof_Slice_Location) {.ptr = buffer->locations, .len = 1},
values,
labels
labels,
/* placeholder: */ true
);
}

Expand Down
4 changes: 2 additions & 2 deletions ext/datadog_profiling_native_extension/stack_recorder.c
Original file line number Diff line number Diff line change
Expand Up @@ -586,7 +586,7 @@ static VALUE ruby_time_from(ddog_Timespec ddprof_time) {
return rb_time_timespec_new(&time, utc);
}

void record_sample(VALUE recorder_instance, ddog_prof_Slice_Location locations, sample_values values, sample_labels labels) {
void record_sample(VALUE recorder_instance, ddog_prof_Slice_Location locations, sample_values values, sample_labels labels, bool placeholder) {
struct stack_recorder_state *state;
TypedData_Get_Struct(recorder_instance, struct stack_recorder_state, &stack_recorder_typed_data, state);

Expand All @@ -605,7 +605,7 @@ void record_sample(VALUE recorder_instance, ddog_prof_Slice_Location locations,
metric_values[position_for[ALLOC_SAMPLES_VALUE_ID]] = values.alloc_samples;
metric_values[position_for[TIMELINE_VALUE_ID]] = values.timeline_wall_time_ns;

if (values.alloc_samples != 0) {
if (!placeholder && values.alloc_samples > 0) {
// If we got an allocation sample end the heap allocation recording to commit the heap sample.
// FIXME: Heap sampling currently has to be done in 2 parts because the construction of locations is happening
// very late in the allocation-sampling path (which is shared with the cpu sampling path). This can
Expand Down
2 changes: 1 addition & 1 deletion ext/datadog_profiling_native_extension/stack_recorder.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ typedef struct sample_labels {
int64_t end_timestamp_ns;
} sample_labels;

void record_sample(VALUE recorder_instance, ddog_prof_Slice_Location locations, sample_values values, sample_labels labels);
void record_sample(VALUE recorder_instance, ddog_prof_Slice_Location locations, sample_values values, sample_labels labels, bool placeholder);
void record_endpoint(VALUE recorder_instance, uint64_t local_root_span_id, ddog_CharSlice endpoint);
void track_object(VALUE recorder_instance, VALUE new_object, unsigned int sample_weight, ddog_CharSlice *alloc_class);
VALUE enforce_recorder_instance(VALUE object);
10 changes: 10 additions & 0 deletions spec/datadog/profiling/collectors/thread_context_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1237,6 +1237,16 @@ def self.otel_sdk_available?
expect(single_sample.locations.size).to be 1
expect(single_sample.locations.first.path).to eq 'Skipped Samples'
end

context 'when heap sampling is enabled' do
let(:recorder) { build_stack_recorder(heap_samples_enabled: true) }

it 'records only the number of skipped allocations, and does not record any heap samples' do
GC.start # Force any incorrect heap samples to have age > 1

expect(single_sample.values).to include('alloc-samples': 123, 'heap-live-samples': 0)
end
end
end

describe '#thread_list' do
Expand Down

0 comments on commit 6f8f756

Please sign in to comment.