Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/telemetry: enable instrumentation telemetry in the tracer and profiler #1769

Merged
merged 99 commits into from
Mar 27, 2023

Conversation

lievan
Copy link
Contributor

@lievan lievan commented Mar 1, 2023

What does this PR do?

This PR is meant to enable telemetry by default in the tracer and profiler. It is a "minimal version" of adding instrumentation telemetry data, because:

  1. Only the tracer can start the instrumentation telemetry client on tracer.Start. If the profiler attempts to send telemetry info before the tracer has started, no info is sent.
  2. Nothing telemetry-related happens when the profiler stops.

The important file changes are in:
Profiler:
profiler/profiler.go
profiler/telemetry.go
profiler.telemetry_test.go
Tracer:
ddtrace/tracer/tracer.go
ddtrace/tracer/telemetry.go
ddtrace/tracer/telemetry_test.go

The rest of the changes are in the tracer test files, since telemetry logs caused some logging-related unit tests to fail

Motivation

Describe how to test/QA your changes

Reviewer's Checklist

  • If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • Changed code has unit tests for its functionality.
  • If this interacts with the agent in a new way, a system test has been added.

@lievan lievan added this to the v1.49.0 milestone Mar 1, 2023
@lievan lievan changed the title internal/telemetry: enable v2 instrumentation telemetry by default internal/telemetry: enable instrumentation telemetry in the tracer and profiler Mar 17, 2023
@lievan lievan marked this pull request as draft March 17, 2023 17:58
@lievan lievan marked this pull request as ready for review March 24, 2023 17:43
Copy link
Contributor

@nsrip-dd nsrip-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In general this looks good to go for profiling, just a few things to point out (see comments)

profiler/telemetry.go Show resolved Hide resolved
profiler/telemetry_test.go Outdated Show resolved Hide resolved
profiler/telemetry_test.go Outdated Show resolved Hide resolved
profiler/telemetry_test.go Show resolved Hide resolved
ajgajg1134
ajgajg1134 previously approved these changes Mar 24, 2023
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good to me, just one small nit

ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved
@lievan lievan merged commit 5a392f3 into main Mar 27, 2023
@lievan lievan deleted the evan.li/migrate-telemetry-v2 branch March 27, 2023 13:33
ajgajg1134 added a commit that referenced this pull request Mar 29, 2023
nsrip-dd pushed a commit that referenced this pull request Mar 30, 2023
…d profiler (#1769)

Send instrumentation telemetry data from the tracer and profiler.

Only the tracer can start the instrumentation telemetry client on tracer.Start. If the profiler attempts to send telemetry info before the tracer has started, no info is sent.

Nothing telemetry-related happens when the profiler stops.

* start telemetry client when tracer starts

* enable telemetry by default and refactor transport unit tests

* fixed race condition in transport tests

* add new telemetry function with starter options to telemetry package

* add options file

* find API key when agentless and disable telemetry for clean stop test

* remove telemetry logs in tracer log tests

* call telemetry.NewClient to init profiler telemetry client, turn telemetry on by default for profiler

* change test telemetry enabled

* lint

* disable instrumentation telemetry for transport tests to prevent flakiness

* remove non-valid config values

* change config to single test value

* send empty payload

* add service mappings and global tags to telemetry config

* add product version and enabled details

* add product field to newrequest

* enabled field change to bool

* required field of enabled

* remove profiler from client products info

* remove telemetry client from the profiler for now, keep it for the tracer

* add sampling rules to config data, clarify sampling rule comments

* retry sending telemetry with agentless endpoint given agent errors

* update headers to v2

* update host spec to v2

* remove dependencies from app-started into seperate event

* add support for dependency collection env var

* remove lib fields from metrics

* fix race on temp agentless bool

* add products to app-started and fix errors in configuration schema

* error should not be arr

* lint

* fix mismatched type err'

* send start error data for telemetry

* changed config key values to be consistent with telemetry backend normalization

* del withstarterrors fn

* user tracer log for instrumentation telemetry client:

* fix lint and flaky tests

* implement global telemetry client

* fix race cond on temp agentless var

* refactor telemetry starting logic, reset global telemetry client default values on tracer start

* enable instrumentation telemetry for the profiler

* fix hanging test case

* fix timeout in profiler test, rememebr to stop telemetry

* remove sending error telemetry information for now

* refactor telemetry code in the tracer and profiler

* remove flush timer and just flush during heartbeat interval

* fix lint

* remove the ability for the profiler to start telemetry client

* dont send app-product-change if client hasn't started, update api version in headers

* Update ddtrace/tracer/telemetry.go

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>

* Update internal/telemetry/client.go

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>

* Update internal/telemetry/message.go

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>

* update comments and reformat

* add api key to headers when retrying:

* add agentlessRetry function for retry logic

* switch statement for namespace selection when sending product data

* git rid of black blox testing to speed up the tests, minimize possibility of ci timeout

* fix lint

* move around some functions for clarity

* differentiate namespace for each metric

* Update ddtrace/tracer/telemetry.go

Co-authored-by: Katie Hockman <katie@hockman.dev>

* Update internal/telemetry/client.go

Co-authored-by: Katie Hockman <katie@hockman.dev>

* Update internal/telemetry/client.go

Co-authored-by: Katie Hockman <katie@hockman.dev>

* Update internal/telemetry/client.go

Co-authored-by: Katie Hockman <katie@hockman.dev>

* Update internal/telemetry/option.go

Co-authored-by: Katie Hockman <katie@hockman.dev>

* Update internal/telemetry/client.go

Co-authored-by: Katie Hockman <katie@hockman.dev>

* export Disabled fn for the purpose of checking whether instrumentation telemetry is disabled

* log at info level

* rename submit fn and add comments describing functionality

* add kernel architecture

* dont stop telemetry client when profiler stops

* move all env var config logic into Start

* take away responsibility of disabling the client from With.. functions

* removed unused field

* add an events file and move all events-related code to the file

* rename events file to telemetry

* send produc change on profiler stop

* lint

* ignore telemetry logs

* set api key manually for testing

* no need to check for api key in applydefaultops

* api key validation not our job i think

* readd checking fro agentless for setting api key

* update with latest telemetry changes

* fix lint

* resolve missing telemetry when start is called multiple times without stop

* Update ddtrace/tracer/tracer_test.go

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>

* add note about telemetry when tracing is disabled

---------

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>
Co-authored-by: Katie Hockman <katie@hockman.dev>
nsrip-dd added a commit that referenced this pull request Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants