Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opentelemetry][callback] Add support for http exporter #8321

Merged

Conversation

wilfriedroset
Copy link
Contributor

SUMMARY

The previous version of the callback was supporting only the grpc exporter. This was counter intuitive as the documentation was mentioning <your endpoint (OTLP/HTTP)>. Users were left with a error similar to
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to <endpoint>, retrying in 1s.

The following commit fix this situation by support both HTTP and GRPC via the standard environment variables and ansible.cfg

Fixes #7888

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

community.general.opentelemetry

ADDITIONAL INFORMATION

To reproduce and validate the correct behavior
Consider the following directory setup

$ tree
.
├── ansible.cfg
├── ansible-execution_environment
│   └── Dockerfile
├── docker-compose.yaml
├── opentelemetry.py
└── test.yaml

2 directories, 5 files

Set ansible.cfg

$ cat ansible.cfg
[defaults]
callbacks_enabled = community.general.opentelemetry
[callback_opentelemetry]
enable_from_environment = ANSIBLE_OPENTELEMETRY_ENABLED

Set a sample playbook:

$ cat test.yaml
- hosts: localhost
  connection: local
  gather_facts: false
  tasks:
    - name: Print the gateway for each host when defined
      ansible.builtin.debug:
        msg: Test

The execution environment is the following, please note that any environment with ansible and the correct python deps should work as well

$ cat ansible-execution_environment/Dockerfile
FROM quay.io/ansible/awx-ee:23.9.0

USER root

RUN pip install --no-cache-dir \
  opentelemetry-api==1.24.0 \
  opentelemetry-exporter-otlp==1.24.0 \
  opentelemetry-sdk==1.24.0

USER 1000

Here is the docker-compose file

$ cat docker-compose.yaml
services:
  jaeger:
    image: jaegertracing/all-in-one:1.56
    container_name: jaeger
    ports:
      - 45400:16686
    environment:
      - COLLECTOR_OTLP_ENABLED=true
  ansible:
    image: reproducer
    build: ./ansible-execution_environment
    container_name: ansible
    volumes:
      - ./:/mnt
      # - ./opentelemetry.py:/usr/share/ansible/collections/ansible_collections/community/general/plugins/callback/opentelemetry.py # Mount the patched file in the correct directory
    environment:
      - ANSIBLE_OPENTELEMETRY_ENABLED=true
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
      - OTEL_SERVICE_NAME=testing-ansible-tracing
    entrypoint: sleep 3000

Build the image

docker-compose build

Start the stack

docker-compose up

Now enter the execution environment

docker exec -it -u root -w /mnt ansible /bin/bash

Witness ansible complaining about the error before the patch

# ansible-playbook test.yaml
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match
'all'

PLAY [localhost] *****************************************************************************************************

TASK [Print the gateway for each host when defined] ******************************************************************
ok: [localhost] => {
    "msg": "Test"
}

PLAY RECAP ***********************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Transient error StatusCode.UNAVAILABLE encountered while exporting traces to jaeger:4318, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to jaeger:4318, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to jaeger:4318, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to jaeger:4318, retrying in 8s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to jaeger:4318, retrying in 16s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to jaeger:4318, retrying in 32s.

After the file is patched and the correct variable is exported there is no error the trace is visible in the backend

# export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf
# ansible-playbook test.yaml
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match
'all'

PLAY [localhost] *****************************************************************************************************

TASK [Print the gateway for each host when defined] ******************************************************************
ok: [localhost] => {
    "msg": "Test"
}

PLAY RECAP ***********************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

@ansibullbot
Copy link
Collaborator

cc @v1v
click here for bot help

@ansibullbot ansibullbot added WIP Work in progress bug This issue/PR relates to a bug callback callback plugin plugins plugin (any type) labels May 6, 2024
@wilfriedroset
Copy link
Contributor Author

Ping @mihai-satmarean this should fix your error.
Ping @v1v for the review maybe 🙏🏾

@wilfriedroset wilfriedroset marked this pull request as ready for review May 6, 2024 13:38
The previous version of the callback was supporting only the grpc
exporter. This was counter intuitive as the documentation was
mentioning `<your endpoint (OTLP/HTTP)>`. Users were left with a error
similar to
`Transient error StatusCode.UNAVAILABLE encountered while exporting traces to <endpoint>, retrying in 1s.`

The following commit fix this situation by support both HTTP and GRPC
via the standard environment variables and ansible.cfg

See as well ansible-collections#7888

Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>
@mihai-satmarean
Copy link

Ping @mihai-satmarean this should fix your error. Ping @v1v for the review maybe 🙏🏾

Hi, thank you! I will find some time these days!

@ansibullbot ansibullbot removed the WIP Work in progress label May 6, 2024
@felixfontein felixfontein added the check-before-release PR will be looked at again shortly before release and merged if possible. label May 6, 2024
Copy link
Collaborator

@felixfontein felixfontein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! I've added some comments below.

changelogs/fragments/8321-fix-opentelemetry-callback.yml Outdated Show resolved Hide resolved
plugins/callback/opentelemetry.py Outdated Show resolved Hide resolved
plugins/callback/opentelemetry.py Outdated Show resolved Hide resolved
plugins/callback/opentelemetry.py Outdated Show resolved Hide resolved
plugins/callback/opentelemetry.py Outdated Show resolved Hide resolved
plugins/callback/opentelemetry.py Outdated Show resolved Hide resolved
@v1v
Copy link
Contributor

v1v commented May 6, 2024

Thanks @wilfriedroset , that's super great. I'll run some manual tests to validate your changes on my end this week (likely by the end of the week)

@wilfriedroset
Copy link
Contributor Author

Thank for the quick review @felixfontein, I've taken into account all your comments.

@ansibullbot ansibullbot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR label May 6, 2024
Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>
@ansibullbot ansibullbot removed the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR label May 7, 2024
@russoz
Copy link
Collaborator

russoz commented May 12, 2024

LGTM (although I am not familiar with the callback nor I use OT at the moment)

@felixfontein
Copy link
Collaborator

@v1v did you have a chance to look at this?

@v1v
Copy link
Contributor

v1v commented May 15, 2024

@v1v did you have a chance to look at this?

I've been working on setting a test environment and it took me longer. I'm gonna run some tests shortly, thanks for your patient 🙏

@ansibullbot ansibullbot added the stale_ci CI is older than 7 days, rerun before merging label May 15, 2024
@v1v
Copy link
Contributor

v1v commented May 15, 2024

All good, I did run a few manual executions. Thanks so much @wilfriedroset

@felixfontein felixfontein merged commit 6889e04 into ansible-collections:main May 15, 2024
132 of 133 checks passed
@felixfontein felixfontein removed the check-before-release PR will be looked at again shortly before release and merged if possible. label May 15, 2024
@felixfontein
Copy link
Collaborator

@wilfriedroset thanks a lot for your contribution!
@v1v @russoz thanks a lot for reviewing and testing this!

v1v added a commit to v1v/community.general that referenced this pull request May 16, 2024
…t-flag

* upstream/main:
  Add REUSE badge (ansible-collections#8365)
  Initial commit for django modutils and django_command module (ansible-collections#8349)
  [opentelemetry][callback] Add support for http exporter (ansible-collections#8321)
  Fix sanitize for keycloak_identitiy_provider. (ansible-collections#8355)
austinlucaslake pushed a commit to austinlucaslake/community.general that referenced this pull request May 25, 2024
…ections#8321)

* [opentelemetry][callback] Add support for http exporter

The previous version of the callback was supporting only the grpc
exporter. This was counter intuitive as the documentation was
mentioning `<your endpoint (OTLP/HTTP)>`. Users were left with a error
similar to
`Transient error StatusCode.UNAVAILABLE encountered while exporting traces to <endpoint>, retrying in 1s.`

The following commit fix this situation by support both HTTP and GRPC
via the standard environment variables and ansible.cfg

See as well ansible-collections#7888

Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>

* [opentelemetry][callback] Take into account review

Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>

---------

Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue/PR relates to a bug callback callback plugin plugins plugin (any type) stale_ci CI is older than 7 days, rerun before merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

open telemetry callback error: Transient error StatusCode.UNAVAILABLE
6 participants