Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OTel tracing configuration to VSAIO #134

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

matthewoliver
Copy link
Collaborator

@matthewoliver matthewoliver commented Sep 27, 2022

When TRACING=true, the VSAIO will be configured to use tracing. This
involves:

  • Adding trace middleware to all the wsgi server pipelines and config
    (not internal client... but hmm)
  • Add bin/reset_jaeger.sh helper script to start and clear/restart the
  • Start a jaeger all-in-one in the virtual machine using reset_jaeger.sh
  • Add /etc/swift/jaeger_exporter.json file which points to the running docker
    image.
  • Sets up some basic configuration for the traces, namely trace every
    request going through the proxies.

Because the proxy spans can get quite big, they can get bigger then
opentracing UDP max size. So enable udp_split_oversize_batches in the
jaeger_exporter. Otherwise it's just dropped.

Also added a bunch tracing notes to the end of the chef run which
currently reads:

==> default: Recipe: swift::tracing_info
==> default: * execute[Start jaeger all-in-one docker image] action run
==> default:
==> default: [execute] Unable to find image 'jaegertracing/all-in-one:1.27' locally
==> default: 1.27: Pulling from jaegertracing/all-in-one
==> default: a0d0a0d46f8b: Pulling fs layer
==> default: b45576136ee2: Pulling fs layer
==> default: d22e8500bf73: Pulling fs layer
==> default: 1a972b89c2b0: Pulling fs layer
==> default: 1a972b89c2b0: Waiting
==> default: b45576136ee2: Verifying Checksum
==> default: b45576136ee2: Download complete
==> default: a0d0a0d46f8b: Verifying Checksum
==> default: a0d0a0d46f8b: Download complete
==> default: a0d0a0d46f8b: Pull complete
==> default: b45576136ee2: Pull complete
==> default: 1a972b89c2b0: Verifying Checksum
==> default: 1a972b89c2b0: Download complete
==> default: d22e8500bf73: Verifying Checksum
==> default: d22e8500bf73: Download complete
==> default: d22e8500bf73: Pull complete
==> default: 1a972b89c2b0: Pull complete
==> default: Digest: sha256:8d0bff43db3ce5c528cb6f957520511d263d7cceee012696e4afdc9087919bb9
==> default: Status: Downloaded newer image for jaegertracing/all-in-one:1.27
==> default: 59364e3d2e876234fc1b43d01ddd434d61dbc4bc1e659cd9bfb90d05aa13d15c
==> default: [2022-09-28T05:59:41+00:00] INFO: execute[Start jaeger all-in-one docker image] ran successfully
==> default:
==> default: - execute /vagrant/bin/reset_jaeger.sh
==> default:
==> default: * log[show jaeger docker info] action write
==> default: [2022-09-28T05:59:41+00:00] INFO:
==> default: A Jaeger all-in-one has been started in the vagrant environment. It was started with the bin/reset_jaeger tool.
==> default: You can view all your traces at: http://saio:16686/search
==> default:
==> default: The reset_jaeger.sh script basically runs:
==> default:
==> default: docker run -d --name jaeger
==> default: -e COLLECTOR_ZIPKIN_HOST_PORT=:9411
==> default: -p 5775:5775/udp
==> default: -p 6831:6831/udp
==> default: -p 6832:6832/udp
==> default: -p 5778:5778
==> default: -p 16686:16686
==> default: -p 14268:14268
==> default: -p 14250:14250
==> default: -p 9411:9411
==> default: jaegertracing/all-in-one:1.27
==> default:
==> default: See: https://www.jaegertracing.io/docs/1.27/getting-started/
==> default:
==> default: If you want to reset and clear the traces just run:
==> default:
==> default: reset_jaeger.sh
==> default:
==> default: NOTE: We should go via OTel collector, but I havn't implemented that yet. But there are some notes on that below
==> default:
==> default: For the OTel collector we should be able to run something like:
==> default:
==> default: docker pull otel/opentelemetry-collector:latest
==> default: docker run -d otel/opentelemetry-collector:latest
==> default:
==> default: See: https://opentelemetry.io/docs/collector/getting-started/
==> default: NOTE: of course you can also specify a version. We should probably pick whatever we use for prod (when we get that far)
==> default:
==> default: If you want to have a custom config, volume mount in a config:
==> default:
==> default: docker run -v $(pwd)/config.yaml:/etc/otelcol/config.yaml otel/opentelemetry-collector

Copy link
Collaborator

@clayg clayg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go ahead and get docker working on a vsaio by default, maybe steal the few useful things from #131

I think the cookbook/recipes should go ahead and start the jager container and whatever else so that after vagrant up it "just works"

I tested this and got my jager container running with reset_jager.sh, but in the GUI drop down under service I only see "jager-query" and can't find my swift traces... so I don't really know if it's working.

cookbooks/swift/recipes/configs.rb Outdated Show resolved Hide resolved
cookbooks/swift/recipes/configs.rb Outdated Show resolved Hide resolved
cookbooks/swift/recipes/default.rb Show resolved Hide resolved
cookbooks/swift/recipes/setup.rb Show resolved Hide resolved
cookbooks/swift/recipes/tracing_info.rb Outdated Show resolved Hide resolved
bin/reset_jaeger.sh Show resolved Hide resolved
You will find a helper script in the bin/ directory that'll you can run on your host to start (or reset) the
running jaeger container. Just run:

bin/reset_jaeger.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we run this on startup - so that after TRACING=true vagrant up you can make requests and view http://saio:16686/search ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this change. It now runs docker internally and you can browse to http://saio:16686/search on your host and view your traces. Works nice for me. Great idea!

Copy link
Collaborator

@clayg clayg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i checked out https://review.opendev.org/c/openstack/swift/+/857559/3 and this worked great out of the box!

log 'show jaeger docker info' do
message %(
A Jaeger all-in-one has been started in the vagrant environment. It was started with the bin/reset_jaeger tool.
You can view all your traces at: http://saio:16686/search
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only output that I actually wanted


If you want to have a custom config, volume mount in a config:

docker run -v $(pwd)/config.yaml:/etc/otelcol/config.yaml otel/opentelemetry-collector
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get rid of all this extra output at the end of vagrant up

either put it in a bin/ script or docs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewoliver if we can get rid of this extra warnings in the end after provisioning the vagrant VM, you have my approval as well to get this merged. The tool has been helpful in tracing paths for each requests made and latency decomposition within each function invoked while interacting with swift using the swift client

@matthewoliver
Copy link
Collaborator Author

matthewoliver commented Oct 11, 2022 via email

@matthewoliver
Copy link
Collaborator Author

matthewoliver commented Oct 11, 2022 via email

Copy link
Collaborator

@clayg clayg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll work on this some more next time I check out otel - it would be great to land support in vsaio shortly after we merge it upstream: what's the next steps there?

docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-p 5775:5775/udp -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 \
-p 16686:16686 -p 14268:14268 -p 14250:14250 -p 9411:9411 \
jaegertracing/all-in-one:1.27
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IME with docker run we'll want a --rm

"opentelemetry-sdk",
"opentelemetry-semantic-conventions",
"opentelemetry-exporter-jaeger",
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if there's a pip install -e .[opentelem=true] sort of invocation we could get working

@@ -183,6 +184,7 @@
group node["username"]
variables({
:disable_encryption => ! node['encryption'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like for encryption we always create the filters and put them in the pipeline, but just turn them on/of based on the config option - I wonder if that strategy makes more sense after we get otel merged.

@rshaw1467
Copy link

I just found this and the work you're doing here is awesome! Quick question though, why use the Jager exporter instead of a generic Otel exporter like opentelemetry-exporter-otlp which allows to send to any tool supporting otel over RPC/HTTP?

@matthewoliver
Copy link
Collaborator Author

Thanks @rshaw1467. Great question, and your right! I do plan to move it over to Otel collector. If fact have an OTel collector now setup in our dev environment to do jus that. Jaeger was the first version because this otel tracing in swift forked from some slightly older OpenTracing version I was working on before OTel, which was using jaeger. So I still had that infrastructure up, so just worked.
Having said that, when people wanted to test this, throwing up a jaeger all in one in docker was easy. As a dev envirnment I'll need to throw up an otel collector and something to view the traces, unles opentelemetry-exporter-otlp also allow you to view traces or do I still need to send them to something like jaeger?

@rshaw1467
Copy link

You are correct, you'll still need a backend to view the trace data. I was just thinking from a productionized sense where people are using existing observability tools like Dynatrace, Datadog, New Relic etc. a generic exporter makes sense.

@matthewoliver
Copy link
Collaborator Author

matthewoliver commented Nov 8, 2022 via email

When TRACING=true, the VSAIO will be configured to use tracing. This
involves:

 - Adding trace middleware to all the wsgi server pipelines and config
   (not internal client... but hmm)
 - Add bin/reset_jaeger.sh helper script to start and clear/restart the
 - Start a jaeger all-in-one in the virtual machine using reset_jaeger.sh
 - Add /etc/swift/jaeger_exporter.json file which points to the running docker
   image.
 - Sets up some basic configuration for the traces, namely trace every
   request going through the proxies.

Because the proxy spans can get quite big, they can get bigger then
opentracing UDP max size. So enable udp_split_oversize_batches in the
jaeger_exporter. Otherwise it's just dropped.

Also added a bunch tracing notes to the end of the chef run which
currently reads:

  ==> default: Recipe: swift::tracing_info
  ==> default:   * execute[Start jaeger all-in-one docker image] action run
  ==> default:
  ==> default:     [execute] Unable to find image 'jaegertracing/all-in-one:1.27' locally
  ==> default:               1.27: Pulling from jaegertracing/all-in-one
  ==> default:               a0d0a0d46f8b: Pulling fs layer
  ==> default:               b45576136ee2: Pulling fs layer
  ==> default:               d22e8500bf73: Pulling fs layer
  ==> default:               1a972b89c2b0: Pulling fs layer
  ==> default:               1a972b89c2b0: Waiting
  ==> default:               b45576136ee2: Verifying Checksum
  ==> default:               b45576136ee2: Download complete
  ==> default:               a0d0a0d46f8b: Verifying Checksum
  ==> default:               a0d0a0d46f8b: Download complete
  ==> default:               a0d0a0d46f8b: Pull complete
  ==> default:               b45576136ee2: Pull complete
  ==> default:               1a972b89c2b0: Verifying Checksum
  ==> default:               1a972b89c2b0: Download complete
  ==> default:               d22e8500bf73: Verifying Checksum
  ==> default:               d22e8500bf73: Download complete
  ==> default:               d22e8500bf73: Pull complete
  ==> default:               1a972b89c2b0: Pull complete
  ==> default:               Digest: sha256:8d0bff43db3ce5c528cb6f957520511d263d7cceee012696e4afdc9087919bb9
  ==> default:               Status: Downloaded newer image for jaegertracing/all-in-one:1.27
  ==> default:               59364e3d2e876234fc1b43d01ddd434d61dbc4bc1e659cd9bfb90d05aa13d15c
  ==> default: [2022-09-28T05:59:41+00:00] INFO: execute[Start jaeger all-in-one docker image] ran successfully
  ==> default:
  ==> default: - execute /vagrant/bin/reset_jaeger.sh
  ==> default:
  ==> default: * log[show jaeger docker info] action write
  ==> default: [2022-09-28T05:59:41+00:00] INFO:
  ==> default:   A Jaeger all-in-one has been started in the vagrant environment. It was started with the bin/reset_jaeger tool.
  ==> default:   You can view all your traces at: http://saio:16686/search
  ==> default:
  ==> default:   The reset_jaeger.sh script basically runs:
  ==> default:
  ==> default:     docker run -d --name jaeger \
  ==> default:     -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  ==> default:     -p 5775:5775/udp \
  ==> default:     -p 6831:6831/udp \
  ==> default:     -p 6832:6832/udp \
  ==> default:     -p 5778:5778 \
  ==> default:     -p 16686:16686 \
  ==> default:     -p 14268:14268 \
  ==> default:     -p 14250:14250 \
  ==> default:     -p 9411:9411 \
  ==> default:     jaegertracing/all-in-one:1.27
  ==> default:
  ==> default:   See: https://www.jaegertracing.io/docs/1.27/getting-started/
  ==> default:
  ==> default:   If you want to reset and clear the traces just run:
  ==> default:
  ==> default:     reset_jaeger.sh
  ==> default:
  ==> default:   NOTE: We should go via OTel collector, but I havn't implemented that yet. But there are some notes on that below
  ==> default:
  ==> default:   For the OTel collector we should be able to run something like:
  ==> default:
  ==> default:     docker pull otel/opentelemetry-collector:latest
  ==> default:     docker run -d otel/opentelemetry-collector:latest
  ==> default:
  ==> default:   See: https://opentelemetry.io/docs/collector/getting-started/
  ==> default:   NOTE: of course you can also specify a version. We should probably pick whatever we use for prod (when we get that far)
  ==> default:
  ==> default:   If you want to have a custom config, volume mount in a config:
  ==> default:
  ==> default:     docker run -v $(pwd)/config.yaml:/etc/otelcol/config.yaml otel/opentelemetry-collector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants