Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Zipkin trace rate configurable #3968

Merged
merged 4 commits into from
Oct 9, 2018
Merged

Make Zipkin trace rate configurable #3968

merged 4 commits into from
Oct 9, 2018

Conversation

negz
Copy link
Contributor

@negz negz commented Oct 2, 2018

What does this PR do?

Allows Zipkin traces to be sent for only a sample of between 0 and 100% of requests, expressed as a float between 0.0 and 1.0. Defaults to the current behaviour of sampling 100% of requests.

Fixes #3959.

Motivation

In large scale deployments it's not always practical or feasible to emit traces for 100% of incoming HTTP requests. For example at Planet Labs we typically sample 2% of requests for tracing in our service mesh, and a similar number at our (not-yet-Traefik) edge router.

More

  • Updated tests
  • Updated documentation

Additional Notes

I couldn't resist fixing a typo and a few differing stylisations of the word Zipkin in the docs and docstrings, sorry! I did so before I saw the "don' reformat code" rule.

@dduportal
Copy link
Contributor

Hello @negz , could you fix the tests (failing in the CI) please (ref. https://semaphoreci.com/containous/traefik/branches/pull-request-3968/builds/3 )?

Thanks for this PR 👏

@ldez ldez added this to the next milestone Oct 2, 2018
@negz
Copy link
Contributor Author

negz commented Oct 2, 2018

@dduportal I'm starting to take a look now, but at first glance it looks pretty unlikely that my PR had anything to do with this test failure. Is there a chance it's a flaky test?

@ldez
Copy link
Contributor

ldez commented Oct 2, 2018

the main issue:

INFO[0715] TracingSuite.TestZipkinAuth: Traefik logs:   
INFO[0715] time="2018-10-02T08:19:49Z" level=info msg="Using TOML configuration file /go/src/github.com/containous/traefik/integration/fixtures/tracing/simple.toml065753929"
time="2018-10-02T08:19:49Z" level=warning msg="Jaeger configuration will be ignored"
time="2018-10-02T08:19:49Z" level=info msg="Traefik version 4d43bff2eda8dc127792417af0f0eb4ba0e03308 built on 2018-10-02_08:07:40AM"
time="2018-10-02T08:19:49Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/basics/#collected-data\n"
time="2018-10-02T08:19:49Z" level=debug msg="Global configuration loaded "
time="2018-10-02T08:19:49Z" level=debug msg="Zipkin tracer configured"
time="2018-10-02T08:19:49Z" level=debug msg="Added entrypoint tracing middleware"
time="2018-10-02T08:19:49Z" level=info msg="Preparing server http &{Address::8000 TLS:<nil> Redirect:<nil> Auth:<nil> WhiteList:<nil> Compress:<nil> ProxyProtocol:<nil> ForwardedHeaders:0xc00040c6c0 ClientIPStrategy:<nil>} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
time="2018-10-02T08:19:49Z" level=debug msg="Added entrypoint tracing middleware"
time="2018-10-02T08:19:49Z" level=info msg="Preparing server traefik &{Address::8080 TLS:<nil> Redirect:<nil> Auth:<nil> WhiteList:<nil> Compress:<nil> ProxyProtocol:<nil> ForwardedHeaders:0xc00040c6e0 ClientIPStrategy:<nil>} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
time="2018-10-02T08:19:49Z" level=info msg="Starting provider configuration.ProviderAggregator {}"
time="2018-10-02T08:19:49Z" level=info msg="Starting server on :8000"
time="2018-10-02T08:19:49Z" level=info msg="Starting server on :8080"
time="2018-10-02T08:19:49Z" level=info msg="Starting provider *file.Provider {\"Watch\":true,\"Filename\":\"\",\"Constraints\":null,\"Trace\":false,\"DebugLogGeneratedTemplate\":false,\"Directory\":\"\",\"TraefikFile\":\"/go/src/github.com/containous/traefik/integration/fixtures/tracing/simple.toml065753929\"}"
time="2018-10-02T08:19:49Z" level=debug msg="Backend backend2: no load-balancer defined, fallback to 'wrr' method"
time="2018-10-02T08:19:49Z" level=debug msg="Backend backend3: no load-balancer defined, fallback to 'wrr' method"
time="2018-10-02T08:19:49Z" level=debug msg="Backend backend1: no load-balancer defined, fallback to 'wrr' method"
time="2018-10-02T08:19:49Z" level=debug msg="Configuration received from provider file: {\"backends\":{\"backend1\":{\"servers\":{\"server-ratelimit\":{\"url\":\"http://172.17.0.3:80\",\"weight\":1}},\"loadBalancer\":{\"method\":\"wrr\"}},\"backend2\":{\"servers\":{\"server-retry\":{\"url\":\"http://172.17.0.3:80\",\"weight\":1}},\"loadBalancer\":{\"method\":\"wrr\"}},\"backend3\":{\"servers\":{\"server-auth\":{\"url\":\"http://172.17.0.3:80\",\"weight\":1}},\"loadBalancer\":{\"method\":\"wrr\"}}},\"frontends\":{\"frontend1\":{\"entryPoints\":[\"http\"],\"backend\":\"backend1\",\"routes\":{\"test_ratelimit\":{\"rule\":\"Path:/ratelimit\"}},\"passHostHeader\":true,\"priority\":0,\"ratelimit\":{\"rateset\":{\"rateset1\":{\"period\":60000000000,\"average\":4,\"burst\":5},\"rateset2\":{\"period\":3000000000,\"average\":1,\"burst\":2}},\"extractorFunc\":\"client.ip\"}},\"frontend2\":{\"entryPoints\":[\"http\"],\"backend\":\"backend2\",\"routes\":{\"test_retry\":{\"rule\":\"Path:/retry\"}},\"passHostHeader\":true,\"priority\":0},\"frontend3\":{\"entryPoints\":[\"http\"],\"backend\":\"backend3\",\"routes\":{\"test_auth\":{\"rule\":\"Path:/auth\"}},\"passHostHeader\":true,\"priority\":0,\"auth\":{\"basic\":{\"users\":[\"test:$apr1$H6uskkkW$IgXLP6ewTrSuBkTrqE8wj/\",\"test2:$apr1$d9hr9HBB$4HxwgUir3HP4EsggP/QNo0\"]}}}}}"
time="2018-10-02T08:19:49Z" level=debug msg="Wiring frontend frontend1 to entryPoint http"
time="2018-10-02T08:19:49Z" level=debug msg="Creating backend backend1"
time="2018-10-02T08:19:49Z" level=debug msg="Adding TLSClientHeaders middleware for frontend frontend1"
time="2018-10-02T08:19:49Z" level=debug msg="Added outgoing tracing middleware frontend1"
time="2018-10-02T08:19:49Z" level=debug msg="Creating load-balancer wrr"
time="2018-10-02T08:19:49Z" level=debug msg="Creating server server-ratelimit at http://172.17.0.3:80 with weight 1"
time="2018-10-02T08:19:49Z" level=debug msg="Creating load-balancer rate limiter"
time="2018-10-02T08:19:49Z" level=debug msg="Creating retries max attempts 3"
time="2018-10-02T08:19:49Z" level=debug msg="Creating route test_ratelimit Path:/ratelimit"
time="2018-10-02T08:19:49Z" level=debug msg="Wiring frontend frontend2 to entryPoint http"
time="2018-10-02T08:19:49Z" level=debug msg="Creating backend backend2"
time="2018-10-02T08:19:49Z" level=debug msg="Adding TLSClientHeaders middleware for frontend frontend2"
time="2018-10-02T08:19:49Z" level=debug msg="Added outgoing tracing middleware frontend2"
time="2018-10-02T08:19:49Z" level=debug msg="Creating load-balancer wrr"
time="2018-10-02T08:19:49Z" level=debug msg="Creating server server-retry at http://172.17.0.3:80 with weight 1"
time="2018-10-02T08:19:49Z" level=debug msg="Creating retries max attempts 3"
time="2018-10-02T08:19:49Z" level=debug msg="Creating route test_retry Path:/retry"
time="2018-10-02T08:19:49Z" level=debug msg="Wiring frontend frontend3 to entryPoint http"
time="2018-10-02T08:19:49Z" level=debug msg="Creating backend backend3"
time="2018-10-02T08:19:49Z" level=debug msg="Adding TLSClientHeaders middleware for frontend frontend3"
time="2018-10-02T08:19:49Z" level=debug msg="Added outgoing tracing middleware frontend3"
time="2018-10-02T08:19:49Z" level=debug msg="Creating load-balancer wrr"
time="2018-10-02T08:19:49Z" level=debug msg="Creating server server-auth at http://172.17.0.3:80 with weight 1"
time="2018-10-02T08:19:49Z" level=debug msg="Creating retries max attempts 3"
time="2018-10-02T08:19:49Z" level=debug msg="Creating route test_auth Path:/auth"
time="2018-10-02T08:19:49Z" level=info msg="Server configuration reloaded on :8000"
time="2018-10-02T08:19:49Z" level=info msg="Server configuration reloaded on :8080"
time="2018-10-02T08:19:49Z" level=debug msg="Basic auth failed"
 

----------------------------------------------------------------------
FAIL: tracing_test.go:116: TracingSuite.TestZipkinAuth

tracing_test.go:136:
    c.Assert(err, checker.IsNil)
... value *errors.errorString = &errors.errorString{s:"try operation failed: could not find 'entrypoint http' in body '[]'"} ("try operation failed: could not find 'entrypoint http' in body '[]'")

@negz
Copy link
Contributor Author

negz commented Oct 3, 2018

Just to update, I've been fighting the integration tests for quite some time now. I'm quite confident that even the broken Zipkin test is unrelated to my changes, but it's hard to prove that because I can't get it to pass on master on my laptop in the first place.

@negz
Copy link
Contributor Author

negz commented Oct 3, 2018

To add some context, I'm seeing the following failure on master. So far I've run the test maybe 10 times. It's passed once but failed all the other times, despite no differences in how it was invoked.

tracing_test.go:133:
    c.Assert(err, checker.IsNil)
... value *errors.errorString = &errors.errorString{s:"try operation failed: Get http://127.0.0.1:8000/auth: dial tcp 127.0.0.1:8000: connect: connection refused"} ("try operation failed: Get http://127.0.0.1:8000/auth: dial tcp 127.0.0.1:8000: connect: connection refused")

@ldez
Copy link
Contributor

ldez commented Oct 3, 2018

Your PR change a behavior due to:

https://github.com/openzipkin-contrib/zipkin-go-opentracing/blob/b85dc675b16b116a00c351d2c41305bf9ae71293/sample.go#L32-L37

before:

https://github.com/openzipkin-contrib/zipkin-go-opentracing/blob/b85dc675b16b116a00c351d2c41305bf9ae71293/tracer.go#L227

The set effective configuration for the tracing don't do what you think
https://github.com/containous/traefik/pull/3968/files#diff-786c8376feaa68e9ebee88ca950d2774R204

You can add this test in TestSetEffectiveConfigurationTracing to see the problem:

		{
			desc: "tracing zipkin TEST",
			tracing: &tracing.Tracing{
				Backend: "zipkin",
				Zipkin: &zipkin.Config{
					HTTPEndpoint: "http://powpow:9411/api/v1/spans",
				},
			},
			expected: &tracing.Tracing{
				Backend: "zipkin",
				Jaeger:  nil,
				Zipkin: &zipkin.Config{
					HTTPEndpoint: "http://powpow:9411/api/v1/spans",
					SameSpan:     false,
					ID128Bit:     false,
					Debug:        false,
					SampleRate:   0,
				},
			},
		},

@negz
Copy link
Contributor Author

negz commented Oct 4, 2018

@ldez Thanks for the pointer! I had intended to make the default SampleRate: 1.0 to maintain the existing behaviour of not using a sampler (and thus defaulting to alwaysSample), but I mistakenly thought the defaults were set in initTracing(), and did not notice defaultTracing.

Nic Cope and others added 3 commits October 3, 2018 21:27
This maintains the previous behaviour of Traefik, which did not set a
sampler and thus defaulted to always sampling.
Copy link
Member

@mmatur mmatur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@nmengin nmengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@traefiker traefiker merged commit 32f7fb8 into traefik:master Oct 9, 2018
@negz negz deleted the zipple branch October 9, 2018 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sample Zipkin traces
6 participants