Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add datadog metric provider. Fixes #702 #705

Merged
merged 13 commits into from Oct 9, 2020

Conversation

stephen-harris
Copy link
Contributor

@stephen-harris stephen-harris commented Sep 7, 2020

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed the CLA.
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

@CLAassistant
Copy link

CLAassistant commented Sep 7, 2020

CLA assistant check
All committers have signed the CLA.

@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2020

Codecov Report

Merging #705 into master will decrease coverage by 1.62%.
The diff coverage is 68.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #705      +/-   ##
==========================================
- Coverage   84.51%   82.88%   -1.63%     
==========================================
  Files          95       96       +1     
  Lines        8988     8034     -954     
==========================================
- Hits         7596     6659     -937     
+ Misses        983      964      -19     
- Partials      409      411       +2     
Impacted Files Coverage Δ
metricproviders/datadog/datadog.go 67.44% <67.44%> (ø)
utils/analysis/factory.go 100.00% <100.00%> (ø)
utils/unstructured/unstructured.go 57.77% <0.00%> (-26.23%) ⬇️
rollout/context.go 90.00% <0.00%> (-8.57%) ⬇️
utils/log/redactor_formatter.go 66.66% <0.00%> (-8.34%) ⬇️
rollout/replicaset.go 57.81% <0.00%> (-8.02%) ⬇️
utils/tolerantinformer/rollout.go 67.85% <0.00%> (-5.12%) ⬇️
utils/tolerantinformer/experiment.go 67.85% <0.00%> (-5.12%) ⬇️
utils/tolerantinformer/analysisrun.go 67.85% <0.00%> (-5.12%) ⬇️
utils/tolerantinformer/analysistemplate.go 67.85% <0.00%> (-5.12%) ⬇️
... and 88 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1fd695...e7b5bae. Read the comment docs.

@jessesuen jessesuen self-assigned this Sep 8, 2020
@jessesuen
Copy link
Member

I think we need to treat DataDog API Keys and App Keys more first class in the metric provider spec. For reference, this is how we do something similar wavefront:
https://argoproj.github.io/argo-rollouts/features/analysis/#wavefront-metrics

I've read through https://docs.datadoghq.com/account_management/api-app-keys/, but I still have some questions on where would be an appropriate place to configure API keys and app keys.

With Wavefront, a single API token can be specified for an entire domain, e.g. example1.wavefront.com. For this reason, API tokens are configured at the rollout controller level, in a secret in the same namespace of argo-rollouts controller. The rollout end users do not need to specify API keys when using the wavefront metric provider, since this is configured at the controller.

Since I don't understand DataDog, can you answer some questions about it:

  1. Does it make more sense to configure DataDog API keys and/or App keys at a controller level? With wavefront, it made sense to configure it at a controller level, since the same token could be used for the entire wavefront address. By having centrally configured keys, it relieved the burden from developers having to configure and duplicate keys in every one of their application's namespace. This would be a pain to manage, especially with a lot of namespaces.

  2. What is the scope of API keys and App keys? In a large organization, who (account administrators, developers, operators) typically generates API keys vs. App keys? How many of either of these keys typically exist? I see from their documentation, "Your org must have at least one API key and at most five API keys." I'm not sure how large an "org" is in this context, but the statement leads me to believe that API keys are quite limited, and would lend itself to be centrally configured vs. configured at a namespace secret.

pkg/apis/rollouts/v1alpha1/analysis_types.go Outdated Show resolved Hide resolved
metricproviders/datadog/datadog.go Outdated Show resolved Hide resolved
metricproviders/datadog/datadog_test.go Outdated Show resolved Hide resolved
@stephen-harris
Copy link
Contributor Author

Hi @jessesuen I'll make those changes over the weekend, but to answer your questions around DataDog:

I wasn't aware of the 5-API key limit. Our organisation is a heavy datadog user and we appear to have had that limit waivered as we have over 200 API keys (each team has a key per environment, and some are using different keys for different servcies). In our org each team is encouraged to create their own API & App keys, but not through DataDog directly (presumably because only admins can create API keys)

Application keys are tied to a user, so I imagine even in other organisations they are creating app keys per service.

Just checking the issues, I think @johnkost & @ozooxo might be using DataDog - I don't know if they want to weigh in here.

@johnkost
Copy link

We use DataDog as well and haven't had any issues with the 5-API key limit (roughly same setup as @stephen-harris).

I think it makes sense to have this setup similarly at the controller level as Argo-Rollouts is an application, just like any of the other apps we have.

@stephen-harris
Copy link
Contributor Author

metricproviders/datadog/datadog_test.go Outdated Show resolved Hide resolved
Comment on lines 910 to 921
name: wavefront-api-tokens
type: Opaque
data:
api-key: <datadog-api-key>
app-key: <datadog-app-key>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. docs still say "wavefront-api-tokens". I think this should be datadog-auth-keys
  2. don't we need to allow multiple API/APP keys to be configured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) - oops! It should be datadog. Fixed now

(2) - I don't thinks so. All key pairs are assigned to the organisation, and using them you can pull any metrics from your organisation. Our organisation encourages teams to own their own keys, but team a can use its keys to access metrics reported by team b. We could make the secret name configurable, and you specify it as an AnalysisTemplate parameter, but I don't see the value in doing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'll take your word for it, but I'll also pose the question in #argo-rollouts slack in case there's users who may need to configure multiple, in case there's something we're missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear:

With the proposed spec, we allow the DataDog address to be specified in the metric (so it can vary between AnalysisTemplates), but API/APP keys are configured centrally in the controller. This presumes that the same API/APP key can be used for different DataDog URLs. This may very well be possible with DataDog (I don't know the answer to this), but I wanted to call that out as a potential flaw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. No API/APP keys are not cross-site. In fact I believe an organisation is in either the EU or US site. Although not a secret, for ease of configuration we could move these to the datadog secret (alongside app_key and api_key). What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I agree we should make the address and api/app keys centrally configured, and remove the address field from the metric. I'm thinking the Secret should look something like:

data:
  default: |
    address: https://api.datadoghq.com
    api-key: igq7imbxlm97vnlw61bo
    app-key: 35bu8ofvc0osyjmqxtgi

This also leaves the door open so if in the future, multiple addresses and/or app keys could to be specified in the same cluster. e.g.:

data:
  us-app1: |
    address: https://api.datadoghq.com
    api-key: igq7imbxlm97vnlw61bo
    app-key: 35bu8ofvc0osyjmqxtgi
  us-app2: |
    address: https://api.datadoghq.com
    api-key: igq7imbxlm97vnlw61bo
    app-key: cb7ehxo1jwagyceth8v8
  eu: |
    address: https://api.datadoghq.eu
    api-key: cb7ehxo1jwagyceth8v8
    app-key: timk0do15pafuvmfwimg

And the names us-app1 and us-app2 could be referenced in the metric.

But for MVP, for simplicity, we can go out with a single configured DataDog region and single api/app key combination.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this, though I'm starting to think perhaps we should have just one level (e.g. a secret per app/region), and then reference the name of the secret in the metric.

apiVersion: v1
kind: Secret
metadata:
   name: datadog
data:
  address: https://api.datadoghq.com
  api-key: igq7imbxlm97vnlw61bo
  app-key: 35bu8ofvc0osyjmqxtgi
---
apiVersion: v1
kind: Secret
metadata:
   name: datadog-app2
data:
  address: https://api.datadoghq.eu
  api-key: cb7ehxo1jwagyceth8v8
  app-key: timk0do15pafuvmfwimg

Not too fussed though; I think in most use cases organisations will only have one region, and will only have key pair used by rollouts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I'm starting to think perhaps we should have just one level (e.g. a secret per app/region), and then reference the name of the secret in the metric.

Actually I agree with you. A secret per app/region sounds better as it doesn't put a special datastructure in the secret payload. Then things like kustomize patching the secret will just work. Let's go with your suggestion, but we don't have to do anything yet with regards to referencing until someone has a need to reference multiple regions.


func NewDatadogProvider(logCtx log.Entry, kubeclientset kubernetes.Interface) (*Provider, error) {
ns := Namespace()
secret, err := kubeclientset.CoreV1().Secrets(ns).Get(DatadogTokensSecretName, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I updated the k8s libraries from underneath you, which now require context objects passed into every call. You can just use context.TODO() for now.

@codecov-io
Copy link

codecov-io commented Oct 9, 2020

Codecov Report

Merging #705 into master will decrease coverage by 0.10%.
The diff coverage is 73.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #705      +/-   ##
==========================================
- Coverage   82.66%   82.56%   -0.11%     
==========================================
  Files          95       96       +1     
  Lines        8071     8162      +91     
==========================================
+ Hits         6672     6739      +67     
- Misses        997     1013      +16     
- Partials      402      410       +8     
Impacted Files Coverage Δ
metricproviders/datadog/datadog.go 73.03% <73.03%> (ø)
utils/analysis/factory.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be3495d...f42d41e. Read the comment docs.

Copy link
Member

@jessesuen jessesuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@jessesuen jessesuen merged commit 6c3fcfe into argoproj:master Oct 9, 2020
@jessesuen jessesuen mentioned this pull request Oct 9, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants