Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not merge - additional resources support #189

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

thomas-riccardi
Copy link

Requirements for Contributing to this repository

  • Fill out the template below. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • The pull request must only fix one issue, or add one feature, at the time.
  • The pull request must update the test suite to demonstrate the changed functionality.
  • After you create the pull request, all status checks must be pass before a maintainer reviews your contribution. For more details, please see CONTRIBUTING.

What does this PR do?

To migrate our datadog organization from datadog US site to datadog EU1 site, we used datadog-sync-cli, but needed support for more resources.

This PR is open not to be merged, but to help potential future users: this is not supported code, but we used it for our migration, it can serve as a base, after review.

We do not intend to work on this anymore: our migration is finished.

If this is not the place for this, sorry for the inconvenience, don't hesitate to close.

Description of the Change

warning: some resources don't have an official API, so we used the api used by the web frontend, with cookie authentication (and csrf tokens).

warning: we migrated what we could, with our own priorities, it's often partial migrations

info: we opened datadog support tickets for missing api, small bugs in the api, some were fixed already (!), some upstream happened here from our tickets (logs pipelines order), more details in individual commit messages.

See the README for more details.

additional resources

  • logs_facets
  • logs_views
  • metric_metadatas
  • incidents
  • incidents_integrations
  • incidents_todos
  • incident_org_settings
  • incidents_config_fields
  • incidents_config_notifications_templates
  • incidents_config_integrations_workflows
  • integrations_slack_channels

Alternate Designs

  • with official APIs for more resources, we would have avoided the cookie auth hack

Possible Drawbacks

  • non-homogeneous model/resource naming
  • cookie auth/non-official api usage
  • no automated test
  • didn't understand everything the lib does/how to develop new models, especially on the first ones or advanced patterns

Verification Process

we tested and used this as is for our migration.

Additional Notes

Some fixes are in datadog_sync/utils/resource_utils.py and could be useful upstream.

Release Notes

do not merge/release

Review checklist (to be filled by reviewers)

  • Feature or bug fix MUST have appropriate tests (unit, integration, etc...)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have one changelog/ label attached. If applicable it should have the backward-incompatible label attached.
  • PR should not have do-not-merge/ label attached.
  • If Applicable, issue must have kind/ and severity/ labels attached at least.

needed for hack web frontend api usage (e.g. logs_facets)

when defining config cookie_dogweb value, the client *switches* to
cookie *only*: the api doesn't support both api/app key & cookie auth.
=> when setting cookie_dogweb config, it will break all standard
resources types.

also need csrf_token for destination as _authentication_token key in
request payload for write actions.
- hardcoded source & target scopeid for now (seem to be internal index id, but
  which ones?)
- use x-csrf-token as _authentication_token key in request payload for
  write actions
- use dogweb cookie auth
- use x-csrf-token as _authentication_token key in request payload for
  write actions
- get all metric metadata with standard v2 api, but it returns only ids
- hack import_resource standard pattern to populate the dict that
  contains just 'id'; it's called during import just after
  get_resources()
- get & update metric metadata uses v1 api, which doesn't have 'id' in
  its dict, hack adding it everywhere for consistency with v2/standard
  pattern
- create metric metadata is *not* supported by datadog api, just
  update it on already existing metric: raise error explaining that:
  first push data-points on metric, then rerun the script
  - use standard pattern to get the destination resources (just ids
    here), and call update_resource() instead if it already exists
  - initial diff won't be the actual diff: it says it will create everything
Only the base incident data is supported, related
resources (integrations(slack), todos(remediations), attachments) may
be done later with dedicated resources.

The import is lossy: for example the creation date is on sync,
timeline is lost, etc.

- the datadog api documentation says only a subset of accepted fields
  for creation; in practice it does handles only a subset, and ignores
  the others
  => auto update just after create to sync more data
- attributes.notification_handles ignored: too hard to reproduce
  properly, spams people during sync (or dates are all wrong); we
  don't really care about that piece of information => skip it
- avoid forever diff on old incidents without `attributes.visibility`:
  hardcode `organization` value, it's what will be created on the
  destination organization anyway
- incidents list api initially skipped old incidents without
  `visibility` field, exchange with datadog support resulted in a
  production fix
- after full import: it seems users not yet active on the destination
  organization are in fact *not* notified (active incidents commanders
  are notified, and mandatory according to api documentation)
…mespace

if kube_namespace is not null, don't overwrite: the data is better.
…s() when last level is an array of non-nested objects
- undocumented api, but standard v2 api used by web frontend, works
  with API/APP key
- just one resource per org, forcing update, ignoring ids, etc.
- unique by attributes.names
- perpetual diff: on 'metadata' for ootb service & team:
  - PATCH ok (maybe ignores metadata?)
  - but PATCH response contains `metadata: null`
  => `diffs` always shows it; it's ok, we can ignore those
Covers General>Integrations & Notifications>Rules

- api inconsistency: `attributes.triggers.variables.severity_values`
  and `attributes.triggers.variables.status_values` are `null` in read
  calls, and require an array in write calls
  => skipping them with non_nullable_attr (fixed to support lists too)
- errors (probably because some workflows are hardcoded, not duplicable, but no obvious attribute to distingish them)
  - Error: 400 Bad Request - {"errors":["a workflow like that already exists"]}
  - Error: 400 Bad Request - {"errors":["Invalid payload: 'name' is invalid"]}
  => ignoring those errors for now, and manually fixed `Send all incident updates to a global channel` via web frontend.
- iterate on all incidents, then for each incident, iterate on
  relationships 'todo'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant