Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for organization level metrics #12375

Merged
merged 38 commits into from
Jun 23, 2022
Merged

Conversation

sarah-witt
Copy link
Contributor

@sarah-witt sarah-witt commented Jun 16, 2022

What does this PR do?

Adds support for querying metrics from the ORGANIZATION_USAGE schema https://docs.snowflake.com/en/sql-reference/organization-usage.html

Motivation

Collecting more data

Additional Notes

Currently, although snowflake supports configuring schema, the default queries are only available in ACCOUNT_USAGE. This PR adds queries specific to ORGANIZATION_USAGE, as well as default metric groups, etc.

Some differences to note:

  • Almost all organization usage views have a latency of 24 hours, so the information is collected from the previous day, since the data would hardly be available if queried for the current day
  • The recommended organization usage collection interval is 12 hours due to this high latency, but can be configured differently
  • organization and account usage queries cannot be collected at the same time, they have to be in separate instances
  • some tables have since moved into "preview mode" after initial implementation, the column names could potentially change https://docs.snowflake.com/en/sql-reference/organization-usage/data_transfer_history.html

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

@codecov
Copy link

codecov bot commented Jun 16, 2022

Codecov Report

Merging #12375 (dcabb07) into master (760fb50) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Flag Coverage Δ
snowflake 96.46% <100.00%> (+1.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@sarah-witt sarah-witt marked this pull request as ready for review June 16, 2022 13:57
@sarah-witt sarah-witt requested review from a team as code owners June 16, 2022 13:57
apigirl
apigirl previously approved these changes Jun 16, 2022
Copy link
Contributor

@apigirl apigirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs look good!


By default, this integration monitors the `ACCOUNT_USAGE` schema, but can also monitor the `ORGANIZATION_USAGE` schema to monitor organization-level metrics.

To collect organization metrics, you can create configure the integration by changing the schema field to `ORGANIZATION_USAGE` and increasing the `min_collection_interval` to reduce the number of queries to Snowflake, as most organization queries have a latency of up to 24 hours.
Copy link
Contributor

@hithwen hithwen Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To collect organization metrics, you can create configure the integration by changing the schema field to `ORGANIZATION_USAGE` and increasing the `min_collection_interval` to reduce the number of queries to Snowflake, as most organization queries have a latency of up to 24 hours.
To collect organization metrics, you can configure the integration by changing the `schema` field to `ORGANIZATION_USAGE`. You should also increase the `min_collection_interval` to reduce the number of queries to Snowflake as most organization queries have a latency of up to 24 hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the recommended value for min_collection_interval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 43200. I will add this here as well.

@@ -52,7 +52,7 @@

# https://docs.snowflake.com/en/sql-reference/account-usage/warehouse_metering_history.html
WarehouseCreditUsage = {
'name': 'billings.warehouse.metrics',
'name': 'billing.warehouse.metrics',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not incorrect, just inconsistent with other naming conventions. The name is not used anywhere besides logging. I can make this change in a separate PR if you think it's cleaner.

snowflake/README.md Outdated Show resolved Hide resolved
- schema: ORGANIZATION_USAGE
min_collection_interval: 43200
```

#### Collecting data for multiple environments

If you want to collect data for multiple Snowflake environments, add each environment as an instance in your `snowflake.d/conf.yaml` file. For example, if you needed to collect data for two users named `DATADOG_SYSADMIN` and `DATADOG_USER`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add an example at the bottom of this section to collect both account and organization metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that!

@sarah-witt
Copy link
Contributor Author

Hi @apigirl I added another section to the readme, do you mind re-reviewing?

Comment on lines 136 to 151
```yaml
instances:
- account: example-inc
username: DATADOG_ORG_ADMIN
password: '<PASSWORD>'
role: SYSADMIN
database: ORGANIZATION_USAGE
min_collection_interval: 3600

- account: example-inc
username: DATADOG_ACCOUNT_ADMIN
password: '<PASSWORD>'
role: DATADOG_ADMIN
database: ACCOUNT_USAGE
min_collection_interval: 43200
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be schema rather than database here? It also looks like the min_collection_interval are reversed, I would expect the instance with ORGANIZATION_USAGE to have it set to 43200 as per the section above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that's right, I was copying the other example 🤦‍♀️

snowflake/assets/configuration/spec.yaml Outdated Show resolved Hide resolved
except KeyError:
self.errors.append(mgroup)

if self.errors:
self.log.warning('Invalid metric_groups found in snowflake conf.yaml: %s', (', '.join(self.errors)))
self.log.warning('Invalid metric_groups forfound in snowflake conf.yaml: %s', (', '.join(self.errors)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an accidental change.

Copy link
Contributor Author

@sarah-witt sarah-witt Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I meant to include the schema as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow so the test covering this was actually broken which is why I didn't catch this earlier! Good spot! https://github.com/DataDog/integrations-core/blob/master/snowflake/tests/test_unit.py#L258

@@ -45,3 +45,79 @@
Decimal('0.000000'),
)
]
ORGANIZATION_USAGE_IN_CURRENCY_DAILY = [('test', 'Standard', 'Compute', 'USD', Decimal('0.4'), Decimal('0.7'))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are adding these here maybe we can add assertions for these metrics on the e2e test, which is where this data is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the e2e test!

Co-authored-by: Alex Lopez <alex.lopez.zorzano@gmail.com>
@github-actions
Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

buraizu
buraizu previously approved these changes Jun 22, 2022
Copy link
Contributor

@buraizu buraizu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I made a small suggestion to split up a lengthy sentence, but otherwise this looks fine from a docs perspective.

snowflake/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@alopezz alopezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to go, just a little thing that concerns me for testing.

@@ -47,7 +47,7 @@ def execute(self, query):
# look for ACCOUNT_NAME or CONTRACT_NUMBER since some views have
# the same name in ACCOUNT_USAGE and ORGANIZATION_USAGE
table_prefix = ""
if "ACCOUNT_NAME" or "CONTRACT_NUMBER" in query:
if "ACCOUNT_NAME" in query or "CONTRACT_NUMBER" in query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit brittle. Looking a bit deeper into how the code uses the snowflake library, here's what I think it might make most sense:

Since the schema is passed when calling connect

How about we pass that value in our mock down to this mock Cursor and rely on that to know whether it's an organization query or not? I think that would match the faked behavior better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good idea. I updated to better handle multiple schemas

sarah-witt and others added 3 commits June 23, 2022 10:37
Co-authored-by: Bryce Eadie <bryce.eadie@datadoghq.com>
alopezz
alopezz previously approved these changes Jun 23, 2022
Comment on lines 48 to 52
# check the schema since some views have
# the same name in ACCOUNT_USAGE and ORGANIZATION_USAGE
table_prefix = ''
if self.schema == 'ORGANIZATION_USAGE':
table_prefix = 'ORGANIZATION_'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Works for me! A tiny nit would be that the comment mentioning view names might be a bit confusing with the final implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I removed the comment

@sarah-witt sarah-witt merged commit c9f96e3 into master Jun 23, 2022
@sarah-witt sarah-witt deleted the sarah/snowflake-org-usage branch June 23, 2022 16:07
github-actions bot pushed a commit that referenced this pull request Jun 23, 2022
* Add snowflake queries

* Update and test currency usage queries

* Add more queries

* Complete all new queries

* Clean up queries, add another test

* Add more tests

* Add more tests, split up test files

* Add organization docs, start metadata

* Remove billing prefix

* Update defaults

* Fix query syntax

* Add more queries

* Allow org usage in e2e

* Clean up queries, update metadata

* Update readme

* fix readme

* Add metric groups tests

* Add currency tag

* Update snowflake/README.md

Co-authored-by: Julia <611228+hithwen@users.noreply.github.com>

* Update readme

* Update snowflake/assets/configuration/spec.yaml

Co-authored-by: Alex Lopez <alex.lopez.zorzano@gmail.com>

* Update readme

* Sync config

* Update grammar

* Fix logging and tests

* Add organization e2e test

* Assert tags

* Add credit prefix, remove count assertion

* Also look for contract number as org usage

* Fix syntax

* Better handle multiple schemas in mock library

* Update snowflake/README.md

Co-authored-by: Bryce Eadie <bryce.eadie@datadoghq.com>

* Fix style

* Remove comment

Co-authored-by: Julia <611228+hithwen@users.noreply.github.com>
Co-authored-by: Alex Lopez <alex.lopez.zorzano@gmail.com>
Co-authored-by: Bryce Eadie <bryce.eadie@datadoghq.com> c9f96e3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants