Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internally compile the include patterns in the autodiscovery feature #14768

Merged

Conversation

FlorentClarret
Copy link
Member

@FlorentClarret FlorentClarret commented Jun 15, 2023

What does this PR do?

The autodiscovery feature added in this PR allows us to provide an include pattern as a string or directly as a regular expression. If the pattern is a string, we do not compile it and always forward the string to re.search which in turn will compile the string and use an internal cache. This PR pre-compiles the string in the discovery class to avoid multiple compilations of the same pattern.

I also added a test to ensure the filter is working as expected if we directly provide a compile regular expression.

Motivation

Even if there's an internal cache, I think we should compile them on our side to be completely sure they're not going to be recompiled each time the check runs

Additional Notes

  • I'm going to use this feature in this PR

  • The autodiscovery feature is already used in the cloudera integration

  • This PR is a proposition, let me know what you think!

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached
  • If the PR doesn't need to be tested during QA, please add a qa/skip-qa label.

@FlorentClarret FlorentClarret requested a review from a team as a code owner June 15, 2023 08:10
@ghost ghost added the base_package label Jun 15, 2023
@FlorentClarret FlorentClarret changed the title Internally compile the include patterns in the discovery feature Internally compile the include patterns in the autodiscovery feature Jun 15, 2023
@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #14768 (f175a8d) into master (0b68479) will increase coverage by 0.28%.
The diff coverage is 100.00%.

Flag Coverage Δ
airflow 90.00% <ø> (ø)
amazon_msk 89.07% <ø> (ø)
apache 95.08% <ø> (ø)
arangodb 98.23% <ø> (ø)
aspdotnet 100.00% <ø> (+26.19%) ⬆️
azure_iot_edge 82.08% <ø> (ø)
boundary 100.00% <ø> (ø)
btrfs 82.91% <ø> (ø)
cacti 87.90% <ø> (ø)
cassandra_nodetool 93.16% <ø> (ø)
ceph 91.02% <ø> (ø)
cilium 75.46% <ø> (+0.92%) ⬆️
citrix_hypervisor 87.50% <ø> (ø)
clickhouse 95.49% <ø> (ø)
cloud_foundry_api 96.35% <ø> (+0.12%) ⬆️
cloudera 99.82% <ø> (ø)
cockroachdb 91.90% <ø> (ø)
consul 91.65% <ø> (ø)
coredns 94.57% <ø> (ø)
couch 95.43% <ø> (+0.24%) ⬆️
couchbase 84.28% <ø> (ø)
datadog_checks_base 89.58% <100.00%> (+0.34%) ⬆️
datadog_checks_dev 82.77% <ø> (+0.07%) ⬆️
datadog_checks_downloader 81.65% <ø> (ø)
ddev 99.21% <ø> (ø)
disk 85.03% <ø> (-6.30%) ⬇️
dns_check 93.90% <ø> (ø)
dotnetclr 91.39% <ø> (+12.90%) ⬆️
eks_fargate 94.05% <ø> (ø)
elastic 93.22% <ø> (ø)
envoy 95.02% <ø> (+0.42%) ⬆️
etcd 95.56% <ø> (-4.44%) ⬇️
exchange_server 96.85% <ø> (+11.81%) ⬆️
fluentd 94.77% <ø> (ø)
gearmand 78.26% <ø> (+1.24%) ⬆️
gitlab 92.46% <ø> (+1.22%) ⬆️
gitlab_runner 91.94% <ø> (ø)
glusterfs 80.09% <ø> (+0.92%) ⬆️
gunicorn 92.10% <ø> (-0.76%) ⬇️
haproxy 95.13% <ø> (+0.16%) ⬆️
harbor 80.04% <ø> (ø)
hdfs_datanode 89.74% <ø> (ø)
hdfs_namenode 86.72% <ø> (ø)
http_check 96.09% <ø> (+2.15%) ⬆️
ibm_ace 91.79% <ø> (ø)
ibm_db2 95.30% <ø> (ø)
ibm_mq 91.26% <ø> (+0.13%) ⬆️
ibm_was 96.08% <ø> (ø)
iis 95.00% <ø> (+37.60%) ⬆️
istio 77.43% <ø> (+0.55%) ⬆️
kafka_consumer 93.43% <ø> (ø)
kong 87.56% <ø> (ø)
kube_dns 95.97% <ø> (ø)
lighttpd 83.64% <ø> (ø)
linkerd 85.14% <ø> (+1.14%) ⬆️
marathon 83.43% <ø> (ø)
marklogic 96.46% <ø> (ø)
mesos_master 89.75% <ø> (ø)
mesos_slave 93.63% <ø> (ø)
mongo 96.55% <ø> (ø)
mysql 87.23% <ø> (ø)
nfsstat 95.20% <ø> (ø)
nginx 95.24% <ø> (+0.54%) ⬆️
openmetrics 98.08% <ø> (ø)
openstack 51.45% <ø> (ø)
oracle 89.78% <ø> (ø)
pdh_check 97.82% <ø> (ø)
pgbouncer 91.33% <ø> (ø)
postfix 88.04% <ø> (ø)
postgres 91.23% <ø> (+0.04%) ⬆️
powerdns_recursor 96.65% <ø> (ø)
prometheus 94.17% <ø> (ø)
proxysql 98.97% <ø> (ø)
rabbitmq 96.04% <ø> (ø)
redisdb 87.89% <ø> (ø)
rethinkdb 97.93% <ø> (ø)
riak 99.22% <ø> (ø)
riakcs 93.61% <ø> (ø)
sap_hana 91.64% <ø> (+0.26%) ⬆️
scylla 100.00% <ø> (ø)
singlestore 90.81% <ø> (ø)
snmp 82.34% <ø> (+0.03%) ⬆️
sonarqube 98.24% <ø> (ø)
spark 93.63% <ø> (ø)
sqlserver 84.92% <ø> (-1.62%) ⬇️
squid 100.00% <ø> (ø)
ssh_check 91.58% <ø> (ø)
strimzi 89.06% <ø> (ø)
supervisord 92.69% <ø> (ø)
system_core 90.90% <ø> (ø)
tcp_check 92.92% <ø> (ø)
teradata 94.24% <ø> (ø)
tokumx 58.40% <ø> (ø)
varnish 84.39% <ø> (+0.26%) ⬆️
vault 95.53% <ø> (+0.57%) ⬆️
vertica 98.50% <ø> (ø)
vsphere 90.58% <ø> (+0.08%) ⬆️
win32_event_log 86.40% <ø> (+0.27%) ⬆️
windows_performance_counters 98.36% <ø> (ø)
windows_service 98.00% <ø> (ø)
wmi_check 92.91% <ø> (ø)
zk 82.62% <ø> (+1.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Contributor

@jose-manuel-almaza jose-manuel-almaza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this PR as a pure small refactor. In my opinion no new unit tests needed, since we test the public interface of the class, not the implementation

d = Discovery(mock_get_items, include={'a.*': None, 'b.*': None})
assert list(d.get_items()) == [('a.*', 'a', 'a', None), ('b.*', 'b', 'b', None)]
assert mock_get_items.call_count == 1
assert d._filter._compiled_include_patterns == {p: re.compile(p) for p in ['a.*', 'b.*']}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are refactoring and not adding new functionality. In that case no new unit tests should be needed, and the proof of that is that you are checking private properties of the class that might be refactored in the future and force you to change your unit tests then, when that shouldn't be necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I think we can keep the other test even if it's not directly related to my modification, what do you think?

Copy link
Contributor

@jose-manuel-almaza jose-manuel-almaza Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very similar to test_include_not_empty, but yes... I think we can keep it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so similar that we can merge them together! Done. Thanks

@github-actions
Copy link

github-actions bot commented Jun 15, 2023

Test Results

     975 files       975 suites   7h 6m 52s ⏱️
  5 441 tests   5 355 ✔️      72 💤 3  11 🔥
22 826 runs  19 277 ✔️ 3 535 💤 3  11 🔥

For more details on these failures and errors, see this check.

Results for commit f175a8d.

♻️ This comment has been updated with latest results.

@FlorentClarret FlorentClarret merged commit 22910f9 into master Jun 16, 2023
@FlorentClarret FlorentClarret deleted the florentclarret/base_check/discovery_compile_include branch June 16, 2023 11:46
aquiladayc added a commit that referenced this pull request Jun 28, 2023
remove lines

Update CHANGELOG.md

Draft CHANGELOG.md

Update metadata.csv

Update metadata.json

Update conftest.py

Update CHANGELOG.md

Remove change log

Update ecs_fargate.py

Add __init__ function

Fix ecs_fargate.py and add tests

Add new performance counter metrics (#14625)

* Add new cluster metrics

* Add host and VM metrics

* Add realtime fixtures

* update metadata.csv

* Add throughput metric

Move cancel waiting logic to test functions for DBMAsyncJob  (#14773)

* init commit

* revert changes to sqlserver, mysql, utils tests

Add validations for removed dependencies (#14556)

* Map out new licenses validation

* Implement validations for extra licenses

* Add constants to config.toml

* Implement license validation

* Uncomment legacy licenses validation

* Keep license command addition in same place

* Small style change

* Update config.toml override values

* Refactor

* Fix style

* Apply suggestions from code review

Co-authored-by: Ofek Lev <ofekmeister@gmail.com>

* Update suggestions

* Require CI for license validation tests and update to use empty envvars

* Fix permission for file

* Add windows version of setting github env vars

* Fix windows file

* Change to powershell script

* Output GITHUB_ENV on windows CI

* Convert entirely to powershell

* Change back to bat file

* Test DD_GITHUB_USER value

* Print github user in license test

* Manually set Github user and token in test

* Fix config_file

* Print github user

* Check if tokens are the same

* Remove additional space in bat script

* Fix style and remove test code

* Change order of scripts

* Try commenting out model.github override

* Revert previous commit

* Change to threads instead of async

* Switch out async request to requests

* Clean up

* Fix style

---------

Co-authored-by: Ofek Lev <ofekmeister@gmail.com>

DOCS-5656 gke setup links to operator/helm (#14746)

dbm-oracle-dashboard (#14736)

Internally compile the `include` patterns in the autodiscovery feature (#14768)

* Internally compile the `include` patterns in the discovery feature

* address

Use Git for versioning (#14778)

Upgrade Pydantic model code generator (#14779)

* Upgrade Pydantic model code generator

* address

build standalone binaries for ddev (#14774)

Remove `pyperclip` dependency and clipboard functionality (#14782)

[Release] Bumped datadog_checks_dev version to 20.0.0 (#14784)

* [Release] Bumped datadog_checks_dev version to 20.0.0

* [Release] Update metadata

Bump the minimum version of datadog-checks-dev (#14785)

update license path (#14783)

Allow all projects to be collected in REST implementation (#14433)

* Bug Fix Teamcity rest with all projects

* .

Rewrite Postgres size query and add `postgresql.relation.{tuples,pages,all_visible}` + toast_size metrics (#14500)

* Bumped dependency version

* Rewrite size metric query

Use new query executor
Split toast size from table size
Add partition_of tag
Add pages/tuples/allvisible metrics
Optimise query to minimise stat calls

* Add partitioned test tables

* Update metadata with new metric

* Fix version check

* Fix wal_level for tests

[SNMP] Add metadata for traps telemetry metrics (#14769)

* Add metadata for traps telemetry metrics

* Remove commas from desc 🤦‍

* Add units

Temporarily disable py2 tests on PRs (#14793)

fix(redisdb): return len of stream instead of 1 (#14722)

Currently the code compute the len of the stream but report always 1

This change fixes this.

Update Netflow dashboard (#14794)

* update Netflow dashboard

* remove datadog_demo_keep:true

* avg -> sum

* rename dashboard

revert manifest.json (#14797)

Add User Profiles support (#14752)

Remove Content (#14766)

Update wording and add extra install directions for ODBC (#14781)

* Update wording and add extra install directions for ODBC

* Update README.md

Add ability to choose tag to append to VM hostname (#14657)

* Add ability to choose tag to append to VM hostname

* Add a test for integration tags

* Sort list

* Change log to debug

* Fix style

* Allow user to choose a datadog tag for vm hostname

Fix ability to release ddev (#14790)

Disable server info and version collection when collect_server_info is false (#14610)

* if collect_server_info is set to false disable server info and version collection

* the collect metadata in check.py has to check for the collect_server_info before attempting to collect server info, even when the base url is well formated

* add testing to see if metadata is collected when collect_server_info is false

* add testing to see if metadata is collected when collect_server_info is false

* fix typo

* fix typo

* commit

* commit

* commit

* commit typo

* fix check.py

[Release] Bumped datadog_checks_dev version to 20.0.1 (#14806)

* [Release] Bumped datadog_checks_dev version to 20.0.1

* [Release] Update metadata

[Release] Bumped ddev version to 3.0.0 (#14807)

* [Release] Bumped ddev version to 3.0.0

* [Release] Update metadata

fix build flake for ddev (#14808)

Fix ddev platform installers and releasing (#14812)

Bump postgres integration to Python 3 (#14813)

update changelog generation (#14810)

Update ecs_fargate/tests/fixtures/metadata_v4.json

Co-authored-by: Cedric Lamoriniere <cedric.lamoriniere@datadoghq.com>

Update ecs_fargate/tests/fixtures/stats_linux_v4.json

Co-authored-by: Cedric Lamoriniere <cedric.lamoriniere@datadoghq.com>

Update test_unit_v4.py

Update license term

Update test_unit_v4.py

update license format

Update metadata.csv

Fix unit name

Set the `marker` option to `not e2e` by default (#14804)

Add profile for hp-ilo (#14771)

* add profile for hp-ilo

* add tests for new hp-ilo profile

* fix linter

* hp-ilo4 extends hp-ilo

* delete unnecessary product_name field

* lint

* move hp-ilo to default-profiles

Update profiles with missing devices (#14695)

* update cisco-asr

* update cisco-catalyst-wlc

* update cisco-catalyst

* update cisco-legacy-wlc

* update cisco-nexus

* update dell-poweredge

* update juniper-ex

* update juniper-mx

* add cisco-isr

* add models + move cisco5700WLC to cisco-catalyst-wlc

* move cisco-isr to default-profiles

Add profile 3com-huawei (#14694)

Revert "Set the `marker` option to `not e2e` by default (#14804)" (#14815)

This reverts commit 3f4c885.

Sort assert_device_metadata tags (#14816)

Add per vendor generic profiles (#14721)

* add dell generic profile

* add fortinet generic profile

* add juniper generic profile

* move vendor profiles to default-profiles

* add test for cisco

* add test for dell

* add test for fortinet

* add test for juniper

* linter

* linter

Update formatting for changelogs (#14814)

* Update formatting for changelogs

* Update formatting for changelogs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants