Internally compile the `include` patterns in the autodiscovery feature #14768

FlorentClarret · 2023-06-15T08:10:57Z

What does this PR do?

The autodiscovery feature added in this PR allows us to provide an include pattern as a string or directly as a regular expression. If the pattern is a string, we do not compile it and always forward the string to re.search which in turn will compile the string and use an internal cache. This PR pre-compiles the string in the discovery class to avoid multiple compilations of the same pattern.

I also added a test to ensure the filter is working as expected if we directly provide a compile regular expression.

Motivation

Even if there's an internal cache, I think we should compile them on our side to be completely sure they're not going to be recompiled each time the check runs

Additional Notes

I'm going to use this feature in this PR
The autodiscovery feature is already used in the cloudera integration
This PR is a proposition, let me know what you think!

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
PR title must be written as a CHANGELOG entry (see why)
Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
PR must have changelog/ and integration/ labels attached
If the PR doesn't need to be tested during QA, please add a qa/skip-qa label.

codecov · 2023-06-15T08:18:03Z

Codecov Report

Merging #14768 (f175a8d) into master (0b68479) will increase coverage by 0.28%.
The diff coverage is 100.00%.

Flag	Coverage Δ
airflow	`90.00% <ø> (ø)`
amazon_msk	`89.07% <ø> (ø)`
apache	`95.08% <ø> (ø)`
arangodb	`98.23% <ø> (ø)`
aspdotnet	`100.00% <ø> (+26.19%)`	⬆️
azure_iot_edge	`82.08% <ø> (ø)`
boundary	`100.00% <ø> (ø)`
btrfs	`82.91% <ø> (ø)`
cacti	`87.90% <ø> (ø)`
cassandra_nodetool	`93.16% <ø> (ø)`
ceph	`91.02% <ø> (ø)`
cilium	`75.46% <ø> (+0.92%)`	⬆️
citrix_hypervisor	`87.50% <ø> (ø)`
clickhouse	`95.49% <ø> (ø)`
cloud_foundry_api	`96.35% <ø> (+0.12%)`	⬆️
cloudera	`99.82% <ø> (ø)`
cockroachdb	`91.90% <ø> (ø)`
consul	`91.65% <ø> (ø)`
coredns	`94.57% <ø> (ø)`
couch	`95.43% <ø> (+0.24%)`	⬆️
couchbase	`84.28% <ø> (ø)`
datadog_checks_base	`89.58% <100.00%> (+0.34%)`	⬆️
datadog_checks_dev	`82.77% <ø> (+0.07%)`	⬆️
datadog_checks_downloader	`81.65% <ø> (ø)`
ddev	`99.21% <ø> (ø)`
disk	`85.03% <ø> (-6.30%)`	⬇️
dns_check	`93.90% <ø> (ø)`
dotnetclr	`91.39% <ø> (+12.90%)`	⬆️
eks_fargate	`94.05% <ø> (ø)`
elastic	`93.22% <ø> (ø)`
envoy	`95.02% <ø> (+0.42%)`	⬆️
etcd	`95.56% <ø> (-4.44%)`	⬇️
exchange_server	`96.85% <ø> (+11.81%)`	⬆️
fluentd	`94.77% <ø> (ø)`
gearmand	`78.26% <ø> (+1.24%)`	⬆️
gitlab	`92.46% <ø> (+1.22%)`	⬆️
gitlab_runner	`91.94% <ø> (ø)`
glusterfs	`80.09% <ø> (+0.92%)`	⬆️
gunicorn	`92.10% <ø> (-0.76%)`	⬇️
haproxy	`95.13% <ø> (+0.16%)`	⬆️
harbor	`80.04% <ø> (ø)`
hdfs_datanode	`89.74% <ø> (ø)`
hdfs_namenode	`86.72% <ø> (ø)`
http_check	`96.09% <ø> (+2.15%)`	⬆️
ibm_ace	`91.79% <ø> (ø)`
ibm_db2	`95.30% <ø> (ø)`
ibm_mq	`91.26% <ø> (+0.13%)`	⬆️
ibm_was	`96.08% <ø> (ø)`
iis	`95.00% <ø> (+37.60%)`	⬆️
istio	`77.43% <ø> (+0.55%)`	⬆️
kafka_consumer	`93.43% <ø> (ø)`
kong	`87.56% <ø> (ø)`
kube_dns	`95.97% <ø> (ø)`
lighttpd	`83.64% <ø> (ø)`
linkerd	`85.14% <ø> (+1.14%)`	⬆️
marathon	`83.43% <ø> (ø)`
marklogic	`96.46% <ø> (ø)`
mesos_master	`89.75% <ø> (ø)`
mesos_slave	`93.63% <ø> (ø)`
mongo	`96.55% <ø> (ø)`
mysql	`87.23% <ø> (ø)`
nfsstat	`95.20% <ø> (ø)`
nginx	`95.24% <ø> (+0.54%)`	⬆️
openmetrics	`98.08% <ø> (ø)`
openstack	`51.45% <ø> (ø)`
oracle	`89.78% <ø> (ø)`
pdh_check	`97.82% <ø> (ø)`
pgbouncer	`91.33% <ø> (ø)`
postfix	`88.04% <ø> (ø)`
postgres	`91.23% <ø> (+0.04%)`	⬆️
powerdns_recursor	`96.65% <ø> (ø)`
prometheus	`94.17% <ø> (ø)`
proxysql	`98.97% <ø> (ø)`
rabbitmq	`96.04% <ø> (ø)`
redisdb	`87.89% <ø> (ø)`
rethinkdb	`97.93% <ø> (ø)`
riak	`99.22% <ø> (ø)`
riakcs	`93.61% <ø> (ø)`
sap_hana	`91.64% <ø> (+0.26%)`	⬆️
scylla	`100.00% <ø> (ø)`
singlestore	`90.81% <ø> (ø)`
snmp	`82.34% <ø> (+0.03%)`	⬆️
sonarqube	`98.24% <ø> (ø)`
spark	`93.63% <ø> (ø)`
sqlserver	`84.92% <ø> (-1.62%)`	⬇️
squid	`100.00% <ø> (ø)`
ssh_check	`91.58% <ø> (ø)`
strimzi	`89.06% <ø> (ø)`
supervisord	`92.69% <ø> (ø)`
system_core	`90.90% <ø> (ø)`
tcp_check	`92.92% <ø> (ø)`
teradata	`94.24% <ø> (ø)`
tokumx	`58.40% <ø> (ø)`
varnish	`84.39% <ø> (+0.26%)`	⬆️
vault	`95.53% <ø> (+0.57%)`	⬆️
vertica	`98.50% <ø> (ø)`
vsphere	`90.58% <ø> (+0.08%)`	⬆️
win32_event_log	`86.40% <ø> (+0.27%)`	⬆️
windows_performance_counters	`98.36% <ø> (ø)`
windows_service	`98.00% <ø> (ø)`
wmi_check	`92.91% <ø> (ø)`
zk	`82.62% <ø> (+1.33%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

jose-manuel-almaza

I consider this PR as a pure small refactor. In my opinion no new unit tests needed, since we test the public interface of the class, not the implementation

jose-manuel-almaza · 2023-06-15T08:40:37Z

datadog_checks_base/tests/base/utils/discovery/test_discovery.py

+    d = Discovery(mock_get_items, include={'a.*': None, 'b.*': None})
+    assert list(d.get_items()) == [('a.*', 'a', 'a', None), ('b.*', 'b', 'b', None)]
+    assert mock_get_items.call_count == 1
+    assert d._filter._compiled_include_patterns == {p: re.compile(p) for p in ['a.*', 'b.*']}


You are refactoring and not adding new functionality. In that case no new unit tests should be needed, and the proof of that is that you are checking private properties of the class that might be refactored in the future and force you to change your unit tests then, when that shouldn't be necessary.

Sounds good to me. I think we can keep the other test even if it's not directly related to my modification, what do you think?

It is very similar to test_include_not_empty, but yes... I think we can keep it

so similar that we can merge them together! Done. Thanks

github-actions · 2023-06-15T09:01:04Z

Test Results

    975 files     975 suites 7h 6m 52s ⏱️
  5 441 tests   5 355 ✔️     72 💤 3 ❌ 11 🔥
22 826 runs 19 277 ✔️ 3 535 💤 3 ❌ 11 🔥

For more details on these failures and errors, see this check.

Results for commit f175a8d.

♻️ This comment has been updated with latest results.

remove lines Update CHANGELOG.md Draft CHANGELOG.md Update metadata.csv Update metadata.json Update conftest.py Update CHANGELOG.md Remove change log Update ecs_fargate.py Add __init__ function Fix ecs_fargate.py and add tests Add new performance counter metrics (#14625) * Add new cluster metrics * Add host and VM metrics * Add realtime fixtures * update metadata.csv * Add throughput metric Move cancel waiting logic to test functions for DBMAsyncJob (#14773) * init commit * revert changes to sqlserver, mysql, utils tests Add validations for removed dependencies (#14556) * Map out new licenses validation * Implement validations for extra licenses * Add constants to config.toml * Implement license validation * Uncomment legacy licenses validation * Keep license command addition in same place * Small style change * Update config.toml override values * Refactor * Fix style * Apply suggestions from code review Co-authored-by: Ofek Lev <ofekmeister@gmail.com> * Update suggestions * Require CI for license validation tests and update to use empty envvars * Fix permission for file * Add windows version of setting github env vars * Fix windows file * Change to powershell script * Output GITHUB_ENV on windows CI * Convert entirely to powershell * Change back to bat file * Test DD_GITHUB_USER value * Print github user in license test * Manually set Github user and token in test * Fix config_file * Print github user * Check if tokens are the same * Remove additional space in bat script * Fix style and remove test code * Change order of scripts * Try commenting out model.github override * Revert previous commit * Change to threads instead of async * Switch out async request to requests * Clean up * Fix style --------- Co-authored-by: Ofek Lev <ofekmeister@gmail.com> DOCS-5656 gke setup links to operator/helm (#14746) dbm-oracle-dashboard (#14736) Internally compile the `include` patterns in the autodiscovery feature (#14768) * Internally compile the `include` patterns in the discovery feature * address Use Git for versioning (#14778) Upgrade Pydantic model code generator (#14779) * Upgrade Pydantic model code generator * address build standalone binaries for ddev (#14774) Remove `pyperclip` dependency and clipboard functionality (#14782) [Release] Bumped datadog_checks_dev version to 20.0.0 (#14784) * [Release] Bumped datadog_checks_dev version to 20.0.0 * [Release] Update metadata Bump the minimum version of datadog-checks-dev (#14785) update license path (#14783) Allow all projects to be collected in REST implementation (#14433) * Bug Fix Teamcity rest with all projects * . Rewrite Postgres size query and add `postgresql.relation.{tuples,pages,all_visible}` + toast_size metrics (#14500) * Bumped dependency version * Rewrite size metric query Use new query executor Split toast size from table size Add partition_of tag Add pages/tuples/allvisible metrics Optimise query to minimise stat calls * Add partitioned test tables * Update metadata with new metric * Fix version check * Fix wal_level for tests [SNMP] Add metadata for traps telemetry metrics (#14769) * Add metadata for traps telemetry metrics * Remove commas from desc 🤦‍ * Add units Temporarily disable py2 tests on PRs (#14793) fix(redisdb): return len of stream instead of 1 (#14722) Currently the code compute the len of the stream but report always 1 This change fixes this. Update Netflow dashboard (#14794) * update Netflow dashboard * remove datadog_demo_keep:true * avg -> sum * rename dashboard revert manifest.json (#14797) Add User Profiles support (#14752) Remove Content (#14766) Update wording and add extra install directions for ODBC (#14781) * Update wording and add extra install directions for ODBC * Update README.md Add ability to choose tag to append to VM hostname (#14657) * Add ability to choose tag to append to VM hostname * Add a test for integration tags * Sort list * Change log to debug * Fix style * Allow user to choose a datadog tag for vm hostname Fix ability to release ddev (#14790) Disable server info and version collection when collect_server_info is false (#14610) * if collect_server_info is set to false disable server info and version collection * the collect metadata in check.py has to check for the collect_server_info before attempting to collect server info, even when the base url is well formated * add testing to see if metadata is collected when collect_server_info is false * add testing to see if metadata is collected when collect_server_info is false * fix typo * fix typo * commit * commit * commit * commit typo * fix check.py [Release] Bumped datadog_checks_dev version to 20.0.1 (#14806) * [Release] Bumped datadog_checks_dev version to 20.0.1 * [Release] Update metadata [Release] Bumped ddev version to 3.0.0 (#14807) * [Release] Bumped ddev version to 3.0.0 * [Release] Update metadata fix build flake for ddev (#14808) Fix ddev platform installers and releasing (#14812) Bump postgres integration to Python 3 (#14813) update changelog generation (#14810) Update ecs_fargate/tests/fixtures/metadata_v4.json Co-authored-by: Cedric Lamoriniere <cedric.lamoriniere@datadoghq.com> Update ecs_fargate/tests/fixtures/stats_linux_v4.json Co-authored-by: Cedric Lamoriniere <cedric.lamoriniere@datadoghq.com> Update test_unit_v4.py Update license term Update test_unit_v4.py update license format Update metadata.csv Fix unit name Set the `marker` option to `not e2e` by default (#14804) Add profile for hp-ilo (#14771) * add profile for hp-ilo * add tests for new hp-ilo profile * fix linter * hp-ilo4 extends hp-ilo * delete unnecessary product_name field * lint * move hp-ilo to default-profiles Update profiles with missing devices (#14695) * update cisco-asr * update cisco-catalyst-wlc * update cisco-catalyst * update cisco-legacy-wlc * update cisco-nexus * update dell-poweredge * update juniper-ex * update juniper-mx * add cisco-isr * add models + move cisco5700WLC to cisco-catalyst-wlc * move cisco-isr to default-profiles Add profile 3com-huawei (#14694) Revert "Set the `marker` option to `not e2e` by default (#14804)" (#14815) This reverts commit 3f4c885. Sort assert_device_metadata tags (#14816) Add per vendor generic profiles (#14721) * add dell generic profile * add fortinet generic profile * add juniper generic profile * move vendor profiles to default-profiles * add test for cisco * add test for dell * add test for fortinet * add test for juniper * linter * linter Update formatting for changelogs (#14814) * Update formatting for changelogs * Update formatting for changelogs

Internally compile the include patterns in the discovery feature

e0bf0a9

FlorentClarret requested a review from a team as a code owner June 15, 2023 08:10

ghost added the base_package label Jun 15, 2023

FlorentClarret added the changelog/Added label Jun 15, 2023

FlorentClarret changed the title ~~Internally compile the include patterns in the discovery feature~~ Internally compile the include patterns in the autodiscovery feature Jun 15, 2023

jose-manuel-almaza requested changes Jun 15, 2023

View reviewed changes

address

f175a8d

FlorentClarret requested a review from jose-manuel-almaza June 15, 2023 11:09

ofek approved these changes Jun 15, 2023

View reviewed changes

jose-manuel-almaza approved these changes Jun 15, 2023

View reviewed changes

FlorentClarret merged commit 22910f9 into master Jun 16, 2023

FlorentClarret deleted the florentclarret/base_check/discovery_compile_include branch June 16, 2023 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internally compile the `include` patterns in the autodiscovery feature #14768

Internally compile the `include` patterns in the autodiscovery feature #14768

FlorentClarret commented Jun 15, 2023 •

edited

Loading

codecov bot commented Jun 15, 2023 •

edited

Loading

jose-manuel-almaza left a comment

jose-manuel-almaza Jun 15, 2023

FlorentClarret Jun 15, 2023

jose-manuel-almaza Jun 15, 2023 •

edited

Loading

FlorentClarret Jun 15, 2023

github-actions bot commented Jun 15, 2023 •

edited

Loading

Internally compile the include patterns in the autodiscovery feature #14768

Internally compile the include patterns in the autodiscovery feature #14768

Conversation

FlorentClarret commented Jun 15, 2023 • edited Loading

What does this PR do?

Motivation

Additional Notes

Review checklist (to be filled by reviewers)

codecov bot commented Jun 15, 2023 • edited Loading

Codecov Report

jose-manuel-almaza left a comment

Choose a reason for hiding this comment

jose-manuel-almaza Jun 15, 2023

Choose a reason for hiding this comment

FlorentClarret Jun 15, 2023

Choose a reason for hiding this comment

jose-manuel-almaza Jun 15, 2023 • edited Loading

Choose a reason for hiding this comment

FlorentClarret Jun 15, 2023

Choose a reason for hiding this comment

github-actions bot commented Jun 15, 2023 • edited Loading

Test Results

Internally compile the `include` patterns in the autodiscovery feature #14768

Internally compile the `include` patterns in the autodiscovery feature #14768

FlorentClarret commented Jun 15, 2023 •

edited

Loading

codecov bot commented Jun 15, 2023 •

edited

Loading

jose-manuel-almaza Jun 15, 2023 •

edited

Loading

github-actions bot commented Jun 15, 2023 •

edited

Loading