Skip inaccessible databases in DBM schema collection instead of aborting by pierreln-dd · Pull Request #23807 · DataDog/integrations-core

pierreln-dd · 2026-05-22T09:54:15Z

What does this PR do?

When a database cannot be opened during schema collection, the per-database error is now caught and the database is skipped with a warning instead of aborting collection for all databases. All other databases on the instance continue to be collected and flushed normally.

Changes across datadog_checks_base, sqlserver, and postgres:

Error isolation: _collect_database_schemas (extracted helper) wraps each database in a try/except guarded by _is_connection_error. SQL Server catches pyodbc.Error; Postgres catches psycopg.OperationalError. Internal errors (logic bugs, uninitialized state) propagate as before.
Observability: dd.<dbms>.schema.skipped_databases_count gauge emitted each run; status:partial tag on all schema metrics when at least one database was skipped.
Logging: skip warning includes the stack trace (exc_info=True).
Protocol correctness: terminal maybe_flush(is_last_payload=True) always fires so the backend receives collection_payloads_count regardless of skip count.
Readability: extracted _collect_database_schemas to reduce nesting depth in collect_schemas.

Motivation

When database_autodiscovery discovers a database that cannot be opened, _get_cursor() raises a driver-level error. The outer collect_schemas() previously caught and re-raised this, discarding all schemas collected from databases processed earlier in the same run. Accessible databases on the same instance reported no schema data.

This affects SQL Server and Postgres integrations that inherit from the shared SchemaCollector base class.

Related escalation: SDBM-2634.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

When a database is unreachable (e.g. version mismatch, restricted state), _get_cursor raises and the outer collect_schemas re-raises, discarding all schemas already collected for other databases. Wrap per-database iteration in a try/except so that a single failing database is skipped with a warning and collection continues for the rest. Move the final flush outside the loop to ensure queued rows are submitted regardless of which database was last. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

datadog-prod-us1-3 · 2026-05-22T10:00:05Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

PR All | test / j563739d / Gitlab Runner

🔄 Retry job. This looks flaky and may succeed on retry.
Http status code 502 on url http://localhost:8085/ci

PR All | test / j7faf392 / CockroachDB

🔄 Retry job. This looks flaky and may succeed on retry.
Agent check failed for 'cockroachdb', context canceled due to unable to start CMD API server: unable to listen on localhost:32967: address already in use.

🧪 1 Test failed in 1 job

PR All | run

test_version_metadata from test_integration.py

(Fix with Cursor)


Traceback (most recent call last):
  File &#34;/home/runner/work/integrations-core/integrations-core/datadog_checks_base/datadog_checks/base/checks/base.py&#34;, line 1423, in run
    self.check(instance)
    ~~~~~~~~~~^^^^^^^^^^
  File &#34;/home/runner/work/integrations-core/integrations-core/datadog_checks_base/datadog_checks/base/utils/tracing.py&#34;, line 55, in wrapper
    return f(self, *args, **kwargs)
  File &#34;/home/runner/work/integrations-core/integrations-core/datadog_checks_base/datadog_checks/base/utils/tracing.py&#34;, line 55, in wrapper
    return f(self, *args, **kwargs)
  File &#34;/home/runner/work/integrations-core/integrations-core/gitlab_runner/datadog_checks/gitlab_runner/gitlab_runner.py&#34;, line 65, in check
...

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 90.62%
• Overall Coverage: 87.47%

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 2338fe4 | Docs | Datadog PR Page | Give us feedback!}

codecov · 2026-05-22T10:15:34Z

Codecov Report

❌ Patch coverage is 90.62500% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.90%. Comparing base (411c31d) to head (2338fe4).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Add _is_connection_error hook so subclasses can restrict which errors are treated as recoverable (defaults to True to preserve behavior) - Remove redundant continue in except block - Add exc_info=True to warning so stack trace is preserved in logs - Track _skipped_databases_count and emit it as a gauge metric so partial collection failures are observable - Add unit tests for error isolation and the _is_connection_error hook Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…add sqlserver override - Guard final maybe_flush so no empty payload is emitted when all databases are skipped or no data was collected (fixes new behavioral regression) - Add SQLServerSchemaCollector._is_connection_error that only catches pyodbc errors, preventing internal errors like uninitialized pre-2017 cursor from being silently swallowed and misclassified as per-database skips - Add test asserting no payload is emitted when all databases fail Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

collect_schemas was reaching 6 levels of nesting. Extracting per-database logic into _collect_database_schemas brings the max depth to 4 and makes the high-level flow readable without scrolling past error handling. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…erminal flush - Add PostgresSchemaCollector._is_connection_error to narrow recovery to psycopg.Error only, matching the SQL Server approach - Remove the empty-payload guard: always emit the terminal flush so the backend snapshotting protocol receives collection_payloads_count regardless of how many databases were collected - Emit status:partial on schema metrics when one or more databases were skipped, distinguishing partial collection from full success or hard error - Update tests to assert status:partial tag and terminal payload presence Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

… customer-facing Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Rewrite sqlserver changelog to customer-facing language without implementation examples - Remove example details from _is_connection_error docstrings - Narrow PostgresSchemaCollector._is_connection_error to psycopg.OperationalError so only connection-level failures skip a database (ProgrammingError and other non-access errors propagate as hard failures) - Add unit tests in sqlserver and postgres verifying the override catches driver errors but not internal exceptions Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2231e1168f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T14:03:47Z

+                while next_row:
+                    self._queued_rows.append(self._map_row(database, next_row))
+                    self._total_rows_count += 1
+                    next_row = self._get_next(cursor)
+                    self.maybe_flush(is_last_payload=False)


Avoid emitting rows for a database later skipped

When a recoverable driver error happens after this loop has already appended rows for the current database (or after maybe_flush has sent a chunk), the except below counts the database as skipped but leaves those partial rows in the snapshot. This can happen for SQL Server legacy collection because _map_row runs additional per-table queries, or for any cursor that fails during fetch, so an inaccessible/flaky database may still replace backend metadata with an incomplete schema. Buffer per-database rows and only merge/flush them after the database completes, or only apply the skip path to cursor-open failures.

Useful? React with 👍 / 👎.

Good catch. Addressed in the latest commit: unflushed rows accumulated for the current database are now discarded from _queued_rows in the except block before the skip is recorded. For the common case (database fails before or early in collection, no chunk flush yet) the snapshot stays clean. For large databases where a mid-loop maybe_flush already sent a chunk, those rows can't be recalled — a separate warning is emitted to make the incomplete snapshot visible in logs.

If an error fires after some rows have already been appended to the queue (e.g. _map_row failure on the nth table), roll back the unflushed rows for that database so they don't pollute the snapshot. Rows already sent via a mid-loop chunk flush cannot be recalled; log a warning in that case so the incomplete snapshot is visible. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

dd-octo-sts · 2026-05-22T14:42:51Z

Validation Report

All 21 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and Codecov settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`qa-label`	Validate the pull request declares whether it needs QA for the next Agent release	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

temporal-github-worker-1 Bot added agent/review-requested ecosystems/review-requested product/review-requested labels May 22, 2026

dd-octo-sts Bot added the base_package label May 22, 2026

pierreln-dd added the qa/required QA is required for this PR and will generate a QA card label May 22, 2026

Add changelog entry for #23807

cb49e3e

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

pierreln-dd and others added 2 commits May 22, 2026 13:08

dd-octo-sts Bot added the integration/sqlserver label May 22, 2026

pierreln-dd and others added 2 commits May 22, 2026 14:12

dd-octo-sts Bot added the integration/postgres label May 22, 2026

pierreln-dd and others added 3 commits May 22, 2026 14:45

Update changelogs: drop postgres internal entry, rewrite sqlserver to…

483702d

… customer-facing Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Re-add postgres changelog entry required by CI

2231e11

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

pierreln-dd marked this pull request as ready for review May 22, 2026 14:01

pierreln-dd requested review from a team as code owners May 22, 2026 14:01

dd-octo-sts Bot added team/agent-integrations team/database-monitoring-agent labels May 22, 2026

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip inaccessible databases in DBM schema collection instead of aborting#23807

Skip inaccessible databases in DBM schema collection instead of aborting#23807
pierreln-dd wants to merge 10 commits into
masterfrom
pierreln-dd/SDBM-2634-schema-collector-per-db-error-isolation

pierreln-dd commented May 22, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-3 Bot commented May 22, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

Uh oh!

codecov Bot commented May 22, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Uh oh!

pierreln-dd May 22, 2026

Uh oh!

dd-octo-sts Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pierreln-dd commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

datadog-prod-us1-3 Bot commented May 22, 2026 • edited by datadog-datadog-prod-us1-2 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

codecov Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

pierreln-dd May 22, 2026

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented May 22, 2026

Validation Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pierreln-dd commented May 22, 2026 •

edited

Loading

datadog-prod-us1-3 Bot commented May 22, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

codecov Bot commented May 22, 2026 •

edited

Loading