Skip to content

Add SAP HANA schema collection and diagnostics#23934

Merged
aboitreaud merged 33 commits into
masterfrom
pawel.leszczynski/sap-hana-schema-colletor
Jun 22, 2026
Merged

Add SAP HANA schema collection and diagnostics#23934
aboitreaud merged 33 commits into
masterfrom
pawel.leszczynski/sap-hana-schema-colletor

Conversation

@pawel-big-lebowski

@pawel-big-lebowski pawel-big-lebowski commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds Database Monitoring schema collection and startup diagnostics to the SAP HANA integration.

  • Schema collection (schemas.py): a HanaSchemaCollector built on the shared datadog_checks.base.utils.db.schemas.SchemaCollector base class (same as postgres). Queries SYS.M_TABLES, SYS.VIEWS, SYS.TABLE_COLUMNS, and SYS.VIEW_COLUMNS and emits kind=saphana_databases metadata payloads (schemas → tables/views → columns). System schemas are skipped; include_schemas/exclude_schemas/max_tables/max_columns limits are honored.
  • Runtime stats: row_count (from SYS.M_TABLES.RECORD_COUNT) and last_updated_on (from SYS.M_TABLE_STATISTICS.LAST_MODIFY_TIME) are included per table. The collector probes for SYS.M_TABLE_STATISTICS access at first run and omits the LEFT JOIN silently when the monitoring user lacks the privilege.
  • Views: SYS.VIEWS is queried separately (not present in SYS.M_TABLES) with columns sourced from SYS.VIEW_COLUMNS (SYS.TABLE_COLUMNS does not cover views in HANA).
  • Memory optimizations:
    • Table/column limits pushed into SQL via a ROW_NUMBER() OVER (PARTITION BY schema, table) CTE so the server only sends capped rows.
    • HanaSchemaCollector overrides maybe_flush to flush after every 50,000 columns rather than the base class default of 10,000 tables. For wide tables this keeps _queued_rows bounded (93.7 MiB peak RSS unlimited vs 61.5 MiB limited on a 1000×1000 schema).
    • Default limits: max_tables=2000, max_columns=500.
    • Memory benchmark under benchmarks/schema_collection_memory/ (1000×1000 schema, limited vs unlimited modes, isolated subprocesses for clean RSS measurement).
  • Diagnostics (diagnose.py): checks connectivity, minimum supported version (2.x), and per-view catalog access (SYS.SCHEMAS, SYS.M_TABLES, SYS.VIEWS, SYS.TABLE_COLUMNS, SYS.VIEW_COLUMNS), distinguishing privilege errors from generic failures with actionable remediation.
  • Configuration: collect_schemas option (enabled, collection_interval, max_tables, max_columns, include_schemas, exclude_schemas) in spec.yaml. Marked hidden: true and omitted from conf.yaml.example until backend quality monitors are ready. Disabled by default.
  • Wiring (sap_hana.py): time-gated _maybe_collect_schemas() invoked from check().
  • Docs (README.md): grants for SYS.SCHEMAS, SYS.M_TABLES, SYS.VIEWS, SYS.TABLE_COLUMNS, SYS.VIEW_COLUMNS, and SYS.M_TABLE_STATISTICS (optional), plus a "Schema collection" configuration section.

Motivation

Bring SAP HANA in line with other Database Monitoring integrations (postgres, mysql, sqlserver, clickhouse) by surfacing catalog metadata for Data Quality features in Data Observability, including live row counts, last modification times, and view definitions.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@datadog-datadog-prod-us1-2

datadog-datadog-prod-us1-2 Bot commented Jun 5, 2026

Copy link
Copy Markdown

Tests  Code Coverage

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 97.74%
Overall Coverage: 94.26% (+6.44%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 1e6900e | Docs | Datadog PR Page | Give us feedback!

@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review June 8, 2026 11:09
@pawel-big-lebowski pawel-big-lebowski requested review from a team as code owners June 8, 2026 11:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18eb96d7d2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sap_hana/datadog_checks/sap_hana/schemas.py Outdated
Comment on lines +119 to +121
with closing(conn.cursor()) as cursor:
cursor.execute(COLUMNS_QUERY)
for row in cursor.fetchall():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply table limits before loading all columns

With schema collection enabled on a large HANA tenant, this fetchall() loads every row from SYS.TABLE_COLUMNS for every visible table before max_tables and max_columns are applied in Python. The default max_tables: 300 therefore does not bound the catalog scan or memory use, so an instance with thousands of tables can spend each collection interval scanning/loading millions of column rows just to emit 300 tables. Push the schema/table filters and limits into the catalog query (or stream only columns for the selected tables) so the configured limits actually cap collection cost.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already resolved: `max_tables`/`max_views` are enforced as `LIMIT` in the table/view CTEs and `max_columns` via a `ROW_NUMBER()` window, so the catalog scan is bounded in SQL. Rows are streamed through the cursor and grouped per object — there is no `fetchall()` of every column anymore.

@drichards-87 drichards-87 self-assigned this Jun 8, 2026
drichards-87
drichards-87 previously approved these changes Jun 8, 2026

@drichards-87 drichards-87 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of suggestions from Docs and approved the PR.

Comment thread sap_hana/README.md Outdated
Comment thread sap_hana/README.md Outdated
@drichards-87 drichards-87 removed their assignment Jun 8, 2026
@temporal-github-worker-1 temporal-github-worker-1 Bot dismissed drichards-87’s stale review June 9, 2026 09:18

Review from drichards-87 is dismissed. Related teams and files:

  • documentation
    • sap_hana/README.md
@pawel-big-lebowski pawel-big-lebowski requested review from a team as code owners June 15, 2026 08:22
return
if time.time() - self._last_schema_collection_time < self._schema_collection_interval:
return
self._last_schema_collection_time = time.time()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler state is updated before collection succeeds, so a transient failure suppresses retries for the full interval.

For example, at t=0 a temporary DB timeout raises in collect_schemas(), _last_schema_collection_time is still set to t=0, and with collection_interval=600 the next healthy run at t=10 is skipped until t>=600 even though the dependency recovered.

Please move the timestamp update to a success path (or track separate attempt/success timestamps with a shorter failure backoff) so transient failures are retried promptly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: `_last_schema_collection_time` is now advanced only on the success path, so a transient failure no longer suppresses retries for the full interval — the next check run retries promptly. Added unit tests covering both the failure (no advance) and success (advance + gate) cases. Done in 3a4f66a.

Comment thread sap_hana/datadog_checks/sap_hana/diagnose.py

@iliakur iliakur left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the large feature push — I left inline comments for three behavior issues that need resolution.

Also, from the abstraction review: HanaSchemaCollector currently carries query-policy composition, row grouping, and custom flush logic in one place, and the large SQL template increases navigation cost during maintenance. Please either add focused maintainer docs that explain the control flow and policy decisions end-to-end, or restructure this collector to reduce the amount of cross-method/context jumping required to reason about it.

pawel-big-lebowski and others added 4 commits June 18, 2026 11:16
- Add max_views config option (SQL LIMIT on the views CTE) so view
  collection is bounded like tables/columns.
- Update _maybe_collect_schemas to advance the schedule only on success
  so transient failures are retried promptly instead of being suppressed
  for a full interval.
- Classify version-query failures: privilege/access errors reading
  SYS.M_DATABASE now report the privilege/access diagnostic instead of a
  misleading "version unsupported" result.
- Drop the hostname fallback in _get_databases; skip collection and warn
  when the current database can't be determined to avoid mislabeled data.
- Extract HanaSchemaQueryBuilder to separate SQL/query-policy concerns
  from the collector's streaming and flush logic.
- Document why payloads flush by accumulated column count, referencing
  the schema-collection memory benchmark.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SYS views on single-tenant HANA Express lack the DATABASE_NAME column,
so inject a constant SYSTEMDB value and drop the GROUP BY for SYS-schema
queries, and use FILE_SIZE instead of TOTAL_SIZE for global disk usage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document the end-to-end control flow, the query-builder/collector
responsibility split, and the collection-policy decisions (SQL vs
client-side caps, system-schema exclusion, optional stats join) in a
module docstring, with a pointer to the memory benchmark. Addresses the
reviewer's request to reduce cross-method context jumping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pawel-big-lebowski

Copy link
Copy Markdown
Contributor Author

@iliakur on the abstraction/maintainability point: addressed both ways you suggested.

  • Restructured: extracted `HanaSchemaQueryBuilder`, which now owns all SQL/query-policy composition (the catalog query template, schema include/exclude filters, the optional stats join, and the SQL-level table/view/column caps). `HanaSchemaCollector` is left with just the runtime concerns — streaming, row grouping, and payload flushing.
  • Documented: added a module-level docstring to `schemas.py` that maps the end-to-end control flow (`_get_databases → _get_cursor → _get_next → _map_row → maybe_flush`), the builder/collector responsibility split, and the collection-policy decisions (SQL vs client-side caps, system-schema exclusion, the graceful-degradation stats join), with a pointer to the memory benchmark that justifies the column-count flush threshold.

Restructure in 3a4f66a, docstring in 73481a5. The individual inline threads are answered above.


@property
def database_identifier(self):
return '{}:{}'.format(self._server, self._port)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to reimplement this into templates with resolved_hostname as done for postgres, but I prefer preparing a separate PR for this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by side note, I'm incrementally working on lifting much of this duplicated DBM derived logic into DatabaseCheck class within datadog_checks_base

iliakur
iliakur previously approved these changes Jun 19, 2026
@pawel-big-lebowski

Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Jun 19, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-06-19 10:53:01 UTC ℹ️ Start processing command /merge


2026-06-19 10:53:09 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-06-19 11:37:12 UTC ⚠️ MergeQueue: This merge request was unqueued

pawel.leszczynski@datadoghq.com unqueued this merge request

@pawel-big-lebowski

Copy link
Copy Markdown
Contributor Author

/merge -c

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Jun 19, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-06-19 11:37:05 UTC ℹ️ Start processing command /merge -c

@pawel-big-lebowski

Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Jun 19, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-06-19 11:39:51 UTC ℹ️ Start processing command /merge


2026-06-19 11:39:59 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-06-19 15:45:07 UTC ⚠️ MergeQueue: This merge request was unqueued

devflow unqueued this merge request: It did not become mergeable within the expected time

janine-c
janine-c previously approved these changes Jun 19, 2026

@janine-c janine-c left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! These are just some really minor writing suggestions. Feel free to ping me directly if you need additional approvals 🙂

Comment thread sap_hana/assets/configuration/spec.yaml Outdated
- name: include_schemas
description: |
A list of schema names to include. Any schema whose name is in
this list will be included. If empty, all schemas (other than

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this list will be included. If empty, all schemas (other than
this list are included. If empty, all schemas (other than

We try to avoid using the future tense in docs, because then readers can be like "but when?", when present tense can kind of subtly avoid that.

Comment thread sap_hana/assets/configuration/spec.yaml Outdated
- name: exclude_schemas
description: |
A list of schema names to exclude. Any schema whose name is in
this list will be excluded. SAP HANA system schemas are always

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this list will be excluded. SAP HANA system schemas are always
this list is excluded. SAP HANA system schemas are always

# 2. Populate the database and run both modes.
python benchmark.py

# Re-run measurements without recreating the schema:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Re-run measurements without recreating the schema:
# Re-run measurements without recreating the schema.

For consistency 🤓

**Column-based flush threshold (1.5x RSS reduction with limits)**
The base class `payload_chunk_size` counts tables, which is a poor proxy for memory when
tables are wide. `HanaSchemaCollector` overrides `maybe_flush` to flush after every
`PAYLOAD_COLUMN_CHUNK_SIZE` (50,000) columns instead. For 1000-column tables this flushes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`PAYLOAD_COLUMN_CHUNK_SIZE` (50,000) columns instead. For 1000-column tables this flushes
`PAYLOAD_COLUMN_CHUNK_SIZE` (50,000) columns instead. For 1000-column tables, this flushes

## Expected outcome

The limited run (max\_tables=300, max\_columns=50) processes 300 tables × 50 columns =
15 000 column dicts. The unlimited run processes 1000 tables × 1000 columns = 1 000 000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
15 000 column dicts. The unlimited run processes 1000 tables × 1000 columns = 1 000 000
15,000 column dicts. The unlimited run processes 1000 tables × 1000 columns = 1,000,000

For consistency with other large numbers


## Notes on memory investigation

**`setfetchsize` (no effect on memory)**

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a formatting best practice, I would suggest converting the bolded lines in this section to H3s, to make this section easier to navigate, and more accessible for users on screen readers.

Comment thread sap_hana/README.md Outdated

#### Schema collection

The Agent can collect SAP HANA catalog metadata (schemas, tables, views, and columns) for Data Quality features in Data Observability. When the monitoring user has access to `SYS.M_TABLE_STATISTICS`, the Agent also collects row counts and last modification times for tables. Collection is disabled by default. To enable it, ensure that the monitoring user can read the required views (see [Granting privileges](#granting-privileges)) and add the following block to your `sap_hana.d/conf.yaml` file:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Agent can collect SAP HANA catalog metadata (schemas, tables, views, and columns) for Data Quality features in Data Observability. When the monitoring user has access to `SYS.M_TABLE_STATISTICS`, the Agent also collects row counts and last modification times for tables. Collection is disabled by default. To enable it, ensure that the monitoring user can read the required views (see [Granting privileges](#granting-privileges)) and add the following block to your `sap_hana.d/conf.yaml` file:
The Agent can collect SAP HANA catalog metadata (schemas, tables, views, and columns) for Data Quality features in Data Observability. When the monitoring user has access to `SYS.M_TABLE_STATISTICS`, the Agent also collects row counts and last modification times for tables. Collection is disabled by default. To enable schema collection, ensure that the monitoring user can read the required views (see [Granting privileges](#granting-privileges)) and add the following block to your `sap_hana.d/conf.yaml` file:

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@janine-c janine-c left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry! Made a typo here

Comment thread sap_hana/assets/configuration/spec.yaml Outdated
- name: include_schemas
description: |
A list of schema names to include. Any schema whose name is in
this list are included. If empty, all schemas (other than

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this list are included. If empty, all schemas (other than
this list is included. If empty, all schemas (other than

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janine-c are you sure? I thought it refers to schemas - plural.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to "any schema," and "name" is singular, so I think "is" makes more sense. We could say "any schemas whose names are in this list are included" as well, if you prefer the plural?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, sending an agent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@janine-c janine-c left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 🚀

@pawel-big-lebowski

Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Jun 19, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-06-19 16:56:11 UTC ℹ️ Start processing command /merge


2026-06-19 16:56:17 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-06-19 21:15:09 UTC ⚠️ MergeQueue: This merge request was unqueued

devflow unqueued this merge request: It did not become mergeable within the expected time

@pawel-big-lebowski

Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Jun 22, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-06-22 07:15:36 UTC ℹ️ Start processing command /merge


2026-06-22 07:15:44 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-06-22 07:40:01 UTC ⚠️ MergeQueue: This merge request was unqueued

pawel.leszczynski@datadoghq.com unqueued this merge request

@pawel-big-lebowski

Copy link
Copy Markdown
Contributor Author

/merge -c

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Jun 22, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-06-22 07:39:53 UTC ℹ️ Start processing command /merge -c

@dd-octo-sts

dd-octo-sts Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and code coverage settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants