Add SAP HANA schema collection and diagnostics#23934
Conversation
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 1e6900e | Docs | Datadog PR Page | Give us feedback! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 18eb96d7d2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| with closing(conn.cursor()) as cursor: | ||
| cursor.execute(COLUMNS_QUERY) | ||
| for row in cursor.fetchall(): |
There was a problem hiding this comment.
Apply table limits before loading all columns
With schema collection enabled on a large HANA tenant, this fetchall() loads every row from SYS.TABLE_COLUMNS for every visible table before max_tables and max_columns are applied in Python. The default max_tables: 300 therefore does not bound the catalog scan or memory use, so an instance with thousands of tables can spend each collection interval scanning/loading millions of column rows just to emit 300 tables. Push the schema/table filters and limits into the catalog query (or stream only columns for the selected tables) so the configured limits actually cap collection cost.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Already resolved: `max_tables`/`max_views` are enforced as `LIMIT` in the table/view CTEs and `max_columns` via a `ROW_NUMBER()` window, so the catalog scan is bounded in SQL. Rows are streamed through the cursor and grouped per object — there is no `fetchall()` of every column anymore.
drichards-87
left a comment
There was a problem hiding this comment.
Left a couple of suggestions from Docs and approved the PR.
Review from drichards-87 is dismissed. Related teams and files:
- documentation
- sap_hana/README.md
| return | ||
| if time.time() - self._last_schema_collection_time < self._schema_collection_interval: | ||
| return | ||
| self._last_schema_collection_time = time.time() |
There was a problem hiding this comment.
The scheduler state is updated before collection succeeds, so a transient failure suppresses retries for the full interval.
For example, at t=0 a temporary DB timeout raises in collect_schemas(), _last_schema_collection_time is still set to t=0, and with collection_interval=600 the next healthy run at t=10 is skipped until t>=600 even though the dependency recovered.
Please move the timestamp update to a success path (or track separate attempt/success timestamps with a shorter failure backoff) so transient failures are retried promptly.
There was a problem hiding this comment.
Fixed: `_last_schema_collection_time` is now advanced only on the success path, so a transient failure no longer suppresses retries for the full interval — the next check run retries promptly. Added unit tests covering both the failure (no advance) and success (advance + gate) cases. Done in 3a4f66a.
iliakur
left a comment
There was a problem hiding this comment.
Thanks for the large feature push — I left inline comments for three behavior issues that need resolution.
Also, from the abstraction review: HanaSchemaCollector currently carries query-policy composition, row grouping, and custom flush logic in one place, and the large SQL template increases navigation cost during maintenance. Please either add focused maintainer docs that explain the control flow and policy decisions end-to-end, or restructure this collector to reduce the amount of cross-method/context jumping required to reason about it.
- Add max_views config option (SQL LIMIT on the views CTE) so view collection is bounded like tables/columns. - Update _maybe_collect_schemas to advance the schedule only on success so transient failures are retried promptly instead of being suppressed for a full interval. - Classify version-query failures: privilege/access errors reading SYS.M_DATABASE now report the privilege/access diagnostic instead of a misleading "version unsupported" result. - Drop the hostname fallback in _get_databases; skip collection and warn when the current database can't be determined to avoid mislabeled data. - Extract HanaSchemaQueryBuilder to separate SQL/query-policy concerns from the collector's streaming and flush logic. - Document why payloads flush by accumulated column count, referencing the schema-collection memory benchmark. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SYS views on single-tenant HANA Express lack the DATABASE_NAME column, so inject a constant SYSTEMDB value and drop the GROUP BY for SYS-schema queries, and use FILE_SIZE instead of TOTAL_SIZE for global disk usage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document the end-to-end control flow, the query-builder/collector responsibility split, and the collection-policy decisions (SQL vs client-side caps, system-schema exclusion, optional stats join) in a module docstring, with a pointer to the memory benchmark. Addresses the reviewer's request to reduce cross-method context jumping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ries" This reverts commit fd86284.
|
@iliakur on the abstraction/maintainability point: addressed both ways you suggested.
Restructure in 3a4f66a, docstring in 73481a5. The individual inline threads are answered above. |
|
|
||
| @property | ||
| def database_identifier(self): | ||
| return '{}:{}'.format(self._server, self._port) |
There was a problem hiding this comment.
I would like to reimplement this into templates with resolved_hostname as done for postgres, but I prefer preparing a separate PR for this.
There was a problem hiding this comment.
Drive by side note, I'm incrementally working on lifting much of this duplicated DBM derived logic into DatabaseCheck class within datadog_checks_base
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
pawel.leszczynski@datadoghq.com unqueued this merge request |
|
/merge -c |
|
View all feedbacks in Devflow UI.
|
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
devflow unqueued this merge request: It did not become mergeable within the expected time |
janine-c
left a comment
There was a problem hiding this comment.
Looks great! These are just some really minor writing suggestions. Feel free to ping me directly if you need additional approvals 🙂
| - name: include_schemas | ||
| description: | | ||
| A list of schema names to include. Any schema whose name is in | ||
| this list will be included. If empty, all schemas (other than |
There was a problem hiding this comment.
| this list will be included. If empty, all schemas (other than | |
| this list are included. If empty, all schemas (other than |
We try to avoid using the future tense in docs, because then readers can be like "but when?", when present tense can kind of subtly avoid that.
| - name: exclude_schemas | ||
| description: | | ||
| A list of schema names to exclude. Any schema whose name is in | ||
| this list will be excluded. SAP HANA system schemas are always |
There was a problem hiding this comment.
| this list will be excluded. SAP HANA system schemas are always | |
| this list is excluded. SAP HANA system schemas are always |
| # 2. Populate the database and run both modes. | ||
| python benchmark.py | ||
|
|
||
| # Re-run measurements without recreating the schema: |
There was a problem hiding this comment.
| # Re-run measurements without recreating the schema: | |
| # Re-run measurements without recreating the schema. |
For consistency 🤓
| **Column-based flush threshold (1.5x RSS reduction with limits)** | ||
| The base class `payload_chunk_size` counts tables, which is a poor proxy for memory when | ||
| tables are wide. `HanaSchemaCollector` overrides `maybe_flush` to flush after every | ||
| `PAYLOAD_COLUMN_CHUNK_SIZE` (50,000) columns instead. For 1000-column tables this flushes |
There was a problem hiding this comment.
| `PAYLOAD_COLUMN_CHUNK_SIZE` (50,000) columns instead. For 1000-column tables this flushes | |
| `PAYLOAD_COLUMN_CHUNK_SIZE` (50,000) columns instead. For 1000-column tables, this flushes |
| ## Expected outcome | ||
|
|
||
| The limited run (max\_tables=300, max\_columns=50) processes 300 tables × 50 columns = | ||
| 15 000 column dicts. The unlimited run processes 1000 tables × 1000 columns = 1 000 000 |
There was a problem hiding this comment.
| 15 000 column dicts. The unlimited run processes 1000 tables × 1000 columns = 1 000 000 | |
| 15,000 column dicts. The unlimited run processes 1000 tables × 1000 columns = 1,000,000 |
For consistency with other large numbers
|
|
||
| ## Notes on memory investigation | ||
|
|
||
| **`setfetchsize` (no effect on memory)** |
There was a problem hiding this comment.
As a formatting best practice, I would suggest converting the bolded lines in this section to H3s, to make this section easier to navigate, and more accessible for users on screen readers.
|
|
||
| #### Schema collection | ||
|
|
||
| The Agent can collect SAP HANA catalog metadata (schemas, tables, views, and columns) for Data Quality features in Data Observability. When the monitoring user has access to `SYS.M_TABLE_STATISTICS`, the Agent also collects row counts and last modification times for tables. Collection is disabled by default. To enable it, ensure that the monitoring user can read the required views (see [Granting privileges](#granting-privileges)) and add the following block to your `sap_hana.d/conf.yaml` file: |
There was a problem hiding this comment.
| The Agent can collect SAP HANA catalog metadata (schemas, tables, views, and columns) for Data Quality features in Data Observability. When the monitoring user has access to `SYS.M_TABLE_STATISTICS`, the Agent also collects row counts and last modification times for tables. Collection is disabled by default. To enable it, ensure that the monitoring user can read the required views (see [Granting privileges](#granting-privileges)) and add the following block to your `sap_hana.d/conf.yaml` file: | |
| The Agent can collect SAP HANA catalog metadata (schemas, tables, views, and columns) for Data Quality features in Data Observability. When the monitoring user has access to `SYS.M_TABLE_STATISTICS`, the Agent also collects row counts and last modification times for tables. Collection is disabled by default. To enable schema collection, ensure that the monitoring user can read the required views (see [Granting privileges](#granting-privileges)) and add the following block to your `sap_hana.d/conf.yaml` file: |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
janine-c
left a comment
There was a problem hiding this comment.
So sorry! Made a typo here
| - name: include_schemas | ||
| description: | | ||
| A list of schema names to include. Any schema whose name is in | ||
| this list are included. If empty, all schemas (other than |
There was a problem hiding this comment.
| this list are included. If empty, all schemas (other than | |
| this list is included. If empty, all schemas (other than |
There was a problem hiding this comment.
@janine-c are you sure? I thought it refers to schemas - plural.
There was a problem hiding this comment.
It refers to "any schema," and "name" is singular, so I think "is" makes more sense. We could say "any schemas whose names are in this list are included" as well, if you prefer the plural?
There was a problem hiding this comment.
cool, sending an agent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
devflow unqueued this merge request: It did not become mergeable within the expected time |
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
pawel.leszczynski@datadoghq.com unqueued this merge request |
|
/merge -c |
|
View all feedbacks in Devflow UI.
|
Validation ReportAll 21 validations passed. Show details
|
What does this PR do?
Adds Database Monitoring schema collection and startup diagnostics to the SAP HANA integration.
schemas.py): aHanaSchemaCollectorbuilt on the shareddatadog_checks.base.utils.db.schemas.SchemaCollectorbase class (same as postgres). QueriesSYS.M_TABLES,SYS.VIEWS,SYS.TABLE_COLUMNS, andSYS.VIEW_COLUMNSand emitskind=saphana_databasesmetadata payloads (schemas → tables/views → columns). System schemas are skipped;include_schemas/exclude_schemas/max_tables/max_columnslimits are honored.row_count(fromSYS.M_TABLES.RECORD_COUNT) andlast_updated_on(fromSYS.M_TABLE_STATISTICS.LAST_MODIFY_TIME) are included per table. The collector probes forSYS.M_TABLE_STATISTICSaccess at first run and omits the LEFT JOIN silently when the monitoring user lacks the privilege.SYS.VIEWSis queried separately (not present inSYS.M_TABLES) with columns sourced fromSYS.VIEW_COLUMNS(SYS.TABLE_COLUMNSdoes not cover views in HANA).ROW_NUMBER() OVER (PARTITION BY schema, table)CTE so the server only sends capped rows.HanaSchemaCollectoroverridesmaybe_flushto flush after every 50,000 columns rather than the base class default of 10,000 tables. For wide tables this keeps_queued_rowsbounded (93.7 MiB peak RSS unlimited vs 61.5 MiB limited on a 1000×1000 schema).max_tables=2000,max_columns=500.benchmarks/schema_collection_memory/(1000×1000 schema, limited vs unlimited modes, isolated subprocesses for clean RSS measurement).diagnose.py): checks connectivity, minimum supported version (2.x), and per-view catalog access (SYS.SCHEMAS,SYS.M_TABLES,SYS.VIEWS,SYS.TABLE_COLUMNS,SYS.VIEW_COLUMNS), distinguishing privilege errors from generic failures with actionable remediation.collect_schemasoption (enabled,collection_interval,max_tables,max_columns,include_schemas,exclude_schemas) inspec.yaml. Markedhidden: trueand omitted fromconf.yaml.exampleuntil backend quality monitors are ready. Disabled by default.sap_hana.py): time-gated_maybe_collect_schemas()invoked fromcheck().README.md): grants forSYS.SCHEMAS,SYS.M_TABLES,SYS.VIEWS,SYS.TABLE_COLUMNS,SYS.VIEW_COLUMNS, andSYS.M_TABLE_STATISTICS(optional), plus a "Schema collection" configuration section.Motivation
Bring SAP HANA in line with other Database Monitoring integrations (postgres, mysql, sqlserver, clickhouse) by surfacing catalog metadata for Data Quality features in Data Observability, including live row counts, last modification times, and view definitions.
Review checklist (to be filled by reviewers)
qa/requiredif this PR needs QA validation, orqa/skip-qaif it does not. Exactly one of the two is required.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged