[pull] master from DataDog:master#611
Merged
Merged
Conversation
* Add SAP HANA schema collection and diagnostics Collect SAP HANA catalog metadata (schemas, tables, columns) for Database Monitoring's Schema Explorer, mirroring the postgres implementation on the shared SchemaCollector base class. Add startup diagnostics for connection, version, and catalog-view access. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add changelog entry Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix license header year on new files Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * sap_hana: handle missing DESCRIPTION column in SYS.M_DATABASE HANA Express does not include the DESCRIPTION column in SYS.M_DATABASE. Fetch DATABASE_NAME and DESCRIPTION in separate queries so that the absence of DESCRIPTION (silently ignored) does not prevent the database name from being resolved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: align kind and dbms with dbm-metadata-processor expectations Use 'saphana_databases' as the schema payload kind and 'saphana' as the dbms identifier, matching KindSapHanaDatabases and the SapHana DBMS constant defined in the dd-go dbm-metadata-processor PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: update tests to expect saphana kind and dbms values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: fix schema collector privilege filter and column query efficiency Remove HAS_PRIVILEGES filter from schema discovery so catalog-view grants control visibility, consistent with the Postgres schema collector. Apply max_tables trimming before fetching columns to avoid loading column data for tables that will be discarded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: push column filter to SQL WHERE clause instead of client-side Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: address PR review wording suggestions in README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: replace "Database Monitoring's Schema Explorer" with "Data Quality features in Data Observability" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: enforce schema collection limits in SQL and stream tables Replace the three fetchall() catalog queries and in-memory filtering with a single streamed JOIN. The limited_tables CTE pushes the schema filters and the max_tables LIMIT into the database, so the agent never pulls more than max_tables tables' rows into memory regardless of total schema size. Columns are joined and ordered so each table is assembled one at a time as the cursor streams, instead of materializing every table and column up front. Verified against a live HANA Express instance that the CTE LIMIT caps tables (not joined rows) and the LIKE ... ESCAPE system-schema filter parses correctly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * sap_hana: add HanaSchemaCollector unit tests for column mapping and _get_databases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: add schema collection memory benchmark Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: fix benchmark setup and add 50x50 baseline results - Use host port 39019 to avoid collision with test container on 39017 - Grant SELECT on SYS.M_DATABASE, SYS.TABLES, SYS.SCHEMAS, SYS.TABLE_COLUMNS (CATALOG READ alone is insufficient for these views) - Fix global declaration order bug in setup_database.py - Add benchmark_results_50x50.txt as baseline (trivial data, both modes identical) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: add 1000x1000 benchmark results (18.4x RSS reduction with limits) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: set hdbcli cursor fetch size to 10k to reduce C-layer buffering Without a fetch size, hdbcli buffers the entire query result set in its C layer before Python iterates it, contributing ~500 MiB to RSS on a 1000x1000 schema. setfetchsize(10_000) limits the client-side buffer to 10k rows per round-trip. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "sap_hana: set hdbcli cursor fetch size to 10k to reduce C-layer buffering" This reverts commit 6fe80a4. * sap_hana: push max_columns limit into SQL via ROW_NUMBER() CTE Previously the query returned all columns for every table and Python discarded those beyond max_columns. On a 1000-table schema with max_columns=50 this sent 285k unnecessary rows from the server (300 tables x 950 discarded columns). A new limited_columns CTE ranks columns per table with ROW_NUMBER() OVER (PARTITION BY schema, table ORDER BY position) and the LEFT JOIN filters on rn <= max_columns, so the server only sends the first max_columns columns per table. The client-side check in _get_next() stays as a safety net. Benchmark result on 1000x1000 schema: limited mode duration 8.1s -> 1.6s (5x). Peak RSS is unchanged — memory is bounded by the Python-side column dict accumulation, not the cursor row count. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: add license headers to benchmark scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: flush after 50k columns instead of 10k tables The base class payload_chunk_size counts tables, which is a poor memory proxy for wide tables. HanaSchemaCollector now overrides maybe_flush to trigger after PAYLOAD_COLUMN_CHUNK_SIZE (50,000) columns instead, keeping _queued_rows bounded regardless of how wide the tables are. On the 1000x1000 benchmark schema, unlimited peak RSS drops from 1,038 MiB to 93.7 MiB (11x reduction). The limited mode (300 tables x 50 cols = 15k columns) is unaffected since it never reaches the threshold. The column count is tracked in _map_row rather than _get_next because the base class loop calls _get_next after appending the current table; counting there would cause the freshly-fetched table's columns to be lost on flush. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: raise default max_tables to 2000 and max_columns to 500 Previous defaults (300 tables, 50 columns) were conservative placeholders. With the column-based flush threshold in place, peak memory is now bounded by columns processed at once (50k) rather than total tables queued, so higher defaults are safe without a proportional memory cost increase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: hide collect_schemas config block from user-facing docs The feature is not yet backed by production quality monitors. Mark the entire collect_schemas section as hidden: true so it is omitted from conf.yaml.example until the backend is ready. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: switch schema collector to SYS.M_TABLES, SYS.VIEWS, and SYS.VIEW_COLUMNS Replace SYS.TABLES with SYS.M_TABLES to gain live RECORD_COUNT (row_count in the payload). Add SYS.VIEWS so view objects are collected alongside tables, with columns sourced from SYS.VIEW_COLUMNS (TABLE_COLUMNS does not cover views in HANA). Conditionally LEFT JOIN SYS.M_TABLE_STATISTICS at runtime for last_updated_on: the collector probes for access on first run and omits the join when the monitoring user lacks the privilege. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: update spec.yaml descriptions to reference new catalog views Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: update benchmark results with SYS.M_TABLES + SYS.VIEWS query Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: update benchmark README with current query results Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: tidy README grant order and bump example limits to defaults Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: address schema collector review feedback - Add max_views config option (SQL LIMIT on the views CTE) so view collection is bounded like tables/columns. - Update _maybe_collect_schemas to advance the schedule only on success so transient failures are retried promptly instead of being suppressed for a full interval. - Classify version-query failures: privilege/access errors reading SYS.M_DATABASE now report the privilege/access diagnostic instead of a misleading "version unsupported" result. - Drop the hostname fallback in _get_databases; skip collection and warn when the current database can't be determined to avoid mislabeled data. - Extract HanaSchemaQueryBuilder to separate SQL/query-policy concerns from the collector's streaming and flush logic. - Document why payloads flush by accumulated column count, referencing the schema-collection memory benchmark. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * sap_hana: handle single-tenant HANA Express in monitoring queries SYS views on single-tenant HANA Express lack the DATABASE_NAME column, so inject a constant SYSTEMDB value and drop the GROUP BY for SYS-schema queries, and use FILE_SIZE instead of TOTAL_SIZE for global disk usage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * sap_hana: add maintainer docstring mapping schema collector flow Document the end-to-end control flow, the query-builder/collector responsibility split, and the collection-policy decisions (SQL vs client-side caps, system-schema exclusion, optional stats join) in a module docstring, with a pointer to the memory benchmark. Addresses the reviewer's request to reduce cross-method context jumping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Revert "sap_hana: handle single-tenant HANA Express in monitoring queries" This reverts commit fd86284. * sap_hana: address janine-c doc review feedback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sap_hana: fix subject-verb agreement in include_schemas description Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )