Fix pydantic rootmodel extra config error by Code-Eat-Rabbit · Pull Request #6 · Code-Eat-Rabbit/OpenMetadata

Code-Eat-Rabbit · 2025-10-11T07:40:28Z

Describe your changes:

Fixes Pydantic RootModel extra config error during schema generation

I worked on fixing a PydanticUserError where RootModel classes generated from JSON Schemas with additionalProperties: false were incorrectly assigned model_config['extra']. Pydantic 2.x RootModel does not support this configuration, leading to runtime errors during ingestion.

The fix involves modifying scripts/datamodel_generation.py to include a post-processing step. This step scans all generated Pydantic schema files, identifies RootModel classes, and removes any extra='forbid' configuration from their model_config. It intelligently handles cases where extra is the sole configuration or part of a larger ConfigDict, and cleans up any resulting empty ConfigDict or extra commas.

I tested these changes by:

Creating a temporary Python script to simulate and verify the post-processing logic for various RootModel model_config scenarios (only extra, extra with other configs, and no extra).
Confirming that the modified datamodel_generation.py script correctly processes these cases.
Running the original ingestion command that caused the error, which now completes successfully.

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

I have added a test that covers the exact scenario we are fixing. For complex issues, comment the issue number in the test for future reference.

Pydantic 2.x RootModel does not support model_config['extra'] setting. This change adds post-processing logic to the datamodel generation script to automatically remove 'extra' config from any RootModel classes while preserving it for regular BaseModel classes. Fixes the error: PydanticUserError: RootModel does not support setting model_config['extra'] Changes: - Add glob import for recursive file processing - Add post-generation hook to scan all generated Python files - Detect RootModel classes and remove extra='forbid' from their model_config - Preserve other config options (e.g., frozen=True) when present - Keep extra='forbid' for normal BaseModel classes

cursor · 2025-10-11T07:40:30Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

cursor

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on November 27

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2025-10-12T02:58:30Z

+                        # If the model_config only has extra, skip it
+                        if "ConfigDict(extra='forbid')" in line or "ConfigDict(extra=\"forbid\")" in line:
+                            skip_next_model_config = True
+                            continue


Bug: Incomplete Model Config Handling

The logic to remove extra='forbid' from RootModel's model_config is incomplete. It fails to identify ConfigDict declarations spanning multiple lines or with varied spacing, as the initial checks are too restrictive. The unused peek_ahead and skip_next_model_config variables further indicate incomplete handling of these cases.

cursor · 2025-10-12T02:58:30Z

+
+                # Reset flag when we hit the next class or significant dedent
+                if in_rootmodel_class and line and not line[0].isspace() and "class " in line:
+                    in_rootmodel_class = False


Bug: RootModel Flag Reset Logic Incomplete

The in_rootmodel_class flag's reset logic is incomplete. It only clears when another class definition is encountered, but should reset for any top-level code. This can lead to incorrect model_config modifications in non-RootModel classes, potentially removing extra='forbid' where it's needed.

…tency (open-metadata#28117) * fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency RdfIndexApp ran daily and never reconciled removed relationships, so triples grew unboundedly across runs. When Fuseki crash-looped on the resulting disk pressure, every entity-write hook blocked synchronously on the unreachable server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating the bounded AsyncService pool and pushing login to ~45s. Storage-side fixes (stop growth): - Drop the extractRelationshipTriples "preserve forward" path in RdfRepository.createOrUpdate; the translator is the source of truth and the surrounding orchestration already rewrites the current relationship set. This also removes a wasted CONSTRUCT round-trip per entity write. - bulkStoreRelationships now does per-source-entity DELETE WHERE with a predicate-exclusion FILTER for lineage edges, so relationships that no longer exist actually leave the store. - Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's initializeJob (the method existed but had no callers). - Flip recreateIndex default to true and move the cron to Saturday midnight ("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the ontology graph empty before indexing starts. - Include a 2.0.1 post-data migration that updates existing installed_apps rows; the app loader is insert-only on upgrade. Connectivity / concurrency fixes (isolate API latency from Fuseki health): - Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail on ConnectException / ClosedChannelException / HttpConnectTimeoutException instead of retrying. Introduce a 5-failure/30s circuit breaker. - Route all RdfUpdater mutators through AsyncService.execute with a bounded pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a dead Fuseki can no longer block request threads or starve the AsyncService pool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): address PR review — preserve relationships, scope DELETEs, surface ontology failures PR open-metadata#28117 review feedback. Addresses 13 findings across gitar-bot and Copilot: Storage correctness: - JenaFusekiStorage.storeEntity now keeps URI-valued triples (relationships) and only refreshes literal-valued triples. A metadata-only PATCH would otherwise wipe every inter-entity edge until the next weekly recreate-index, and async ordering between updateEntity and addRelationship could leave the graph missing edges (Copilot #1, #2). - RdfRepository.removeRelationship wraps the DELETE in the knowledge named graph and uses getRelationshipPredicate so the predicate URI matches what addRelationship actually wrote (e.g. UPSTREAM → prov:wasDerivedFrom). The previous bare DELETE in the default graph was a silent no-op (Copilot #3). - RdfBatchProcessor now calls a new RdfRepository.clearOutgoingEntityRelationships for every entity in the batch, not just those with current edges. An entity whose last outgoing relationship was removed in MySQL contributes zero RelationshipData entries, so bulkStoreRelationships' per-source DELETE never fired for it (Copilot #4). - bulkStoreRelationships no longer swallows non-connect DELETE errors — DELETE WHERE on a source with no edges is a no-op, so exceptions there are real failures (malformed SPARQL, auth, server errors) and should surface (Copilot #5). Visibility: - reloadOntologies() now checks areOntologiesLoaded() after load and throws if still empty. OntologyLoader.loadOntologies catches internally, so the old reloadOntologies always appeared to succeed (Copilot #6). - clearAllGlossaryTermRelations rethrows on failure instead of silently logging — the indexer's caller can now react to cleanup failures (Copilot #10). - clearAllGlossaryTermRelations pulls custom predicate URIs from GlossaryTermRelationSettings and includes them in the DELETE FILTER. The hardcoded list missed any custom predicates an admin configured (Copilot #7). Quality: - Set / LinkedHashSet imported instead of using java.util.* fully qualified in JenaFusekiStorage and RdfBatchProcessor (gitar-bot #2). - RdfIndexAppTest uses InOrder to assert clearAll → reloadOntologies ordering — a plain verify would have accepted a future change that reordered the calls (Copilot #9). - Documented the residual gap that HttpClient.connectTimeout only bounds TCP connect, not request bodies; circuit breaker + bounded pendingWrites contain the blast radius (Copilot #8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(rdf): expect per-source clear on batches whose relationships are all filtered The two EventSubscription-skip tests used verifyNoInteractions on the RDF repository mock, which was valid before because filtered batches never touched RDF. The new per-source reconciliation clear in RdfBatchProcessor.processBatchRelationships now runs for every batch entity regardless of whether its relationships survive filtering — that's deliberate, since stale RDF state for those source entities still needs to be reconciled even when their current MySQL edges all point to excluded entity types. Switch the assertions to verify clearOutgoingEntityRelationships is the sole interaction (no bulkAdd, no addRelationship). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): address remaining PR review nits Three findings from the second gitar-bot review pass: - Replace the fully qualified `org.openmetadata.schema.configuration.GlossaryTermRelationSettings` / `SettingsType` / `SettingsCache` references in clearAllGlossaryTermRelations with imports, matching the project's existing convention. Other pre-existing FQN usages in the same file are left alone (not part of this PR's scope). - Make expandPredicateCurie throw IllegalArgumentException on null/empty input instead of silently defaulting to `om:relatedTo`. The current caller already null-guards so the path is unreachable today, but a future caller could otherwise silently miss-clean a misconfigured predicate. - Document why the lineage predicate URIs in the reconciliation DELETE filter (UPSTREAM / hasLineageDetails) are literal-hardcoded rather than baseUri-derived: they match what addLineageWithDetails actually writes (also hardcoded at RdfRepository.java:423,435). Switching the filter to be baseUri-derived would stop matching the stored lineage triples on non-default baseUri deployments and would incorrectly delete them. Comment added in both clearOutgoingEntityRelationships and bulkStoreRelationships so the next reader doesn't get nudged into "fixing" it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): surface cleanup failures, sync fallback predicates, time-bound reads Addresses the three unresolved Copilot findings from review 4295208187: - Drop the try/catch around clearAllGlossaryTermRelations in initializeJob. clearAllGlossaryTermRelations rethrows specifically so the indexer can fail loudly; wrapping it again let an unreconciled graph slip past as a "successful" run. The outer execute() handler will now mark the run FAILED. - Sync DEFAULT_GLOSSARY_TERM_RELATION_PREDICATES with what SettingsCache actually bootstraps (SettingsCache.java:355-486): adds skos:exactMatch (the real default for `synonym`), om:antonym, om:partOf, om:hasPart, rdfs:seeAlso. Keeps legacy om:* URIs from the stale getGlossaryTermRelationPredicateUri switch so a cleanup run still scrubs pre-SettingsCache data. - Apply READ_TIMEOUT_MS (10s) via QueryExecution.setTimeout on every read path (executeSparqlQuery for SELECT/CONSTRUCT/ASK/DESCRIBE, getEntity, getAllGraphs, getTripleCount, testConnection, the ontology presence check). A Fuseki that accepts the TCP connection but stalls mid-query no longer hangs reads indefinitely. UPDATE-side calls still rely on the connect timeout + circuit breaker + bounded pendingWrites since Jena's RDFConnection.update API doesn't expose a per-request timeout cleanly; comment near the constant notes the gap and a viable follow-up via UpdateExecHTTPBuilder.timeout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): qualify EntityRelationship in test to fix compile RdfIndexAppTest references EntityRelationship.class in two verify() calls that I added in the previous commit, but the class was never imported into the test file. CI's openmetadata-service test compile fails with "cannot find symbol class EntityRelationship", which cascades into 11 dependent checks (build x2, openmetadata-service-unit-tests, three Java integration test workflows, two Python integration test shards that build OM as a setup step, Test Report aggregate, maven-sonarcloud-ci, and the unit-test status gate). Use the fully qualified org.openmetadata.schema.type.EntityRelationship to match how every other reference in this file already spells it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): drop QueryExecution.setTimeout — removed in Jena 5 used by IT classpath GlossaryOntologyExportIT was failing on RdfUpdater.initialize with NoSuchMethodError: 'void org.apache.jena.query.QueryExecution.setTimeout(long, java.util.concurrent.TimeUnit)'. openmetadata-service builds against Jena 4.10 (apache-jena-libs), but openmetadata-integration-tests directly pulls in jena-core/jena-arq 5.0.0, and Jena 5 removed the setTimeout overloads from the QueryExecution interface. Compile passes, integration test JVM links the 5.x class and bombs at the first read path (loadOntology's ASK check). Strip the nine setTimeout calls and the READ_TIMEOUT_MS constant. A clean read-side timeout that works on both Jena 4 and 5 needs to be plumbed via QueryExecutionHTTPBuilder.timeout / UpdateExecHTTPBuilder.timeout instead of RDFConnection — bigger change than this PR should carry. The comment near CONNECT_TIMEOUT now records that history so the next reader knows why we don't simply re-add setTimeout. Protection against a stalled-but-accepting Fuseki still relies on the 5-failure circuit breaker + bounded pendingWrites gate in RdfUpdater. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): align ontology-loaded check, predicate URIs, and CURIE fallback Three real bugs flagged by Copilot's later review passes: - areOntologiesLoaded() looked for `"boolean" : true` (space before colon) but JenaFusekiStorage formats ASK results without that space, so the check never matched and reloadOntologies() always threw. recreateIndex=true (now the default) ran into this on the very first scheduled run. Normalise whitespace before checking. - bulkAddRelationships wrote `om:<lowercase relationshipType>` directly, while removeRelationship uses getRelationshipPredicate which maps a handful of types to prov:* (UPSTREAM → prov:wasDerivedFrom, USES → prov:used, etc.). Triples written by the indexer were therefore unreachable by the live remove hook. Pre-compute predicateUri via getRelationshipPredicate in bulkAddRelationships and pass it through a new field on RelationshipData so JenaFusekiStorage uses the same URI both paths agree on. The legacy RelationshipData(5-arg) ctor still works for callers that don't have a predicate handy; bulkStoreRelationships falls back to the old shape there. - expandPredicateCurie returned bare strings like `customRel` unchanged, but createPropertyFromUri's default branch writes `<baseUri>ontology/customRel`. Custom relation predicates expressed as local names would never match the cleanup FILTER. Mirror createPropertyFromUri: full URIs pass through, bare local names get the OM-ontology prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): schema default + migration force entities=[all] for safe full reindex - rdfIndexingAppConfig.json: flip recreateIndex.default from false to true so any UI form / config generation path that surfaces the schema default agrees with the install JSON files and the new full-rebuild semantics. - 2.0.1 migration (MySQL + Postgres): in addition to flipping recreateIndex=true and the weekly Saturday cron, also rewrite appConfiguration.entities to ["all"]. Pre-upgrade an operator could have narrowed RDF indexing to a subset of entity types; the new recreateIndex=true semantics issues CLEAR ALL before indexing, which would otherwise wipe triples for excluded entity types and leave the graph permanently missing them. Forcing entities back to ["all"] ensures the post-CLEAR-ALL run repopulates the graph fully. Operators can re-narrow after the migration if they need partial indexing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): scope storeEntity DELETE to translator-managed predicates Replace the literal-only FILTER(!isIRI(?o)) in JenaFusekiStorage.storeEntity with a predicate-scoped DELETE so translator-emitted URI triples (tags, glossary terms, owner, domain, tier, data products, structured sub-resources) are refreshed from the new model on every entity write, while hook-managed predicates (om:UPSTREAM, om:hasLineageDetails, om:owns / om:contains / ...) stay intact. Previously, with !isIRI(?o), every URI-valued triple survived storeEntity forever — when a tag was removed or an owner changed, the old URI coexisted with the new one because no hook ever cleans those up (tags live in tag_usage, not entity_relationship; owners' translator-side predicate om:hasOwner is not what the OWNS hook writes). The DELETE set is the union of: - RdfPropertyMapper.TRANSLATOR_MANAGED_DIRECT_PREDICATES, a static list of predicates that may shrink to empty between writes (so the current model walk wouldn't see them) — rdf:type, om:hasOwner, prov:wasAttributedTo, om:hasTag, om:hasGlossaryTerm, om:hasTier, om:belongsToDomain, om:hasDataProduct, dct:source, om:sourceUrl, plus the structured-resource attachment predicates (om:hasLifeCycle / hasCertification / hasExtension / hasCustomProperty). - the predicates the current model actually emits for the entity subject, covering JSON-LD context-driven predicates that aren't in the static list. Added two coverage tests on RdfPropertyMapperTest: the static set contains the documented core predicates, and never contains lineage-hook predicates (om:UPSTREAM, prov:wasDerivedFrom, om:hasLineageDetails) — that overlap would let storeEntity wipe lineage edges on every entity update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): scope reconciliation DELETE to relationship-hook predicates only Both clearOutgoingEntityRelationships (in RdfRepository) and the per-source DELETE inside JenaFusekiStorage.bulkStoreRelationships used to clear ANY outgoing edge whose object was a baseUri/entity/ URI (with only the three lineage predicates excluded). That swept up translator-managed URI triples (om:hasTag, om:hasGlossaryTerm, om:hasOwner, om:belongsToDomain, …) which bulkAddRelationships does not re-emit, so reconciliation runs were permanently destroying tag/owner/domain links. Switch the filter to opt-in: only delete triples whose predicate is in RELATIONSHIP_HOOK_PREDICATES, derived from the Relationship enum via the existing getRelationshipPredicate mapping. The set excludes the lineage predicates by skipping the UPSTREAM enum value (managed by addLineageWithDetails). Translator-managed predicates aren't relationship types so they're naturally not in the set; the new RdfPredicatePartitionTest enforces the partition. Refactored getRelationshipPredicate into a static getRelationshipPredicateUri so it can be reused at class-init time to build the predicate set without an instance. Added a small buildPredicateInList helper exposed at package level for JenaFusekiStorage to reuse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): scope bulk reconciliation to batch entities, not all relationship sources bulkStoreRelationships used to compute its per-source DELETE set from the relationships list, so any source URI mentioned by any row in the batch was reconciled. RdfBatchProcessor passes BOTH outgoing relationships (sources inside the batch) and incoming UPSTREAM lineage (sources outside the batch where this batch's entity is the target). The outside-batch sources had their OTHER outgoing edges wiped, even though the indexer never planned to re-index them. Add a 2-arg overload to RdfStorageInterface.bulkStoreRelationships that takes an explicit Set<String> sourcesToReconcile. The default 1-arg method keeps the legacy "derive from relationships" behavior for any plugin caller that hasn't migrated. RdfRepository.bulkAddRelationships gains a matching overload taking Set<EntitySourceRef>; RdfBatchProcessor passes its batchSources (the entities IT is actually indexing in this pass). JenaFusekiStorage.bulkStoreRelationships now iterates sourcesToReconcile for the per-source DELETE instead of computing distinctSources from relationships. The new buildEntityUri helper on the interface lets callers (or the default delegate) build consistent source URIs. QLeverStorage stubs the new overload (still UnsupportedOperationException). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): time-bound HTTP request bodies via CompletableFuture wrapper Wrap every blocking RDFConnection call in the hot read/write paths (storeEntity DELETE+LOAD, storeRelationship, bulkStoreRelationships, getEntity, deleteEntity, executeSparqlQuery, executeSparqlUpdate) with a CompletableFuture-based 10s request timeout. When Fuseki accepts the TCP connection and then stalls on the response, the caller thread now frees after 10s instead of waiting until the OS gives up on the socket (~60s). We chose CompletableFuture over Jena's QueryExecution.setTimeout because that overload was removed in Jena 5 (broke integration tests already once in this PR), and over Jena's QueryExecutionHTTPBuilder / UpdateExecHTTPBuilder because their API surface differs between Jena 4 and Jena 5 and our two classpaths use different versions. The CompletableFuture wrapper is Jena- API-agnostic. On timeout the underlying HTTP request still leaks its (virtual) thread until OS-level TCP give-up; that's bounded by the existing circuit breaker (after 5 timeouts the breaker opens for 30s, short-circuiting subsequent traffic). Lower-traffic paths (loadTurtleFile, clearGraph, getAllGraphs, getTripleCount, loadOntology, testConnection) keep using the direct connection.update / connection.query / connection.load calls — they're protected by the circuit breaker and the connect timeout, and adding wrappers there is churn without proportional benefit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rdf): document RdfUpdater async-ordering trade-off in submitAsync Add a comment block in RdfUpdater.submitAsync explaining why we accept the loss of per-entity ordering when submitting through AsyncService: - EntityUpdater diff-applies changes per request, so add-then-remove of the same edge within one API call nets to no-op (no hooks fire). - Cross-request races reconcile at the next weekly recreate-index, which rebuilds from MySQL. - The alternative (per-entity striped lock) costs memory and adds latency for the no-contention common case. - Pointers for the future maintainer if an observed-in-production race emerges: gate via ConcurrentHashMap<UUID, Semaphore>. No behavior change. The two open Copilot threads on this trade-off (M6CQYup, M6CYbM2) stay open so a future PR can pick them up if needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): atomic clear+insert, broader fallback predicate set, close temp models Three follow-up findings from the latest Copilot pass: - Atomicity (3249716506): clearOutgoingEntityRelationships + bulkAddRelationships ran as two separate SPARQL updates. If bulkAddRelationships failed after the clear succeeded, the batch entities had their relationships wiped without the replacement edges in place — they stayed gone until the next weekly recreate-index. Combine the per-source DELETE and the INSERT DATA into a single SPARQL update inside JenaFusekiStorage.bulkStoreRelationships and drop the now-redundant separate clear call from RdfBatchProcessor. Either the whole reconciliation commits or none of it does. Also let bulkStoreRelationships handle the zero-edge case (relationships empty, sourcesToReconcile non-empty) so RdfBatchProcessor doesn't need a separate clear for entities whose relationships were all removed in MySQL. - Fallback predicate set (3249716532): when SettingsCache returns null, getGlossaryTermRelationPredicate falls back to literal `https://open-metadata.org/ontology/<relationType>` — so `broader` / `narrower` / `exactMatch` get written as om:broader/om:narrower/om:exactMatch, not skos:* equivalents. Without those URIs in DEFAULT_GLOSSARY_TERM_RELATION_ PREDICATES, a cleanup run during a transient settings-cache outage would miss them. Added the three om:* fallback variants alongside the existing skos:*/rdfs:* bootstrap defaults. - Temp Model leaks (3249319886): bulkAddRelationships and removeRelationship each create an ephemeral Jena Model just to mint property URIs. Wrapped both in try/finally close() so the in-memory graphs are released right after use. Jena 4's Model has a close() method but doesn't implement java.lang.AutoCloseable so try-with-resources isn't possible there. Copilot's "still deleting only non-IRI" finding (3249716480) is a stale- snapshot false positive — JenaFusekiStorage.storeEntity has used predicate- scoped DELETE via TRANSLATOR_MANAGED_DIRECT_PREDICATES since 22d5825. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): make buildPredicateInList public so JenaFusekiStorage can use it JenaFusekiStorage (org.openmetadata.service.rdf.storage) lives in a different package than RdfRepository (org.openmetadata.service.rdf), so the package-private buildPredicateInList helper introduced in 857c09 couldn't be called from JenaFusekiStorage.bulkStoreRelationships — CI was failing with: [ERROR] JenaFusekiStorage.java:[606,51] buildPredicateInList(Set<String>) is not public in RdfRepository; cannot be accessed from outside package Promote it to public alongside RELATIONSHIP_HOOK_PREDICATES (which is the only data this helper renders) so the cross-package call resolves. Local javac across the touched RDF files now reports zero new errors; the only remaining build failures are the pre-existing es.co.elastic.clients shading issues unrelated to this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): normalise sourcesToReconcile to empty-set to prevent NPE in iteration bulkStoreRelationships' early-return guard accepts sourcesToReconcile == null as a valid input, but the subsequent per-source DELETE loop iterates sourcesToReconcile directly — so a caller passing null with a non-empty relationships list would skip the guard and crash at the for-loop. Today no caller hits this path (RdfRepository.bulkAddRelationships always passes non-null, and the 1-arg default interface method derives a set), but the null-check in the guard explicitly encodes null as supported, so the contract should match the iteration. Normalise once after the guard: Set<String> effectiveSources = sourcesToReconcile != null ? sourcesToReconcile : Set.of(); and use effectiveSources for both the loop and the success-log size. Local filtered compile passes cleanly (zero NEW errors from RDF files; remaining errors are the pre-existing es.co.elastic.clients shading mess). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(rdf): update RdfIndexAppTest verifications for the new bulkAddRelationships 2-arg signature Three test failures after the Fix-I / atomic-clear-insert changes: - testProcessBatchRelationshipsStoresResults verified `bulkAddRelationships(captor.capture())` (1-arg) but RdfBatchProcessor now calls the 2-arg `bulkAddRelationships(relationships, batchSources)` — Mockito surfaced this as "different arguments" because the actual call had a Set<EntitySourceRef> tail. Updated the verify to `bulkAddRelationships(captor.capture(), anySet())`. - The two event-subscription skip tests previously verified `clearOutgoingEntityRelationships(anySet())` as the only interaction; that method is no longer called from RdfBatchProcessor (the clear was folded into bulkAddRelationships' atomic SPARQL transaction for safety). Replace with `verify(mockRdfRepository).bulkAddRelationships(eq(List.of()), anySet())` — bulkAdd is still invoked with an empty list to drive the per-source reconciliation for the batch entity, even when the only fetched relationship pointed at an excluded entity type. Filtered local compile + test-compile passes cleanly (no NEW errors from RDF files; only pre-existing es.co.elastic.clients shading errors remain). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): four follow-up findings from Copilot review 4299978111 - collectTranslatorPredicates over-broad (3249798300): RdfRepository.addRelationship passes storeEntity a model loaded from Fuseki PLUS the new relationship, so the dynamic walk was pulling hook-managed predicates (om:owns, etc.) into the DELETE scope. With async writes, two concurrent additions for the same source could each read the old model and each storeEntity wipe the other's relationship. Exclude RELATIONSHIP_HOOK_PREDICATES from the walk result (and defensively from the static-set union too). - ForkJoinPool.commonPool starvation (3249798327): runWithTimeout used CompletableFuture.supplyAsync's default executor, so a Fuseki that stalls would leak workers on the JVM-wide commonPool and starve unrelated CompletableFuture / parallel-stream work. Introduce a dedicated virtual-thread executor (Thread.ofVirtual().name("rdf-storage-timeout-", 0)) and route all timeout wrappers through it — virtual threads are cheap to leak and the circuit breaker bounds the pile-up. - Shrink-to-empty for literal predicates (3249798383): the predicate-scoped DELETE no longer caught stale literals when a literal-valued field (description / displayName / …) was cleared and the new model simply omitted the triple. Chain a "DELETE … FILTER(!isIRI(?o))" pass with the URI-scoped pass so hook-managed URI triples stay intact while stale literals get swept on every write. - UI schema default (3249798439): the UI form schema at utils/ApplicationSchemas/RdfIndexApp.json still declared recreateIndex.default = false. Flipped to true to match the backend openmetadata-spec schema and the install JSON files. (The sibling jsons/applicationSchemas/ is gitignored generated output, no source change needed there.) Local verification before push: spotless:apply, filtered compile + test-compile (zero new errors), and `mvn test -Dtest='RdfIndexAppTest,RdfPropertyMapperTest, RdfPredicatePartitionTest,RdfStorageIdempotencyTest'` — 64 tests, 0 failures. The "buildPredicateInList package-private" finding from the same review (3249798351) is already addressed in 03c5d4f and surfaces here only because Copilot reviewed an earlier commit. The "lineage incremental cleanup" finding (3249798415) is a known architectural trade-off: addLineageWithDetails handles current lineage rows but removed edges have no row to trigger a per-edge delete, and adding UPSTREAM/wasDerivedFrom to RELATIONSHIP_HOOK_PREDICATES would conflict with the inline addLineageWithDetails call that runs BEFORE bulkAddRelationships in RdfBatchProcessor. The weekly recreateIndex=true run (the new default) wipes and rebuilds from MySQL, which reconciles stale lineage; left this thread open as a documented gap rather than reordering processBatchRelationships in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ta#28224) * feat(spec): add ContextMemory + CreateContextMemory JSON schemas * feat(jdbi3): add ContextMemoryDAO * feat: register contextMemory entity type constant * feat(service): add ContextMemory repository, resource, mapper * feat(bootstrap): add context_memory table DDL * test(service): ContextMemory resource CRUD test * fix(context-memory): address review (relationship types, stable FQN, status msg, test name) - storeRelationships: rootMemory -> Relationship.CONTAINS, parentMemory -> Relationship.HAS so the root-ancestor and direct-parent hierarchies are distinguishable. - setFullyQualifiedName: derive from the immutable name only (drop mutable primaryEntity/owner derivation that destabilized nameHash on update). - validateStatusTransition: separate "no transitions defined" from "disallowed transition". - Rename ContextMemoryResourceTest -> ContextMemoryStatusTransitionTest (pure unit test). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(context-memory): add ContextMemoryIT + SDK ContextMemoryService Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(spec): register contextMemory in EntityLink.g4 ENTITY_TYPE grammar EntityLinkGrammarTest.testAllEntityTypesHaveGrammarOrExclusion enumerates every Entity.java constant and requires each to be in the EntityLink grammar or the test's exclusion list. ContextMemory is a normal EntityRepository-backed top-level entity (like learningResource / contextFile), so it belongs in the ENTITY_TYPE rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(context-memory): override owner ITs for creator-as-owner default ContextMemoryMapper.defaultOwners() intentionally assigns the creating user as owner when the create request omits owners. BaseEntityIT's patch_entityUpdateOwner_200 and patch_entityUpdateOwnerFromNull_200 assert "no owner initially" for any supportsOwners entity, so both failed for ContextMemory. Override both in ContextMemoryIT: keep the PATCH-replace-owner contract, change only the precondition to expect the creator as the sole initial owner (asserted by count, not a hardcoded principal). Mapper unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update generated TypeScript types Add the generated ContextMemory TS types (entity/context/contextMemory.ts, api/context/createContextMemory.ts). The schemas were on the branch but their generated types were missing, failing the TypeScript Type Generation check on this fork PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(context-memory): address review (relationship cleanup, owner scope, validations) Copilot review on the ContextMemory entity: - #1 record primaryEntity/relatedEntities/root/parent/source*/machineRepresentation in version history; usageCount/lastUsedAt documented as untracked telemetry - #2 clear stale HAS/RELATED_TO/CONTAINS edges before re-adding in storeRelationships - #4 default creator as owner only on create; PUT without owners no longer silently replaces previously set owners - #5 schema documents that any status is allowed at creation; transitions enforced only on update - #6 setFullyQualifiedName via FullyQualifiedName.build with skip-if-set guard - #7 validate shared principal type is user/team/domain - #8 reject self-reference for parentMemory/rootMemory - #10 inline Entity.CONTEXT_MEMORY, drop redundant constant Regenerate ContextMemory TS types for the schema doc change; add IT coverage for the self-reference and invalid-shared-principal validations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(context-memory): don't blanket-delete relationships (domain data loss) The #2 cleanup via deleteTo(memory, CONTEXT_MEMORY, HAS, null) also matched the framework's domain --HAS--> memory edge (storeDomains runs before storeRelationships in storeRelationshipsInternal, on every create and update), silently dropping domain assignments. storeRelationships is now add-only (addRelationship upserts, so re-running on update is idempotent). Stale-edge cleanup moved to ContextMemoryUpdater using the framework's updateFromRelationship(s) helpers, which delete only the specific changed refs and record the version change. parentMemory now uses Relationship.PARENT_OF (distinct from primaryEntity's HAS and the framework's domain HAS) so the parent edge can be maintained without collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(bootstrap): move context_memory DDL from 2.0.1 to 2.0.0 The context_memory table belongs in the 2.0.0 migration. Relocated the MySQL and Postgres DDL verbatim; the 2.0.1 schemaChanges.sql files are restored to their original task_migration_mapping-only content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(bootstrap): add ENGINE=InnoDB to context_memory MySQL DDL Explicit engine clause, consistent with the task/search-index tables in the same migration and robust to any server default change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(context-memory): preserve sanitized/validated fields; validate relatedEntities Review follow-ups: - ContextMemoryMapper no longer re-sets description/owners/domains/tags/displayName after copy(). copy() sanitizes description (stored-XSS) and validates owners and domains; re-setting the raw request values bypassed both. Only ContextMemory- specific fields are set now. - prepare() now assigns the result of EntityUtil.populateEntityReferences back onto relatedEntities so orphaned/invalid refs are filtered instead of persisted. - ContextMemoryIT Javadoc now references ContextMemoryRepository#setCreatorAsDefaultOwner (the defaultOwners mapper method no longer exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

@JsonProperty

…cs (open-metadata#29201) * feat(ai): add DERIVED_FROM, Metric.provider, ContextMemory.ontologyStats, AISettings schema - Append DERIVED_FROM as last entry in entityRelationship.json enum (ordinal-safe) - Add provider (ProviderType) field to metric.json and createMetric.json - Add ontologyStats definition and property to contextMemory.json (OntologyStats javaType) - Create configuration/aiSettings.json (AISettings, MemoryExtractionSettings, OntologyAgentSettings, PromptConfig, AIPrompts, AIDeletionPolicy) - Register aiSettings in settings.json enum and config_value oneOf - Add default seed openmetadata-service/src/main/resources/json/data/settings/aiSettings.json Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): AISettings handler, cache seed/merge, and system settings REST branch Implements Task 2: AISettingsHandler (validate + merge), SettingsCache seed/merge block for aiSettings.json mirroring searchSettings, and SystemResource PUT branch + reset extension for AI_SETTINGS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): propagate AISettings default-load IO errors and single-return reset - loadDefaultAiSettings now declares throws IOException instead of swallowing it with an empty AISettings fallback; call sites in createOrUpdateSetting and resetSettingToDefault wrap with try/catch and re-throw as SystemSettingsException (matching the searchSettings error-handling pattern) - resetSettingToDefault refactored to if/else-if/else with a single trailing return, eliminating the two early returns - AISettingsHandlerTest: add incomingNullReturnsDefaults and nullNestedDefaultInheritsIncoming tests covering null-guard branches Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): gate file/page extraction on AISettings and externalize the extraction prompt - Add AISettingsUtil: cache-backed AISettings accessor (never null, fails open) with isFileExtractionEnabled, isPageExtractionEnabled, isOntologyAgentEnabled, memoryExtractionPrompt, ontologyAgentPrompt helpers - ContextMemoryExtractor.callLlm resolves the system prompt from AISettings at runtime, falling back to the SYSTEM_PROMPT constant - ContextFileProcessingService.process and fileStatusAfterText gate on AISettingsUtil.isFileExtractionEnabled in addition to LLMClientHolder.isEnabled - KnowledgePageRepository.schedulePillExtraction gates on AISettingsUtil.isPageExtractionEnabled in addition to LLMClientHolder.isEnabled - TDD: AISettingsUtilTest written first (RED), then implementation (GREEN) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): ontology verdict DTO boundary Add OntologyDerivation, OntologyVerdict (Jackson record DTOs with @JsonProperty on every component and @JsonIgnoreProperties), and OntologyAction constants (REUSE/CREATE/SKIP) as the anti-corruption boundary between untrusted LLM JSON and the domain model. Mirrors the KnowledgePill pattern. Covered by OntologyDerivationTest (lenient parse, unknown-field tolerance). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(ai): ontology grounding + extractor (pure derive) Add OntologyContext/OntologyCandidate records, OntologyExtractor (calls LLM via completeStructured, returns SKIP/SKIP on empty result), OntologyPromptBuilder (renders memory + candidate lists), and OntologyGrounding (keyword-searches glossary-term/metric/glossary indexes via Entity.getSearchRepository, caps at 20, fails-safe per axis). Covered by OntologyExtractorTest (mocked LLM, two cases: verdict passthrough + empty-→SKIP/SKIP). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ai): use existing search/entity constants, dedupe helper, shorten prompt builder Replace raw "_score"/"desc" with SearchConstants.DEFAULT_SORT_FIELD/DEFAULT_SORT_ORDER, remove local FIELD_NAME/FIELD_DESCRIPTION in favour of Entity constants, dedupe nullToEmpty via StringUtils.defaultString, extract renderMemory() so build() fits 15 lines, use CommonUtil.nullOrEmpty for the candidates guard, and replace raw "CREATE"/"SKIP" string literals in tests with OntologyAction constants. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): ontology reconciler with ownership lifecycle (create/reuse/retire/cascade) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): correct RELATED_TO edge direction, honor deletionPolicy on re-derive, no-op on all-SKIP Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): ontology processing engine with throttle and hash-gate Adds OntologyProcessingEngine: a trailing-throttle debounce (mirroring PageContextProcessingEngine) that collapses rapid memory edits into one derivation run, protected by a SHA-256 content hash-gate so unchanged memories are never re-derived. stampOntologyStats persists via recordChange(updateVersion=false) in ContextMemoryUpdater, exactly mirroring KnowledgePageRepository.recordExtractionStats, so no version churn occurs and no postUpdate event fires for the stats field. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): accurate recursion-contract docs, split term/metric stat counts, dedup scheduler Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): ContextMemory lifecycle hooks trigger the ontology agent + cascade postCreate/postUpdate schedule the OntologyProcessingEngine (gated on AISettingsUtil.isOntologyAgentEnabled). softDelete/hardDelete/restore AdditionalChildren fire in-edge-window so DERIVED_FROM edges exist when OntologyReconciler.onMemoryDeleted/onMemoryRestored run. Extracted AISettingsUtil.deletionPolicy() to remove duplication between the engine and the repository. Added OntologyReconciler.onMemoryRestored which restores CASCADE-soft-deleted automation-owned entities (Include.ALL query; ORPHAN/DEPRECATE-released entities correctly excluded because their DERIVED_FROM edges were dropped at delete time). Three new unit tests cover: owned restore, human-adopted skip, orphan-released skip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): ontology-bot seed and adopt-on-touch provider release guards Adds the ontology-bot principal seed files (bot + botUser JSON, auto-loaded by BotResource.initialize at startup), the OntologyOwnership utility class (centralized ONTOLOGY_BOT_NAME constant + releaseIfHumanEdited guard), and wires the guard into the entitySpecificUpdate of GlossaryTermUpdater, MetricUpdater and GlossaryUpdater so a human PATCH that changes an agent-managed field flips provider AUTOMATION → USER permanently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(ai): AISettings admin page with toggles and prompt editors Adds the AISettingsPage under Settings > Preferences with master enable toggle, memory-extraction and ontology-agent toggles, a deletion-policy Select, and two system-prompt Textareas. Registers the route, menu item, and all i18n keys (synced to all 17 locales). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ai): i18n the AISettings deletion-policy option labels Replace raw string literals ('cascade'/'orphan'/'deprecate') in DELETION_POLICY_OPTIONS with i18n labelKey fields; add new keys label.cascade / label.deprecate / label.orphan to en-us.json (alphabetical) and sync all 17 other locales via yarn i18n. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(ai): derived-entity provenance projections on memory/term/metric Add three read-only derived fields (non-default, resolved at read, never persisted) to expose File→Memory→Term/Metric provenance: - ContextMemory.derivedEntities (entityReferenceList): terms+metrics created by the Ontology Agent via DERIVED_FROM edges (findFrom(memory, CONTEXT_MEMORY, DERIVED_FROM, GLOSSARY_TERM/METRIC)) - ContextMemory.reusedEntities (entityReferenceList): terms+metrics reused via RELATED_TO edges (findTo(memory, CONTEXT_MEMORY, RELATED_TO, GLOSSARY_TERM/METRIC)) - GlossaryTerm.derivedFrom (entityReference): memory that created the term (findTo(term, GLOSSARY_TERM, DERIVED_FROM, CONTEXT_MEMORY)) - Metric.derivedFrom (entityReference): memory that created the metric (findTo(metric, METRIC, DERIVED_FROM, CONTEXT_MEMORY)) Edge directions verified against OntologyReconciler Task-6 code: addDerivedFromEdge stores from=entity→to=memory; reuse() stores from=memory→to=entity for RELATED_TO. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ai): integration tests for AISettings + ontology agent lifecycle AISettingsResourceIT: deterministic GET/PUT/reset tests for /system/settings/aiSettings covering default values (enabled=true, deletionPolicy=cascade), PUT persistence, and reset. OntologyAgentIT: deterministic lifecycle tests seeding DERIVED_FROM edges via in-process repository to replicate the reconciler's CREATE path, then driving cascade delete, adopt-on-touch (provider flip), and derivedFrom/derivedEntities projection fields through the public REST API without any LLM dependency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): File→Memory→Term/Metric provenance panels in the UI - Add derivedEntities/reusedEntities to ContextMemory generated type - Add derivedFrom to GlossaryTerm and Metric generated types - Add DERIVED_FROM/DERIVED_ENTITIES/REUSED_ENTITIES to TabSpecificField enum - Add getContextMemoryById to contextMemoryAPI - Create DerivedOntologyCard component (+ interface + test): fetches derivedEntities/reusedEntities from a memory and renders linked lists - Embed DerivedOntologyCard into CreateMemoryModal view-only mode so every memory's derived/reused ontology is visible when viewing a memory - Add derivedFrom field to GLOSSARY_TERM_DEFAULT_FIELDS and METRIC_DEFAULT_FIELDS - Add "Derived from memory" link in GlossaryTermsV1 and MetricDetails when derivedFrom is present on the entity - Add i18n keys: label.derived-from-memory, label.derived-ontology, label.reused, message.no-derived-ontology (synced to 19 locales) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): seed/serve aiSettings correctly so GET /system/settings/aiSettings returns defaults CollectionDAO$SettingsRowMapper.getSettings lacked an AI_SETTINGS case in its switch, causing every getConfigWithKey("aiSettings") call to throw IllegalArgumentException (swallowed, returning null) → HTTP 204 on every GET. Added the missing case so aiSettings rows deserialise to AISettings. Also fixed resetSettingToDefault for AI_SETTINGS: it was returning defaults without persisting them to the DB (unlike the equivalent searchSettings reset path which calls systemRepository.createOrUpdate). Now it persists first. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): seed AUTOMATION provider via createInternal and set id/updatedAt so cascade, adopt-on-touch, and projection ITs all pass The test helper createAutomationTerm used create-via-REST + termRepo.update to set provider=AUTOMATION, but the EntityRepository update framework only writes to DB when entityChanged=true; provider is not tracked by recordChange so the write was silently no-op'd. Term stayed provider=null in the DB, causing: - Scenario A timeout: isAutomationOwned read null, cascade skipped the term - Scenario B: assertEquals(USER, null) failed at assertion after human PATCH - Scenario C: cascade skipped term, glossary.delete failed with "glossary is not empty" Fix: replace the two-step seed with termRepo.createInternal() directly, mirroring OntologyReconciler.createTerm exactly. Also fixed the reconciler itself: createTerm, createMetric, and resolveOrMintGlossary all called createInternal without setting id or updatedAt; PostgreSQL GENERATED columns extract both from the stored JSON with NOT NULL constraints, so omitting them would cause constraint failures in production. Result: OntologyAgentIT 3/3 GREEN; OntologyReconcilerTest 20/20 + OntologyOwnershipTest 7/7 no regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): wire AISettings reset-to-default, strengthen util test, import @transaction Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(ai): document AISettings + Ontology Agent (§19) in the company-context spec Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Update generated TypeScript types * fix(ai): use design tokens and carry source-memory identity in provenance links Replace palette classes (text-gray-500/400/900, text-brand-600) with semantic tokens (text-tertiary, text-brand-secondary). Link both provenance anchors to ROUTES.CONTEXT_CENTER_MEMORIES?memory=<name> so the memories-list auto-opens the correct memory's view modal on arrival. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): gate per-axis derive toggles, validate LLM entity names, narrow catches - OntologyReconciler: add 5-arg reconcile() with explicit deriveTerms/deriveMetrics flags; reconcileAxis() gates BOTH apply and retire per flag, preventing mass-retire bug in the reviewer's suggested null-implied approach - OntologyReconciler: isValidName() guards CREATE paths for term, metric, glossary mint — null/blank or FQN-reserved chars (. " /) become a logged SKIP instead of thrown exception (Fix #2+#6) - OntologyProcessingEngine: read AISettings once in derive(), compute axis flags via deriveTermsEnabled/deriveMetricsEnabled helpers, pass to 5-arg reconcile() — no more settings coupling inside reconciler - Narrow Exception catches: AISettingsUtil → RuntimeException; OntologyGrounding → IOException|RuntimeException; OntologyProcessingEngine#runScheduled → RuntimeException with explanatory comment (Fix #5) - OntologyReconcilerTest: 4 new tests covering null/invalid-name SKIP behavior (27 total, 0 failures) - OntologyAgentIT Scenario E: fully deterministic axis-toggle coverage — seeds AUTOMATION-owned metric, calls 5-arg reconcile with deriveMetrics=false, asserts 0 metrics created + 0 retired + owned metric survives (4 tests, 0 failures) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ai): stamp content hash on reconcile failure to prevent re-derive poison-pill loop Wrap reconciler.reconcile() in reconcileSafely() which catches RuntimeException, logs the error with memory id, and returns null so the derive() method always reaches stampOntologyStats(). buildStats() is made null-safe (zero counts when result is null). LLM/network stages (fetchCandidates, extractor.derive) remain outside the guard so transient failures still propagate and retry. Adds stampsHashEvenWhenReconcileThrows test to prove the loop is broken. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ai): fetch memories list with generic sourceEntity, not deprecated sourceFile Page-sourced memories resolve their source via the generic sourceEntity ref; the deprecated sourceFile alias only covers file-sourced ones. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): reuse existing term/metric on FQN collision and cancel pending run on memory delete P1: OntologyReconciler.createTerm/createMetric now precheck the target FQN via findByFqn(repo, fqn) (NON_DELETED) before calling createInternal; on collision they call reuseExisting() (same RELATED_TO edge + counts.reused++) instead of throwing a unique-constraint violation. resolveOrMintGlossary also checks by newGlossaryName before minting a duplicate glossary. Three new unit tests cover the term-FQN-collision, metric-FQN-collision, and glossary-reuse-by-name paths. P2: ContextMemoryRepository.softDeleteAdditionalChildren/hardDeleteAdditionalChildren now call OntologyProcessingEngine.instance().cancel(memoryId) via a shared cancelAndCascadeOntology helper before cascadeOntology, so any pending scheduled derivation is cancelled when a memory is deleted, preventing spurious EntityNotFoundException in runScheduled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): null-guard the LLM verdict list in OntologyExtractor.derive Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ui): align AISettingsPage to codegen DeletionPolicy enum name The TS-codegen bot regenerated aiSettings.ts exporting the enum as DeletionPolicy (from the deletionPolicy schema key), not AIDeletionPolicy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): centralize MCP Server and MCP Chat into AI Settings MCP Server and MCP Chat are no longer internal Applications; they are managed as platform settings. - MCP Chat: new aiSettings.mcpChat (enabled + systemPrompt) drives a runtime McpChatServiceHolder, re-initialized on AI settings save/reset so chat toggles without a restart; McpClientResource reads the holder. - MCP Server: registerMCPServer gates on mcpConfiguration.enabled (seeded by default) instead of an installed app; configurable via /system/mcp/config. - McpApplicationBot seeded as a system bot with impersonation; McpApplication and McpChatApplication entities, marketplace defs and the mcpChatAppConfig schema removed. - MCP usage telemetry re-anchored to a constant identity (read history kept). - AI Settings page adds MCP Chat and MCP Server sections; chat sidebar gated on the setting. - 2.0.0 migration carries prior app config/enablement into settings, then retires the apps (keeping the bots). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Update generated TypeScript types * refactor(ai): rename Ontology Agent to Memory Agent Rename the agent's identity from "Ontology Agent" to "Memory Agent" across all layers: the drive/ontology Java package and its Ontology* pipeline/DTO classes, OntologyOwnership, the ontology-bot principal, the aiSettings ontologyAgent config key, the ContextMemory ontologyStats/OntologyProcessingStatus fields, the OntologyStatusBadge UI component, and the ontology-agent i18n labels (synced across all locales). Java models and generated TS types are regenerated from the renamed schemas. The unrelated Ontology Explorer (RDF glossary graph) feature and the output-concept "Derived Ontology" provenance panels are intentionally left unchanged, since they name the derived term/metric graph rather than the agent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Update generated TypeScript types * fix(ai): show MCP Chat in sidebar via setting + restrict MCP Server config - MCP Chat sidebar: add authenticated GET /mcp-client/enabled and gate the McpChatPlugin sidebar entry on aiSettings.mcpChat.enabled. It was tied to the removed app, so it never appeared after enabling chat in AI Settings. - MCP Server settings: expose only the enable toggle and Origin Header URI; the endpoint path is fixed at /api/v1/mcp and no longer editable; drop the origin validation and allowed origins fields. - EnumBackwardCompatibilityTest: account for the appended DERIVED_FROM relationship (count 26 -> 27, new last ordinal). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): drop MCP Chat config migration; derive MCP usage appId from name - The MCP Chat app was never shipped to customers, so the 2.0.0 migration no longer carries its config into aiSettings.mcpChat — the seeded default shape is kept. Server enable-alignment and dead-app cleanup remain. - McpToolCallUsage.appId: no MCP-usage query reads it (recorder writes, resource reads by appName), but apps_extension_time_series.appId is a NOT NULL generated column, so a value is required. Derive it deterministically from MCP_APP_NAME instead of a hardcoded UUID. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * format * fix(ai): commit missing MemoryRelation DTO blocked by .gitignore The bare `memory` pattern in .gitignore (a claude-flow tooling entry) also matched the drive/memory Java package, so the newly added MemoryRelation.java was silently ignored and never committed — breaking CI with "cannot find symbol MemoryRelation" in MemoryVerdict and MemoryReconciler. Scope the ignore to root-level /memory/ and add the file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai): make file-extraction LLM gate fully injectable (fix unit tests) ContextFileProcessingService gated knowledge-pill extraction on BOTH the injected llmEnabledSupplier AND a direct static AISettingsUtil.isFileExtractionEnabled(AISettingsUtil.get()). The static read needs a live SettingsCache, so ContextFileProcessingServiceTest (which injects the gate as () -> true) could not satisfy it: extraction was skipped and 3 tests failed (wrong repository.update counts, runExtraction never invoked). Fold the AISettings check into the production default supplier and route both call sites through one shouldExtractContext(...) helper, so the status machine is unit-testable and the repeated compound condition lives in one place. Production gate (LLM enabled AND file-extraction toggle) is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ui): add missing label.general i18n key for AI Settings AISettingsPage referenced t('label.general') but the key was absent from en-us.json. Added it and synced all locale files via yarn i18n. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ui): repair AISettingsPage and CreateMemoryModal jest mocks AISettingsPage imports Input from ui-core-components but the test mock omitted it, so the component rendered an undefined element ("Element type is invalid"). Add the Input mock plus the missing settingConfigAPI mocks (getMcpConfiguration/restoreSettingsConfig/updateMcpConfiguration) the component calls. CreateMemoryModal's partial DateTimeUtils mock dropped getEpochMillisForPastDays, which profiler.constant.ts invokes at module load via a deep import chain, failing the suite at import. Spread requireActual to preserve the real exports, and stub DerivedOntologyCard to cut the heavy transitive chain (EntityUtilClassBase -> DataProductsPage -> ConnectionStepCard) that also pulled in unmocked antd internals. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * build(deps): bump undici from 6.25.0 to 6.27.0 and form-data to 4.0.5 in /openmetadata-ui/src/main/resources/ui (open-metadata#29241) * build(deps): bump undici in /openmetadata-ui/src/main/resources/ui Bumps [undici](https://github.com/nodejs/undici) from 6.25.0 to 6.27.0. - [Release notes](https://github.com/nodejs/undici/releases) - [Commits](nodejs/undici@v6.25.0...v6.27.0) --- updated-dependencies: - dependency-name: undici dependency-version: 6.27.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * update yarn --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu> * change index from all to dataAsset for my data (open-metadata#29209) * change index from all to dataAsset for my data * fix jest tests * fix playwrights * fix playwrights --------- Co-authored-by: Shrabanti Paul <shrabantipaul@Shrabantis-MacBook-Pro.local> * fix(fqn): support double-quotes in fully qualified names + guard/repair corrupt FQNs (open-metadata#28697) * fix(fqn): support double-quotes in fully qualified names + guard/repair corrupt FQNs Names containing a double-quote could not be represented in an FQN: the Fqn grammar had no escape mechanism, yet quoteName() backslash-escaped the quote and stored an unparseable segment. Building the FQN is a pure string op, so such values were written successfully (insert hashes only the entity's own FQN); they then detonated later with a 500 (ParseCancellationException) the first time a nested FQN was hashed (e.g. a tags read), and were painful to migrate. Three layered fixes: - Grammar + quoteName: NAME_WITH_RESERVED now allows any character with '"' escaped by doubling it (""). quoteName/unquoteName encode/decode accordingly and are idempotent. Names without a quote encode identically to before, so existing FQNs and their hashes are unchanged (no reindex/migration needed). - Ingest guard: FullyQualifiedName.validateFqnName() asserts a name round-trips through encode->parse->decode, wired into every nested-FQN setter (columns, pipeline tasks, topic/searchIndex/apiEndpoint fields, mlFeatures). A name that cannot be hashed is now rejected at ingest with a clear 400 instead of being stored to fail later. - Heal-on-read: FullyQualifiedName.isValid() detects legacy-corrupt FQNs; PipelineRepository repairs unparseable task FQNs on the fly by re-deriving them from the task name, so existing poisoned data reads cleanly (200) without a migration. The repair is in-memory and persists on the next update. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(fqn): repair corrupt pipeline task FQNs via migration, not heal-on-read Heal-on-read (PipelineRepository.repairTaskFqns) ran a full ANTLR parse for every task on every pipeline read to subsidize a finite set of already-corrupt rows, was incomplete (the bulk/LIST/search path still 500'd), and could NPE on a null task FQN. Replace it with a one-time migration so the corruption leaves the stored data and reads pay no per-request cost. - Remove repairTaskFqns and its setFields() call; keep the validateFqnName write-path guard that rejects un-representable names at ingest (400). - Add migration v11211 (mysql + postgres): re-derive task FQNs where !isValid, persist only when changed. - Harden FullyQualifiedName.isValid to treat null/empty as invalid (no NPE). - Require >=1 char inside a quoted FQN segment (grammar + not *), rejecting empty quoted segments (""). FullyQualifiedNameTest: 17/17. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(fqn): reject empty nested-object names at write time validateFqnName returned early when quoteName(name) was unchanged, letting empty names through (quoteName("") == ""). An empty pipeline task name (the schema sets no minLength on task.name) then produced an unhashable empty FQN segment ("parent.") that 500'd on the next FQN hash -- the same failure class as unrepresentable names. Treat null/empty as invalid so every nested-FQN setter (columns, tasks, fields, mlFeatures) rejects them up front with a 400. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(fqn): make pipeline task-FQN migration batched and observable Address review feedback on the one-time repair migration: - Performance: scan pipelines in pages of 1000 via listAfterWithOffset instead of selecting every id and calling findEntityById per pipeline, dropping the N+1 round-trips and the full id list held in memory. Only changed rows are written. - Observability: track scanned/repaired/failed counts and log a prominent WARN with up to 100 pipeline ids that could not be repaired, instead of swallowing each failure as a lone WARN, so operators get a concrete remediation list. - Search: document (completion log + schemaChanges) that repaired task FQNs are reflected in the search index after the standard post-upgrade reindex, matching existing FQN-fix migration behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(fqn): cover pipeline task-FQN repair migration Add MigrationUtilTest for the v11211 repairPipelineTaskFqns migration: repair correctness (re-derive unparseable/null task FQNs, leave valid ones untouched, skip task-less pipelines) and migration-path resilience -- a single unreadable row or a failing update must not abort the upgrade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(fqn): cover repair-migration pagination across pages The existing repair-migration tests stubbed listAfterWithOffset for any offset and returned data only at offset 0, so every case exercised a single page. Add a test that stubs distinct pages by offset (0 -> page 1, 1000 -> page 2, 2000 -> empty) and asserts the second page is scanned and repaired, locking in correct limit/offset ordering and offset advancement in repairPipelineTaskFqns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(migration): retarget pipeline task-FQN repair to 1.13.1 Move the pipeline task-FQN repair migration from 1.12.11 (package v11211) to 1.13.1 (package v1131): the native SQL placeholder dir, the mysql and postgres Migration handlers, the MigrationUtil, and its test. The framework derives the handler package from the version dir via MigrationFile.getVersionPackageName(), so 1.13.1 -> v1131; no logic changes. Pure data migration, no DDL. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(fqn): correct repair-migration summary counts; unscope 1.13.1 DDL Addresses code review on the 1.13.1 pipeline task-FQN repair migration. 1. Failed-to-persist pipelines were double-counted in the summary log. repairPipeline kept taskCount > 0 when pipelineDAO.update threw, so a row that never persisted was reported as both "re-derived N task FQNs" and "could not fix N pipeline(s)", overstating success. Reset taskCount to 0 on the persistence-failure path so only rows that actually persisted count as repaired. repairPipelineTaskFqns now returns a RepairSummary so the counts are asserted directly (doesNotCountFailedPersistAsRepaired fails without the fix). The Migration handlers ignore the return value. 2. Revert bootstrap/sql/migrations/native/1.13.1/{mysql,postgres}/ schemaChanges.sql to match main exactly. The intake_form_entity DDL there belongs to main (consumed by IntakeFormDAO) and arrived via the main merge, not this PR; only a local comment was added on top. Dropping that comment makes this PR's net change to those files zero and removes the "data migration only" text that contradicted the DDL. The FQN repair runs via the Java v1131 Migration handler and needs no SQL. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add migration for variuos childresn entities --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io> Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com> * fix(mcp): OAuth login fails with 400 when SSO returns id_token in URL fragment (open-metadata#29228) * fix(mcp): handle active-session shortcut and implicit-flow fragment at /mcp/callback Root cause: when a user has an active Google/Azure SSO session, AuthenticationCodeFlowHandler.handleLogin() bypasses pac4j entirely and calls sendRedirectWithToken() directly, committing the response as 302 /mcp/callback#id_token=... (token in URL fragment, no pac4j state). This caused three failures: 1. UserSSOOAuthProvider.handleSSOAuthorization() found no pac4j state in session (expected — pac4j was never invoked) and threw AuthorizeException. 2. AuthorizationHandler.exceptionally() turned the exception into an error redirect URL; handleAuthorizeRequest() then called sendRedirect() on an already-committed response → IllegalStateException: Committed at line 503. 3. McpCallbackServlet received the callback with #id_token=... in the URL fragment (browser-only, server never sees it), so both pac4jState and idTokenParam were null → 400 'missing state'. Fixes: - UserSSOOAuthProvider: check response.isCommitted() before throwing; return SSO_REDIRECT_INITIATED for the active-session path. - OAuthHttpStatelessServerTransportProvider: guard sendRedirect() with response.isCommitted() check to prevent Committed exception. - McpCallbackServlet: serve a JS fragment-extraction page instead of returning 400 — JS reads window.location.hash, extracts id_token, retries /mcp/callback?id_token=... so handleDirectIdTokenFlow() runs. Also adds debug logging throughout the MCP auth flow for easier diagnosis of future SSO/OAuth callback issues. * fix(mcp): address review — POST token from fragment, fix log field, import Collections - serveFragmentExtractionPage: switch from GET redirect to form POST so the id_token never appears in a URL, browser history, or access logs (RFC 6819 §5.3.5). Add doPost() that reads id_token from the body and delegates to handleDirectIdTokenFlow(). - McpCallbackServlet debug log: rename hasFragment→refererPresent with a boolean so the field is meaningful (server cannot observe the fragment). - UserSSOOAuthProvider: replace java.util.Collections FQN with import + simple name per project Java standards. * Update openmetadata-mcp/src/main/java/org/openmetadata/mcp/server/auth/handlers/McpCallbackServlet.java Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(mcp): address review — if/else in doPost, textContent XSS fix, unit tests - doPost: replace early-return guard clauses with if/else per project standard (one return statement per method) - serveFragmentExtractionPage: replace innerHTML with textContent in both error branches to eliminate the JS XSS antipattern (greptile finding) - McpCallbackServlet: package-private constructor for test injection; resolveSsoHandler() promoted to protected for subclass override in tests - Add McpCallbackServletTest: 7 tests covering serveFragmentExtractionPage (content-type, form POST shape, textContent usage) and doPost edge cases (null handler→503, null/empty id_token→400) - Add OAuthHttpStatelessServerTransportProviderTest: 4 tests covering sanitizeRedirectUrlForLogging (with/without query, null) and the committed-response guard * fix(mcp): CSRF protection on doPost; remove vacuous guard test; real CSRF tests Security (P1): doPost accepted cross-origin form submissions without any CSRF check. A malicious site holding a valid id_token for a different user could craft a form targeting /mcp/callback and hijack a victim's pending MCP auth session (victim's Claude Desktop authenticates as the attacker). Fix: add isOriginAllowed() — rejects any POST whose Origin header does not match the server's own base URL (resolved from MCP config or system settings). Absent Origin (same-origin browsers may omit for non-CORS requests) is treated as allowed. Package-private for testability. Test quality: remove the vacuous handleAuthorizeRequest guard test that never called the method under test and trivially passed — replaced with a comment noting the guard is covered at integration level. Replace with 4 real CSRF tests: Origin absent, matching, mismatched, and full doPost 403 path verification. * fix(mcp): CSRF default-port normalization and reject-on-unknown-origin * refactor(mcp): string constants + lazy-cache server origin - Extract all sendError message strings to package-visible static final constants (ERR_SSO_UNAVAILABLE, ERR_CSRF_ORIGIN_MISMATCH, ERR_MISSING_ID_TOKEN, ERR_CALLBACK_FAILED, ERR_MISSING_STATE, ERR_STATE_NOT_FOUND). Eliminates magic strings, gives callers a stable contract, catches typos at compile time. - Add cachedServerOrigin volatile field + getServerOrigin() lazy-init helper: resolveServerOrigin() is now called at most once per server lifetime instead of on every POST. Non-null results are cached; null results (transient DB miss at startup) are not cached so the next request retries — fail-secure without permanently breaking CSRF. - Update tests to reference constants instead of duplicating literals. - Add two cache-behaviour tests: verifies resolveServerOrigin() is called exactly once after a successful resolution, and called again on each request when it returns null (no caching of null). --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Load learning drawer chunk without a page-loader flash (open-metadata#29280) LearningIcon always mounts the lazy LearningDrawer, so its Suspense fallback rendered the centered page Loader inline next to the icon on mount, before any user interaction. Make the fallback configurable on withSuspenseFallback (still defaults to the existing Loader, so all current callers are unchanged) and have LearningIcon opt out with null. The chunk now loads silently while the drawer stays mounted, preserving the close animation and the in-drawer resource player that relies on the drawer not unmounting. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fixes open-metadata#27945: surface exact/prefix matches first in QuickFilter aggregation (open-metadata#29231) * fix(search): surface exact/prefix matches first in QuickFilter aggregation Fixes open-metadata#27945 The /search/aggregate endpoint used a single terms aggregation with include: ".*term.*", ordered alphabetically with a fixed size=10. When more than 10 values matched the pattern, exact matches (e.g. "name") were silently dropped in favour of alphabetically-earlier contains matches (e.g. "first_name", "display_name"). Replace the single agg with three targeted sub-aggregations sent in one ES/OS round-trip: • __exact – include: "term" (size 1, O(1) dict lookup) • __prefix – include: "term.*" (size N, B-tree prefix scan) • __contains – include: ".*term.*" (size N, full wildcard, unchanged) The backend merges the three bucket lists in priority order (exact → prefix → contains), deduplicates, trims to the requested size, and rewrites the response under the original sterms#field key so the frontend requires no changes. Add SearchUtils helpers: isBestMatchSearchPattern, extractRawSearchValue, exactAggKey/prefixAggKey/containsAggKey, mergeBestMatchAggregations. Cover all helpers and merge edge cases in SearchUtilsTest (12 new tests). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(search): address PR review — safe fallback, dot-escaping, null guard - mergeBestMatchAggregations: on merge failure, degrade to renaming the __contains sub-agg to sterms#<field> so the UI always receives the key it expects; double-catches so even fallback parse failures are silent - buildBestMatchAggregations (ES + OS): escape '.' in rawValue before using as Lucene regexp include for exact and prefix sub-aggs, preventing field names like 'user.id' from being treated as wildcards and preventing unbalanced-regexp 500s - isBestMatchSearchPattern: guard against null input - SearchUtilsTest: fix two broken parametrized cases (.* → expected '', remove null-passing CSV row), add dedicated null test, add fallback-path and dot-escaping tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(search): add missing ObjectNode import in SearchUtilsTest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(search): fix dot-escaping test expectations — trailing .* wildcard must not be escaped Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(search): cap oversized dataModel column tree at index time (open-metadata#29212) * fix(search): cap oversized dataModel column tree at index time A container/table with a pathologically large dataModel (a wide/nested schema with hundreds of thousands of columns) produces a multi-hundred-MB search document. The existing oversized-doc guard strips lineage but then ships the doc anyway, so the server stores ~196MB, parses it whole on every read and reindex, and can OOM. Extend stripDocMapIfOversized (live index) and stripLineageForSize (bulk reindex) to also strip the nested column children and the derived columnNames/columnNamesFuzzy when the doc is still over the cap after lineage stripping. Top-level columns are kept, so column search and the column grid still work; the full schema stays available via the entity API. Gated by size, so normal entities are untouched. The nested children are mapped enabled:false (stored, not indexed), so nothing searchable is lost. This bounds the indexed document at the source, so the server never holds the giant doc on read or reindex — complementing the read-side response streaming. * perf(search): compute oversized-doc size once per mutation in strip path Thread the serialized byte size through a local in stripDocMapIfOversized and stripLineageForSize instead of re-serializing the full document for each size gate and log line. On the oversized path this avoids redundant ~hundreds-of-MB pojoToJson/getBytes allocations exactly when memory and CPU are most constrained. Addresses review feedback on open-metadata#29212. * refactor(search): guard upstreamLineage strip and reuse post-strip size in logs Guard the upstreamLineage removal in stripLineageForSize with a null check to match stripDocMapIfOversized, avoiding a wasted full-doc serialization and a misleading WARN when the field is absent on an oversized doc. Store the post-strip serialized size in a local for the column-strip log lines instead of recomputing inline. Addresses review feedback on open-metadata#29212. * fix(ui): exclude dataModel from Explore/suggestion search payloads Complements the index-time column-tree strip (open-metadata#29212): exclude dataModel from Explore and search-suggestion payloads, lazy-fetch it in the container summary panel via getContainerByFQN when absent, and make the service-insights asset-count query aggregation-only (pageSize 0, fetchSource false). Ported from open-metadata#29200 so the index-side and UI-side fixes ship together. Relates to open-metadata#29210. * fix(search): address PR review — strip docs/logs + summary-panel error handling Backend: document the column-tree strip in stripLineageForSize JavaDoc and include columnNamesFuzzy in both oversized-doc WARN logs. UI: in the container summary panel's on-demand dataModel fetch, reset previously-fetched columns when the container changes (so the prior container's schema isn't shown) and surface fetch failures via showErrorToast instead of silently rendering 'No data'. Addresses review comments on open-metadata#29212. * fix(ui): clear stuck loader on container switch + test on-demand dataModel fetch Address follow-up review on the summary-panel lazy-fetch: clear isColumnsLoading in the effect's early-return so a now-cancelled in-flight getContainerByFQN can't leave the loader stuck on. Add tests covering the on-demand fetch (fires only when columns are absent from the search hit, not fetched when inline, and surfaces a toast on failure). Addresses review comments on open-metadata#29212. * fix(ui): prevent No-data flash on summary-panel mount + fix import order Lazily initialize isColumnsLoading (loading when columns must be fetched on demand) so the container summary panel shows the loader on first render instead of briefly flashing 'No data available' (greptile P1). Also reorder imports (TablePureUtils before ToastUtils, drop stray blank) to fix UI checkstyle. Addresses review on open-metadata#29212. * fix(ui): align glossary term Related Terms section inside the left panel (open-metadata#29284) * fix(ui): align glossary term Related Terms section inside the left panel * added unit test * fix lint checks * fix(ui): prevent ontology relations graph from crashing on large glossaries (open-metadata#29270) * fix(ui): prevent ontology relations graph from crashing on large glossaries * nit * fix(playwright): stop SSORenewal nightly flake from too-short token TTL (open-metadata#29268) The SSO Session Renewal suite swaps the server to a short SAML JWT TTL before logging in. At 10s, on a loaded CI runner the initial app bootstrap (loggedInUser, config, permissions) outran that window, so the first /permissions fetch 401'd mid-load; the silent refresh succeeded but the bootstrap request was not retried, leaving the app wedged on the global loading spinner. dropdown-profile never rendered and the renewal tests timed out, exhausting all retries on the 2026-06-22 nightly (run 27929661599). - Raise SHORT_ACCESS_TTL_SECONDS 10 -> 30 so the token outlives bootstrap. 30s stays under EXPIRY_THRESHOLD_MILLES (60s), so the proactive-renewal timer still fires immediately and the refresh-on-expiry behavior under test is unchanged. - Wait for dropdown-profile at the end of loginViaSaml so login is only "done" once the app shell has rendered, making any future bootstrap hang fail in beforeAll with a clear cause instead of mid-test. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(exasol): flush statistics before polling Exasol audit tables in integration tests (open-metadata#29278) * test(ui): update MCP Playwright specs for settings-driven MCP - AISettings.spec: MCP Server section now exposes only Enable + Origin Header URI (path/originValidation removed), so assert/fill the origin header instead of the removed mcp-server-path field. - McpChat.spec: enable MCP Chat via aiSettings.mcpChat.enabled (PUT /system/settings) and reset it afterwards, instead of installing the retired McpChatApplication app (which now 404s). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu> Co-authored-by: shrabantipaul-collate <shrabanti.paul@getcollate.io> Co-authored-by: Shrabanti Paul <shrabantipaul@Shrabantis-MacBook-Pro.local> Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com> Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io> Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com> Co-authored-by: Vishnu Jain <121681876+Vishnuujain@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Sid <30566406+siddhant1@users.noreply.github.com> Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com> Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com> Co-authored-by: Anujkumar Yadav <anujf0510@gmail.com> Co-authored-by: harshsoni2024 <64592571+harshsoni2024@users.noreply.github.com>

Code-Eat-Rabbit marked this pull request as ready for review October 12, 2025 02:56

Code-Eat-Rabbit closed this Oct 12, 2025

cursor Bot reviewed Oct 12, 2025

View reviewed changes

Code-Eat-Rabbit deleted the cursor/fix-pydantic-rootmodel-extra-config-error-cd1f branch October 12, 2025 02:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pydantic rootmodel extra config error#6

Fix pydantic rootmodel extra config error#6
Code-Eat-Rabbit wants to merge 1 commit into
issue-22392-poc-patchfrom
cursor/fix-pydantic-rootmodel-extra-config-error-cd1f

Code-Eat-Rabbit commented Oct 11, 2025

Uh oh!

cursor Bot commented Oct 11, 2025

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Oct 12, 2025

Uh oh!

cursor Bot Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Code-Eat-Rabbit commented Oct 11, 2025

Describe your changes:

Type of change:

Checklist:

Uh oh!

cursor Bot commented Oct 11, 2025

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

This is the final PR Bugbot will review for you during this billing cycle

Uh oh!

cursor Bot Oct 12, 2025

Choose a reason for hiding this comment

Bug: Incomplete Model Config Handling

Uh oh!

cursor Bot Oct 12, 2025

Choose a reason for hiding this comment

Bug: RootModel Flag Reset Logic Incomplete

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants