Fix 11 NLU/NLG source-divergence test failures by Copilot · Pull Request #92 · Steake/GodelOS

Copilot · 2026-03-05T19:48:42Z

Source implementations diverged from test contracts across 6 files in godelOS/nlu_nlg/. All 11 failures are genuine logic bugs, not test issues.

NLU fixes

lexical_analyzer_parser.py / semantic_interpreter.py — _is_dependent_on attempted to derive head token index by parsing the dep label string as an integer, which is nonsensical ("nsubj" is not a number). Added head_i: Optional[int] to Token (populated from token.head.i in spaCy), with a dep-label heuristic fallback when head_i is None (i.e. manually-constructed test tokens).
Review-thread follow-up: hoisted dependency-label heuristic sets to module-level constants and reused a single source of truth by importing the constants into semantic_interpreter.py, reducing recursive-call allocations and drift risk. Also corrected misleading inline comments in _infer_head to match actual ROOT-based behavior.
discourse_manager.py — _determine_topic filtered on pos == "NOUN" only, missing proper nouns. Added "PROPN". Separately, turn_count was double-incremented per utterance (once in add_dialogue_act, once in _update_dialogue_state).
lexicon_ontology_linker.py — link_term_to_concept short-circuited with return None when no lexical entry existed, skipping the ontology concept search that handles synonyms via lexical_mappings.

NLG fixes

content_planner.py — _determine_message_type only inspected top-level AST nodes; a ModalOpNode nested inside a ConnectiveNode was invisible. Added recursive descent through connective operands. Separately, _process_node recorded discourse relations using fabricated child IDs (f"{parent}_arg_{i}") that never matched the counter-based IDs actually assigned to child elements. Changed _process_node to return its element ID so callers use the real value.
Review-thread follow-up: added explicit handling for LambdaNode and DefinitionNode in _process_node, and added a fail-fast error path for unsupported node types with contextual error messaging to prevent silently inconsistent MessageSpecifications.
pipeline.py — self.discourse_manager.context fails on a MagicMock(spec=DiscourseStateManager) because context is an instance attribute, not a class attribute. Switched to getattr(..., 'context', None).

Results

79 passed, 5 skipped (integration), 0 failed across tests/nlu_nlg/. No regressions.

Original prompt

This section details on the original issue you should resolve

<issue_title>Fix: 11 remaining NLU/NLG source-divergence test failures (Phase 2 deferred)</issue_title>
<issue_description>## Context

PR #90 resolved 55 of the 66 remaining test failures from issue #73. The following 11 failures were deferred as they require genuine module-level source rewrites, not test patches.

Root causes identified

Source implementations have diverged from the interfaces the tests were written against

Method signatures, return types, or class structures no longer match test expectations

These are not environmental — spaCy skip guards were applied in Fix: remaining 66 test failures from Phase 2 (Categories 1–4) #90 for the 7 environmental failures; these 11 are genuine code divergence

What needs doing

Work through each failing test in tests/nlu_nlg/nlu/ and tests/nlu_nlg/nlg/ file by file:

Read the test expectations

Read the current source implementation

Align the source to the test contract (preferred) OR update the test if the source represents correct intended behaviour

Do not skip — these need to actually pass

Files in scope

tests/nlu_nlg/nlu/ — NLU parser/pipeline tests

tests/nlu_nlg/nlg/ — NLG generator tests

Corresponding source: godelOS/nlu_nlg/

Acceptance criteria

All 11 previously-failing tests pass without skipif guards.

Related: #73, #90</issue_description>

Comments on the Issue (you are @copilot in this section)
@Steake @copilot Please work through the 11 remaining NLU/NLG source-divergence test failures described in this issue. Work file by file through `tests/nlu_nlg/nlu/` and `tests/nlu_nlg/nlg/`, align source implementations to test contracts, and ensure all 11 tests pass without skip guards.

Custom agent used: Oli
Oliver Chimera Profile

Fixes Fix: 11 remaining NLU/NLG source-divergence test failures (Phase 2 deferred) #91

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- discourse_manager.py: include PROPN in topic detection, fix double turn_count increment - lexical_analyzer_parser.py: add head_i field to Token, rewrite _is_dependent_on with proper head-finding - lexicon_ontology_linker.py: search ontology concepts when no lexical entry found (synonym support) - semantic_interpreter.py: rewrite _is_dependent_on with same head_i/heuristic approach - content_planner.py: recursive _determine_message_type for nested modals, fix _process_node ID tracking - pipeline.py: use getattr for discourse_manager.context access to handle mocks gracefully Co-authored-by: Steake <530040+Steake@users.noreply.github.com>

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>

Steake

Review: PR #92 — Fix 11 NLU/NLG source-divergence test failures

Verdict: Strong. Verify truncated diff in lexical_analyzer_parser.py, then merge.

File-by-file assessment

godelOS/nlu_nlg/nlu/lexical_analyzer_parser.py and semantic_interpreter.py
The original _is_dependent_on attempted int(t.dep.split('_')[0]) — parsing a dependency label string such as "nsubj" as an integer, which is not merely wrong but demonstrates a fundamental misunderstanding of what a dependency parse tree is. The replacement correctly uses token.head_i (populated from spaCy's token.head.i) with a heuristic label-based fallback for manually-constructed test tokens that have no head_i. The fallback _infer_head heuristic — classifying dep labels into verb_deps, noun_deps, prep_obj_deps sets — is sensible. One item: the patch view truncates mid-definition of the noun_deps set; confirm commit 51b0f0cd is complete in the actual file before merging.

godelOS/nlu_nlg/nlu/discourse_manager.py
Two independent fixes: (1) turn_count was incremented in both add_dialogue_act and _update_dialogue_state, producing a count of 2 per utterance where 1 was expected. The duplicate removed. (2) Topic detection filtered on pos == "NOUN" only; "PROPN" (proper noun) added. Both correct.

godelOS/nlu_nlg/nlu/lexicon_ontology_linker.py
link_term_to_concept short-circuited with return None when no lexical entry existed, skipping the fallback ontology concept search entirely. Restructured to if entry: [lexical path] followed unconditionally by the ontology fallback. Correct.

godelOS/nlu_nlg/nlg/content_planner.py
Two fixes: (1) _process_node was recording discourse relations using fabricated IDs (f"{element_id}_arg_{i}") that never matched the counter-based IDs actually assigned to child elements — the relations were silently broken. Fixed by returning the real element ID from _process_node and using it at the call site. (2) _determine_message_type did not recurse into ConnectiveNode operands, making any ModalOpNode nested inside a connective invisible to type detection. Recursive descent added.

godelOS/nlu_nlg/nlg/pipeline.py
self.discourse_manager.context fails on MagicMock(spec=DiscourseStateManager) because instance attributes set in __init__ are not part of a spec mock's attribute set. getattr(..., 'context', None) is the correct defensive form. Minimal, correct.

Claimed result

79 passed, 5 skipped (integration), 0 failed across tests/nlu_nlg/. No regressions.

Fixes #91.

Steake · 2026-03-05T21:16:56Z

Verify truncated diff in lexical_analyzer_parser.py

@copilot Please address the comments queries and remedy any issues.

Copilot · 2026-03-05T21:21:18Z

Verify truncated diff in lexical_analyzer_parser.py

@copilot Please address the comments queries and remedy any issues.

Verified — the file is complete. The truncation was only in GitHub's diff view. In the actual committed file (2773ae1), both lexical_analyzer_parser.py (lines 344–346) and semantic_interpreter.py (lines 515–517) have the full noun_deps set definition:

noun_deps = {"det", "amod", "compound", "nummod", "poss", "nmod",
             "appos", "acl", "relcl", "case"}

The _infer_head static method is also fully present in both files. Re-ran the full suite just now: 79 passed, 5 skipped, 0 failed.

Copilot

Pull request overview

Fixes NLU/NLG source implementations that diverged from existing unit-test contracts in godelOS/nlu_nlg/, restoring correct dependency traversal, discourse tracking, ontology linking, and NLG content planning behavior.

Changes:

Add Token.head_i (spaCy head index) and use it for dependency traversal with a heuristic fallback when unavailable.
Fix discourse topic detection for proper nouns and prevent double-incrementing of dialogue turn count.
Improve NLG content planning by recursively detecting nested modal operators and by using real (returned) child element IDs for discourse relations; harden pipeline access to discourse_manager.context for mocks.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
godelOS/nlu_nlg/nlu/semantic_interpreter.py	Replace broken dep-label parsing with `head_i`-based traversal + heuristic fallback for dependency checks.
godelOS/nlu_nlg/nlu/lexicon_ontology_linker.py	Allow ontology concept matching even when no lexical entry exists (synonym/lexical mapping path).
godelOS/nlu_nlg/nlu/lexical_analyzer_parser.py	Add `Token.head_i` populated from spaCy and update dependency traversal accordingly.
godelOS/nlu_nlg/nlu/discourse_manager.py	Fix topic detection to include `PROPN` and remove extra dialogue turn increment.
godelOS/nlu_nlg/nlg/pipeline.py	Avoid direct attribute access to `context` to support mocked discourse managers.
godelOS/nlu_nlg/nlg/content_planner.py	Recursively detect nested modal operators; fix discourse relation child IDs by returning actual element IDs.

Copilot · 2026-03-05T21:47:44Z

godelOS/nlu_nlg/nlu/lexical_analyzer_parser.py

+                    verb_deps: set, noun_deps: set, prep_obj_deps: set) -> Optional[int]:
+        """Infer the head token index from dependency label heuristics."""
+        if token.dep in verb_deps:
+            # Look for the ROOT token or nearest verb


The _infer_head inline comment says “Look for the ROOT token or nearest verb”, but the code only checks for a ROOT token. Please update the comment (or implement the nearest-verb behavior) so the heuristic is accurately documented.

Suggested change

# Look for the ROOT token or nearest verb

# Look for the ROOT token

Copilot · 2026-03-05T21:47:45Z

godelOS/nlu_nlg/nlg/content_planner.py

+                agent_id = self._process_node(node.agent_or_world, message_spec)
+                message_spec.add_discourse_relation("agent", element_id, agent_id)
+
+        return element_id


_process_node now documents that it returns “the ID of the created content element” and always returns element_id, but there’s no else/error path for unsupported node types (e.g., LambdaNode, DefinitionNode are imported). If an unsupported node is encountered, callers will record discourse relations pointing to an element ID that was never added to main_content/supporting_content. Consider explicitly handling the remaining node types or raising/logging on unsupported nodes to avoid silently producing inconsistent MessageSpecifications.

Copilot · 2026-03-05T21:47:45Z

godelOS/nlu_nlg/nlu/semantic_interpreter.py

+        # Find the head using heuristic
+        inferred_head_i = self._infer_head(token, all_tokens,
+                                           verb_deps, noun_deps, prep_obj_deps)
+        if inferred_head_i is not None:
+            if inferred_head_i == head_token.i:


The _is_dependent_on fallback heuristic and _infer_head logic appears duplicated between lexical_analyzer_parser.py and semantic_interpreter.py. Consider extracting the shared head-inference helper (and the dep-label sets) into a single utility to avoid the two copies drifting over time.

Copilot · 2026-03-05T21:47:45Z

godelOS/nlu_nlg/nlu/semantic_interpreter.py

+        # Deps that attach to a verb (typically the ROOT)
+        verb_deps = {
+            "nsubj", "nsubjpass", "dobj", "iobj", "prep", "advmod", "punct",
+            "cc", "conj", "aux", "auxpass", "neg", "ccomp", "xcomp", "advcl",
+            "attr", "agent", "oprd", "acomp", "prt", "mark", "expl",
+            "npadvmod", "intj", "csubj", "csubjpass", "dative",
+        }
+
+        # Deps that attach to a noun
+        noun_deps = {"det", "amod", "compound", "nummod", "poss", "nmod",
+                     "appos", "acl", "relcl", "case"}
+
+        # Deps that attach to a preposition
+        prep_obj_deps = {"pobj"}


In _is_dependent_on, verb_deps/noun_deps/prep_obj_deps are re-created on every call. Since this method can be called for every token (and recursively), consider making these sets class-level constants (or module-level) to avoid repeated allocations and make the heuristic easier to keep consistent.

Copilot · 2026-03-05T21:47:45Z

godelOS/nlu_nlg/nlu/semantic_interpreter.py

+            for t in all_tokens:
+                if t.dep == "ROOT":
+                    return t.i


The comment in _infer_head says “Look for the ROOT token or nearest verb”, but the implementation only looks for dep == "ROOT" and never attempts to find a nearest verb. Either adjust the comment to match the behavior or implement the “nearest verb” fallback to avoid misleading future maintainers.

Suggested change

for t in all_tokens:

if t.dep == "ROOT":

return t.i

root_i: Optional[int] = None

for t in all_tokens:

if t.dep == "ROOT":

root_i = t.i

break

if root_i is not None:

return root_i

# Fallback: find the nearest verb (e.g., VERB/AUX) to this token

best_verb_i: Optional[int] = None

for t in all_tokens:

if t.pos in ("VERB", "AUX"):

if best_verb_i is None or abs(t.i - token.i) < abs(best_verb_i - token.i):

best_verb_i = t.i

if best_verb_i is not None:

return best_verb_i

Copilot · 2026-03-05T21:47:45Z

godelOS/nlu_nlg/nlu/lexical_analyzer_parser.py

+        # Deps that attach to a verb (typically the ROOT)
+        verb_deps = {
+            "nsubj", "nsubjpass", "dobj", "iobj", "prep", "advmod", "punct",
+            "cc", "conj", "aux", "auxpass", "neg", "ccomp", "xcomp", "advcl",
+            "attr", "agent", "oprd", "acomp", "prt", "mark", "expl",
+            "npadvmod", "intj", "csubj", "csubjpass", "dative",
+        }
+
+        # Deps that attach to a noun
+        noun_deps = {"det", "amod", "compound", "nummod", "poss", "nmod",
+                     "appos", "acl", "relcl", "case"}

-        return False
+        # Deps that attach to a preposition
+        prep_obj_deps = {"pobj"}
+


verb_deps/noun_deps/prep_obj_deps are allocated inside _is_dependent_on each time it’s called (and it can recurse). Consider hoisting these to class/module constants to reduce per-token overhead and keep the heuristic definitions centralized.

Steake · 2026-03-05T22:00:30Z

@copilot apply changes based on the comments in this thread

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>

…andling Co-authored-by: Steake <530040+Steake@users.noreply.github.com>

Copilot · 2026-03-05T22:10:11Z

@copilot apply changes based on the comments in this thread

Applied the review-thread follow-ups in commits d94e0b3 and 1e7d394:

hoisted dependency-label heuristic sets out of recursive calls
corrected the misleading ROOT/nearest-verb comment
added explicit LambdaNode/DefinitionNode handling in content planning and fail-fast for unsupported node types
centralized semantic interpreter heuristic constants by importing from lexical analyzer/parser

Validation: python -m pytest tests/nlu_nlg/ -v --no-cov → 79 passed, 5 skipped, 0 failed.

Resolves all tractable failures from issue #73. Cat 1 — query_replay_harness: async methods, unified data kwarg, complete_recording rename, public attr aliases, ProcessingStep enum extension. Cat 2 — knowledge_models: 'manual' added to ImportSource Literal. Cat 3 — test_frontend_modules: godelos-frontend → svelte-frontend. Cat 4 — env skip guards for Z3 (SMT) removed for nlu_nlg after #92 fixed source; Z3 skip on test_smt_interface_enhanced retained (infrastructure absent). Also carries godelOS/core_kr, inference_engine, learning_system, symbol_grounding fixes from the #74 corpus. 11 NLU/NLG source-divergence failures resolved separately in #92 (merged prior).

Initial plan

7c85f27

Copilot AI assigned Copilot and Steake Mar 5, 2026

Copilot started work on behalf of Steake March 5, 2026 19:48 View session

Copilot AI and others added 2 commits March 5, 2026 20:05

refactor: remove unused head_token param from _infer_head

2773ae1

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix 11 remaining NLU/NLG source-divergence test failures~~ Fix 11 NLU/NLG source-divergence test failures Mar 5, 2026

Copilot finished work on behalf of Steake March 5, 2026 20:11

Steake reviewed Mar 5, 2026

View reviewed changes

Steake mentioned this pull request Mar 5, 2026

Fix: remaining 66 test failures from Phase 2 (Categories 1–4) #90

Merged

Copilot started work on behalf of Steake March 5, 2026 21:17 View session

Copilot finished work on behalf of Steake March 5, 2026 21:21

Steake marked this pull request as ready for review March 5, 2026 21:42

Copilot AI review requested due to automatic review settings March 5, 2026 21:42

Copilot started reviewing on behalf of Steake March 5, 2026 21:42 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Copilot started work on behalf of Steake March 5, 2026 22:00 View session

Copilot AI and others added 2 commits March 5, 2026 22:06

refactor: address PR review thread feedback in NLU/NLG modules

d94e0b3

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>

fix: apply review-thread follow-ups for NLU/NLG heuristics and node h…

1e7d394

…andling Co-authored-by: Steake <530040+Steake@users.noreply.github.com>

Copilot finished work on behalf of Steake March 5, 2026 22:11

Steake merged commit a1bd9eb into main Mar 5, 2026

Steake added a commit that referenced this pull request Mar 5, 2026

chore: remove spaCy skip guard — NLU/NLG source fixes landed in #92

5fe1101

Steake added a commit that referenced this pull request Mar 5, 2026

chore: remove spaCy skip guard — NLU/NLG source fixes landed in #92

3f079af

This was referenced Mar 5, 2026

Fix: 167 pre-existing runtime test failures (Phase 2) #73

Closed

Fix: 11 remaining NLU/NLG source-divergence test failures (Phase 2 deferred) #91

Closed

Steake mentioned this pull request Mar 6, 2026

test: End-to-end integration test suite for cognitive subsystems #101

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 11 NLU/NLG source-divergence test failures#92

Fix 11 NLU/NLG source-divergence test failures#92
Steake merged 5 commits intomainfrom
copilot/fix-nlu-nlg-test-failures

Copilot AI commented Mar 5, 2026 •

edited

Loading

Uh oh!

Steake left a comment

Uh oh!

Steake commented Mar 5, 2026

Uh oh!

Copilot AI commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Steake commented Mar 5, 2026

Uh oh!

Copilot AI commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# Look for the ROOT token or nearest verb
	# Look for the ROOT token

-            for t in all_tokens:
-                if t.dep == "ROOT":
-                    return t.i
+            root_i: Optional[int] = None
+            for t in all_tokens:
+                if t.dep == "ROOT":
+                    root_i = t.i
+                    break
+            if root_i is not None:
+                return root_i
+            # Fallback: find the nearest verb (e.g., VERB/AUX) to this token
+            best_verb_i: Optional[int] = None
+            for t in all_tokens:
+                if t.pos in ("VERB", "AUX"):
+                    if best_verb_i is None or abs(t.i - token.i) < abs(best_verb_i - token.i):
+                        best_verb_i = t.i
+            if best_verb_i is not None:
+                return best_verb_i

Conversation

Copilot AI commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NLU fixes

NLG fixes

Results

Root causes identified

What needs doing

Files in scope

Acceptance criteria

Comments on the Issue (you are @copilot in this section)

Uh oh!

Steake left a comment

Choose a reason for hiding this comment

Review: PR #92 — Fix 11 NLU/NLG source-divergence test failures

File-by-file assessment

Claimed result

Uh oh!

Steake commented Mar 5, 2026

Uh oh!

Copilot AI commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Steake commented Mar 5, 2026

Uh oh!

Copilot AI commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 5, 2026 •

edited

Loading