Skip to content

Fix 11 NLU/NLG source-divergence test failures#92

Merged
Steake merged 5 commits intomainfrom
copilot/fix-nlu-nlg-test-failures
Mar 5, 2026
Merged

Fix 11 NLU/NLG source-divergence test failures#92
Steake merged 5 commits intomainfrom
copilot/fix-nlu-nlg-test-failures

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 5, 2026

Source implementations diverged from test contracts across 6 files in godelOS/nlu_nlg/. All 11 failures are genuine logic bugs, not test issues.

NLU fixes

  • lexical_analyzer_parser.py / semantic_interpreter.py_is_dependent_on attempted to derive head token index by parsing the dep label string as an integer, which is nonsensical ("nsubj" is not a number). Added head_i: Optional[int] to Token (populated from token.head.i in spaCy), with a dep-label heuristic fallback when head_i is None (i.e. manually-constructed test tokens).

  • Review-thread follow-up: hoisted dependency-label heuristic sets to module-level constants and reused a single source of truth by importing the constants into semantic_interpreter.py, reducing recursive-call allocations and drift risk. Also corrected misleading inline comments in _infer_head to match actual ROOT-based behavior.

  • discourse_manager.py_determine_topic filtered on pos == "NOUN" only, missing proper nouns. Added "PROPN". Separately, turn_count was double-incremented per utterance (once in add_dialogue_act, once in _update_dialogue_state).

  • lexicon_ontology_linker.pylink_term_to_concept short-circuited with return None when no lexical entry existed, skipping the ontology concept search that handles synonyms via lexical_mappings.

NLG fixes

  • content_planner.py_determine_message_type only inspected top-level AST nodes; a ModalOpNode nested inside a ConnectiveNode was invisible. Added recursive descent through connective operands. Separately, _process_node recorded discourse relations using fabricated child IDs (f"{parent}_arg_{i}") that never matched the counter-based IDs actually assigned to child elements. Changed _process_node to return its element ID so callers use the real value.

  • Review-thread follow-up: added explicit handling for LambdaNode and DefinitionNode in _process_node, and added a fail-fast error path for unsupported node types with contextual error messaging to prevent silently inconsistent MessageSpecifications.

  • pipeline.pyself.discourse_manager.context fails on a MagicMock(spec=DiscourseStateManager) because context is an instance attribute, not a class attribute. Switched to getattr(..., 'context', None).

Results

79 passed, 5 skipped (integration), 0 failed across tests/nlu_nlg/. No regressions.

Original prompt

This section details on the original issue you should resolve

<issue_title>Fix: 11 remaining NLU/NLG source-divergence test failures (Phase 2 deferred)</issue_title>
<issue_description>## Context

PR #90 resolved 55 of the 66 remaining test failures from issue #73. The following 11 failures were deferred as they require genuine module-level source rewrites, not test patches.

Root causes identified

  • Source implementations have diverged from the interfaces the tests were written against
  • Method signatures, return types, or class structures no longer match test expectations
  • These are not environmental — spaCy skip guards were applied in Fix: remaining 66 test failures from Phase 2 (Categories 1–4) #90 for the 7 environmental failures; these 11 are genuine code divergence

What needs doing

Work through each failing test in tests/nlu_nlg/nlu/ and tests/nlu_nlg/nlg/ file by file:

  1. Read the test expectations
  2. Read the current source implementation
  3. Align the source to the test contract (preferred) OR update the test if the source represents correct intended behaviour
  4. Do not skip — these need to actually pass

Files in scope

  • tests/nlu_nlg/nlu/ — NLU parser/pipeline tests
  • tests/nlu_nlg/nlg/ — NLG generator tests
  • Corresponding source: godelOS/nlu_nlg/

Acceptance criteria

All 11 previously-failing tests pass without skipif guards.

Related: #73, #90</issue_description>

Comments on the Issue (you are @copilot in this section)

@Steake @copilot Please work through the 11 remaining NLU/NLG source-divergence test failures described in this issue. Work file by file through `tests/nlu_nlg/nlu/` and `tests/nlu_nlg/nlg/`, align source implementations to test contracts, and ensure all 11 tests pass without skip guards.

Custom agent used: Oli
Oliver Chimera Profile


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits March 5, 2026 20:05
- discourse_manager.py: include PROPN in topic detection, fix double turn_count increment
- lexical_analyzer_parser.py: add head_i field to Token, rewrite _is_dependent_on with proper head-finding
- lexicon_ontology_linker.py: search ontology concepts when no lexical entry found (synonym support)
- semantic_interpreter.py: rewrite _is_dependent_on with same head_i/heuristic approach
- content_planner.py: recursive _determine_message_type for nested modals, fix _process_node ID tracking
- pipeline.py: use getattr for discourse_manager.context access to handle mocks gracefully

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>
Co-authored-by: Steake <530040+Steake@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix 11 remaining NLU/NLG source-divergence test failures Fix 11 NLU/NLG source-divergence test failures Mar 5, 2026
Copy link
Copy Markdown
Owner

@Steake Steake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #92 — Fix 11 NLU/NLG source-divergence test failures

Verdict: Strong. Verify truncated diff in lexical_analyzer_parser.py, then merge.

File-by-file assessment

godelOS/nlu_nlg/nlu/lexical_analyzer_parser.py and semantic_interpreter.py
The original _is_dependent_on attempted int(t.dep.split('_')[0]) — parsing a dependency label string such as "nsubj" as an integer, which is not merely wrong but demonstrates a fundamental misunderstanding of what a dependency parse tree is. The replacement correctly uses token.head_i (populated from spaCy's token.head.i) with a heuristic label-based fallback for manually-constructed test tokens that have no head_i. The fallback _infer_head heuristic — classifying dep labels into verb_deps, noun_deps, prep_obj_deps sets — is sensible. One item: the patch view truncates mid-definition of the noun_deps set; confirm commit 51b0f0cd is complete in the actual file before merging.

godelOS/nlu_nlg/nlu/discourse_manager.py
Two independent fixes: (1) turn_count was incremented in both add_dialogue_act and _update_dialogue_state, producing a count of 2 per utterance where 1 was expected. The duplicate removed. (2) Topic detection filtered on pos == "NOUN" only; "PROPN" (proper noun) added. Both correct.

godelOS/nlu_nlg/nlu/lexicon_ontology_linker.py
link_term_to_concept short-circuited with return None when no lexical entry existed, skipping the fallback ontology concept search entirely. Restructured to if entry: [lexical path] followed unconditionally by the ontology fallback. Correct.

godelOS/nlu_nlg/nlg/content_planner.py
Two fixes: (1) _process_node was recording discourse relations using fabricated IDs (f"{element_id}_arg_{i}") that never matched the counter-based IDs actually assigned to child elements — the relations were silently broken. Fixed by returning the real element ID from _process_node and using it at the call site. (2) _determine_message_type did not recurse into ConnectiveNode operands, making any ModalOpNode nested inside a connective invisible to type detection. Recursive descent added.

godelOS/nlu_nlg/nlg/pipeline.py
self.discourse_manager.context fails on MagicMock(spec=DiscourseStateManager) because instance attributes set in __init__ are not part of a spec mock's attribute set. getattr(..., 'context', None) is the correct defensive form. Minimal, correct.

Claimed result

79 passed, 5 skipped (integration), 0 failed across tests/nlu_nlg/. No regressions.

Fixes #91.

@Steake
Copy link
Copy Markdown
Owner

Steake commented Mar 5, 2026

Verify truncated diff in lexical_analyzer_parser.py

@copilot Please address the comments queries and remedy any issues.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 5, 2026

Verify truncated diff in lexical_analyzer_parser.py

@copilot Please address the comments queries and remedy any issues.

Verified — the file is complete. The truncation was only in GitHub's diff view. In the actual committed file (2773ae1), both lexical_analyzer_parser.py (lines 344–346) and semantic_interpreter.py (lines 515–517) have the full noun_deps set definition:

noun_deps = {"det", "amod", "compound", "nummod", "poss", "nmod",
             "appos", "acl", "relcl", "case"}

The _infer_head static method is also fully present in both files. Re-ran the full suite just now: 79 passed, 5 skipped, 0 failed.

@Steake Steake marked this pull request as ready for review March 5, 2026 21:42
Copilot AI review requested due to automatic review settings March 5, 2026 21:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes NLU/NLG source implementations that diverged from existing unit-test contracts in godelOS/nlu_nlg/, restoring correct dependency traversal, discourse tracking, ontology linking, and NLG content planning behavior.

Changes:

  • Add Token.head_i (spaCy head index) and use it for dependency traversal with a heuristic fallback when unavailable.
  • Fix discourse topic detection for proper nouns and prevent double-incrementing of dialogue turn count.
  • Improve NLG content planning by recursively detecting nested modal operators and by using real (returned) child element IDs for discourse relations; harden pipeline access to discourse_manager.context for mocks.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
godelOS/nlu_nlg/nlu/semantic_interpreter.py Replace broken dep-label parsing with head_i-based traversal + heuristic fallback for dependency checks.
godelOS/nlu_nlg/nlu/lexicon_ontology_linker.py Allow ontology concept matching even when no lexical entry exists (synonym/lexical mapping path).
godelOS/nlu_nlg/nlu/lexical_analyzer_parser.py Add Token.head_i populated from spaCy and update dependency traversal accordingly.
godelOS/nlu_nlg/nlu/discourse_manager.py Fix topic detection to include PROPN and remove extra dialogue turn increment.
godelOS/nlu_nlg/nlg/pipeline.py Avoid direct attribute access to context to support mocked discourse managers.
godelOS/nlu_nlg/nlg/content_planner.py Recursively detect nested modal operators; fix discourse relation child IDs by returning actual element IDs.

verb_deps: set, noun_deps: set, prep_obj_deps: set) -> Optional[int]:
"""Infer the head token index from dependency label heuristics."""
if token.dep in verb_deps:
# Look for the ROOT token or nearest verb
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _infer_head inline comment says “Look for the ROOT token or nearest verb”, but the code only checks for a ROOT token. Please update the comment (or implement the nearest-verb behavior) so the heuristic is accurately documented.

Suggested change
# Look for the ROOT token or nearest verb
# Look for the ROOT token

Copilot uses AI. Check for mistakes.
agent_id = self._process_node(node.agent_or_world, message_spec)
message_spec.add_discourse_relation("agent", element_id, agent_id)

return element_id
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_process_node now documents that it returns “the ID of the created content element” and always returns element_id, but there’s no else/error path for unsupported node types (e.g., LambdaNode, DefinitionNode are imported). If an unsupported node is encountered, callers will record discourse relations pointing to an element ID that was never added to main_content/supporting_content. Consider explicitly handling the remaining node types or raising/logging on unsupported nodes to avoid silently producing inconsistent MessageSpecifications.

Copilot uses AI. Check for mistakes.
Comment on lines +522 to +526
# Find the head using heuristic
inferred_head_i = self._infer_head(token, all_tokens,
verb_deps, noun_deps, prep_obj_deps)
if inferred_head_i is not None:
if inferred_head_i == head_token.i:
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _is_dependent_on fallback heuristic and _infer_head logic appears duplicated between lexical_analyzer_parser.py and semantic_interpreter.py. Consider extracting the shared head-inference helper (and the dep-label sets) into a single utility to avoid the two copies drifting over time.

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +520
# Deps that attach to a verb (typically the ROOT)
verb_deps = {
"nsubj", "nsubjpass", "dobj", "iobj", "prep", "advmod", "punct",
"cc", "conj", "aux", "auxpass", "neg", "ccomp", "xcomp", "advcl",
"attr", "agent", "oprd", "acomp", "prt", "mark", "expl",
"npadvmod", "intj", "csubj", "csubjpass", "dative",
}

# Deps that attach to a noun
noun_deps = {"det", "amod", "compound", "nummod", "poss", "nmod",
"appos", "acl", "relcl", "case"}

# Deps that attach to a preposition
prep_obj_deps = {"pobj"}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _is_dependent_on, verb_deps/noun_deps/prep_obj_deps are re-created on every call. Since this method can be called for every token (and recursively), consider making these sets class-level constants (or module-level) to avoid repeated allocations and make the heuristic easier to keep consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +540 to +542
for t in all_tokens:
if t.dep == "ROOT":
return t.i
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in _infer_head says “Look for the ROOT token or nearest verb”, but the implementation only looks for dep == "ROOT" and never attempts to find a nearest verb. Either adjust the comment to match the behavior or implement the “nearest verb” fallback to avoid misleading future maintainers.

Suggested change
for t in all_tokens:
if t.dep == "ROOT":
return t.i
root_i: Optional[int] = None
for t in all_tokens:
if t.dep == "ROOT":
root_i = t.i
break
if root_i is not None:
return root_i
# Fallback: find the nearest verb (e.g., VERB/AUX) to this token
best_verb_i: Optional[int] = None
for t in all_tokens:
if t.pos in ("VERB", "AUX"):
if best_verb_i is None or abs(t.i - token.i) < abs(best_verb_i - token.i):
best_verb_i = t.i
if best_verb_i is not None:
return best_verb_i

Copilot uses AI. Check for mistakes.
Comment on lines +336 to +350
# Deps that attach to a verb (typically the ROOT)
verb_deps = {
"nsubj", "nsubjpass", "dobj", "iobj", "prep", "advmod", "punct",
"cc", "conj", "aux", "auxpass", "neg", "ccomp", "xcomp", "advcl",
"attr", "agent", "oprd", "acomp", "prt", "mark", "expl",
"npadvmod", "intj", "csubj", "csubjpass", "dative",
}

# Deps that attach to a noun
noun_deps = {"det", "amod", "compound", "nummod", "poss", "nmod",
"appos", "acl", "relcl", "case"}

return False No newline at end of file
# Deps that attach to a preposition
prep_obj_deps = {"pobj"}

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verb_deps/noun_deps/prep_obj_deps are allocated inside _is_dependent_on each time it’s called (and it can recurse). Consider hoisting these to class/module constants to reduce per-token overhead and keep the heuristic definitions centralized.

Copilot uses AI. Check for mistakes.
@Steake
Copy link
Copy Markdown
Owner

Steake commented Mar 5, 2026

@copilot apply changes based on the comments in this thread

Copilot AI and others added 2 commits March 5, 2026 22:06
Co-authored-by: Steake <530040+Steake@users.noreply.github.com>
…andling

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 5, 2026

@copilot apply changes based on the comments in this thread

Applied the review-thread follow-ups in commits d94e0b3 and 1e7d394:

  • hoisted dependency-label heuristic sets out of recursive calls
  • corrected the misleading ROOT/nearest-verb comment
  • added explicit LambdaNode/DefinitionNode handling in content planning and fail-fast for unsupported node types
  • centralized semantic interpreter heuristic constants by importing from lexical analyzer/parser

Validation: python -m pytest tests/nlu_nlg/ -v --no-cov → 79 passed, 5 skipped, 0 failed.

@Steake Steake merged commit a1bd9eb into main Mar 5, 2026
Steake pushed a commit that referenced this pull request Mar 5, 2026
Resolves all tractable failures from issue #73.

Cat 1 — query_replay_harness: async methods, unified data kwarg, complete_recording rename, public attr aliases, ProcessingStep enum extension.
Cat 2 — knowledge_models: 'manual' added to ImportSource Literal.
Cat 3 — test_frontend_modules: godelos-frontend → svelte-frontend.
Cat 4 — env skip guards for Z3 (SMT) removed for nlu_nlg after #92 fixed source;
Z3 skip on test_smt_interface_enhanced retained (infrastructure absent).

Also carries godelOS/core_kr, inference_engine, learning_system, symbol_grounding fixes from the #74 corpus.

11 NLU/NLG source-divergence failures resolved separately in #92 (merged prior).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix: 11 remaining NLU/NLG source-divergence test failures (Phase 2 deferred)

3 participants