Add LlmIntentClassifier and chat-to-proposal integration tests by Chris0Jeky · Pull Request #580 · Chris0Jeky/Taskdeck

Chris0Jeky · 2026-03-29T21:45:12Z

Summary

Fixes Add LlmIntentClassifier and chat-to-proposal integration tests #577
Add edge case tests for LlmIntentClassifier: null input (documents NullReferenceException), very long strings, whitespace-only input, special characters, and pattern matching within strings containing special characters/newlines
Add chat-to-proposal flow integration tests in ChatServiceTests: structured syntax classifier hit with parser success, natural language classifier miss with no planner call, explicit RequestProposal with parser failure (graceful error), and actionable classification with parser failure (hint shown)

Test plan

All new tests pass with current codebase (1609 total, 0 failures)
LlmIntentClassifier tests cover all current patterns (existing) plus edge cases (new)
Known gap cases documented as tests asserting current (limited) behavior (existing)
ChatService proposal flow tested end-to-end with mocks (new + existing)
Edge cases covered: null, empty, whitespace, special chars, very long strings

…ecial character inputs

…eline

chatgpt-codex-connector · 2026-03-29T21:45:18Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Chris0Jeky · 2026-03-29T21:45:36Z

Self-Review Findings

Resilience to classifier improvements

All tests assert current behavior (not desired behavior). Tests documenting known gaps assert isActionable.Should().BeFalse() with explanatory because messages.
If the classifier is improved to detect natural language, the existing "known gap" tests in the pre-existing region will need updating, but the new edge case tests (null, long strings, special chars) will remain stable.

Mock accuracy

ChatService tests mock ILlmProvider.CompleteAsync to return specific IsActionable values, accurately simulating the real flow where the mock provider's IsActionable flag drives proposal creation.
The planner mock correctly returns Result.Failure with ErrorCodes.ValidationError to simulate parse failures, matching real behavior.

Test quality

No flaky patterns detected: all tests are deterministic with no time-dependent assertions or external dependencies.
Classify_NullInput_ThrowsNullReferenceException documents a real gap (no null guard) without being prescriptive about fixing it.
Test names clearly describe the scenario and expected outcome.

Overlap with existing tests

SendMessageAsync_StructuredSyntax_ClassifierHit_ParserSuccess_ProposalCreated overlaps somewhat with the existing SendMessageAsync_ShouldAutoCreateProposal_WhenActionableIntentDetected_WithoutExplicitRequestProposal, but adds an explicit Verify on the exact message being passed to the planner and has a clearer name for documenting the pipeline.
SendMessageAsync_ActionableClassification_ParserFails_ShowsParseHint overlaps with SendMessageAsync_ShouldReturnStatusWithParseHint_WhenActionableButPlannerFails — these test the same code path. The new test provides slightly different framing (Add LlmIntentClassifier and chat-to-proposal integration tests #577 context) but is duplicative. Not blocking since the test names are distinct and provide different documentation angles.

No issues found requiring changes

The diff is clean, all 1609 tests pass, and no tests assert aspirational behavior.

gemini-code-assist

Code Review

This pull request adds integration tests for the Chat-to-Proposal flow, covering successful parsing, classifier misses, and error handling for invalid instructions. It also introduces edge-case tests for the LLM intent classifier, addressing long strings, special characters, and null inputs. Review feedback suggests enhancing the classifier's robustness by handling null inputs gracefully and improving test specificity by verifying exact call arguments in mock setups.

gemini-code-assist · 2026-03-29T21:46:40Z

backend/tests/Taskdeck.Application.Tests/Services/LlmIntentClassifierTests.cs

+    [Fact]
+    public void Classify_NullInput_ThrowsNullReferenceException()
+    {
+        // The classifier calls message.ToLowerInvariant() without a null guard.
+        // This documents that null input is not handled gracefully.
+        var act = () => LlmIntentClassifier.Classify(null!);
+
+        act.Should().Throw<NullReferenceException>();
+    }


While it's good to document current behavior, a public static method like Classify should ideally be more robust and not throw a NullReferenceException on null input. It would be better to handle null (and whitespace) gracefully by returning (false, null). I recommend updating LlmIntentClassifier.Classify to handle this and changing this test to assert the graceful handling instead of the exception. This prevents potential unhandled exceptions in the application.

[Fact] public void Classify_NullInput_ReturnsNotActionable() { // A null guard should be in place for public methods. var (isActionable, actionIntent) = LlmIntentClassifier.Classify(null!); isActionable.Should().BeFalse(); actionIntent.Should().BeNull(); }

gemini-code-assist · 2026-03-29T21:46:41Z

backend/tests/Taskdeck.Application.Tests/Services/ChatServiceTests.cs

+            p => p.ParseInstructionAsync(
+                It.IsAny<string>(), userId, boardId,
+                It.IsAny<CancellationToken>(), ProposalSourceType.Chat,
+                session.Id.ToString(), It.IsAny<string?>()),
+            Times.Once);


For a more robust test, it's better to verify that ParseInstructionAsync was called with the exact message content. Using It.IsAny<string>() for the instruction makes the test less specific and could potentially mask issues if the wrong content is passed to the planner.

p => p.ParseInstructionAsync( "please create some tasks for the deployment checklist", userId, boardId, It.IsAny<CancellationToken>(), ProposalSourceType.Chat, session.Id.ToString(), It.IsAny<string?>()), Times.Once);

gemini-code-assist · 2026-03-29T21:46:41Z

backend/tests/Taskdeck.Application.Tests/Services/ChatServiceTests.cs

+
+        result.IsSuccess.Should().BeTrue();
+        result.Value.MessageType.Should().Be("status");
+        result.Value.Content.Should().Contain("detected a task request but could not parse it");


To make this test more robust, consider adding a verification step to ensure ParseInstructionAsync was called on the planner mock with the expected input. This explicitly confirms that the flow reached the planner as intended before failing.

result.Value.Content.Should().Contain("detected a task request but could not parse it"); _plannerMock.Verify( p => p.ParseInstructionAsync( "create card for testing without quotes", userId, boardId, It.IsAny<CancellationToken>(), ProposalSourceType.Chat, session.Id.ToString(), It.IsAny<string?>()), Times.Once);

Chris0Jeky · 2026-03-29T22:00:22Z

Adversarial Review — PR #580

Critical

None found.

Major

1. Two new tests are near-exact duplicates of existing tests (false coverage inflation)

SendMessageAsync_StructuredSyntax_ClassifierHit_ParserSuccess_ProposalCreated duplicates SendMessageAsync_ShouldAutoCreateProposal_WhenActionableIntentDetected_WithoutExplicitRequestProposal (lines 182-228). Both test the same code path: LLM returns IsActionable=true, planner succeeds, result is proposal-reference. The only difference is the new test adds a Verify call on the exact message string — but the existing test already covers the behavior. The added Verify is also fragile: it asserts the exact user message string is forwarded to the planner, which will break if the message routing logic ever transforms the input.
SendMessageAsync_ActionableClassification_ParserFails_ShowsParseHint duplicates SendMessageAsync_ShouldReturnStatusWithParseHint_WhenActionableButPlannerFails (lines 279-307). Both test the identical code path: LLM returns IsActionable=true, planner fails, result contains "detected a task request but could not parse it". The self-review acknowledges this overlap but dismisses it as "not blocking." In a test suite of 1600+ tests, duplicate tests increase maintenance burden for zero additional safety.

Recommendation: Remove these two duplicate tests, or consolidate them with the originals. Adding a Verify call to the existing test would capture the only new assertion without the duplication.

2. SendMessageAsync_NaturalLanguage_ClassifierMiss_NoPlannerCall is also largely duplicative

This test covers the same path as SendMessageAsync_NaturalLanguage_WithoutRequestProposal_NoProposalAttempt (lines 813-843 in the existing NLP gap region). Both set IsActionable=false, no RequestProposal, and verify ParseInstructionAsync is never called. The existing test even uses the same Times.Never verification. The new test uses a different message string but tests no new code path.

Minor

3. Classify_NullInput_ThrowsNullReferenceException asserts on an implementation detail (NRE), not a contract

This test documents that null throws NullReferenceException. If someone later adds a null guard (returning (false, null) or throwing ArgumentNullException), this test will break. A more resilient approach would be to assert that the method throws any exception on null, or better, to assert the desired behavior with a comment noting the current gap. As written, it locks in a bug as the expected contract.

4. No test for empty-string input to Classify

The existing Classify_NonActionable_ShouldReturnFalse already covers "" (line 133), so the edge case region does not need it — but the edge case region's XML summary claims to cover "input extremes" without acknowledging that empty string is already tested elsewhere. This could mislead future readers into thinking it was overlooked.

5. SendMessageAsync_ExplicitRequestProposal_NaturalLanguage_ParserFailsGracefully is largely duplicative of the existing SendMessageAsync_NaturalLanguage_WithRequestProposal_ShowsParseError

Both tests: set RequestProposal: true, set IsActionable: false, mock planner to return Failure, and assert MessageType == "status" with content containing "Could not create the requested proposal". Same code path (lines 252-255 in ChatService.cs). The only variation is the user message string.

Nits

6. Inconsistent edge-case completeness for Classify

The edge case region tests very long strings (50K chars) but misses a boundary value: a string of exactly MaxPromptLength (4000 chars). While the classifier itself has no length limit, the ChatService enforces one upstream, so this is cosmetic.

7. Test region naming ambiguity

The new #region Chat-to-Proposal Flow — Classifier → Parser Integration (#577) is placed before the existing #region NLP Gap Tests — Documents #570. Given the significant overlap between the two regions (both test classifier-to-planner flow), a reader may not immediately understand why they are separate.

8. Classify_VeryLongStringContainingPattern_StillMatches is useful but has a minor naming issue

The name says "StillMatches" but doesn't specify what intent it matches. Classify_VeryLongStringContainingCreateCard_MatchesCardCreate would be clearer.

Overall Assessment

Pass with fixes. The LlmIntentClassifier edge-case tests (null, long strings, whitespace, special chars, newlines) are genuinely valuable and well-constructed. However, 3 of the 4 ChatService flow tests duplicate existing tests covering the same code paths. This adds maintenance cost without improving coverage. The duplicates should be removed or the overlapping existing tests should be enhanced instead.

Summary of recommended changes:

Remove SendMessageAsync_StructuredSyntax_ClassifierHit_ParserSuccess_ProposalCreated — add the Verify to the existing test if desired.
Remove SendMessageAsync_ActionableClassification_ParserFails_ShowsParseHint — identical path to existing test.
Remove SendMessageAsync_NaturalLanguage_ClassifierMiss_NoPlannerCall — identical path to existing NLP gap test.
Keep SendMessageAsync_ExplicitRequestProposal_NaturalLanguage_ParserFailsGracefully only if the framing under Add LlmIntentClassifier and chat-to-proposal integration tests #577 adds distinct documentary value vs the existing Chat-to-proposal NLP gap: natural language fails to produce proposals #570 test. Otherwise remove.
Consider changing the null-input test to assert ThrowsException<Exception>() instead of the specific NullReferenceException type.

- Remove 4 ChatServiceTests that duplicated existing tests covering identical code paths (structured-syntax success, classifier miss, explicit RequestProposal failure, actionable-but-parser-fails) - Change Classify_NullInput test to assert base Exception instead of NullReferenceException so it survives addition of a null guard

Chris0Jeky · 2026-03-29T22:05:21Z

Follow-up: Fixes Applied

Pushed commit 4c5fdb5f addressing the Major findings from the adversarial review.

Changes made:

Removed 4 duplicate ChatService flow tests (176 lines deleted):
- SendMessageAsync_StructuredSyntax_ClassifierHit_ParserSuccess_ProposalCreated — duplicated SendMessageAsync_ShouldAutoCreateProposal_WhenActionableIntentDetected_WithoutExplicitRequestProposal
- SendMessageAsync_NaturalLanguage_ClassifierMiss_NoPlannerCall — duplicated SendMessageAsync_NaturalLanguage_WithoutRequestProposal_NoProposalAttempt
- SendMessageAsync_ExplicitRequestProposal_NaturalLanguage_ParserFailsGracefully — duplicated SendMessageAsync_NaturalLanguage_WithRequestProposal_ShowsParseError
- SendMessageAsync_ActionableClassification_ParserFails_ShowsParseHint — duplicated SendMessageAsync_ShouldReturnStatusWithParseHint_WhenActionableButPlannerFails
Hardened null-input test: Changed Classify_NullInput_ThrowsNullReferenceException to Classify_NullInput_Throws, asserting base Exception instead of NullReferenceException. This way the test survives if someone later adds an ArgumentNullException null guard.

Test results:

All 1605 tests pass (0 failures). Test count reduced from 1609 by removing the 4 duplicates — no coverage lost since the same code paths are exercised by the existing tests.

Remaining items (Minor/Nit, not blocking):

Edge case region could note that empty-string is already covered in the Non-Actionable region
Classify_VeryLongStringContainingPattern_StillMatches name could be more specific about the matched intent

Chris0Jeky · 2026-03-29T22:17:45Z

Addressed Gemini review feedback: added a string.IsNullOrWhiteSpace guard at the top of LlmIntentClassifier.Classify() so null/whitespace input returns (false, null) instead of throwing. Updated the corresponding test to assert the new graceful behavior. All 1,605 backend tests pass.

Update two analysis docs (chat-to-proposal gap and manual testing findings) to reflect recent fixes and testing status. Key changes: add Last Updated and status notes; mark Tier 1 improvements shipped (intent classifier regex/stemming/negation fixes, substring ordering bug, PR #579), UX parse hints shipped (PR #582), unit/integration tests shipped (PR #580), and note PR range #578–#582. In manual testing findings mark OBS-2/OBS-3 resolved (PR #581) and BUG-M5 resolved (PR #578), update resolutions and remove duplicate checklist items. Minor editorial clarifications and test counts added.

Chris0Jeky added 2 commits March 29, 2026 22:44

Add edge case tests for LlmIntentClassifier null, long string, and sp…

ec11452

…ecial character inputs

Add chat-to-proposal integration tests covering classifier-parser pip…

ab59b76

…eline

github-project-automation bot added this to Taskdeck Execution Mar 29, 2026

github-project-automation bot moved this to Pending in Taskdeck Execution Mar 29, 2026

gemini-code-assist bot reviewed Mar 29, 2026

View reviewed changes

Add null guard to LlmIntentClassifier.Classify

ca74a65

Chris0Jeky merged commit b948742 into main Mar 29, 2026
18 checks passed

Chris0Jeky deleted the test/577-intent-classifier-tests branch March 29, 2026 22:22

github-project-automation bot moved this from Pending to Done in Taskdeck Execution Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LlmIntentClassifier and chat-to-proposal integration tests#580

Add LlmIntentClassifier and chat-to-proposal integration tests#580
Chris0Jeky merged 4 commits intomainfrom
test/577-intent-classifier-tests

Chris0Jeky commented Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 29, 2026

Uh oh!

Chris0Jeky commented Mar 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 29, 2026

Uh oh!

gemini-code-assist bot Mar 29, 2026

Uh oh!

gemini-code-assist bot Mar 29, 2026

Uh oh!

Chris0Jeky commented Mar 29, 2026

Uh oh!

Chris0Jeky commented Mar 29, 2026

Uh oh!

Chris0Jeky commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Chris0Jeky commented Mar 29, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector bot commented Mar 29, 2026

Uh oh!

Chris0Jeky commented Mar 29, 2026

Self-Review Findings

Resilience to classifier improvements

Mock accuracy

Test quality

Overlap with existing tests

No issues found requiring changes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Chris0Jeky commented Mar 29, 2026

Adversarial Review — PR #580

Critical

Major

Minor

Nits

Overall Assessment

Uh oh!

Chris0Jeky commented Mar 29, 2026

Follow-up: Fixes Applied

Changes made:

Test results:

Remaining items (Minor/Nit, not blocking):

Uh oh!

Chris0Jeky commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant