Add LlmIntentClassifier and chat-to-proposal integration tests#580
Add LlmIntentClassifier and chat-to-proposal integration tests#580Chris0Jeky merged 4 commits intomainfrom
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Self-Review FindingsResilience to classifier improvements
Mock accuracy
Test quality
Overlap with existing tests
No issues found requiring changesThe diff is clean, all 1609 tests pass, and no tests assert aspirational behavior. |
There was a problem hiding this comment.
Code Review
This pull request adds integration tests for the Chat-to-Proposal flow, covering successful parsing, classifier misses, and error handling for invalid instructions. It also introduces edge-case tests for the LLM intent classifier, addressing long strings, special characters, and null inputs. Review feedback suggests enhancing the classifier's robustness by handling null inputs gracefully and improving test specificity by verifying exact call arguments in mock setups.
| [Fact] | ||
| public void Classify_NullInput_ThrowsNullReferenceException() | ||
| { | ||
| // The classifier calls message.ToLowerInvariant() without a null guard. | ||
| // This documents that null input is not handled gracefully. | ||
| var act = () => LlmIntentClassifier.Classify(null!); | ||
|
|
||
| act.Should().Throw<NullReferenceException>(); | ||
| } |
There was a problem hiding this comment.
While it's good to document current behavior, a public static method like Classify should ideally be more robust and not throw a NullReferenceException on null input. It would be better to handle null (and whitespace) gracefully by returning (false, null). I recommend updating LlmIntentClassifier.Classify to handle this and changing this test to assert the graceful handling instead of the exception. This prevents potential unhandled exceptions in the application.
[Fact]
public void Classify_NullInput_ReturnsNotActionable()
{
// A null guard should be in place for public methods.
var (isActionable, actionIntent) = LlmIntentClassifier.Classify(null!);
isActionable.Should().BeFalse();
actionIntent.Should().BeNull();
}| p => p.ParseInstructionAsync( | ||
| It.IsAny<string>(), userId, boardId, | ||
| It.IsAny<CancellationToken>(), ProposalSourceType.Chat, | ||
| session.Id.ToString(), It.IsAny<string?>()), | ||
| Times.Once); |
There was a problem hiding this comment.
For a more robust test, it's better to verify that ParseInstructionAsync was called with the exact message content. Using It.IsAny<string>() for the instruction makes the test less specific and could potentially mask issues if the wrong content is passed to the planner.
p => p.ParseInstructionAsync(
"please create some tasks for the deployment checklist", userId, boardId,
It.IsAny<CancellationToken>(), ProposalSourceType.Chat,
session.Id.ToString(), It.IsAny<string?>()),
Times.Once);|
|
||
| result.IsSuccess.Should().BeTrue(); | ||
| result.Value.MessageType.Should().Be("status"); | ||
| result.Value.Content.Should().Contain("detected a task request but could not parse it"); |
There was a problem hiding this comment.
To make this test more robust, consider adding a verification step to ensure ParseInstructionAsync was called on the planner mock with the expected input. This explicitly confirms that the flow reached the planner as intended before failing.
result.Value.Content.Should().Contain("detected a task request but could not parse it");
_plannerMock.Verify(
p => p.ParseInstructionAsync(
"create card for testing without quotes", userId, boardId,
It.IsAny<CancellationToken>(), ProposalSourceType.Chat,
session.Id.ToString(), It.IsAny<string?>()),
Times.Once);
Adversarial Review — PR #580CriticalNone found. Major1. Two new tests are near-exact duplicates of existing tests (false coverage inflation)
Recommendation: Remove these two duplicate tests, or consolidate them with the originals. Adding a 2. This test covers the same path as Minor3. This test documents that 4. No test for empty-string input to The existing 5. Both tests: set Nits6. Inconsistent edge-case completeness for The edge case region tests very long strings (50K chars) but misses a boundary value: a string of exactly 7. Test region naming ambiguity The new 8. The name says "StillMatches" but doesn't specify what intent it matches. Overall AssessmentPass with fixes. The LlmIntentClassifier edge-case tests (null, long strings, whitespace, special chars, newlines) are genuinely valuable and well-constructed. However, 3 of the 4 ChatService flow tests duplicate existing tests covering the same code paths. This adds maintenance cost without improving coverage. The duplicates should be removed or the overlapping existing tests should be enhanced instead. Summary of recommended changes:
|
- Remove 4 ChatServiceTests that duplicated existing tests covering identical code paths (structured-syntax success, classifier miss, explicit RequestProposal failure, actionable-but-parser-fails) - Change Classify_NullInput test to assert base Exception instead of NullReferenceException so it survives addition of a null guard
Follow-up: Fixes AppliedPushed commit Changes made:
Test results:All 1605 tests pass (0 failures). Test count reduced from 1609 by removing the 4 duplicates — no coverage lost since the same code paths are exercised by the existing tests. Remaining items (Minor/Nit, not blocking):
|
|
Addressed Gemini review feedback: added a |
Update two analysis docs (chat-to-proposal gap and manual testing findings) to reflect recent fixes and testing status. Key changes: add Last Updated and status notes; mark Tier 1 improvements shipped (intent classifier regex/stemming/negation fixes, substring ordering bug, PR #579), UX parse hints shipped (PR #582), unit/integration tests shipped (PR #580), and note PR range #578–#582. In manual testing findings mark OBS-2/OBS-3 resolved (PR #581) and BUG-M5 resolved (PR #578), update resolutions and remove duplicate checklist items. Minor editorial clarifications and test counts added.
Summary
LlmIntentClassifier: null input (documents NullReferenceException), very long strings, whitespace-only input, special characters, and pattern matching within strings containing special characters/newlinesChatServiceTests: structured syntax classifier hit with parser success, natural language classifier miss with no planner call, explicitRequestProposalwith parser failure (graceful error), and actionable classification with parser failure (hint shown)Test plan