-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor/agent #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/agent #46
Conversation
| // NOT relative to this config file location | ||
| content: [ | ||
| "./internal/web/**/*.templ", | ||
| "./internal/web/**/*.go", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including ./internal/web/**/*.go in the content array may lead to performance degradation and missed class extraction, as Go files typically do not contain Tailwind classes directly. Consider restricting the content paths to files where Tailwind classes are actually used, such as .templ, .html, .js, or .ts files, to optimize build performance and ensure accurate class extraction.
| if err := session.SaveCurrentSessionID(newSess.ID); err != nil { | ||
| slog.Warn("failed to save session state", "error", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If session.SaveCurrentSessionID(newSess.ID) fails, the error is only logged as a warning, but the function still returns the new session ID as if it was successfully saved. This can lead to inconsistencies if the session ID is not persisted, potentially causing confusion or loss of session continuity in future runs.
Recommendation: Consider returning an error if saving the session ID fails, or otherwise ensure the caller is aware of the failure so it can be handled appropriately.
| ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM) | ||
| defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context created with signal.NotifyContext is canceled via defer cancel(), which will execute after runCLI returns. However, if the Bubble Tea TUI program (program.Run()) is still running or has background operations tied to the context, canceling the context at this point may cause premature shutdown of resources or incomplete cleanup.
Recommendation: Ensure that context cancellation aligns precisely with the lifecycle of the TUI program and all dependent resources. Consider whether explicit context management is needed to avoid premature cancellation.
| @@ -85,7 +95,8 @@ func findOrBuildKoopa(t *testing.T) string { | |||
|
|
|||
| // Get project root (parent of cmd/) | |||
| projectRoot, _ := filepath.Abs("..") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unchecked error from filepath.Abs
The call to filepath.Abs("..") ignores the returned error, which could result in an invalid projectRoot if path resolution fails. This may cause subsequent file operations to behave unpredictably or fail silently.
Recommended solution:
Handle the error explicitly and fail the test if the project root cannot be determined:
projectRoot, err := filepath.Abs("..")
require.NoError(t, err, "Failed to determine project root")| if len(os.Args) < 2 { | ||
| runHelp() | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When no command is provided (len(os.Args) < 2), the application prints help and returns nil. This may lead to ambiguity for automation or scripting, as the process exits with a success code despite no command being executed. Consider returning a non-nil error (e.g., fmt.Errorf("no command provided")) to signal that the invocation was incomplete.
Recommended change:
if len(os.Args) < 2 {
runHelp()
return fmt.Errorf("no command provided")
}| func ResetFlowForTesting() { | ||
| flowOnce = sync.Once{} | ||
| flow = nil | ||
| flowInitDone = false | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential Data Race in Test Reset
The ResetFlowForTesting function is not safe for concurrent use, as noted in the comment. If tests are run in parallel and this function is called concurrently, it could lead to data races and unpredictable behavior.
Recommendation:
Consider adding synchronization (e.g., a mutex) around the singleton reset logic, or ensure that tests using this function are always run serially. Alternatively, document this limitation more prominently and enforce serial test execution where this function is used.
| func TestChatAgent_RAGIntegration_EndToEnd(t *testing.T) { | ||
| framework, cleanup := SetupTest(t) | ||
| defer cleanup() | ||
|
|
||
| ctx := context.Background() | ||
|
|
||
| // Ensure RAG is enabled | ||
| require.Greater(t, framework.Config.RAGTopK, 0, "RAG must be enabled for this test") | ||
|
|
||
| // STEP 1: Index test document using DocStore | ||
| docID := uuid.New() | ||
| testContent := "The secret password is KOOPA_TEST_123. This is a unique test string." | ||
| framework.IndexDocument(t, testContent, map[string]any{ | ||
| "id": docID.String(), | ||
| "filename": "test-rag-doc.txt", | ||
| "source_type": rag.SourceTypeFile, | ||
| }) | ||
| t.Logf("Indexed document %s with test content", docID) | ||
|
|
||
| // STEP 2: Query should trigger RAG retrieval | ||
| invCtx, sessionID, branch := newInvocationContext(ctx, framework.SessionID) | ||
| resp, err := framework.Agent.ExecuteStream(invCtx, sessionID, branch, | ||
| "What is the secret password from the test document?", | ||
| false, | ||
| nil, | ||
| ) | ||
|
|
||
| require.NoError(t, err, "Query with RAG should succeed") | ||
| assert.NotEmpty(t, resp.FinalText, "Response should not be empty") | ||
|
|
||
| // STEP 3: Verify LLM response uses retrieved context | ||
| assert.Contains(t, resp.FinalText, "KOOPA_TEST_123", | ||
| "Response should contain the password from retrieved document") | ||
| t.Logf("RAG response: %s", resp.FinalText) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test Isolation Risk
The test TestChatAgent_RAGIntegration_EndToEnd (and others in this file) indexes documents but does not clear the knowledge base before or after the test. This can lead to test flakiness if documents from previous tests persist and interfere with the current test's retrieval results.
Recommendation:
Ensure that the knowledge base is cleared before each test, or use a unique, isolated test environment per test run. For example:
framework.ClearKnowledgeBase(t)Add this at the start of each test if such a method exists.
| assert.Contains(t, resp.FinalText, "KOOPA_TEST_123", | ||
| "Response should contain the password from retrieved document") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brittle Assertion on LLM Output
The assertion assert.Contains(t, resp.FinalText, "KOOPA_TEST_123", ...) assumes that the LLM will always echo the exact string from the indexed document. LLMs may paraphrase or omit details, causing false negatives even if retrieval works correctly.
Recommendation:
Consider relaxing the assertion or supplementing it with additional checks, such as verifying that the retrieved context was included in the prompt sent to the LLM, or using a more robust matching strategy.
| func TestChatAgent_StreamingContextCancellation(t *testing.T) { | ||
| framework, cleanup := SetupTest(t) | ||
| defer cleanup() | ||
|
|
||
| ctx, cancel := context.WithCancel(context.Background()) | ||
| _, sessionID, branch := newInvocationContext(context.Background(), framework.SessionID) | ||
| chunks := 0 | ||
|
|
||
| callback := func(callbackCtx context.Context, chunk *ai.ModelResponseChunk) error { | ||
| chunks++ | ||
| t.Logf("Received chunk %d", chunks) | ||
|
|
||
| // Cancel after first chunk | ||
| if chunks == 1 { | ||
| cancel() | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| resp, err := framework.Agent.ExecuteStream(ctx, sessionID, branch, | ||
| "Write a very long story", | ||
| false, | ||
| callback, | ||
| ) | ||
|
|
||
| // Should fail with context canceled error | ||
| if err != nil { | ||
| assert.Contains(t, err.Error(), "context canceled", | ||
| "Should fail with context canceled error") | ||
| t.Logf("Context cancellation detected: %v", err) | ||
| } else { | ||
| // If somehow completed despite cancellation, that's unexpected but not wrong | ||
| assert.NotNil(t, resp) | ||
| t.Logf("Completed despite cancellation (unexpected): %s", resp.FinalText) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential race condition in context cancellation test
In TestChatAgent_StreamingContextCancellation, the test cancels the context after the first chunk and expects an error. However, it does not assert that only one chunk was received. If the agent implementation is slow to respond to cancellation, more than one chunk could be processed, leading to flaky or unreliable test results.
Recommended solution:
Add an assertion after the test to verify that chunks == 1, ensuring that only one chunk was processed before cancellation:
assert.Equal(t, 1, chunks, "Should have received only one chunk before cancellation")This will make the test more robust and reliable.
| func TestChatAgent_StreamingCallbackError(t *testing.T) { | ||
| framework, cleanup := SetupTest(t) | ||
| defer cleanup() | ||
|
|
||
| ctx, sessionID, branch := newInvocationContext(context.Background(), framework.SessionID) | ||
| chunks := 0 | ||
| maxChunks := 3 | ||
|
|
||
| // Callback that errors after 3 chunks | ||
| callback := func(ctx context.Context, chunk *ai.ModelResponseChunk) error { | ||
| chunks++ | ||
| t.Logf("Received chunk %d: %d bytes", chunks, len(chunk.Text())) | ||
|
|
||
| if chunks >= maxChunks { | ||
| return errors.New("simulated streaming error after 3 chunks") | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| resp, err := framework.Agent.ExecuteStream(ctx, sessionID, branch, | ||
| "Write a long story about a space adventure", | ||
| false, | ||
| callback, | ||
| ) | ||
|
|
||
| // Stream should stop and propagate the error | ||
| require.Error(t, err, "Should propagate streaming callback error") | ||
| assert.Contains(t, err.Error(), "simulated streaming error", | ||
| "Error should contain callback error message") | ||
| assert.Equal(t, maxChunks, chunks, | ||
| "Should have received exactly maxChunks chunks before error") | ||
|
|
||
| // Response might be nil or partial depending on error handling | ||
| if resp != nil { | ||
| t.Logf("Partial response received before error: %s", resp.FinalText) | ||
| } else { | ||
| t.Log("No response returned (expected when streaming errors)") | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete verification of stream termination after callback error
In TestChatAgent_StreamingCallbackError, the test checks that the error is propagated and the number of chunks matches maxChunks. However, it does not explicitly verify that no further chunks are processed after the error. If the agent does not properly halt streaming after a callback error, this could go undetected.
Recommended solution:
After the test, add a check to ensure that no chunks are processed after the error. For example, you could set a flag in the callback when the error is returned and assert that the callback is not invoked again after that point.
- Remove internal/agent/session_id.go (SessionID was just a string wrapper - Move History type from agent to session package - Update all session-related APIs to use uuid.UUID directly - Simplify code by removing unnecessary type conversions"
fb8f56d to
e0bda0b
Compare
| DROP TABLE IF EXISTS artifact; | ||
| DROP TABLE IF EXISTS message; | ||
| DROP TABLE IF EXISTS sessions; | ||
| DROP TABLE IF EXISTS documents; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping tables may leave behind dependent objects such as indexes, views, or sequences that were created alongside these tables. This can result in orphaned objects and schema inconsistencies after rollback.
Recommended solution:
Explicitly drop any dependent objects (e.g., indexes, views, sequences) associated with these tables before or after dropping the tables. For example:
DROP INDEX IF EXISTS some_index;
DROP SEQUENCE IF EXISTS some_sequence;
DROP VIEW IF EXISTS some_view;|
|
||
| -- name: DeleteMessagesByBranch :exec | ||
| -- Delete all messages in a specific branch | ||
| DELETE FROM session_messages | ||
| DELETE FROM message | ||
| WHERE session_id = sqlc.arg(session_id) AND branch = sqlc.arg(branch); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential Data Inconsistency: DeleteMessagesByBranch Does Not Update message_count
The DeleteMessagesByBranch query (lines 98-102) removes messages from a branch but does not update the message_count field in the sessions table. This can result in the message_count becoming stale or incorrect, leading to inconsistencies in the application logic that relies on this field.
Recommendation:
After deleting messages, ensure that message_count is recalculated and updated accordingly, ideally within the same transaction. For example:
UPDATE sessions
SET message_count = (SELECT COUNT(*) FROM message WHERE session_id = sqlc.arg(session_id))
WHERE id = sqlc.arg(session_id);This will keep the session metadata consistent with the actual message data.
| callback := func(ctx context.Context, chunk *ai.ModelResponseChunk) error { | ||
| chunks++ | ||
| t.Logf("Received chunk %d: %d bytes", chunks, len(chunk.Text())) | ||
| return nil // Continue streaming | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streaming Callback Error Propagation Not Tested
In TestExecuteStream_StreamingCallback, the callback always returns nil, so the test does not verify that errors returned from the callback are properly propagated and handled by ExecuteStream. This could mask issues where streaming errors are not surfaced to the caller.
Recommended Solution:
Add a subtest where the callback returns a non-nil error and assert that ExecuteStream returns the error as expected. For example:
t.Run("callback error", func(t *testing.T) {
errCallback := func(ctx context.Context, chunk *ai.ModelResponseChunk) error {
return errors.New("streaming error")
}
resp, err := framework.Agent.ExecuteStream(ctx, sessionID, branch, "test", false, errCallback)
require.Error(t, err)
assert.Contains(t, err.Error(), "streaming error")
assert.Nil(t, resp)
})| }) | ||
| } | ||
|
|
||
| // TestChat_NilResponseDefense tests defensive check for nil responses. | ||
| // This prevents panics when execute() incorrectly returns nil without error. | ||
| func TestChat_NilResponseDefense(t *testing.T) { | ||
| t.Parallel() | ||
|
|
||
| t.Run("nil response detection", func(t *testing.T) { | ||
| t.Parallel() | ||
| // Simulate the defensive check in ExecuteStream | ||
| var resp *ai.ModelResponse = nil | ||
| if resp == nil { | ||
| err := errors.New("internal error: execute returned nil response without error") | ||
| assert.Contains(t, err.Error(), "nil response") | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| // TestChat_MaxTurnsProtection tests that conversation doesn't loop infinitely. | ||
| // Safety: Prevents runaway agent loops that could exhaust resources. | ||
| func TestChat_MaxTurnsProtection(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Max Turns Protection Test Is Superficial
The test in TestChat_MaxTurnsProtection only simulates a local turn counter and does not interact with the actual agent loop or logic. This means it does not verify that the agent itself enforces the max turns protection, which could allow runaway loops in production if the agent's logic is incorrect.
Recommendation:
Integrate this test with the real agent loop or use a mock agent to ensure that the max turns protection is enforced in practice, not just in a simulated counter.
| return Output{SessionID: input.SessionID}, fmt.Errorf("%w: %w", agent.ErrInvalidSession, err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect Error Wrapping Format
The error wrapping in fmt.Errorf("%w: %w", agent.ErrInvalidSession, err) uses two %w verbs, which is not idiomatic Go. Only one error can be wrapped using %w, and additional context should be added as a string.
Recommended Solution:
Use a single %w and add context as a string:
return Output{SessionID: input.SessionID}, fmt.Errorf("invalid session: %w", err)Or, if you want to wrap both errors, consider using errors.Join (Go 1.20+):
return Output{SessionID: input.SessionID}, fmt.Errorf("invalid session: %w", errors.Join(agent.ErrInvalidSession, err))| numConcurrentQueries := 5 | ||
| var wg sync.WaitGroup | ||
| wg.Add(numConcurrentQueries) | ||
|
|
||
| ctx, sessionID, branch := newInvocationContext(context.Background(), framework.SessionID) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential Data Race in Concurrent Test
The concurrent test uses the same ctx, sessionID, and branch for all goroutines. If the underlying chat agent or session store is not explicitly designed for concurrent access with the same session, this may lead to data races or inconsistent state.
Recommendation:
Consider creating separate sessions or branches per goroutine, or ensure that the chat agent and session store are thread-safe for concurrent access with the same session.
No description provided.