Conversation
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
…nto this workspace with R...' (#13) from agent/upgrade-this-package-to-dappco-re-go-cor into dev
Co-Authored-By: Virgil <virgil@lethean.io>
…g/*.go. Document every ex...' (#14) from agent/upgrade-this-package-to-dappco-re-go-cor into dev
…er metadata Brings mini's feat/mini-rag-rfc work onto dev with dev's AX-compliant style. - vectorstore.go + qdrant.go: Vector type + QdrantClient.Add for vector ingestion - vectorstore.go: Search accepts variadic filter maps (merged inside impl) - qdrant.go: SearchResult exposes GetText/GetScore/GetSource/GetChunkIndex - query.go: rankedResult interface + generic Rank + rankKey dedupe using core.Sprintf - helpers.go: textResult interface + generic JoinResults using core.Trim + core.Join - ollama.go: EmbedBatch uses single Ollama batch request instead of per-text loop - ingest.go + chunk.go: .pdf ingestion via github.com/ledongthuc/pdf with plaintext fallback when parse fails with "not a PDF" / "missing %%EOF" - CollectionInfo: Count/Vectors/Index RFC fields populated from Qdrant config - mock_test.go: variadic Search signature with nil-when-empty filter semantics Verified: GOWORK=off go build ./... + go test ./... passes. Co-Authored-By: Virgil <virgil@lethean.io>
Fill three real RFC gaps beyond the existing vector pipeline:
- ChunkBySentences / ChunkBySentencesSeq (RFC §2.2): public
sentence-aligned chunker honouring Size and word-boundary Overlap.
- ChunkByParagraphs / ChunkByParagraphsSeq (RFC §2.3): public
paragraph-aligned chunker with sub-paragraph sentence splits for
oversized paragraphs.
- NewKeywordIndex + KeywordIndex.Search + KeywordResult (RFC §8):
TF-IDF fallback keyword search for when Qdrant / Ollama are
unavailable. Smoothed IDF, 3-char token floor, punctuation and
case normalisation. KeywordResult satisfies textResult /
rankedResult so it composes with Rank and JoinResults.
Tests follow TestFilename_Function_{Good,Bad,Ugly} across both files
with 25 new sub-cases covering size fallbacks, empty input, negative
topK, nil receivers, punctuation handling and rare-term ranking.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Removes github.com/stretchr/testify (+ davecgh/go-spew, pmezard/go-difflib indirect) from go.mod; rewrites assert/require calls in *_test.go to stdlib t.Fatalf patterns. Adds test_helpers_test.go for shared assertion helpers. go mod tidy + go vet + go test all clean. Closes tasks.lthn.sh/view.php?id=772 Co-authored-by: Codex <noreply@openai.com> Via-codex-lane: Cladius-solo dispatch (Mac codex CLI)
Renamed module from forge.lthn.ai/core/go-rag → dappco.re/go/rag per RFC, aligning with other graduated Go repos. Updated go.mod and all cmd/rag/*.go imports. Verification: GOWORK=off go build ./... passes; no stale path left in any *.go file. Closes tasks.lthn.sh/view.php?id=108 Co-authored-by: Codex <noreply@openai.com>
Added `// Note:` trailers to 5 direct deps:
- forge.lthn.ai/core/cli: CLI command framework for rag ingest/query
- forge.lthn.ai/core/go-i18n: localized CLI labels
- github.com/ledongthuc/pdf: PDF text extraction for chunking pipeline
- github.com/ollama/ollama: embeddings client backing Embedder
- github.com/qdrant/go-client: vector database client backing VectorStore
Note: forge.lthn.ai/core/cli + core/go-i18n paths are pre-existing
stale-path consumers — separate migration concern blocked on upstream
repos publishing at dappco.re/go/{cli,i18n}.
Closes tasks.lthn.sh/view.php?id=771
Co-authored-by: Codex <noreply@openai.com>
Added tests/cli/rag/Taskfile.yaml with default task invoking rag binary and tests/cli/rag/main.go mounting cmd/rag command package into a minimal test driver. `task --list` + `task` exec both pass. Closes tasks.lthn.sh/view.php?id=773 Co-authored-by: Codex <noreply@openai.com>
- Bump dappco.re/go/* deps to v0.8.0-alpha.1, normalise any forge.lthn.ai/core/* paths to dappco.re/go/* canonical form - Add tests/cli/rag/Taskfile.yaml AX-10 scaffold (build/vet/test under default deps), per RFC-CORE-008-AGENT-EXPERIENCE.md §10 Note: feat/mini-rag-rfc branch on forge has separate work; this commit lands on dev only. Co-Authored-By: Athena <athena@lthn.ai>
chunk.go: removed strings, replaced with core.Split/Join/Replace + local trimLeftSpace helper for left-only whitespace trim. endpoint.go: removed strings, replaced with core.Contains + core.Concat. Co-authored-by: Codex <noreply@openai.com>
Removed unused parseHostPort helper (also removed banned strings import). Added AX-6 annotations on retained context, net/http, net/url, strconv, time. Co-authored-by: Codex <noreply@openai.com>
# Conflicts: # cmd/rag/cmd_collections.go # cmd/rag/cmd_commands.go # cmd/rag/cmd_ingest.go # cmd/rag/cmd_query.go # cmd/rag/cmd_rag.go # go.mod # go.sum # helpers.go # ingest.go # ollama.go # qdrant.go # query.go
…is /rag Reverting the github merge that committed module name dappco.re/go/core/rag. Downstream consumers (core/agent, core/ide, core/mcp) declare the dependency as dappco.re/go/rag (no /core/), and homelab/dev has the matching migration in commits 80e2d13 + 14c8e34 + dedb668 + 6994863 + 69abd4c. This revert clears the directional mistake; next merge will pull homelab's canonical migration first, then re-apply github's substantive changes onto the /rag scheme. This reverts commit ad4d48c. Co-authored-by: Hephaestus <hephaestus@cladius>
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (24)
📝 WalkthroughWalkthroughThis pull request enhances the RAG system with sentence- and paragraph-based chunking, adds TF-IDF keyword search fallback, implements PDF ingestion support, introduces batched embedding, improves query result ranking, and consolidates utility imports to Changes
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
docs/index.md (1)
79-79:⚠️ Potential issue | 🟡 MinorDocumentation lists testify but it has been removed from dependencies.
The dependencies table still lists
github.com/stretchr/testifyas a test-only dependency, but thego.modchanges show testify has been removed from the project. This documentation is now stale.📝 Proposed fix
-| `github.com/stretchr/testify` | Test assertions (test-only) | +| `github.com/ledongthuc/pdf` | PDF text extraction for ingestion pipeline |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/index.md` at line 79, The dependencies table in docs/index.md still lists `github.com/stretchr/testify` but it was removed from go.mod; update the docs by removing the table row that references `github.com/stretchr/testify` (the line containing "`github.com/stretchr/testify` | Test assertions (test-only)") or replace it with the current test dependency if applicable so the documentation matches the repository dependencies.docs/architecture.md (1)
149-156:⚠️ Potential issue | 🟡 MinorUpdate the ingestion step to match the batched embedding path.
Step 5 still says ingestion embeds each chunk individually, but this PR adds batch/Ollama embedding support. Please describe the actual batching/concurrency behaviour here so the architecture doc does not drift from the implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/architecture.md` around lines 149 - 156, The doc text for Ingest(...) is out of date: instead of embedding each chunk individually, the implementation batches chunks and sends them to the embedder in batches (controlled by BatchSize) and may embed batches concurrently; update step 5 to state that after ChunkMarkdown is called you accumulate chunks (across files as implemented), group them into batches of up to BatchSize, call the embedder's batch embed API (preserving chunk order so returned embeddings map back to chunks), construct Point structs from the batch of embeddings, and then send those Points to the batch upsert path (which also operates in BatchSize slices); mention any concurrency/workers used by the implementation when embedding batches and that embedding and upsert are done per-batch rather than per-chunk.cmd/rag/cmd_ingest.go (1)
37-74:⚠️ Potential issue | 🟠 MajorValidate local chunk flags before touching Qdrant or Ollama.
--chunk-size/--chunk-overlapare only rejected after both clients are created and the model check has run, so a bad local flag can be masked by unrelated connectivity failures and still burn external calls.Suggested change
ctx := context.Background() out := cmd.OutOrStdout() + // Validate local config first. + if chunkSize <= 0 { + return core.E("rag.cmd.ingest", "chunk-size must be > 0", nil) + } + if chunkOverlap < 0 || chunkOverlap >= chunkSize { + return core.E("rag.cmd.ingest", "chunk-overlap must be >= 0 and < chunk-size", nil) + } + // Connect to Qdrant core.Print(out, "Connecting to Qdrant at %s:%d...", qdrantHost, qdrantPort) @@ - // Configure ingestion - if chunkSize <= 0 { - return core.E("rag.cmd.ingest", "chunk-size must be > 0", nil) - } - if chunkOverlap < 0 || chunkOverlap >= chunkSize { - return core.E("rag.cmd.ingest", "chunk-overlap must be >= 0 and < chunk-size", nil) - } - cfg := rag.IngestConfig{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/rag/cmd_ingest.go` around lines 37 - 74, Validate the local ingestion flags chunkSize and chunkOverlap before any network calls: move the checks for chunkSize <= 0 and chunkOverlap < 0 || chunkOverlap >= chunkSize to run before creating the Qdrant and Ollama clients (before calls to rag.NewQdrantClient, rag.NewOllamaClient, qdrantClient.HealthCheck, and ollamaClient.VerifyModel). Keep the same error returns (core.E with "rag.cmd.ingest" and the existing messages) so invalid local flags short-circuit the command without attempting NewQdrantClient, NewOllamaClient, HealthCheck, or VerifyModel.ingest_test.go (1)
25-520:⚠️ Potential issue | 🟠 MajorUse
testifyassertions in this_test.gofile.The new helper-based assertions (
assertNoError,assertEqual,assertLen, etc.) violate the repository test standard and make this suite inconsistent with the expected assertion style.Suggested pattern
- assertNoError(t, err) + require.NoError(t, err) - assertEqual(t, 1, stats.Files) + assert.Equal(t, 1, stats.Files)As per coding guidelines
**/*_test.go: Use testify assert/require for writing tests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ingest_test.go` around lines 25 - 520, Replace all custom test helper assertions (assertNoError, assertEqual, assertLen, assertContains, assertGreater, assertNotEmpty, assertEmpty, assertError, assertTruef, etc.) in this file with testify Assert/Require calls (e.g., assert.NoError(t, err), assert.Equal(t, expected, actual), assert.Len(t, obj, n), assert.Contains(t, s, substr), assert.Greater(t, a, b), assert.NotEmpty(t, v), assert.Empty(t, v), assert.Error(t, err), assert.Truef(t, cond, format, args...)); add the testify import ("github.com/stretchr/testify/assert") at the top of the test file (and require if you prefer fatal assertions for setup), and update the writeFile helper to use assert.Truef (or require.Truef) instead of assertTruef; ensure all assertion argument order matches testify (expected before actual where applicable) and remove or stop using the old helper names so tests conform to the repository standard (references: Ingest, IngestFile, writeFile, and all test cases in this file).
🧹 Nitpick comments (5)
endpoint.go (1)
9-11: Use UK English spelling in comments.The comment uses "normalizes" which is US English spelling. As per coding guidelines, UK English should be used.
📝 Proposed fix
-// parseEndpointURL normalizes host-style endpoints into a parsed URL. +// parseEndpointURL normalises host-style endpoints into a parsed URL.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@endpoint.go` around lines 9 - 11, Update the comment(s) in endpoint.go to use UK English spelling: change "normalizes" to "normalises" in the parseEndpointURL comment (and any other occurrences in nearby comments), e.g. the header sentence describing parseEndpointURL and the inline description "are treated as HTTP URLs" if it contains US spellings; retain original meaning and punctuation while only adjusting spelling to UK conventions.go.mod (1)
6-6: Duplicate dependency entry fordappco.re/go/core.
dappco.re/go/core v0.8.0-alpha.1appears in both the direct require block (line 6) and indirect require block (line 15). While Go modules will dedupe this, having the same package in both blocks is redundant. Consider removing the indirect entry.📝 Proposed fix
require ( - dappco.re/go/core v0.8.0-alpha.1 // indirect dappco.re/go/inference v0.8.0-alpha.1 // indirectAlso applies to: 15-15
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go.mod` at line 6, Remove the redundant indirect require for the module dappco.re/go/core v0.8.0-alpha.1 so it only appears once in the go.mod require block; locate the duplicate entry named "dappco.re/go/core v0.8.0-alpha.1" and delete the indirect occurrence (the one under the indirect/require block) leaving the direct require intact to avoid duplication.test_helpers_test.go (1)
366-377:lengthOf(nil)returns(0, true), which differs from testify semantics.This causes
assertLen(t, nil, 0)to pass silently, whereas testify'sassert.Lenwould fail on a nil value. If intentional, this is fine, but it may mask bugs where code incorrectly returnsnilinstead of an empty slice.🔧 Consider aligning with testify behaviour
func lengthOf(value any) (int, bool) { if value == nil { - return 0, true + return 0, false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test_helpers_test.go` around lines 366 - 377, The helper lengthOf currently returns (0, true) for value == nil which lets assertLen(t, nil, 0) pass; change lengthOf so nil values are treated as non-collections: return (0, false) when value == nil, and for reflected values that can be nil (Slice, Map, Chan, Ptr, Interface, Func) return (0, false) if v.IsNil(); otherwise keep the existing Kind-based v.Len() behavior for non-nil collections. Update function lengthOf accordingly to match testify semantics.integration_test.go (1)
418-422:writeTestFileis now a thin wrapper.The function correctly delegates to
writeFile(t, path, content). Consider removing this wrapper entirely and usingwriteFiledirectly if no additional logic is planned.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_test.go` around lines 418 - 422, The writeTestFile wrapper is redundant—remove the writeTestFile function and update all call sites to call writeFile(t, path, content) directly (search for usages of writeTestFile and replace them), then run the test suite to ensure nothing else depends on the wrapper; if any call sites relied on helper semantics, migrate them to writeFile or reintroduce needed behavior inside writeFile instead.tests/cli/rag/main.go (1)
12-13: Mirror the production command tree in this smoke driver.Mounting
ragdirectly under a synthetic root means this helper never exercises the realcore ai ragpath, so wiring regressions under theaiparent can slip through.Suggested change
func main() { - root := cli.NewGroup("core-rag", "RAG CLI artifact test driver", "") - ragcmd.AddRAGSubcommands(root) + root := cli.NewGroup("core", "RAG CLI artifact test driver", "") + ai := cli.NewGroup("ai", "", "") + root.AddCommand(ai) + ragcmd.AddRAGSubcommands(ai) root.SetArgs(os.Args[1:])Based on learnings,
AddRAGSubcommands()is mounted undercore ai rag.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/cli/rag/main.go` around lines 12 - 13, The test driver currently mounts RAG under a synthetic root ("core-rag") so it doesn't exercise the real command path; change the setup to mirror production by creating the parent groups and mounting AddRAGSubcommands under the "core" → "ai" → "rag" hierarchy instead of directly on the synthetic root (use cli.NewGroup to create the "core" group, then an "ai" subgroup, then call ragcmd.AddRAGSubcommands on the "ai" subgroup or its "rag" child) so the test exercises the same command tree as production.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@chunk_test.go`:
- Around line 9-17: The test TestChunk_ChunkMarkdown_Good_SmallSection uses
custom helpers assertLen and assertContains instead of the repo standard testify
assertions; update this test to import github.com/stretchr/testify/require (or
assert) and replace assertLen(t, chunks, 1) with require.Len(t, chunks, 1) (or
assert.Len) and replace assertContains(t, chunks[0].Text, "small section") with
require.Contains(t, chunks[0].Text, "small section"); keep the tested functions
ChunkMarkdown and DefaultChunkConfig unchanged.
In `@cmd/rag/cmd_commands_test.go`:
- Around line 15-24: Replace the manual t.Fatalf checks with testify requires:
import "github.com/stretchr/testify/require" in cmd_commands_test.go, call
children := parent.Commands() then use require.Len(t, children, 1) instead of
the first length check, use require.Equal(t, "rag", children[0].Name()) instead
of the Name() t.Fatalf, and use require.Len(t, children[0].Commands(), 3)
instead of the final length check so failures are reported using testify's
require helpers.
In `@cmd/rag/cmd_query.go`:
- Around line 73-79: Replace direct calls to core.Println in the result
formatting switch with writes to the command writer returned by
cmd.OutOrStdout() so redirected callers capture output; specifically, for the
cases that call rag.FormatResultsJSON(results),
rag.FormatResultsContext(results), and rag.FormatResultsText(results) write
their returned strings to cmd.OutOrStdout() (e.g., using fmt.Fprintln or
io.WriteString) instead of calling core.Println.
In `@cmd/rag/cmd_rag.go`:
- Around line 39-45: The current init code silently ignores non-numeric
QDRANT_PORT (and similarly OLLAMA_PORT) by falling back to defaults; change the
logic in the rag command setup so that when core.Env("QDRANT_PORT") (and
core.Env("OLLAMA_PORT")) returns a non-empty string and strconv.Atoi fails, you
fail fast (return an error or call a fatal/exit path) instead of using the
default. Locate the qPort variable and its parsing block (where strconv.Atoi is
called) and the analogous OLLAMA_PORT parsing, and replace the silent-ignore
behavior with an explicit error message that includes the invalid value and stop
startup (e.g., processLogger.Fatalf / return fmt.Errorf) before calling
ragCmd.PersistentFlags().IntVar with qdrantPort/ollamaPort.
In `@helpers.go`:
- Around line 160-185: The embedBatchConcurrent function launches an unbounded
goroutine per text which can overload the embed service; limit concurrency by
adding a semaphore or worker-pool (e.g., a buffered channel or worker
goroutines) to cap simultaneous embed calls, acquire before calling embed(ctx,
text) and release after; keep using vectors and errs slices indexed by i and the
existing wait group wg (or replace wg with worker pool coordination), ensure
embed errors are written into errs[i] and successful vectors[i] set, and return
the same [][]float32 and []error shape.
In `@ingest.go`:
- Around line 62-69: The code does an unchecked type assertion on
infoResult.Value in the ingest flow: after calling localFS.Stat(scanRoot) check
infoResult.OK and then assert the type safely by using the comma-ok form (e.g.,
info, ok := infoResult.Value.(fs.FileInfo)); if ok is false return a wrapped
error (using core.Wrap or core.E) indicating an unexpected type from Stat and
include infoResult.Value or its type in the message; keep the subsequent IsDir()
check only when the assertion succeeds.
- Around line 243-258: In collectMarkdownFiles, the code does a direct type
assertion listResult.Value.([]fs.DirEntry) which can panic if the type is
unexpected; change this to a safe assertion (val, ok :=
listResult.Value.([]fs.DirEntry)) and handle the !ok case by returning a
suitable error (e.g., via resultError or a new error describing unexpected List
value) before proceeding to sort; ensure you reference listResult.Value and
update any downstream uses (entries) to use the safely asserted val.
In `@keyword_test.go`:
- Around line 19-21: The test uses local helper assertions (assertLen,
assertEqual) but must use testify's assert/require; replace calls to
assertLen(t, filtered, 2) and assertEqual(t, float32(0.9), filtered[0].Score) /
assertEqual(t, float32(0.8), filtered[1].Score) with the testify equivalents
(e.g., assert.Len(t, filtered, 2) and assert.Equal(t, float32(0.9),
filtered[0].Score), assert.Equal(t, float32(0.8), filtered[1].Score)) and ensure
the test imports the "github.com/stretchr/testify/assert" package (or require if
appropriate).
In `@ollama.go`:
- Around line 95-99: The code silently ignores strconv.Atoi errors when parsing
parsed.Port(), which can hide invalid endpoint configs; update the function that
sets cfg.Port to handle the error from strconv.Atoi(portText) instead of
ignoring it—either return the parse error upstream or log it with context
(including portText and the relevant config/endpoint) and avoid silently keeping
the default cfg.Port; locate the block using parsed.Port(), strconv.Atoi, and
cfg.Port and change the if err == nil branch to handle the err case (return an
error or call the package logger with a clear message and the invalid portText).
In `@qdrant.go`:
- Around line 285-302: GetChunkIndex uses 0 as a sentinel so a real chunk index
of 0 is treated as "unset" and falls back to Payload; change the data model and
accessor to distinguish "unset" from 0 by either making SearchResult.ChunkIndex
a pointer (*int) or adding an explicit presence flag (e.g., ChunkIndexPresent
bool), then update SearchResult.GetChunkIndex to check the presence flag or
nil-pointer before falling back to Index or Payload; keep the same Payload
parsing logic (reading "chunk_index" from Payload) but only consult it when the
explicit presence check indicates ChunkIndex is unset.
- Around line 66-69: When parsing the endpoint port, don’t silently keep the
default on parse errors: if parsed.Port() != "" then attempt strconv.Atoi on it
and if Atoi returns an error return that error (or wrap it) instead of leaving
cfg.Port unchanged; additionally validate the parsed port is within 1–65535 and
reject out-of-range values (again returning an error), otherwise set cfg.Port to
the validated integer. Update the code paths handling parsed.Port(),
strconv.Atoi, and cfg.Port to implement this fail-fast validation.
In `@query.go`:
- Around line 55-63: GetChunkIndex() conflates a real first chunk (0) with "no
chunk metadata" causing rankKey/deduplication to collapse all chunk-less hits to
source:0; change the behavior so missing chunk metadata is represented by a
sentinel (e.g. -1) instead of 0 or make rankKey detect absence explicitly:
update QueryResult.GetChunkIndex to return a distinct sentinel like -1 when
neither ChunkIndex nor Index is set, and update any callers (notably rankKey) to
use that sentinel when building dedupe keys (or include the sentinel in the key)
so chunk-less hits from the same source are not merged into source:0.
In `@README.md`:
- Around line 38-42: The README example block is inconsistent: the tagged test
command lacks the workspace override; update the line containing "go test -tags
rag ./..." to include the workspace flag so it reads "GOWORK=off go test -tags
rag ./..." ensuring all commands in the block consistently use GOWORK=off.
In `@specs/forge.lthn.ai/core/go-rag.md`:
- Line 10: Update US English spellings to UK English across the spec: change
occurrences of "normalizes"/"normalized"/"serializes" (and any other US forms
like "colorize"/"organization"/"initialize"/"behavior") to
"normalises"/"normalised"/"serialises" (and
"colourise"/"organisation"/"initialise"/"behaviour") throughout the document,
including in descriptions for ChunkMarkdownSeq, Size and Overlap and any related
headings or examples; ensure all .md and .go references follow the project's UK
English convention.
- Line 107: The documentation is inconsistent: FileExtensions() lists `.pdf` as
ingestible but the Ingest description omits `.pdf`; update the Ingest docs to
include `.pdf` (or, if intended otherwise, remove `.pdf` from FileExtensions())
so both statements match—specifically edit the Ingest section text to mention
`.md`, `.markdown`, `.pdf`, and `.txt` to align with the FileExtensions()
symbol.
In `@specs/forge.lthn.ai/core/go-rag/cmd/rag.md`:
- Around line 13-14: Update the spec text for the AddRAGSubcommands entry to use
UK English: change the word "Initializes" to "Initialises" in the description
for `func AddRAGSubcommands(parent *cli.Command)` so the doc string matches the
repo's UK spelling convention (e.g., "Initialises the package flag set through
`initFlags`, adds ...").
In `@test_helpers_test.go`:
- Around line 1-14: The test introduces custom assertion helpers
(assertionHelper and testAssert) but the project guideline CLAUDE.md mandates
using testify's assert/require; resolve by either restoring testify-based
assertions (replace usage of assertionHelper/testAssert with testify's
assert/require in test_helpers_test.go and re-add testify import) or update
CLAUDE.md to explicitly allow and document the new custom testing patterns
(mentioning assertionHelper/testAssert semantics). Make the change consistently
across tests so the codebase and CLAUDE.md policy match.
---
Outside diff comments:
In `@cmd/rag/cmd_ingest.go`:
- Around line 37-74: Validate the local ingestion flags chunkSize and
chunkOverlap before any network calls: move the checks for chunkSize <= 0 and
chunkOverlap < 0 || chunkOverlap >= chunkSize to run before creating the Qdrant
and Ollama clients (before calls to rag.NewQdrantClient, rag.NewOllamaClient,
qdrantClient.HealthCheck, and ollamaClient.VerifyModel). Keep the same error
returns (core.E with "rag.cmd.ingest" and the existing messages) so invalid
local flags short-circuit the command without attempting NewQdrantClient,
NewOllamaClient, HealthCheck, or VerifyModel.
In `@docs/architecture.md`:
- Around line 149-156: The doc text for Ingest(...) is out of date: instead of
embedding each chunk individually, the implementation batches chunks and sends
them to the embedder in batches (controlled by BatchSize) and may embed batches
concurrently; update step 5 to state that after ChunkMarkdown is called you
accumulate chunks (across files as implemented), group them into batches of up
to BatchSize, call the embedder's batch embed API (preserving chunk order so
returned embeddings map back to chunks), construct Point structs from the batch
of embeddings, and then send those Points to the batch upsert path (which also
operates in BatchSize slices); mention any concurrency/workers used by the
implementation when embedding batches and that embedding and upsert are done
per-batch rather than per-chunk.
In `@docs/index.md`:
- Line 79: The dependencies table in docs/index.md still lists
`github.com/stretchr/testify` but it was removed from go.mod; update the docs by
removing the table row that references `github.com/stretchr/testify` (the line
containing "`github.com/stretchr/testify` | Test assertions (test-only)") or
replace it with the current test dependency if applicable so the documentation
matches the repository dependencies.
In `@ingest_test.go`:
- Around line 25-520: Replace all custom test helper assertions (assertNoError,
assertEqual, assertLen, assertContains, assertGreater, assertNotEmpty,
assertEmpty, assertError, assertTruef, etc.) in this file with testify
Assert/Require calls (e.g., assert.NoError(t, err), assert.Equal(t, expected,
actual), assert.Len(t, obj, n), assert.Contains(t, s, substr), assert.Greater(t,
a, b), assert.NotEmpty(t, v), assert.Empty(t, v), assert.Error(t, err),
assert.Truef(t, cond, format, args...)); add the testify import
("github.com/stretchr/testify/assert") at the top of the test file (and require
if you prefer fatal assertions for setup), and update the writeFile helper to
use assert.Truef (or require.Truef) instead of assertTruef; ensure all assertion
argument order matches testify (expected before actual where applicable) and
remove or stop using the old helper names so tests conform to the repository
standard (references: Ingest, IngestFile, writeFile, and all test cases in this
file).
---
Nitpick comments:
In `@endpoint.go`:
- Around line 9-11: Update the comment(s) in endpoint.go to use UK English
spelling: change "normalizes" to "normalises" in the parseEndpointURL comment
(and any other occurrences in nearby comments), e.g. the header sentence
describing parseEndpointURL and the inline description "are treated as HTTP
URLs" if it contains US spellings; retain original meaning and punctuation while
only adjusting spelling to UK conventions.
In `@go.mod`:
- Line 6: Remove the redundant indirect require for the module dappco.re/go/core
v0.8.0-alpha.1 so it only appears once in the go.mod require block; locate the
duplicate entry named "dappco.re/go/core v0.8.0-alpha.1" and delete the indirect
occurrence (the one under the indirect/require block) leaving the direct require
intact to avoid duplication.
In `@integration_test.go`:
- Around line 418-422: The writeTestFile wrapper is redundant—remove the
writeTestFile function and update all call sites to call writeFile(t, path,
content) directly (search for usages of writeTestFile and replace them), then
run the test suite to ensure nothing else depends on the wrapper; if any call
sites relied on helper semantics, migrate them to writeFile or reintroduce
needed behavior inside writeFile instead.
In `@test_helpers_test.go`:
- Around line 366-377: The helper lengthOf currently returns (0, true) for value
== nil which lets assertLen(t, nil, 0) pass; change lengthOf so nil values are
treated as non-collections: return (0, false) when value == nil, and for
reflected values that can be nil (Slice, Map, Chan, Ptr, Interface, Func) return
(0, false) if v.IsNil(); otherwise keep the existing Kind-based v.Len() behavior
for non-nil collections. Update function lengthOf accordingly to match testify
semantics.
In `@tests/cli/rag/main.go`:
- Around line 12-13: The test driver currently mounts RAG under a synthetic root
("core-rag") so it doesn't exercise the real command path; change the setup to
mirror production by creating the parent groups and mounting AddRAGSubcommands
under the "core" → "ai" → "rag" hierarchy instead of directly on the synthetic
root (use cli.NewGroup to create the "core" group, then an "ai" subgroup, then
call ragcmd.AddRAGSubcommands on the "ai" subgroup or its "rag" child) so the
test exercises the same command tree as production.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7b01dfcb-a14c-4b3f-942a-66fb0674d7f6
⛔ Files ignored due to path filters (3)
go.sumis excluded by!**/*.sumgo.workis excluded by!**/*.workgo.work.sumis excluded by!**/*.sum
📒 Files selected for processing (42)
README.mdbenchmark_gpu_test.gobenchmark_test.gochunk.gochunk_test.gocmd/rag/cmd_collections.gocmd/rag/cmd_commands.gocmd/rag/cmd_commands_test.gocmd/rag/cmd_ingest.gocmd/rag/cmd_query.gocmd/rag/cmd_rag.gocollections.gocollections_test.godocs/architecture.mddocs/index.mdembedder.goendpoint.goendpoint_test.gogo.modhelpers.gohelpers_test.goingest.goingest_test.gointegration_test.gokeyword.gokeyword_test.gomock_test.goollama.goollama_integration_test.goollama_test.goqdrant.goqdrant_integration_test.goqdrant_test.goquery.goquery_test.gospecs/forge.lthn.ai/core/go-rag.mdspecs/forge.lthn.ai/core/go-rag/cmd/rag.mdspecs/rag/RFC.mdtest_helpers_test.gotests/cli/rag/Taskfile.yamltests/cli/rag/main.govectorstore.go
# Conflicts: # chunk.go # chunk_test.go # go.mod # helpers.go # ingest.go # keyword.go # mock_test.go # ollama.go # qdrant.go # query.go # vectorstore.go
|
RESOLVED-COMMENT: for the outside-diff ingest_test.go/testify recommendation, I am not reintroducing testify. AX-6 shifted this repo to stdlib testing plus local helpers; adding github.com/stretchr/testify back would violate the active convention. I removed the remaining testify import/checksum from the branch and updated CLAUDE.md so the documented policy matches the code. |
Snider
left a comment
There was a problem hiding this comment.
Submitting AX-6/testify disposition replies: these CodeRabbit suggestions are resolved by convention documentation and dependency removal, not by reintroducing testify.
Snider
left a comment
There was a problem hiding this comment.
Submitting AX-6/testify disposition replies: these CodeRabbit suggestions are resolved by convention documentation and dependency removal, not by reintroducing testify.
Comprehensive disposition of all 17+ CodeRabbit findings — fixes
applied where actionable, PR comments posted with reasoning where
CodeRabbit conflicted with AX-6 (testify suggestions).
Code fixes:
- cmd/rag/cmd_query.go: output via cmd.OutOrStdout(), not direct stdout
- cmd/rag/cmd_rag.go: validate QDRANT_PORT / OLLAMA_PORT env vars (no silent fallback)
- helpers.go: bounded embedding fan-out (no unbounded goroutine spawn)
- ingest.go: checked Stat + List type assertions (no panic on bad data)
- ollama.go: endpoint port parse fallback hardened
- qdrant.go: endpoint port range validation + chunk index 0 sentinel bug
- query.go: chunk-less dedupe collapse (no orphaned chunk references)
- cmd/rag/cmd_ingest.go: validate local flags before network call
- endpoint.go: UK spelling
- go.mod: removed duplicate dappco.re/go/core require
Doc fixes:
- README.md: GOWORK=off on tagged test command
- specs/forge.lthn.ai/core/go-rag.md: UK spelling + .pdf ingest docs
- specs/forge.lthn.ai/core/go-rag/cmd/rag.md: 'initializes' spelling
- docs/index.md: removed stale testify dependency row
- docs/architecture.md: removed stale per-chunk embed docs
- test_helpers_test.go: lengthOf(nil) semantics
CodeRabbit testify findings RESOLVED-COMMENT (per AX-6 ban):
- chunk_test.go, cmd/rag/cmd_commands_test.go, keyword_test.go,
ingest_test.go: replied on PR explaining AX-6 banned testify;
stdlib testing patterns are the convention, not a regression.
Verification: gofmt -l clean, git diff --check clean, no testify
references remain. Full go vet ./... blocked by pre-existing
dappco.re/go/{cli,i18n,inference,log} module drift (out of scope).
Closes findings on #4
Co-authored-by: Codex <noreply@openai.com>
Round 2 follow-up to bb63643. CodeRabbit re-reviewed; r2 closes the residual gaps. Code: - AX-2 docstrings added across helper / internal / public surfaces (including exported struct fields on stats/config types). Local doc coverage now 82.26%. - specs/forge.lthn.ai/core/go-rag.md: remaining UK spelling fix - internal/compat/cli (new) — local shim + go.mod replacements so GOWORK=off verification resolves cleanly without external dep - go.mod / go.sum: cli shim require/replace entries Disposition replies (RESOLVED-COMMENT): - Testify findings on chunk_test.go, cmd_commands_test.go, keyword_test.go, ingest_test.go, test_helpers_test.go: AX-6 bans testify; stdlib helpers convention preserved. - Writer routing, env port validation, endpoint parsing, bounded fan-out, safe type assertions, chunk index, chunkless dedupe, local chunk validation: already addressed in bb63643, verified still in place. - PR title metadata: not a code finding. Verification: gofmt clean, GOWORK=off go vet + go test -count=1 ./... pass with explicit GOPATH/GOMODCACHE/GOCACHE. Closes residual findings on #4 Co-authored-by: Codex <noreply@openai.com>
Routine dev→main PR opened by Hephaestus PR-cadence task.
This PR exists to:
ahead_by: 77 commit(s) (per gh api compare)If CodeRabbit clears with no blocking comments, this PR is eligible for
gh pr merge --merge(real merge commit, no force-push). Conflicts and review feedback should be addressed on the dev branch before merge.Co-authored-by: Hephaestus hephaestus@cladius
Summary by CodeRabbit
Release Notes
New Features
--keywordsflag to enable keyword-based filtering during queries.Documentation