refactor: centralize ingest and upload logic - BED-8313#2800
Conversation
📝 WalkthroughWalkthroughThis PR refactors ingest file handling by extracting temp file prefix logic into a reusable utility, introducing a dedicated ingest storage module to centralize JSON and ZIP file extraction, and updating the task orchestrator to delegate to the new shared extraction function. ChangesIngest File Handling Refactor
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/api/src/services/graphify/ingest_storage_test.go (1)
118-124: ⚡ Quick winMake content assertions order-independent.
Line 118 and Line 122 assume
fileDataordering, which can make this test flaky if extraction iteration order changes. Assert byIngestFileData.Nameinstead of fixed indexes.Proposed test hardening
- domainsContent, err := os.ReadFile(fileData[0].Path) - require.NoError(t, err) - require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), domainsContent) - - usersContent, err := os.ReadFile(fileData[1].Path) - require.NoError(t, err) - require.Equal(t, plainFileContent, usersContent) + contentsByName := map[string][]byte{} + for _, entry := range fileData { + content, err := os.ReadFile(entry.Path) + require.NoError(t, err) + contentsByName[entry.Name] = content + } + + require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), contentsByName["domains.json"]) + require.Equal(t, plainFileContent, contentsByName["users.json"])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/api/src/services/graphify/ingest_storage_test.go` around lines 118 - 124, The test currently assumes a stable ordering of fileData by indexing fileData[0] and fileData[1]; make the assertions order-independent by locating entries in fileData by their IngestFileData.Name (e.g., "domains.json" and the user file name) and then reading from the found.Path before asserting contents; update the variables/domainsContent and usersContent checks to use the Path from the matched entry and compare against the expected []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`) and plainFileContent respectively so the test no longer depends on iteration order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cmd/api/src/services/graphify/ingest_storage_test.go`:
- Around line 118-124: The test currently assumes a stable ordering of fileData
by indexing fileData[0] and fileData[1]; make the assertions order-independent
by locating entries in fileData by their IngestFileData.Name (e.g.,
"domains.json" and the user file name) and then reading from the found.Path
before asserting contents; update the variables/domainsContent and usersContent
checks to use the Path from the matched entry and compare against the expected
[]byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`) and
plainFileContent respectively so the test no longer depends on iteration order.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 21912fb1-8a8e-452f-ab00-f1c14055c876
📒 Files selected for processing (5)
cmd/api/src/services/graphify/ingest_storage.gocmd/api/src/services/graphify/ingest_storage_test.gocmd/api/src/services/graphify/tasks.gocmd/api/src/services/upload/upload.gocmd/api/src/services/upload/upload_test.go
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
cmd/api/src/services/graphify/ingest_storage_test.go (1)
118-124: 💤 Low valueTest assumes deterministic file ordering from ZIP archive.
The assertions at lines 118-124 assume
fileData[0]is "domains.json" andfileData[1]is "users.json". While Go'sarchive/ziptypically iterates files in the order they appear in the central directory (which matches write order), this is an implementation detail. Consider making this test more robust by finding files by name rather than relying on index ordering.♻️ Suggested approach
// Find files by name for more robust assertions var domainsFile, usersFile *graphify.IngestFileData for i := range fileData { switch fileData[i].Name { case "domains.json": domainsFile = &fileData[i] case "users.json": usersFile = &fileData[i] } } require.NotNil(t, domainsFile) require.NotNil(t, usersFile) domainsContent, err := os.ReadFile(domainsFile.Path) require.NoError(t, err) require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), domainsContent) usersContent, err := os.ReadFile(usersFile.Path) require.NoError(t, err) require.Equal(t, plainFileContent, usersContent)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/api/src/services/graphify/ingest_storage_test.go` around lines 118 - 124, Test assumes deterministic ZIP ordering: don't index into fileData by position. Locate entries by name (e.g., search fileData slice for Name == "domains.json" and Name == "users.json", storing pointers/indices into variables such as domainsFile and usersFile of type graphify.IngestFileData), assert they are non-nil, then read and compare contents using those found entries (use os.ReadFile on domainsFile.Path and usersFile.Path and the same require checks).cmd/api/src/services/graphify/tasks.go (1)
51-51: 💤 Low valueConsider using a job-specific temp file prefix for consistency.
The
ExtractIngestFilescall uses a hardcoded"bh"prefix, whileSaveIngestFileinupload.gouses a job-specific prefix (file_upload_job{jobID}_). Sincetask.JobIdis available here, consider using a consistent job-scoped prefix for easier debugging and file association. If the difference is intentional (e.g., distinguishing upload-stage vs extraction-stage temp files), a brief comment would help clarify the intent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/api/src/services/graphify/tasks.go` at line 51, The ExtractIngestFiles invocation currently passes a hardcoded "bh" prefix; change it to use a job-scoped temp-file prefix (e.g., construct the same pattern used by SaveIngestFile like fmt.Sprintf("file_upload_job%d_", task.JobId)) so extraction temp files are associated with the job, or if the hardcoded "bh" is intentional, add a short comment next to the ExtractIngestFiles call explaining why the extraction-stage prefix must differ from SaveIngestFile's job-specific prefix; reference ExtractIngestFiles and SaveIngestFile to locate the related logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cmd/api/src/services/graphify/ingest_storage_test.go`:
- Around line 118-124: Test assumes deterministic ZIP ordering: don't index into
fileData by position. Locate entries by name (e.g., search fileData slice for
Name == "domains.json" and Name == "users.json", storing pointers/indices into
variables such as domainsFile and usersFile of type graphify.IngestFileData),
assert they are non-nil, then read and compare contents using those found
entries (use os.ReadFile on domainsFile.Path and usersFile.Path and the same
require checks).
In `@cmd/api/src/services/graphify/tasks.go`:
- Line 51: The ExtractIngestFiles invocation currently passes a hardcoded "bh"
prefix; change it to use a job-scoped temp-file prefix (e.g., construct the same
pattern used by SaveIngestFile like fmt.Sprintf("file_upload_job%d_",
task.JobId)) so extraction temp files are associated with the job, or if the
hardcoded "bh" is intentional, add a short comment next to the
ExtractIngestFiles call explaining why the extraction-stage prefix must differ
from SaveIngestFile's job-specific prefix; reference ExtractIngestFiles and
SaveIngestFile to locate the related logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 63182534-23f7-4e0f-83de-50d91b994353
📒 Files selected for processing (5)
cmd/api/src/services/graphify/ingest_storage.gocmd/api/src/services/graphify/ingest_storage_test.gocmd/api/src/services/graphify/tasks.gocmd/api/src/services/upload/upload.gocmd/api/src/services/upload/upload_test.go
082af65 to
1969000
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/api/src/services/graphify/ingest_storage_test.go (1)
82-125: 💤 Low valueTest assumes deterministic file ordering from ZIP iteration.
Lines 118-124 access
fileData[0]expectingdomains.jsonandfileData[1]expectingusers.json. While Go'sarchive/zipiterates entries in their stored order (whichwriteZipFilecontrols), consider either asserting by matchingfile.Nameor adding a comment noting this relies on deterministic insertion order.♻️ Optional: Match by name for robustness
- domainsContent, err := os.ReadFile(fileData[0].Path) - require.NoError(t, err) - require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), domainsContent) - - usersContent, err := os.ReadFile(fileData[1].Path) - require.NoError(t, err) - require.Equal(t, plainFileContent, usersContent) + for _, file := range fileData { + content, err := os.ReadFile(file.Path) + require.NoError(t, err) + + switch file.Name { + case "domains.json": + require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), content) + case "users.json": + require.Equal(t, plainFileContent, content) + default: + t.Errorf("unexpected file: %s", file.Name) + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/api/src/services/graphify/ingest_storage_test.go` around lines 82 - 125, The test TestExtractIngestFilesExpandsZIPIntoTempDirectory assumes deterministic ordering by indexing fileData[0] and fileData[1]; change the assertions to locate entries by name instead of index (e.g., inspect each graphify.FileInfo.Path or graphify.FileInfo.Name, build a map from filepath.Base(file.Path) to the entry or iterate and match "domains.json" and "users.json") and then assert contents for the matched entries and other properties (ParentFile, Errors, DirExists, FileExists, tempFilePrefix) so the test no longer depends on ZIP iteration order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cmd/api/src/services/graphify/ingest_storage_test.go`:
- Around line 82-125: The test TestExtractIngestFilesExpandsZIPIntoTempDirectory
assumes deterministic ordering by indexing fileData[0] and fileData[1]; change
the assertions to locate entries by name instead of index (e.g., inspect each
graphify.FileInfo.Path or graphify.FileInfo.Name, build a map from
filepath.Base(file.Path) to the entry or iterate and match "domains.json" and
"users.json") and then assert contents for the matched entries and other
properties (ParentFile, Errors, DirExists, FileExists, tempFilePrefix) so the
test no longer depends on ZIP iteration order.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 723b1c5b-ba01-4f98-814c-17eec2ecd6c8
📒 Files selected for processing (5)
cmd/api/src/services/graphify/ingest_storage.gocmd/api/src/services/graphify/ingest_storage_test.gocmd/api/src/services/graphify/tasks.gocmd/api/src/services/upload/upload.gocmd/api/src/services/upload/upload_test.go
Description
This change refactors the ingest and upload logic to centralize some of the functions to be able to create a new file service in the future.
Motivation and Context
Resolves: BED-8313
How Has This Been Tested?
This has been tested by running the application and doing an upload.
Screenshots (optional):
Types of changes
Checklist:
Summary by CodeRabbit
New Features
Tests