Skip to content

refactor: centralize ingest and upload logic - BED-8313#2800

Merged
mykeelium merged 1 commit into
mainfrom
mcuomo/BED-8313
May 27, 2026
Merged

refactor: centralize ingest and upload logic - BED-8313#2800
mykeelium merged 1 commit into
mainfrom
mcuomo/BED-8313

Conversation

@mykeelium
Copy link
Copy Markdown
Contributor

@mykeelium mykeelium commented May 20, 2026

Description

This change refactors the ingest and upload logic to centralize some of the functions to be able to create a new file service in the future.

Motivation and Context

Resolves: BED-8313

How Has This Been Tested?

This has been tested by running the application and doing an upload.

Screenshots (optional):

Types of changes

  • Chore (a change that does not modify the application functionality)

Checklist:

Summary by CodeRabbit

  • New Features

    • Ingests JSON files and expands uploaded archives into staged temp files with per-file error tracking and automatic cleanup
    • Upload/validation flow now uses a configurable temp-file prefix for organized temporary filenames
  • Tests

    • Unit tests added for archive extraction, JSON handling (including BOM stripping), temp-file-prefix behavior, validation success, and cleanup on failure

Review Change Stack

@mykeelium mykeelium self-assigned this May 20, 2026
@mykeelium mykeelium added the api A pull request containing changes affecting the API code. label May 20, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR refactors ingest file handling by extracting temp file prefix logic into a reusable utility, introducing a dedicated ingest storage module to centralize JSON and ZIP file extraction, and updating the task orchestrator to delegate to the new shared extraction function.

Changes

Ingest File Handling Refactor

Layer / File(s) Summary
Upload temp file prefix infrastructure
cmd/api/src/services/upload/upload.go, cmd/api/src/services/upload/upload_test.go
WriteAndValidateFileWithPrefix allows callers to specify custom temp file name prefixes. WriteAndValidateFile is refactored as a thin wrapper that computes a default prefix. SaveIngestFile now uses ingestFileTempPrefix(jobID) to create job-scoped temp files. Tests verify prefix-based temp file creation and cleanup on validation failure.
Ingest file extraction module
cmd/api/src/services/graphify/ingest_storage.go, cmd/api/src/services/graphify/ingest_storage_test.go
IngestFileData tracks extracted/staged files with per-file error lists. ExtractIngestFiles returns JSON files directly or extracts ZIP entries to a temp directory with UTF-8 normalization, closing and removing the archive after extraction. extractToTempFile creates temp files and normalizes content streams. processSingleFile invokes ReadFileForIngest, logs errors, and cleans up files. Tests verify JSON passthrough and ZIP expansion with proper filename prefix tracking and BOM stripping.
Task orchestration refactor
cmd/api/src/services/graphify/tasks.go
ProcessIngestFile now calls ExtractIngestFiles directly instead of using local archive and temp file helpers. Imports are updated to remove ZIP and temp file package dependencies. The prior processSingleFile local helper is removed; its responsibilities are now in the shared module. Batch creation and error classification logic remain unchanged.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main refactoring effort to centralize ingest and upload logic, with the associated ticket reference.
Description check ✅ Passed The description covers all required sections with appropriate detail: motivation/context (BED-8313), testing approach, and all checklist items completed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mcuomo/BED-8313

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/api/src/services/graphify/ingest_storage_test.go (1)

118-124: ⚡ Quick win

Make content assertions order-independent.

Line 118 and Line 122 assume fileData ordering, which can make this test flaky if extraction iteration order changes. Assert by IngestFileData.Name instead of fixed indexes.

Proposed test hardening
-	domainsContent, err := os.ReadFile(fileData[0].Path)
-	require.NoError(t, err)
-	require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), domainsContent)
-
-	usersContent, err := os.ReadFile(fileData[1].Path)
-	require.NoError(t, err)
-	require.Equal(t, plainFileContent, usersContent)
+	contentsByName := map[string][]byte{}
+	for _, entry := range fileData {
+		content, err := os.ReadFile(entry.Path)
+		require.NoError(t, err)
+		contentsByName[entry.Name] = content
+	}
+
+	require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), contentsByName["domains.json"])
+	require.Equal(t, plainFileContent, contentsByName["users.json"])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/services/graphify/ingest_storage_test.go` around lines 118 - 124,
The test currently assumes a stable ordering of fileData by indexing fileData[0]
and fileData[1]; make the assertions order-independent by locating entries in
fileData by their IngestFileData.Name (e.g., "domains.json" and the user file
name) and then reading from the found.Path before asserting contents; update the
variables/domainsContent and usersContent checks to use the Path from the
matched entry and compare against the expected
[]byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`) and
plainFileContent respectively so the test no longer depends on iteration order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/api/src/services/graphify/ingest_storage_test.go`:
- Around line 118-124: The test currently assumes a stable ordering of fileData
by indexing fileData[0] and fileData[1]; make the assertions order-independent
by locating entries in fileData by their IngestFileData.Name (e.g.,
"domains.json" and the user file name) and then reading from the found.Path
before asserting contents; update the variables/domainsContent and usersContent
checks to use the Path from the matched entry and compare against the expected
[]byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`) and
plainFileContent respectively so the test no longer depends on iteration order.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 21912fb1-8a8e-452f-ab00-f1c14055c876

📥 Commits

Reviewing files that changed from the base of the PR and between 5e263ba and 082af65.

📒 Files selected for processing (5)
  • cmd/api/src/services/graphify/ingest_storage.go
  • cmd/api/src/services/graphify/ingest_storage_test.go
  • cmd/api/src/services/graphify/tasks.go
  • cmd/api/src/services/upload/upload.go
  • cmd/api/src/services/upload/upload_test.go

@mykeelium
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
cmd/api/src/services/graphify/ingest_storage_test.go (1)

118-124: 💤 Low value

Test assumes deterministic file ordering from ZIP archive.

The assertions at lines 118-124 assume fileData[0] is "domains.json" and fileData[1] is "users.json". While Go's archive/zip typically iterates files in the order they appear in the central directory (which matches write order), this is an implementation detail. Consider making this test more robust by finding files by name rather than relying on index ordering.

♻️ Suggested approach
// Find files by name for more robust assertions
var domainsFile, usersFile *graphify.IngestFileData
for i := range fileData {
	switch fileData[i].Name {
	case "domains.json":
		domainsFile = &fileData[i]
	case "users.json":
		usersFile = &fileData[i]
	}
}
require.NotNil(t, domainsFile)
require.NotNil(t, usersFile)

domainsContent, err := os.ReadFile(domainsFile.Path)
require.NoError(t, err)
require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), domainsContent)

usersContent, err := os.ReadFile(usersFile.Path)
require.NoError(t, err)
require.Equal(t, plainFileContent, usersContent)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/services/graphify/ingest_storage_test.go` around lines 118 - 124,
Test assumes deterministic ZIP ordering: don't index into fileData by position.
Locate entries by name (e.g., search fileData slice for Name == "domains.json"
and Name == "users.json", storing pointers/indices into variables such as
domainsFile and usersFile of type graphify.IngestFileData), assert they are
non-nil, then read and compare contents using those found entries (use
os.ReadFile on domainsFile.Path and usersFile.Path and the same require checks).
cmd/api/src/services/graphify/tasks.go (1)

51-51: 💤 Low value

Consider using a job-specific temp file prefix for consistency.

The ExtractIngestFiles call uses a hardcoded "bh" prefix, while SaveIngestFile in upload.go uses a job-specific prefix (file_upload_job{jobID}_). Since task.JobId is available here, consider using a consistent job-scoped prefix for easier debugging and file association. If the difference is intentional (e.g., distinguishing upload-stage vs extraction-stage temp files), a brief comment would help clarify the intent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/services/graphify/tasks.go` at line 51, The ExtractIngestFiles
invocation currently passes a hardcoded "bh" prefix; change it to use a
job-scoped temp-file prefix (e.g., construct the same pattern used by
SaveIngestFile like fmt.Sprintf("file_upload_job%d_", task.JobId)) so extraction
temp files are associated with the job, or if the hardcoded "bh" is intentional,
add a short comment next to the ExtractIngestFiles call explaining why the
extraction-stage prefix must differ from SaveIngestFile's job-specific prefix;
reference ExtractIngestFiles and SaveIngestFile to locate the related logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/api/src/services/graphify/ingest_storage_test.go`:
- Around line 118-124: Test assumes deterministic ZIP ordering: don't index into
fileData by position. Locate entries by name (e.g., search fileData slice for
Name == "domains.json" and Name == "users.json", storing pointers/indices into
variables such as domainsFile and usersFile of type graphify.IngestFileData),
assert they are non-nil, then read and compare contents using those found
entries (use os.ReadFile on domainsFile.Path and usersFile.Path and the same
require checks).

In `@cmd/api/src/services/graphify/tasks.go`:
- Line 51: The ExtractIngestFiles invocation currently passes a hardcoded "bh"
prefix; change it to use a job-scoped temp-file prefix (e.g., construct the same
pattern used by SaveIngestFile like fmt.Sprintf("file_upload_job%d_",
task.JobId)) so extraction temp files are associated with the job, or if the
hardcoded "bh" is intentional, add a short comment next to the
ExtractIngestFiles call explaining why the extraction-stage prefix must differ
from SaveIngestFile's job-specific prefix; reference ExtractIngestFiles and
SaveIngestFile to locate the related logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 63182534-23f7-4e0f-83de-50d91b994353

📥 Commits

Reviewing files that changed from the base of the PR and between 5e263ba and 082af65.

📒 Files selected for processing (5)
  • cmd/api/src/services/graphify/ingest_storage.go
  • cmd/api/src/services/graphify/ingest_storage_test.go
  • cmd/api/src/services/graphify/tasks.go
  • cmd/api/src/services/upload/upload.go
  • cmd/api/src/services/upload/upload_test.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/api/src/services/graphify/ingest_storage_test.go (1)

82-125: 💤 Low value

Test assumes deterministic file ordering from ZIP iteration.

Lines 118-124 access fileData[0] expecting domains.json and fileData[1] expecting users.json. While Go's archive/zip iterates entries in their stored order (which writeZipFile controls), consider either asserting by matching file.Name or adding a comment noting this relies on deterministic insertion order.

♻️ Optional: Match by name for robustness
-	domainsContent, err := os.ReadFile(fileData[0].Path)
-	require.NoError(t, err)
-	require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), domainsContent)
-
-	usersContent, err := os.ReadFile(fileData[1].Path)
-	require.NoError(t, err)
-	require.Equal(t, plainFileContent, usersContent)
+	for _, file := range fileData {
+		content, err := os.ReadFile(file.Path)
+		require.NoError(t, err)
+
+		switch file.Name {
+		case "domains.json":
+			require.Equal(t, []byte(`{"meta":{"type":"domains","version":5,"count":0},"data":[]}`), content)
+		case "users.json":
+			require.Equal(t, plainFileContent, content)
+		default:
+			t.Errorf("unexpected file: %s", file.Name)
+		}
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/services/graphify/ingest_storage_test.go` around lines 82 - 125,
The test TestExtractIngestFilesExpandsZIPIntoTempDirectory assumes deterministic
ordering by indexing fileData[0] and fileData[1]; change the assertions to
locate entries by name instead of index (e.g., inspect each
graphify.FileInfo.Path or graphify.FileInfo.Name, build a map from
filepath.Base(file.Path) to the entry or iterate and match "domains.json" and
"users.json") and then assert contents for the matched entries and other
properties (ParentFile, Errors, DirExists, FileExists, tempFilePrefix) so the
test no longer depends on ZIP iteration order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/api/src/services/graphify/ingest_storage_test.go`:
- Around line 82-125: The test TestExtractIngestFilesExpandsZIPIntoTempDirectory
assumes deterministic ordering by indexing fileData[0] and fileData[1]; change
the assertions to locate entries by name instead of index (e.g., inspect each
graphify.FileInfo.Path or graphify.FileInfo.Name, build a map from
filepath.Base(file.Path) to the entry or iterate and match "domains.json" and
"users.json") and then assert contents for the matched entries and other
properties (ParentFile, Errors, DirExists, FileExists, tempFilePrefix) so the
test no longer depends on ZIP iteration order.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 723b1c5b-ba01-4f98-814c-17eec2ecd6c8

📥 Commits

Reviewing files that changed from the base of the PR and between 082af65 and 1969000.

📒 Files selected for processing (5)
  • cmd/api/src/services/graphify/ingest_storage.go
  • cmd/api/src/services/graphify/ingest_storage_test.go
  • cmd/api/src/services/graphify/tasks.go
  • cmd/api/src/services/upload/upload.go
  • cmd/api/src/services/upload/upload_test.go

@mykeelium mykeelium merged commit ccb3ed5 into main May 27, 2026
13 checks passed
@mykeelium mykeelium deleted the mcuomo/BED-8313 branch May 27, 2026 18:20
@github-actions github-actions Bot locked and limited conversation to collaborators May 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api A pull request containing changes affecting the API code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants