Skip to content

Add modular forensic processor and batch downloads#22

Merged
ChrisAdamsdevelopment merged 1 commit into
mainfrom
codex/create-new-branch-for-architecture-upgrade
May 13, 2026
Merged

Add modular forensic processor and batch downloads#22
ChrisAdamsdevelopment merged 1 commit into
mainfrom
codex/create-new-branch-for-architecture-upgrade

Conversation

@ChrisAdamsdevelopment
Copy link
Copy Markdown
Owner

@ChrisAdamsdevelopment ChrisAdamsdevelopment commented May 13, 2026

Motivation

  • Move forensic metadata wipe/inject/verify logic into a focused, testable module and add paid-only batch processing plus one-time secure downloads while preserving existing single-file behavior and headers.
  • Introduce explicit marker rules and benign/allowed tag classification to reduce false positives and provide richer audit evidence for forensic review.
  • Add durable tracking for generated outputs and safe cleanup to avoid leaking files from uploads/ and support one-time batch downloads.

Description

  • Added server/metadataRules.js, exporting MARKER_RULES, ALLOWED_INJECTED_TAGS, isBenign, and isAllowedInjected for suspicious/benign/allowed tag classification.
  • Added server/processor.js, encapsulating processMediaFile(...) plus helper logic for marker detection, final-state verification, platform-aware metadata writing, and unsupported-cleanse 422 errors.
  • Added server/cleanup.js, persisting a SQLite-backed cleanup_queue with upload-dir safety guards, immediate deletion, periodic sweeping, and row pruning.
  • Added server/downloadTokens.js, persisting one-time download_tokens with TTL, account validation, consumption semantics, and expiry cleanup.
  • Integrated the new modules into server.js while preserving auth (req.user.sub), free-tier usage checks, Stripe/Gemini behavior, MP3 server-cleanse rejection, existing /api/process behavior, and SPA/API route ordering.
  • Added authenticated POST /api/process-batch and GET /api/download/:token before the API catch-all.
  • Extended frontend forensic report typing in app.tsx with optional richer audit fields while preserving legacy fields (removedCount, removedTags, timestamp).
  • Added concise README.md notes for batch processing, one-time download tokens, MP3 server-side exclusion, and deployment/body-size guard expectations.

Explicit exclusions preserved

  • Did not implement any Spotify/Apple Music album consent modal.
  • Did not add AlbumConsentModal.
  • Did not add or use albumOverrideConfirmed.
  • Did not change Spotify/Apple Music behavior; backend still writes Album = safeTitle, Description, Year, and optional Lyrics-eng.
  • Did not introduce @google/generative-ai; Gemini remains REST/fetch-based.
  • Did not implement server-side MP3 processing; MP3 server cleanse remains rejected.

Testing

Validated under supported Node runtime 20.20.2:

  • npm install passed.
  • npm run build passed.
  • node --check server.js passed.
  • node --check server/processor.js passed.
  • node --check server/metadataRules.js passed.
  • node --check server/cleanup.js passed.
  • node --check server/downloadTokens.js passed.

Runtime smoke checks performed under Node 20.20.2:

  • Started server with safe local environment.
  • GET /api/health returned JSON 200.
  • GET /api/nonexistent returned JSON 404.
  • GET /nonexistent-frontend-route served index.html when dist/ existed.
  • GET /api/download/badtoken without auth returned JSON 401.
  • After registering/authenticating a test user, GET /api/download/badtoken returned JSON 404 (Download link not found.).

Code/contract checks confirmed:

  • req.user.sub is still used for authenticated user identity.
  • users.monthly_usage is not referenced.
  • jobs.status is not referenced.
  • @google/generative-ai is not referenced.
  • AlbumConsentModal is not referenced.
  • albumOverrideConfirmed is not referenced.
  • /api/process-batch and /api/download/:token are before the /api catch-all.
  • /api/generate-seo still uses REST fetch.
  • Stripe routes remain present.
  • Existing /api/process route remains present and MP3 server-side processing remains rejected.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 13, 2026

Reviewer's Guide

Refactors the forensic metadata cleanse/inject pipeline into modular server-side components, adds a richer three-class forensic report model, and introduces paid-only batch processing with one-time secure downloads and durable cleanup of generated files, while keeping existing single-file behavior and headers compatible.

Sequence diagram for batch processing and one-time download flow

sequenceDiagram
  actor User
  participant Server
  participant Processor
  participant DownloadTokens
  participant Cleanup

  User->>Server: POST /api/process-batch
  Server->>Server: validatePlanAndSize
  alt plan is free or size too large
    Server-->>User: 4xx error
  else paid and within limit
    loop each file
      Server->>Processor: processMediaFile(outputPath, platform, metadata)
      Processor-->>Server: report
      Server->>DownloadTokens: createToken(userId, filePath, downloadName)
      DownloadTokens-->>Server: token
      Server->>Cleanup: registerForCleanup([filePath])
    end
    Server-->>User: JSON { results[ { report, downloadToken } ] }
  end

  User->>Server: GET /api/download/:token
  Server->>DownloadTokens: consumeToken(token, userId)
  alt token valid
    DownloadTokens-->>Server: { filePath, downloadName }
    Server-->>User: res.download(filePath, downloadName)
    Server->>Cleanup: deleteImmediately(filePath)
  else invalid/expired
    DownloadTokens-->>Server: { error, code }
    Server-->>User: code error
  end
Loading

File-Level Changes

Change Details Files
Modularized media processing pipeline and reused it for single-file processing.
  • Extracted exiftool-based read→wipe→verify→inject→final-verify flow into server/processor.js as processMediaFile plus helpers.
  • Replaced inline /api/process implementation in server.js to delegate to processMediaFile, while preserving auth, plan checks, MP3 rejection, and response headers.
  • Standardized error handling from processMediaFile to map 422-format errors and generic failures into appropriate HTTP responses.
server.js
server/processor.js
Introduced explicit marker rules and benign/allowed tag classification for forensic analysis.
  • Added MARKER_RULES plus isBenign and isAllowedInjected helpers to classify tags into suspicious, benign, and allowed-injected sets.
  • Implemented detectMarkers and verifyFinalState to generate detailed forensic evidence and residual marker detection.
  • Wired processor.js to use the rule set to compute statuses, summaries, and audit fields in the returned report.
server/metadataRules.js
server/processor.js
Added durable cleanup tracking for generated outputs under uploads/.
  • Created cleanup_queue table and sweeper in server/cleanup.js to track and delete files after a TTL with safety checks for uploads/ paths.
  • Registered cleanup module in server.js and used registerForCleanup/deleteImmediately in single-file and batch flows.
  • Ensured cleanup records are marked deleted and aged-out rows are pruned to keep the table bounded.
server/cleanup.js
server.js
Implemented one-time download tokens for batch outputs.
  • Created download_tokens table and token management in server/downloadTokens.js with TTL-based sweeping and upload-dir safety validation.
  • Initialized download token module in server.js and added GET /api/download/:token route gated by auth.
  • Made batch processing issue per-file tokens tied to user and file path, enforcing one-time, account-scoped downloads.
server/downloadTokens.js
server.js
Added paid-only batch processing endpoint with usage headers and soft size limits.
  • Added POST /api/process-batch route that accepts up to 20 files, enforces paid plan requirement, and applies a 2GB soft total size guard.
  • Sequentially processes each non-MP3 file via processMediaFile, records jobs, and registers outputs for cleanup.
  • Returns a structured JSON result with per-file reports and download tokens plus X-Usage-* headers consistent with single-file processing.
server.js
Extended frontend typing and UI to consume richer forensic reports while remaining backward compatible.
  • Defined MarkerHit, ResidualTag, and ForensicReport interfaces in app.tsx with optional status, summaries, and audit arrays.
  • Updated QueueItem.report to use ForensicReport, preserving legacy fields removedCount, removedTags, and timestamp.
  • Prepared for a compact expandable audit UI (inferred from typing changes) without breaking existing consumers.
app.tsx
Documented new batch and download APIs and operational constraints.
  • Extended README with descriptions of /api/process-batch, /api/download/:token, MP3 limitations, and 2GB batch soft guard.
  • Clarified that MP3 server cleanse remains unsupported and that infra limits must still enforce upload/body size and disk constraints.
README.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ChrisAdamsdevelopment ChrisAdamsdevelopment merged commit af0ee06 into main May 13, 2026
4 of 5 checks passed
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new removedCount in processMediaFile is now based on marker hits (beforeMarkers.length) rather than the number of tags removed (removedTags.length), which changes the meaning of the X-Forensic-Removed header compared to the old implementation; if consumers expect the old semantics, consider either keeping the previous behavior or exposing both counts explicitly.
  • In /api/process-batch the JSON response always returns usage.limit: null even though the X-Usage-Limit header still carries the per-plan limit; if clients read the JSON usage object, consider populating limit consistently with the header value.
  • Several new code paths (e.g., the /api/process, /api/process-batch handlers and processMediaFile) compress complex logic into single-line try/catch blocks and long inline expressions, which makes the control flow difficult to follow; consider expanding these into multi-line statements or extracting helpers to improve readability and future maintainability.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `removedCount` in `processMediaFile` is now based on marker hits (`beforeMarkers.length`) rather than the number of tags removed (`removedTags.length`), which changes the meaning of the `X-Forensic-Removed` header compared to the old implementation; if consumers expect the old semantics, consider either keeping the previous behavior or exposing both counts explicitly.
- In `/api/process-batch` the JSON response always returns `usage.limit: null` even though the `X-Usage-Limit` header still carries the per-plan limit; if clients read the JSON usage object, consider populating `limit` consistently with the header value.
- Several new code paths (e.g., the `/api/process`, `/api/process-batch` handlers and `processMediaFile`) compress complex logic into single-line `try/catch` blocks and long inline expressions, which makes the control flow difficult to follow; consider expanding these into multi-line statements or extracting helpers to improve readability and future maintainability.

## Individual Comments

### Comment 1
<location path="server.js" line_range="523" />
<code_context>
+  const usedNow = getMonthlyJobCount(userId);
+  res.setHeader('X-Usage-This-Month', usedNow);
+  res.setHeader('X-Usage-Limit', userPlan === 'free' ? FREE_MONTHLY_LIMIT : 'unlimited');
+  return res.json({ results, usage: { thisMonth: usedNow, limit: null } });
+});
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Usage `limit` is always `null`, even for limited plans.

Here `usage.limit` is always `null` even though the headers differentiate between free and unlimited plans. This makes the batch response inconsistent with `/api/process` and can break clients that rely on the JSON body to display or enforce limits. Please set `usage.limit` in line with the headers, e.g. `userPlan === 'free' ? FREE_MONTHLY_LIMIT : null`.
</issue_to_address>

### Comment 2
<location path="server/processor.js" line_range="53-54" />
<code_context>
+  const finalTags = await exiftool.read(outputPath);
+  const finalMarkers = detectMarkers(finalTags);
+  const verification = verifyFinalState(finalTags);
+  const removedTags = beforeKeys.filter((k) => !(k in finalTags));
+  const removedCount = beforeMarkers.length;
+  const status = (!wipeVerificationPassed || finalMarkers.length > 0) ? 'review_required' : (verification.unexpectedDescriptive.length > 0 ? 'clean_with_notes' : 'clean');
+  const seo = '';
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Mismatch between `removedCount` and `removedTags` may confuse downstream consumers.

`removedTags` tracks tags that disappeared between pre- and post-state, but `removedCount` is based on `beforeMarkers.length` (forensic markers), not removed tags. Since callers may interpret `removedCount` as “tags removed”, this mismatch can be misleading, especially when multiple markers share a tag or when non-marker tags are removed. Please either rename the field (e.g. `removedMarkersCount`) or make it derive from `removedTags.length` to keep the semantics consistent.

```suggestion
  const removedTags = beforeKeys.filter((k) => !(k in finalTags));
  const removedCount = removedTags.length;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread server.js
const usedNow = getMonthlyJobCount(userId);
res.setHeader('X-Usage-This-Month', usedNow);
res.setHeader('X-Usage-Limit', userPlan === 'free' ? FREE_MONTHLY_LIMIT : 'unlimited');
return res.json({ results, usage: { thisMonth: usedNow, limit: null } });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Usage limit is always null, even for limited plans.

Here usage.limit is always null even though the headers differentiate between free and unlimited plans. This makes the batch response inconsistent with /api/process and can break clients that rely on the JSON body to display or enforce limits. Please set usage.limit in line with the headers, e.g. userPlan === 'free' ? FREE_MONTHLY_LIMIT : null.

Comment thread server/processor.js
Comment on lines +53 to +54
const removedTags = beforeKeys.filter((k) => !(k in finalTags));
const removedCount = beforeMarkers.length;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Mismatch between removedCount and removedTags may confuse downstream consumers.

removedTags tracks tags that disappeared between pre- and post-state, but removedCount is based on beforeMarkers.length (forensic markers), not removed tags. Since callers may interpret removedCount as “tags removed”, this mismatch can be misleading, especially when multiple markers share a tag or when non-marker tags are removed. Please either rename the field (e.g. removedMarkersCount) or make it derive from removedTags.length to keep the semantics consistent.

Suggested change
const removedTags = beforeKeys.filter((k) => !(k in finalTags));
const removedCount = beforeMarkers.length;
const removedTags = beforeKeys.filter((k) => !(k in finalTags));
const removedCount = removedTags.length;

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cc92395f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/processor.js
const verification = verifyFinalState(finalTags);
const removedTags = beforeKeys.filter((k) => !(k in finalTags));
const removedCount = beforeMarkers.length;
const status = (!wipeVerificationPassed || finalMarkers.length > 0) ? 'review_required' : (verification.unexpectedDescriptive.length > 0 ? 'clean_with_notes' : 'clean');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Base review status on filtered residual markers

verifyFinalState explicitly filters benign and allowed injected tags, but the status computation ignores that and uses finalMarkers.length > 0 from the unfiltered tag set. This means normal user-provided metadata (for example, an allowed Title containing a brand term like "OpenAI") can force review_required even when finalVerificationPassed is true, creating false-positive forensic failures.

Useful? React with 👍 / 👎.

Comment thread server/processor.js
const finalMarkers = detectMarkers(finalTags);
const verification = verifyFinalState(finalTags);
const removedTags = beforeKeys.filter((k) => !(k in finalTags));
const removedCount = beforeMarkers.length;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep removedCount aligned with removedTags/header semantics

removedCount is now set to beforeMarkers.length, which counts marker-rule hits, not removed metadata keys. Because one key/value can trigger multiple rules (or none), this can diverge from removedTags and from prior X-Forensic-Removed behavior, causing inconsistent audit output and misleading usage/UI metrics that assume the count corresponds to removed tags.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant