Skip to content

Conversation

oscartbeaumont
Copy link
Member

@oscartbeaumont oscartbeaumont commented Oct 14, 2025

Replaces #1181 which I merged by mistake

Summary by CodeRabbit

  • New Features

    • Faster multipart uploads by prefetching the next part’s upload URL in the background, improving throughput and reducing stalls for large or concurrent uploads.
    • Improved upload progress continuity and reduced chance of transient mismatches when sending many parts.
  • Refactor

    • Removed the MD5 requirement from multipart presign requests, simplifying upload payloads and improving compatibility with varied clients.

Copy link
Contributor

coderabbitai bot commented Oct 14, 2025

Walkthrough

Removed md5/ContentMD5 from presign-part requests/handlers. Desktop uploader now prefetches the next part’s presigned URL concurrently, conditionally reuses or refetches it based on part numbers, and stops prefetching when the stream ends.

Changes

Cohort / File(s) Summary
Desktop Tauri API md5 removal
apps/desktop/src-tauri/src/api.rs
Removed the md5_sum parameter from upload_multipart_presign_part and removed md5Sum from its JSON payload; control flow and error handling unchanged.
Desktop uploader optimistic presign
apps/desktop/src-tauri/src/upload.rs
Added expected_part_number and join-based prefetch: concurrently await next stream chunk and presign for expected part, reuse or refetch presign URL if mismatched, compute md5/size and upload as before, increment expected part, break when stream ends.
Web presign-part schema/handler update
apps/web/app/api/upload/[...route]/multipart.ts
Removed required md5Sum from the presign-part request schema and handler; no longer passes ContentMD5 to getPresignedUploadPartUrl. Other endpoints unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Desktop Uploader
  participant B as Backend /presign-part
  participant S as Storage

  rect rgb(245,250,240)
  note right of U: Start upload, prefetch presign for part=1
  U->>B: POST /presign-part {fileKey, uploadId, partNumber=1}
  B-->>U: {url, headers}
  end

  loop For each part N
    alt Prefetched URL matches N
      U-->>U: use cached presign URL(N)
    else Prefetch mismatch
      U->>B: POST /presign-part {partNumber=N}
      B-->>U: {url, headers}
    end
    U->>S: PUT part N using returned URL (include computed Content-MD5 header locally)
    U-->>U: expected_part_number = N+1
    U->>B: POST /presign-part {partNumber=N+1} [async]
  end

  opt Completion
    U-->>U: cancel/cleanup outstanding presign task
  end
Loading
sequenceDiagram
  autonumber
  participant C as Client (desktop/web)
  participant B as Backend /presign-part
  participant S as Storage

  note over C,B: md5Sum removed from request/response
  C->>B: POST /presign-part {fileKey, uploadId, partNumber}
  B-->>C: 200 {url, headers}  %% no md5Sum/ContentMD5 in payload
  C->>S: PUT part using returned URL
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

I nibble bytes and fetch ahead,
Prefetching links where parts are led.
No md5 crumbs to mark my way,
I stitch the upload, hop and play. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the addition of optimistic presigned URL logic for multipart uploads and directly reflects the core change without extraneous details.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch optimistic-presigned-urls

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f97e1a and dea6aec.

📒 Files selected for processing (1)
  • apps/desktop/src-tauri/src/upload.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/desktop/src-tauri/src/upload.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Desktop (aarch64-apple-darwin, macos-latest)
  • GitHub Check: Build Desktop (x86_64-pc-windows-msvc, windows-latest)
  • GitHub Check: Analyze (rust)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@oscartbeaumont oscartbeaumont marked this pull request as ready for review October 15, 2025 06:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
apps/web/app/api/upload/[...route]/multipart.ts (1)

156-156: Tighten input validation for partNumber

Use integer and valid range to avoid bad inputs.

- const { uploadId, partNumber, ...body } = c.req.valid("json");
+ const { uploadId, partNumber, ...body } = c.req.valid("json");

And update the schema above:

- partNumber: z.number(),
+ partNumber: z.number().int().min(1).max(10000),
apps/desktop/src-tauri/src/upload.rs (4)

633-646: Good addition: optimistic presign prefetch initialization

Seeding a background presign for part 1 and tracking expected_part_number is sound. Consider documenting the invariant that the first yielded chunk is usually part 1, but may re-emit part 1 at the end (your later dedupe handles this).


654-671: Avoid blocking on unfinished prefetch to preserve the “optimistic” benefit

Currently you always await the prefetch when expected_part_number == part_number. If the task isn’t finished, this blocks and negates the optimization. Prefer using the prefetched URL only when the task has already completed; otherwise abort it and fetch inline, or just fetch inline and keep/replace the task deterministically.

Apply a non-blocking variant like:

-            let presigned_url = if expected_part_number == part_number {
-                if let Some(task) = optimistic_presigned_url_task.take() {
-                    task.await
-                        .map_err(|e| format!("uploader/part/{part_number}/task_join: {e:?}"))?
-                        .map_err(|e| format!("uploader/part/{part_number}/presign: {e}"))?
-                } else {
-                    api::upload_multipart_presign_part(&app, &video_id, &upload_id, part_number)
-                        .await?
-                }
-            } else {
+            let presigned_url = if expected_part_number == part_number {
+                if let Some(ref handle) = optimistic_presigned_url_task {
+                    if handle.is_finished() {
+                        // Use prefetched result
+                        optimistic_presigned_url_task
+                            .take()
+                            .unwrap()
+                            .await
+                            .map_err(|e| format!("uploader/part/{part_number}/task_join: {e:?}"))?
+                            .map_err(|e| format!("uploader/part/{part_number}/presign: {e}"))?
+                    } else {
+                        // Prefer not to wait; abort to avoid stale work and fetch inline
+                        handle.abort();
+                        optimistic_presigned_url_task = None;
+                        api::upload_multipart_presign_part(&app, &video_id, &upload_id, part_number)
+                            .await?
+                    }
+                } else {
+                    api::upload_multipart_presign_part(&app, &video_id, &upload_id, part_number)
+                        .await?
+                }
+            } else {
                 // The optimistic presigned URL doesn't match, abort it and generate a new correct one
                 if let Some(task) = optimistic_presigned_url_task.take() {
                     task.abort();
                 }
                 debug!("Throwing out optimistic presigned URL for part {expected_part_number} as part {part_number} was requested!");
                 api::upload_multipart_presign_part(&app, &video_id, &upload_id, part_number)
                     .await?
             };

Also consider keeping the aborted task rare by adjusting prefetch only when the stream advances sequentially.


704-717: Prefetch for next part: solid; minor ergonomic nit

Spawning for expected_part_number is correct and captures by value. Optional: bound retries/timeouts in upload_multipart_presign_part to avoid hanging tasks if the server is slow/unavailable.


680-681: Reconcile Content-MD5 header with server-side presign change

Server no longer signs/consumes md5Sum for presign. Sending Content-MD5 is optional; if it wasn’t part of the signed headers, it’s usually ignored by SigV4 verification, but S3 will still validate the MD5 if present. If you hit SignatureDoesNotMatch or intermittent 403s, drop this header or gate it behind a feature flag.

-                .header("Content-MD5", &md5_sum)
+                // Consider dropping Content-MD5 now that presign doesn’t include it
+                // .header("Content-MD5", &md5_sum)

If keeping it, please confirm presigned URLs are generated without requiring Content-MD5 and uploads succeed consistently.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b200fe4 and bc40b2c.

📒 Files selected for processing (3)
  • apps/desktop/src-tauri/src/api.rs (0 hunks)
  • apps/desktop/src-tauri/src/upload.rs (2 hunks)
  • apps/web/app/api/upload/[...route]/multipart.ts (1 hunks)
💤 Files with no reviewable changes (1)
  • apps/desktop/src-tauri/src/api.rs
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Use a 2-space indent for TypeScript code.
Use Biome for formatting and linting TypeScript/JavaScript files by running pnpm format.

Use strict TypeScript and avoid any; leverage shared types

Files:

  • apps/web/app/api/upload/[...route]/multipart.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx,js,jsx}: Use kebab-case for filenames for TypeScript/JavaScript modules (e.g., user-menu.tsx).
Use PascalCase for React/Solid components.

Files:

  • apps/web/app/api/upload/[...route]/multipart.ts
apps/web/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

On the client, always use useEffectQuery or useEffectMutation from @/lib/EffectRuntime; never call EffectRuntime.run* directly in components.

Files:

  • apps/web/app/api/upload/[...route]/multipart.ts
apps/web/app/api/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

apps/web/app/api/**/*.{ts,tsx}: Prefer Server Actions for API surface; when routes are necessary, implement under app/api and export only the handler from apiToHandler(ApiLive)
Construct API routes with @effect/platform HttpApi/HttpApiBuilder, declare contracts with Schema, and only export the handler
Use HttpAuthMiddleware for required auth and provideOptionalAuth for guests; avoid duplicating session lookups
Map domain errors to transport with HttpApiError.* and keep translation exhaustive (catchTags/tapErrorCause)
Inside HttpApiBuilder.group, acquire services with Effect.gen and provide dependencies via Layer.provide instead of manual provideService

Files:

  • apps/web/app/api/upload/[...route]/multipart.ts
apps/web/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

apps/web/**/*.{ts,tsx}: Use TanStack Query v5 for all client-side server state and fetching in the web app
Mutations should call Server Actions directly and perform targeted cache updates with setQueryData/setQueriesData
Run server-side effects via the ManagedRuntime from apps/web/lib/server.ts using EffectRuntime.runPromise/runPromiseExit; do not create runtimes ad hoc
Client code should use helpers from apps/web/lib/EffectRuntime.ts (useEffectQuery, useEffectMutation, useRpcClient); never call ManagedRuntime.make inside components

Files:

  • apps/web/app/api/upload/[...route]/multipart.ts
apps/web/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Server components needing Effect services must call EffectRuntime.runPromise(effect.pipe(provideOptionalAuth))

Files:

  • apps/web/app/api/upload/[...route]/multipart.ts
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Format Rust code using rustfmt and ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.

Files:

  • apps/desktop/src-tauri/src/upload.rs
🧬 Code graph analysis (1)
apps/desktop/src-tauri/src/upload.rs (1)
apps/desktop/src-tauri/src/api.rs (1)
  • upload_multipart_presign_part (50-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Desktop (aarch64-apple-darwin, macos-latest)
  • GitHub Check: Build Desktop (x86_64-pc-windows-msvc, windows-latest)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (1)
apps/desktop/src-tauri/src/upload.rs (1)

719-721: Cleanup OK

Aborting any remaining optimistic task at the end avoids leaks.

Ensure aborted tasks don’t log panics in your tracing setup.

Comment on lines +150 to 153
// deprecated
// z.object({ md5Sum: z.string() }),
]),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

md5Sum removal acknowledged; please verify no remaining callers depend on it

Dropping md5Sum from the schema is fine and aligns with the API change. Ensure no clients still send md5Sum or expect ContentMD5 to be signed.

Run to confirm repo-wide consistency:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for md5Sum payload usage..."
rg -nC2 -iP '\bmd5\s*Sum\b|"md5Sum"\s*:|md5_sum' --glob '!**/node_modules/**'

echo "Searching for Content-MD5 header usage..."
rg -nC2 -iP 'Content-MD5' --glob '!**/node_modules/**'

Length of output: 3968


Remove md5Sum payload and Content-MD5 header in desktop clients
apps/desktop/src-tauri/src/upload.rs (lines 651–652, 680–681) and upload_legacy.rs (lines 914–916, 975–976) still send these; drop them to match the updated API schema.

🤖 Prompt for AI Agents
In apps/web/app/api/upload/[...route]/multipart.ts around lines 150-153, remove
the deprecated md5Sum field from the request schema and stop accepting or
extracting a Content-MD5 header: delete the z.object({ md5Sum: z.string() })
entry and any related parsing/validation logic, and ensure incoming requests
that include Content-MD5 are ignored (or stripped) rather than causing
validation or processing to rely on it; update any docs/tests referring to
md5Sum accordingly and coordinate with the desktop clients
(apps/desktop/**/upload.rs) to stop sending that header/payload.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
apps/desktop/src-tauri/src/upload.rs (2)

17-17: Avoid stream name collision (macro vs module).

use async_stream::{stream, ...}; (macro) and use futures::{..., stream}; (module) share the same identifier. Legal (separate namespaces) but confusing. Consider aliasing the module to improve readability.

Example:

-use futures::{Stream, StreamExt, TryStreamExt, future::join, stream};
+use futures::{Stream, StreamExt, TryStreamExt, future::join, stream as fstream};

Then call fstream::once(...) at usage sites.


634-644: Prefetch join cannot be canceled; consider select + abort for mismatches.

futures::join waits for both futures. If the arriving chunk’s part_number differs (e.g., header re-emit of part 1), you still await the now-useless prefetch before issuing the correct presign. Prefer spawning the prefetch and using tokio::select! so you can abort it when it becomes irrelevant.

Would you like a small refactor that:

  • Spawns a prefetch task for expected_part_number
  • On chunk arrival, cancels prefetch on mismatch
  • Reuses completed prefetch on match
  • Cleans up any outstanding task when the stream ends?
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc40b2c and 292dc97.

📒 Files selected for processing (1)
  • apps/desktop/src-tauri/src/upload.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Format Rust code using rustfmt and ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.

Files:

  • apps/desktop/src-tauri/src/upload.rs
🧬 Code graph analysis (1)
apps/desktop/src-tauri/src/upload.rs (1)
apps/desktop/src-tauri/src/api.rs (1)
  • upload_multipart_presign_part (50-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Desktop (aarch64-apple-darwin, macos-latest)
  • GitHub Check: Build Desktop (x86_64-pc-windows-msvc, windows-latest)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (1)
apps/desktop/src-tauri/src/upload.rs (1)

689-690: LGTM: update next expected part number.

Incrementing expected_part_number after a successful upload is correct.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 292dc97 and 2f97e1a.

📒 Files selected for processing (1)
  • apps/desktop/src-tauri/src/upload.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Format Rust code using rustfmt and ensure all Rust code passes workspace-level clippy lints.
Rust modules should be named with snake_case, and crate directories should be in kebab-case.

Files:

  • apps/desktop/src-tauri/src/upload.rs
🧬 Code graph analysis (1)
apps/desktop/src-tauri/src/upload.rs (1)
apps/desktop/src-tauri/src/api.rs (1)
  • upload_multipart_presign_part (50-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (1)
apps/desktop/src-tauri/src/upload.rs (1)

17-17: LGTM: Import added for concurrent prefetch optimization.

The future::join import is correctly added to support the concurrent prefetching of presigned URLs introduced in the multipart_uploader function.

@Brendonovich Brendonovich merged commit d35c189 into main Oct 15, 2025
15 checks passed
@oscartbeaumont oscartbeaumont deleted the optimistic-presigned-urls branch October 17, 2025 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants