Skip to content

Optimize upload_visit_images by offloading I/O while preserving thread safety and security#714

Open
RohanExploit wants to merge 1 commit intomainfrom
jules-1540687093843833633-3d8ca3aa
Open

Optimize upload_visit_images by offloading I/O while preserving thread safety and security#714
RohanExploit wants to merge 1 commit intomainfrom
jules-1540687093843833633-3d8ca3aa

Conversation

@RohanExploit
Copy link
Copy Markdown
Owner

@RohanExploit RohanExploit commented Apr 30, 2026

Fixed an issue where uploading large image batches via the field officer /upload-images endpoint blocked the async event loop. \n\nBy carefully refactoring this logic, I have offloaded the blocking file writing open().write() loops to starlette.concurrency.run_in_threadpool, keeping the application snappy and responsive. Simultaneously, we ensured thread safety for the SQLAlchemy object by executing db.commit() safely in the primary thread. Security measures such as ALLOWED_IMAGE_EXTENSIONS allowlisting and mapping the PIL actual image formats have been applied to seal unrestricted file upload vectors.


PR created automatically by Jules for task 1540687093843833633 started by @RohanExploit


Summary by cubic

Optimized the field officer /upload-images endpoint by offloading blocking disk writes to a thread pool to keep the async server responsive. Preserves SQLAlchemy thread safety and hardens image validation and format handling.

  • Refactors
    • Offloaded file writes via starlette.concurrency.run_in_threadpool using a save_images helper.
    • Centralized image processing with backend.utils.process_uploaded_image (validation, EXIF stripping, resizing), and aligned saved extensions to PIL-detected formats with safe fallbacks.
    • Kept db.commit() on the main thread to avoid cross-thread SQLAlchemy session use.
    • Enforced route-level MAX_UPLOAD_SIZE (10MB) and strict ALLOWED_IMAGE_EXTENSIONS, preventing mismatched extension/format bypasses.

Written for commit 830da6e. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced image upload validation—files without proper file extensions are now rejected during upload
    • Improved processing pipeline for field visit image uploads to ensure consistent quality and handling
    • Refined file format detection and automatic normalization for better organization and cross-platform compatibility
    • Optimized image storage and processing workflow to improve overall reliability and performance

…n\n- Offloaded synchronous file I/O writes to the threadpool via `save_images`.\n- Preserved SQLAlchemy thread safety by keeping `db.commit()` on the main event loop.\n- Integrated centralized `process_uploaded_image` utility for resizing, EXIF stripping, and PIL-based validation.\n- Safely preserved file extension checks against `ALLOWED_IMAGE_EXTENSIONS` while extracting the actual PIL-detected format to prevent restricted file upload bypasses.\n- Retained `MAX_UPLOAD_SIZE` validation.
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 30, 2026

Deploy Preview for fixmybharat canceled.

Name Link
🔨 Latest commit 830da6e
🔍 Latest deploy log https://app.netlify.com/projects/fixmybharat/deploys/69f33b769292350008917b36

@github-actions
Copy link
Copy Markdown

🙏 Thank you for your contribution, @RohanExploit!

PR Details:

Quality Checklist:
Please ensure your PR meets the following criteria:

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Code is commented where necessary
  • Documentation updated (if applicable)
  • No new warnings generated
  • Tests added/updated (if applicable)
  • All tests passing locally
  • No breaking changes to existing functionality

Review Process:

  1. Automated checks will run on your code
  2. A maintainer will review your changes
  3. Address any requested changes promptly
  4. Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

The upload_visit_images route handler is refactored to process uploaded images through a shared utility, apply stricter filename validation, batch enqueue file-storage tuples, and delegate disk I/O to a thread pool while keeping database operations in the main async flow.

Changes

Cohort / File(s) Summary
Image Upload Processing
backend/routers/field_officer.py
Integrates shared process_uploaded_image utility to normalize and validate image content before storage. Adds stricter filename extension requirements. Batches file-storage operations (safe_filename, image_bytes) for threadpool-based disk writes. Extracts file extension from actual image format rather than filename, with jpeg normalized to jpg. Separates I/O operations from database updates for improved async handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • ⚡ Bolt: Optimize image upload validation #327: Modifies backend/utils.py image-processing logic (removal of img.verify and resampling changes), which directly relates to the image processing behavior now integrated into the upload_visit_images handler via the shared utility.

Suggested labels

ECWoC26, ECWoC26-L1

Poem

🐰 Images hop through processing streams,
Where threadpools spin and validation gleams,
No more raw bytes in careless flight,
Just verified pixels, structured right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: optimizing the upload endpoint by offloading I/O while maintaining thread safety and security, which aligns with the changeset's primary objective.
Description check ✅ Passed The description covers the main objective (fixing blocking I/O), the solution (threadpool offloading), thread safety measures, and security improvements. However, the 'Type of Change' section lacks selection of applicable checkboxes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jules-1540687093843833633-3d8ca3aa

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the field officer visit image upload endpoint to reduce async event loop blocking by moving blocking disk I/O to a threadpool, while adding stricter file-type validation steps during upload.

Changes:

  • Offloads batched image file writes to run_in_threadpool instead of writing in the async context.
  • Centralizes image validation/processing by using backend.utils.process_uploaded_image.
  • Tightens filename extension validation with an explicit allowlist requirement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/routers/field_officer.py
Comment thread backend/routers/field_officer.py
Comment on lines 369 to +373
# Generate secure filename
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}"
file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)

# Save file
with open(file_path, 'wb') as f:
f.write(content)
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"

# Store relative path
relative_path = os.path.join("data", "visit_images", safe_filename)
image_paths.append(relative_path)
images_to_save.append((safe_filename, image_bytes))
Comment on lines +356 to +362
pil_img, image_bytes = await process_uploaded_image(image)
# Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension
actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension
# Map some formats to standard extensions
if actual_ext == 'jpeg': actual_ext = 'jpg'
if actual_ext not in ALLOWED_IMAGE_EXTENSIONS:
actual_ext = extension
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routers/field_officer.py`:
- Around line 308-317: The current save_images function writes files to
VISIT_IMAGES_DIR before db.commit(), risking orphaned files if any file write or
the subsequent DB commit fails; update the flow to use a transactional and
cleanup strategy: within a try/except around the file writes + DB operations,
collect saved_paths (or write to a temporary directory/temporary filenames),
perform the DB operations and call db.commit(), and if any exception occurs call
db.rollback() and delete any files that were written (using the saved_paths or
temp-to-final rollback move); ensure the logic in save_images (and the
surrounding handler that calls db.commit()) either moves temp files to their
final paths only after a successful commit or removes written files on failure
so no orphaned files remain.
- Around line 370-372: The filename generation in the visit image handling (the
block creating timestamp and safe_filename) can collide under concurrent
requests; modify the logic in the same function that currently sets timestamp =
datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') and safe_filename =
f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" to append a random,
collision-resistant suffix (e.g., uuid4 hex) to the filename; add an import for
uuid and incorporate uuid.uuid4().hex (or a truncated form) into safe_filename
so names are unique even when timestamp and idx collide.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c3d03db-0b37-4555-8fe5-83a7d68182a6

📥 Commits

Reviewing files that changed from the base of the PR and between 3166316 and 830da6e.

📒 Files selected for processing (1)
  • backend/routers/field_officer.py

Comment on lines +308 to +317
def save_images(images_data):
saved_paths = []
for safe_filename, img_bytes in images_data:
file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)
with open(file_path, 'wb') as f:
f.write(img_bytes)
relative_path = os.path.join("data", "visit_images", safe_filename)
saved_paths.append(relative_path)
return saved_paths

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent orphaned files when disk write or DB commit fails.

Line 376 writes files before db.commit() on Line 382. If write partially fails or commit fails, uploaded files can be left on disk without DB references. Add rollback + file cleanup on failure paths.

💡 Suggested fix
+        written_abs_paths = []
         def save_images(images_data):
             saved_paths = []
             for safe_filename, img_bytes in images_data:
                 file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)
                 with open(file_path, 'wb') as f:
                     f.write(img_bytes)
+                written_abs_paths.append(file_path)
                 relative_path = os.path.join("data", "visit_images", safe_filename)
                 saved_paths.append(relative_path)
             return saved_paths
@@
-        image_paths = await run_in_threadpool(save_images, images_to_save)
+        try:
+            image_paths = await run_in_threadpool(save_images, images_to_save)
+            existing_images.extend(image_paths)
+            visit.visit_images = existing_images
+            visit.updated_at = datetime.now(timezone.utc)
+            db.commit()
+        except Exception:
+            db.rollback()
+            for p in written_abs_paths:
+                try:
+                    os.remove(p)
+                except FileNotFoundError:
+                    pass
+            raise
 
-        # Keep SQLAlchemy operations in main async context
-        existing_images.extend(image_paths)
-        visit.visit_images = existing_images
-        visit.updated_at = datetime.now(timezone.utc)
-        db.commit()

Also applies to: 375-383, 392-396

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/field_officer.py` around lines 308 - 317, The current
save_images function writes files to VISIT_IMAGES_DIR before db.commit(),
risking orphaned files if any file write or the subsequent DB commit fails;
update the flow to use a transactional and cleanup strategy: within a try/except
around the file writes + DB operations, collect saved_paths (or write to a
temporary directory/temporary filenames), perform the DB operations and call
db.commit(), and if any exception occurs call db.rollback() and delete any files
that were written (using the saved_paths or temp-to-final rollback move); ensure
the logic in save_images (and the surrounding handler that calls db.commit())
either moves temp files to their final paths only after a successful commit or
removes written files on failure so no orphaned files remain.

Comment on lines 370 to 372
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}"
file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)

# Save file
with open(file_path, 'wb') as f:
f.write(content)
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Filename generation can overwrite files under concurrency.

Line 370 uses second-level timestamp plus idx; concurrent requests for the same visit_id in the same second can produce identical filenames and overwrite existing images. Add a random suffix (e.g., uuid4) to make names collision-resistant.

💡 Suggested fix
+from uuid import uuid4
@@
-            safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"
+            safe_filename = f"visit_{visit_id}_{timestamp}_{idx}_{uuid4().hex[:8]}.{actual_ext}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}"
file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)
# Save file
with open(file_path, 'wb') as f:
f.write(content)
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}_{uuid4().hex[:8]}.{actual_ext}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/field_officer.py` around lines 370 - 372, The filename
generation in the visit image handling (the block creating timestamp and
safe_filename) can collide under concurrent requests; modify the logic in the
same function that currently sets timestamp =
datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') and safe_filename =
f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" to append a random,
collision-resistant suffix (e.g., uuid4 hex) to the filename; add an import for
uuid and incorporate uuid.uuid4().hex (or a truncated form) into safe_filename
so names are unique even when timestamp and idx collide.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/routers/field_officer.py">

<violation number="1" location="backend/routers/field_officer.py:358">
P2: `actual_ext` is derived from `pil_img.format`, but that value is always unset from `process_uploaded_image`, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.</violation>

<violation number="2" location="backend/routers/field_officer.py:371">
P2: Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same `visit_id` within the same second will produce identical filenames (e.g., both start at `idx=0`) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.</violation>

<violation number="3" location="backend/routers/field_officer.py:376">
P2: No cleanup of written files on failure. If `save_images` partially succeeds (e.g., disk-full mid-loop) or `db.commit()` raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

try:
pil_img, image_bytes = await process_uploaded_image(image)
# Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension
actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: actual_ext is derived from pil_img.format, but that value is always unset from process_uploaded_image, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 358:

<comment>`actual_ext` is derived from `pil_img.format`, but that value is always unset from `process_uploaded_image`, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.</comment>

<file context>
@@ -310,45 +324,61 @@ async def upload_visit_images(
+            try:
+                pil_img, image_bytes = await process_uploaded_image(image)
+                # Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension
+                actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension
+                # Map some formats to standard extensions
+                if actual_ext == 'jpeg': actual_ext = 'jpg'
</file context>

images_to_save.append((safe_filename, image_bytes))

# Offload only file writing to threadpool
image_paths = await run_in_threadpool(save_images, images_to_save)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: No cleanup of written files on failure. If save_images partially succeeds (e.g., disk-full mid-loop) or db.commit() raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 376:

<comment>No cleanup of written files on failure. If `save_images` partially succeeds (e.g., disk-full mid-loop) or `db.commit()` raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.</comment>

<file context>
@@ -310,45 +324,61 @@ async def upload_visit_images(
+            images_to_save.append((safe_filename, image_bytes))
+
+        # Offload only file writing to threadpool
+        image_paths = await run_in_threadpool(save_images, images_to_save)
         
-        # Update visit with image paths
</file context>

# Save file
with open(file_path, 'wb') as f:
f.write(content)
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same visit_id within the same second will produce identical filenames (e.g., both start at idx=0) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 371:

<comment>Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same `visit_id` within the same second will produce identical filenames (e.g., both start at `idx=0`) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.</comment>

<file context>
@@ -310,45 +324,61 @@ async def upload_visit_images(
-            # Save file
-            with open(file_path, 'wb') as f:
-                f.write(content)
+            safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"
             
-            # Store relative path
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants