Optimize `upload_visit_images` by offloading I/O while preserving thread safety and security by RohanExploit · Pull Request #714 · RohanExploit/VishwaGuru

RohanExploit · 2026-04-30T11:22:26Z

Fixed an issue where uploading large image batches via the field officer /upload-images endpoint blocked the async event loop. \n\nBy carefully refactoring this logic, I have offloaded the blocking file writing open().write() loops to starlette.concurrency.run_in_threadpool, keeping the application snappy and responsive. Simultaneously, we ensured thread safety for the SQLAlchemy object by executing db.commit() safely in the primary thread. Security measures such as ALLOWED_IMAGE_EXTENSIONS allowlisting and mapping the PIL actual image formats have been applied to seal unrestricted file upload vectors.

PR created automatically by Jules for task 1540687093843833633 started by @RohanExploit

Summary by cubic

Optimized the field officer /upload-images endpoint by offloading blocking disk writes to a thread pool to keep the async server responsive. Preserves SQLAlchemy thread safety and hardens image validation and format handling.

Refactors
- Offloaded file writes via starlette.concurrency.run_in_threadpool using a save_images helper.
- Centralized image processing with backend.utils.process_uploaded_image (validation, EXIF stripping, resizing), and aligned saved extensions to PIL-detected formats with safe fallbacks.
- Kept db.commit() on the main thread to avoid cross-thread SQLAlchemy session use.
- Enforced route-level MAX_UPLOAD_SIZE (10MB) and strict ALLOWED_IMAGE_EXTENSIONS, preventing mismatched extension/format bypasses.

^{Written for commit 830da6e. Summary will update on new commits. Review in cubic}

Summary by CodeRabbit

Bug Fixes
- Enhanced image upload validation—files without proper file extensions are now rejected during upload
- Improved processing pipeline for field visit image uploads to ensure consistent quality and handling
- Refined file format detection and automatic normalization for better organization and cross-platform compatibility
- Optimized image storage and processing workflow to improve overall reliability and performance

…n\n- Offloaded synchronous file I/O writes to the threadpool via `save_images`.\n- Preserved SQLAlchemy thread safety by keeping `db.commit()` on the main event loop.\n- Integrated centralized `process_uploaded_image` utility for resizing, EXIF stripping, and PIL-based validation.\n- Safely preserved file extension checks against `ALLOWED_IMAGE_EXTENSIONS` while extracting the actual PIL-detected format to prevent restricted file upload bypasses.\n- Retained `MAX_UPLOAD_SIZE` validation.

google-labs-jules · 2026-04-30T11:22:27Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

netlify · 2026-04-30T11:22:34Z

✅ Deploy Preview for fixmybharat canceled.

Name	Link
🔨 Latest commit	`830da6e`
🔍 Latest deploy log	https://app.netlify.com/projects/fixmybharat/deploys/69f33b769292350008917b36

github-actions · 2026-04-30T11:22:36Z

🙏 Thank you for your contribution, @RohanExploit!

PR Details:

Title: Optimize upload_visit_images by offloading I/O while preserving thread safety and security
Number: Optimize upload_visit_images by offloading I/O while preserving thread safety and security #714

Quality Checklist:
Please ensure your PR meets the following criteria:

Code follows the project's style guidelines
Self-review of code completed
Code is commented where necessary
Documentation updated (if applicable)
No new warnings generated
Tests added/updated (if applicable)
All tests passing locally
No breaking changes to existing functionality

Review Process:

Automated checks will run on your code
A maintainer will review your changes
Address any requested changes promptly
Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

coderabbitai · 2026-04-30T11:22:51Z

📝 Walkthrough

Walkthrough

The upload_visit_images route handler is refactored to process uploaded images through a shared utility, apply stricter filename validation, batch enqueue file-storage tuples, and delegate disk I/O to a thread pool while keeping database operations in the main async flow.

Changes

Cohort / File(s)	Summary
Image Upload Processing `backend/routers/field_officer.py`	Integrates shared `process_uploaded_image` utility to normalize and validate image content before storage. Adds stricter filename extension requirements. Batches file-storage operations `(safe_filename, image_bytes)` for threadpool-based disk writes. Extracts file extension from actual image format rather than filename, with `jpeg` normalized to `jpg`. Separates I/O operations from database updates for improved async handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

⚡ Bolt: Optimize image upload validation #327: Modifies backend/utils.py image-processing logic (removal of img.verify and resampling changes), which directly relates to the image processing behavior now integrated into the upload_visit_images handler via the shared utility.

Suggested labels

ECWoC26, ECWoC26-L1

Poem

🐰 Images hop through processing streams,
Where threadpools spin and validation gleams,
No more raw bytes in careless flight,
Just verified pixels, structured right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: optimizing the upload endpoint by offloading I/O while maintaining thread safety and security, which aligns with the changeset's primary objective.
Description check	✅ Passed	The description covers the main objective (fixing blocking I/O), the solution (threadpool offloading), thread safety measures, and security improvements. However, the 'Type of Change' section lacks selection of applicable checkboxes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jules-1540687093843833633-3d8ca3aa

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Refactors the field officer visit image upload endpoint to reduce async event loop blocking by moving blocking disk I/O to a threadpool, while adding stricter file-type validation steps during upload.

Changes:

Offloads batched image file writes to run_in_threadpool instead of writing in the async context.
Centralizes image validation/processing by using backend.utils.process_uploaded_image.
Tightens filename extension validation with an explicit allowlist requirement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

            # Generate secure filename
            timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
-            safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}"
-            file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)
-
-            # Save file
-            with open(file_path, 'wb') as f:
-                f.write(content)
+            safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"

-            # Store relative path
-            relative_path = os.path.join("data", "visit_images", safe_filename)
-            image_paths.append(relative_path)
+            images_to_save.append((safe_filename, image_bytes))


+                pil_img, image_bytes = await process_uploaded_image(image)
+                # Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension
+                actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension
+                # Map some formats to standard extensions
+                if actual_ext == 'jpeg': actual_ext = 'jpg'
+                if actual_ext not in ALLOWED_IMAGE_EXTENSIONS:
+                    actual_ext = extension


coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routers/field_officer.py`:
- Around line 308-317: The current save_images function writes files to
VISIT_IMAGES_DIR before db.commit(), risking orphaned files if any file write or
the subsequent DB commit fails; update the flow to use a transactional and
cleanup strategy: within a try/except around the file writes + DB operations,
collect saved_paths (or write to a temporary directory/temporary filenames),
perform the DB operations and call db.commit(), and if any exception occurs call
db.rollback() and delete any files that were written (using the saved_paths or
temp-to-final rollback move); ensure the logic in save_images (and the
surrounding handler that calls db.commit()) either moves temp files to their
final paths only after a successful commit or removes written files on failure
so no orphaned files remain.
- Around line 370-372: The filename generation in the visit image handling (the
block creating timestamp and safe_filename) can collide under concurrent
requests; modify the logic in the same function that currently sets timestamp =
datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') and safe_filename =
f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" to append a random,
collision-resistant suffix (e.g., uuid4 hex) to the filename; add an import for
uuid and incorporate uuid.uuid4().hex (or a truncated form) into safe_filename
so names are unique even when timestamp and idx collide.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c3d03db-0b37-4555-8fe5-83a7d68182a6

📥 Commits

Reviewing files that changed from the base of the PR and between 3166316 and 830da6e.

📒 Files selected for processing (1)

backend/routers/field_officer.py

coderabbitai · 2026-04-30T11:26:03Z

+        def save_images(images_data):
+            saved_paths = []
+            for safe_filename, img_bytes in images_data:
+                file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)
+                with open(file_path, 'wb') as f:
+                    f.write(img_bytes)
+                relative_path = os.path.join("data", "visit_images", safe_filename)
+                saved_paths.append(relative_path)
+            return saved_paths
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent orphaned files when disk write or DB commit fails.

Line 376 writes files before db.commit() on Line 382. If write partially fails or commit fails, uploaded files can be left on disk without DB references. Add rollback + file cleanup on failure paths.

💡 Suggested fix

+ written_abs_paths = [] def save_images(images_data): saved_paths = [] for safe_filename, img_bytes in images_data: file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename) with open(file_path, 'wb') as f: f.write(img_bytes) + written_abs_paths.append(file_path) relative_path = os.path.join("data", "visit_images", safe_filename) saved_paths.append(relative_path) return saved_paths @@ - image_paths = await run_in_threadpool(save_images, images_to_save) + try: + image_paths = await run_in_threadpool(save_images, images_to_save) + existing_images.extend(image_paths) + visit.visit_images = existing_images + visit.updated_at = datetime.now(timezone.utc) + db.commit() + except Exception: + db.rollback() + for p in written_abs_paths: + try: + os.remove(p) + except FileNotFoundError: + pass + raise - # Keep SQLAlchemy operations in main async context - existing_images.extend(image_paths) - visit.visit_images = existing_images - visit.updated_at = datetime.now(timezone.utc) - db.commit()

Also applies to: 375-383, 392-396

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/routers/field_officer.py` around lines 308 - 317, The current save_images function writes files to VISIT_IMAGES_DIR before db.commit(), risking orphaned files if any file write or the subsequent DB commit fails; update the flow to use a transactional and cleanup strategy: within a try/except around the file writes + DB operations, collect saved_paths (or write to a temporary directory/temporary filenames), perform the DB operations and call db.commit(), and if any exception occurs call db.rollback() and delete any files that were written (using the saved_paths or temp-to-final rollback move); ensure the logic in save_images (and the surrounding handler that calls db.commit()) either moves temp files to their final paths only after a successful commit or removes written files on failure so no orphaned files remain.

coderabbitai · 2026-04-30T11:26:04Z

            timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
-            safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}"
-            file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)
-
-            # Save file
-            with open(file_path, 'wb') as f:
-                f.write(content)
+            safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"



⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Filename generation can overwrite files under concurrency.

Line 370 uses second-level timestamp plus idx; concurrent requests for the same visit_id in the same second can produce identical filenames and overwrite existing images. Add a random suffix (e.g., uuid4) to make names collision-resistant.

💡 Suggested fix

+from uuid import uuid4 @@ - safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" + safe_filename = f"visit_{visit_id}_{timestamp}_{idx}_{uuid4().hex[:8]}.{actual_ext}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')

safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}"

file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)

# Save file

with open(file_path, 'wb') as f:

f.write(content)

safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"

timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')

safe_filename = f"visit_{visit_id}_{timestamp}_{idx}_{uuid4().hex[:8]}.{actual_ext}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/routers/field_officer.py` around lines 370 - 372, The filename generation in the visit image handling (the block creating timestamp and safe_filename) can collide under concurrent requests; modify the logic in the same function that currently sets timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') and safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" to append a random, collision-resistant suffix (e.g., uuid4 hex) to the filename; add an import for uuid and incorporate uuid.uuid4().hex (or a truncated form) into safe_filename so names are unique even when timestamp and idx collide.

cubic-dev-ai

3 issues found across 1 file

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/routers/field_officer.py">

<violation number="1" location="backend/routers/field_officer.py:358">
P2: `actual_ext` is derived from `pil_img.format`, but that value is always unset from `process_uploaded_image`, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.</violation>

<violation number="2" location="backend/routers/field_officer.py:371">
P2: Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same `visit_id` within the same second will produce identical filenames (e.g., both start at `idx=0`) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.</violation>

<violation number="3" location="backend/routers/field_officer.py:376">
P2: No cleanup of written files on failure. If `save_images` partially succeeds (e.g., disk-full mid-loop) or `db.commit()` raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-30T11:29:10Z

+            try:
+                pil_img, image_bytes = await process_uploaded_image(image)
+                # Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension
+                actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension


P2: actual_ext is derived from pil_img.format, but that value is always unset from process_uploaded_image, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 358: <comment>`actual_ext` is derived from `pil_img.format`, but that value is always unset from `process_uploaded_image`, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.</comment> <file context> @@ -310,45 +324,61 @@ async def upload_visit_images( + try: + pil_img, image_bytes = await process_uploaded_image(image) + # Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension + actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension + # Map some formats to standard extensions + if actual_ext == 'jpeg': actual_ext = 'jpg' </file context>

cubic-dev-ai · 2026-04-30T11:29:10Z

+            images_to_save.append((safe_filename, image_bytes))
+
+        # Offload only file writing to threadpool
+        image_paths = await run_in_threadpool(save_images, images_to_save)


P2: No cleanup of written files on failure. If save_images partially succeeds (e.g., disk-full mid-loop) or db.commit() raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 376: <comment>No cleanup of written files on failure. If `save_images` partially succeeds (e.g., disk-full mid-loop) or `db.commit()` raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.</comment> <file context> @@ -310,45 +324,61 @@ async def upload_visit_images( + images_to_save.append((safe_filename, image_bytes)) + + # Offload only file writing to threadpool + image_paths = await run_in_threadpool(save_images, images_to_save) - # Update visit with image paths </file context>

cubic-dev-ai · 2026-04-30T11:29:10Z

-            # Save file
-            with open(file_path, 'wb') as f:
-                f.write(content)
+            safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"


P2: Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same visit_id within the same second will produce identical filenames (e.g., both start at idx=0) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 371: <comment>Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same `visit_id` within the same second will produce identical filenames (e.g., both start at `idx=0`) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.</comment> <file context> @@ -310,45 +324,61 @@ async def upload_visit_images( - # Save file - with open(file_path, 'wb') as f: - f.write(content) + safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" - # Store relative path </file context>

Copilot AI review requested due to automatic review settings April 30, 2026 11:22

RohanExploit deployed to jules-1540687093843833633-3d8ca3aa - vishwaguru-backend PR #714 April 30, 2026 11:22 — with Render View deployment

github-actions Bot added the size/s label Apr 30, 2026

Copilot started reviewing on behalf of RohanExploit April 30, 2026 11:23 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Apr 30, 2026

View reviewed changes

Conversation

RohanExploit commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Summary by CodeRabbit

Uh oh!

google-labs-jules Bot commented Apr 30, 2026

Uh oh!

netlify Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for fixmybharat canceled.

Uh oh!

github-actions Bot commented Apr 30, 2026

🙏 Thank you for your contribution, @RohanExploit!

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RohanExploit commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

netlify Bot commented Apr 30, 2026 •

edited

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading