Optimize upload_visit_images by offloading I/O while preserving thread safety and security#714
Optimize upload_visit_images by offloading I/O while preserving thread safety and security#714RohanExploit wants to merge 1 commit intomainfrom
upload_visit_images by offloading I/O while preserving thread safety and security#714Conversation
…n\n- Offloaded synchronous file I/O writes to the threadpool via `save_images`.\n- Preserved SQLAlchemy thread safety by keeping `db.commit()` on the main event loop.\n- Integrated centralized `process_uploaded_image` utility for resizing, EXIF stripping, and PIL-based validation.\n- Safely preserved file extension checks against `ALLOWED_IMAGE_EXTENSIONS` while extracting the actual PIL-detected format to prevent restricted file upload bypasses.\n- Retained `MAX_UPLOAD_SIZE` validation.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
✅ Deploy Preview for fixmybharat canceled.
|
🙏 Thank you for your contribution, @RohanExploit!PR Details:
Quality Checklist:
Review Process:
Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken. |
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Pull request overview
Refactors the field officer visit image upload endpoint to reduce async event loop blocking by moving blocking disk I/O to a threadpool, while adding stricter file-type validation steps during upload.
Changes:
- Offloads batched image file writes to
run_in_threadpoolinstead of writing in the async context. - Centralizes image validation/processing by using
backend.utils.process_uploaded_image. - Tightens filename extension validation with an explicit allowlist requirement.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Generate secure filename | ||
| timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') | ||
| safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}" | ||
| file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename) | ||
|
|
||
| # Save file | ||
| with open(file_path, 'wb') as f: | ||
| f.write(content) | ||
| safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" | ||
|
|
||
| # Store relative path | ||
| relative_path = os.path.join("data", "visit_images", safe_filename) | ||
| image_paths.append(relative_path) | ||
| images_to_save.append((safe_filename, image_bytes)) |
| pil_img, image_bytes = await process_uploaded_image(image) | ||
| # Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension | ||
| actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension | ||
| # Map some formats to standard extensions | ||
| if actual_ext == 'jpeg': actual_ext = 'jpg' | ||
| if actual_ext not in ALLOWED_IMAGE_EXTENSIONS: | ||
| actual_ext = extension |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routers/field_officer.py`:
- Around line 308-317: The current save_images function writes files to
VISIT_IMAGES_DIR before db.commit(), risking orphaned files if any file write or
the subsequent DB commit fails; update the flow to use a transactional and
cleanup strategy: within a try/except around the file writes + DB operations,
collect saved_paths (or write to a temporary directory/temporary filenames),
perform the DB operations and call db.commit(), and if any exception occurs call
db.rollback() and delete any files that were written (using the saved_paths or
temp-to-final rollback move); ensure the logic in save_images (and the
surrounding handler that calls db.commit()) either moves temp files to their
final paths only after a successful commit or removes written files on failure
so no orphaned files remain.
- Around line 370-372: The filename generation in the visit image handling (the
block creating timestamp and safe_filename) can collide under concurrent
requests; modify the logic in the same function that currently sets timestamp =
datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') and safe_filename =
f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" to append a random,
collision-resistant suffix (e.g., uuid4 hex) to the filename; add an import for
uuid and incorporate uuid.uuid4().hex (or a truncated form) into safe_filename
so names are unique even when timestamp and idx collide.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c3d03db-0b37-4555-8fe5-83a7d68182a6
📒 Files selected for processing (1)
backend/routers/field_officer.py
| def save_images(images_data): | ||
| saved_paths = [] | ||
| for safe_filename, img_bytes in images_data: | ||
| file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename) | ||
| with open(file_path, 'wb') as f: | ||
| f.write(img_bytes) | ||
| relative_path = os.path.join("data", "visit_images", safe_filename) | ||
| saved_paths.append(relative_path) | ||
| return saved_paths | ||
|
|
There was a problem hiding this comment.
Prevent orphaned files when disk write or DB commit fails.
Line 376 writes files before db.commit() on Line 382. If write partially fails or commit fails, uploaded files can be left on disk without DB references. Add rollback + file cleanup on failure paths.
💡 Suggested fix
+ written_abs_paths = []
def save_images(images_data):
saved_paths = []
for safe_filename, img_bytes in images_data:
file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)
with open(file_path, 'wb') as f:
f.write(img_bytes)
+ written_abs_paths.append(file_path)
relative_path = os.path.join("data", "visit_images", safe_filename)
saved_paths.append(relative_path)
return saved_paths
@@
- image_paths = await run_in_threadpool(save_images, images_to_save)
+ try:
+ image_paths = await run_in_threadpool(save_images, images_to_save)
+ existing_images.extend(image_paths)
+ visit.visit_images = existing_images
+ visit.updated_at = datetime.now(timezone.utc)
+ db.commit()
+ except Exception:
+ db.rollback()
+ for p in written_abs_paths:
+ try:
+ os.remove(p)
+ except FileNotFoundError:
+ pass
+ raise
- # Keep SQLAlchemy operations in main async context
- existing_images.extend(image_paths)
- visit.visit_images = existing_images
- visit.updated_at = datetime.now(timezone.utc)
- db.commit()Also applies to: 375-383, 392-396
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/routers/field_officer.py` around lines 308 - 317, The current
save_images function writes files to VISIT_IMAGES_DIR before db.commit(),
risking orphaned files if any file write or the subsequent DB commit fails;
update the flow to use a transactional and cleanup strategy: within a try/except
around the file writes + DB operations, collect saved_paths (or write to a
temporary directory/temporary filenames), perform the DB operations and call
db.commit(), and if any exception occurs call db.rollback() and delete any files
that were written (using the saved_paths or temp-to-final rollback move); ensure
the logic in save_images (and the surrounding handler that calls db.commit())
either moves temp files to their final paths only after a successful commit or
removes written files on failure so no orphaned files remain.
| timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') | ||
| safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}" | ||
| file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename) | ||
|
|
||
| # Save file | ||
| with open(file_path, 'wb') as f: | ||
| f.write(content) | ||
| safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" | ||
|
|
There was a problem hiding this comment.
Filename generation can overwrite files under concurrency.
Line 370 uses second-level timestamp plus idx; concurrent requests for the same visit_id in the same second can produce identical filenames and overwrite existing images. Add a random suffix (e.g., uuid4) to make names collision-resistant.
💡 Suggested fix
+from uuid import uuid4
@@
- safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"
+ safe_filename = f"visit_{visit_id}_{timestamp}_{idx}_{uuid4().hex[:8]}.{actual_ext}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') | |
| safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}" | |
| file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename) | |
| # Save file | |
| with open(file_path, 'wb') as f: | |
| f.write(content) | |
| safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" | |
| timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') | |
| safe_filename = f"visit_{visit_id}_{timestamp}_{idx}_{uuid4().hex[:8]}.{actual_ext}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/routers/field_officer.py` around lines 370 - 372, The filename
generation in the visit image handling (the block creating timestamp and
safe_filename) can collide under concurrent requests; modify the logic in the
same function that currently sets timestamp =
datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') and safe_filename =
f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" to append a random,
collision-resistant suffix (e.g., uuid4 hex) to the filename; add an import for
uuid and incorporate uuid.uuid4().hex (or a truncated form) into safe_filename
so names are unique even when timestamp and idx collide.
There was a problem hiding this comment.
3 issues found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/routers/field_officer.py">
<violation number="1" location="backend/routers/field_officer.py:358">
P2: `actual_ext` is derived from `pil_img.format`, but that value is always unset from `process_uploaded_image`, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.</violation>
<violation number="2" location="backend/routers/field_officer.py:371">
P2: Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same `visit_id` within the same second will produce identical filenames (e.g., both start at `idx=0`) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.</violation>
<violation number="3" location="backend/routers/field_officer.py:376">
P2: No cleanup of written files on failure. If `save_images` partially succeeds (e.g., disk-full mid-loop) or `db.commit()` raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| try: | ||
| pil_img, image_bytes = await process_uploaded_image(image) | ||
| # Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension | ||
| actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension |
There was a problem hiding this comment.
P2: actual_ext is derived from pil_img.format, but that value is always unset from process_uploaded_image, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 358:
<comment>`actual_ext` is derived from `pil_img.format`, but that value is always unset from `process_uploaded_image`, so extension checks silently fall back to the user-provided suffix and can save format/extension mismatches.</comment>
<file context>
@@ -310,45 +324,61 @@ async def upload_visit_images(
+ try:
+ pil_img, image_bytes = await process_uploaded_image(image)
+ # Ensure the saved extension matches the actual image format if possible, otherwise fallback to safe validated extension
+ actual_ext = pil_img.format.lower() if pil_img and pil_img.format else extension
+ # Map some formats to standard extensions
+ if actual_ext == 'jpeg': actual_ext = 'jpg'
</file context>
| images_to_save.append((safe_filename, image_bytes)) | ||
|
|
||
| # Offload only file writing to threadpool | ||
| image_paths = await run_in_threadpool(save_images, images_to_save) |
There was a problem hiding this comment.
P2: No cleanup of written files on failure. If save_images partially succeeds (e.g., disk-full mid-loop) or db.commit() raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 376:
<comment>No cleanup of written files on failure. If `save_images` partially succeeds (e.g., disk-full mid-loop) or `db.commit()` raises, already-written files remain on disk with no corresponding DB record. Track written paths and delete them in an exception handler before re-raising.</comment>
<file context>
@@ -310,45 +324,61 @@ async def upload_visit_images(
+ images_to_save.append((safe_filename, image_bytes))
+
+ # Offload only file writing to threadpool
+ image_paths = await run_in_threadpool(save_images, images_to_save)
- # Update visit with image paths
</file context>
| # Save file | ||
| with open(file_path, 'wb') as f: | ||
| f.write(content) | ||
| safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}" |
There was a problem hiding this comment.
P2: Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same visit_id within the same second will produce identical filenames (e.g., both start at idx=0) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 371:
<comment>Filename generation uses second-level timestamp plus loop index, which is not unique under concurrency. Two simultaneous uploads for the same `visit_id` within the same second will produce identical filenames (e.g., both start at `idx=0`) and silently overwrite each other's files. Add a random or UUID suffix to guarantee uniqueness.</comment>
<file context>
@@ -310,45 +324,61 @@ async def upload_visit_images(
- # Save file
- with open(file_path, 'wb') as f:
- f.write(content)
+ safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{actual_ext}"
- # Store relative path
</file context>
Fixed an issue where uploading large image batches via the field officer
/upload-imagesendpoint blocked the async event loop. \n\nBy carefully refactoring this logic, I have offloaded the blocking file writingopen().write()loops tostarlette.concurrency.run_in_threadpool, keeping the application snappy and responsive. Simultaneously, we ensured thread safety for the SQLAlchemy object by executingdb.commit()safely in the primary thread. Security measures such asALLOWED_IMAGE_EXTENSIONSallowlisting and mapping thePILactual image formats have been applied to seal unrestricted file upload vectors.PR created automatically by Jules for task 1540687093843833633 started by @RohanExploit
Summary by cubic
Optimized the field officer
/upload-imagesendpoint by offloading blocking disk writes to a thread pool to keep the async server responsive. PreservesSQLAlchemythread safety and hardens image validation and format handling.starlette.concurrency.run_in_threadpoolusing asave_imageshelper.backend.utils.process_uploaded_image(validation, EXIF stripping, resizing), and aligned saved extensions toPIL-detected formats with safe fallbacks.db.commit()on the main thread to avoid cross-threadSQLAlchemysession use.MAX_UPLOAD_SIZE(10MB) and strictALLOWED_IMAGE_EXTENSIONS, preventing mismatched extension/format bypasses.Written for commit 830da6e. Summary will update on new commits. Review in cubic
Summary by CodeRabbit