feat(file_service): safer uploads, metadata returns, async wrappers, and management helpers #34

enfayz · 2025-09-25T15:03:47Z

Upgrades file_service.py to make uploads safer, more observable, and non-blocking. Adds a configurable upload directory (via env), enforces a server-side size cap (default 50 MiB), writes files atomically, and computes SHA-256 checksums. Introduces a metadata-returning save that includes path, size, hash, MIME type, and timestamp. Provides async wrappers to offload blocking IO and hashing to a thread pool. Adds MIME detection with optional python-magic fallback, plus lifecycle helpers to list, delete, and clean up old uploads. Strengthens filename/path validation and improves structured logging for start/finish/error events. Configuration knobs cover upload directory, max bytes, and optional thread worker tuning.

Summary by CodeRabbit

New Features
- Configurable upload size limits and allowed file types via settings.
- Asynchronous uploads that return rich metadata (name, size, checksum, type, saved time).
- Automatic checksum and MIME type detection.
- Administrative tools to list uploads, delete files, and clean up old uploads.
Improvements
- Safer file handling with sanitized names, safe paths, and atomic writes.
- More informative errors with automatic cleanup on failures.
- Performance boosts by offloading blocking I/O to background workers.
- Enhanced logging for saved uploads.

…data retrieval

coderabbitai · 2025-09-25T15:03:53Z

Walkthrough

Major refactor of the file upload service adding size/content-type limits, safer path handling, checksum and MIME detection, atomic writes, metadata-returning save, async offloading via a shared thread pool, and administrative utilities for listing, deleting, and cleaning up uploads. New constants and multiple public functions are introduced.

Changes

Cohort / File(s)	Summary of changes
Upload service refactor and utilities `services/file_service.py`	Introduces MAX size and allowed content-type config; filename sanitization and safe path checks; atomic write with byte-limit enforcement; SHA-256 checksum and MIME detection; new save with metadata; async wrappers using shared ThreadPoolExecutor; functions for listing uploads, deleting by name/path, and time-based cleanup; new public exports and constants.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant FileSvc as FileService
  participant FS as Filesystem

  rect rgba(230,240,255,0.4)
  Client->>FileSvc: save_upload_file_with_meta(upload_file, filename, max_bytes?)
  Note right of FileSvc: Sanitize filename, validate path
  FileSvc->>FileSvc: Enforce size limit (DEFAULT_MAX_UPLOAD_BYTES or override)
  FileSvc->>FS: Write to temp file atomically
  FileSvc-->>FS: Truncate/abort on overflow (error)
  FileSvc->>FileSvc: calculate_checksum(temp_path)
  FileSvc->>FileSvc: detect_mime_type(temp_path)
  FileSvc->>FS: Move temp -> final path (atomic rename)
  FileSvc-->>Client: {path, original_name, size, sha256, mime_type, saved_at}
  end

  alt Error (size/path/I-O)
    FileSvc->>FS: Cleanup temp file
    FileSvc-->>Client: DocumentSaveError
  end

sequenceDiagram
  autonumber
  actor Client
  participant API as Async API
  participant Pool as ThreadPoolExecutor
  participant FileSvc as FileService (sync)

  rect rgba(240,255,240,0.5)
  Client->>API: save_upload_file_with_meta_async(...)
  API->>Pool: offload to thread
  Pool->>FileSvc: save_upload_file_with_meta(...)
  FileSvc-->>Pool: result / error
  Pool-->>API: result / error
  API-->>Client: await result (propagate errors)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Poem

I thump my paws—new files arrive!
With checksums, MIME, and tidy hive.
I stash them safe, then hop away,
Async breezes make my day.
Old crumbs cleaned, bad bits denied—
Carrot logs show where they hide. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly summarizes the key enhancements to the file_service module—safer uploads, metadata returns, async wrappers, and management helpers—and directly reflects the main changes introduced in the pull request.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/file-service-enhancements

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (5)

services/file_service.py (5)

65-79: Durability: fsync temp file before atomic replace

Atomic rename is good, but without fsync bytes may not hit disk on crash.

Apply this diff:

             for chunk in iter(lambda: source.read(8192), b""):
-                if not chunk:
-                    break
                 tmp.write(chunk)
                 total += len(chunk)
                 sha256.update(chunk)
                 if total > max_bytes:
                     logger.warning("Upload exceeds max allowed size: %d bytes", total)
                     raise DocumentSaveError("Uploaded file exceeds maximum allowed size.")
+            # Ensure bytes are on disk before atomic replace
+            tmp.flush()
+            os.fsync(tmp.fileno())

132-145: Durability: fsync temp file in metadata save path as well

Apply this diff:

             for chunk in iter(lambda: source.read(8192), b""):
-                if not chunk:
-                    break
                 tmp.write(chunk)
                 total += len(chunk)
                 sha256.update(chunk)
                 if total > max_bytes:
                     logger.warning("Upload exceeds max allowed size: %d bytes", total)
                     raise DocumentSaveError("Uploaded file exceeds maximum allowed size.")
+            # Ensure bytes are on disk before atomic replace
+            tmp.flush()
+            os.fsync(tmp.fileno())

33-37: Make thread pool size configurable

Support optional worker tuning via env (e.g., FILE_SERVICE_MAX_WORKERS).

Apply this diff:

     if _THREAD_POOL is None:
-        _THREAD_POOL = ThreadPoolExecutor(max_workers=4)
+        _THREAD_POOL = ThreadPoolExecutor(
+            max_workers=int(os.environ.get("FILE_SERVICE_MAX_WORKERS", "4"))
+        )

61-76: Minor: drop unused checksum in save_upload_file or return it

save_upload_file computes sha256 but never uses it; remove to save CPU or return it in logs/metadata.

69-73: Minor: redundant empty-chunk check

iter(lambda: read(...), b"") already stops on b"". The explicit if not chunk: break is redundant.

Apply this diff:

-            for chunk in iter(lambda: source.read(8192), b""):
-                if not chunk:
-                    break
+            for chunk in iter(lambda: source.read(8192), b""):

Also applies to: 136-139

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77ca247 and 60ae223.

📒 Files selected for processing (1)

services/file_service.py (2 hunks)

🔇 Additional comments (1)

services/file_service.py (1)

26-29: ALLOWED_CONTENT_TYPES is never used

After adding allowed-type enforcement above, this becomes effective. If not enforcing now, remove the env to avoid a false sense of security.

coderabbitai · 2025-09-25T15:12:45Z

services/file_service.py

 def save_upload_file(upload_file: IO, filename: str) -> str:
    # Sanitize filename: remove path, reject empty/unsafe
    logger = logging.getLogger(__name__)
    base = os.path.basename(filename)
-    if not base or base in {'.', '..'} or any(c in base for c in '\/:*?"<>|'):
+    if not base or base in {'.', '..'} or any(c in base for c in '\\/:*?"<>|'):
        logger.error("Invalid filename for upload: %s", filename)
        raise DocumentSaveError("Invalid filename.")



⚠️ Potential issue | 🔴 Critical

Fix unused max_bytes: plumb through async wrapper to sync save

max_bytes in save_upload_file_async is ignored; wire it through and make sync save accept it.

Apply this diff:

-def save_upload_file(upload_file: IO, filename: str) -> str: +def save_upload_file(upload_file: IO, filename: str, max_bytes: Optional[int] = None) -> str: @@ - # Default max bytes - max_bytes = DEFAULT_MAX_UPLOAD_BYTES + # Default max bytes + if max_bytes is None: + max_bytes = DEFAULT_MAX_UPLOAD_BYTES @@ async def save_upload_file_async(upload_file: IO, filename: str, max_bytes: Optional[int] = None) -> str: """Async wrapper that offloads the blocking save call to a thread pool.""" loop = asyncio.get_running_loop() - return await loop.run_in_executor(_get_thread_pool(), save_upload_file, upload_file, filename) + return await loop.run_in_executor(_get_thread_pool(), save_upload_file, upload_file, filename, max_bytes)

Also applies to: 58-60, 179-183

🤖 Prompt for AI Agents

In services/file_service.py around lines 39-46 (and also update usages at 58-60 and 179-183), the async wrapper save_upload_file_async currently ignores the max_bytes argument; update the sync function signature to accept an optional max_bytes parameter, plumb max_bytes from the async wrapper into the sync call, and enforce the byte limit inside the sync save (validate file size or stream read versus max_bytes and raise DocumentSaveError on overflow). Also update all callsites and tests to pass max_bytes through and ensure logging includes the filename and reason when rejecting due to size.

coderabbitai · 2025-09-25T15:12:45Z

services/file_service.py

+    # Optionally detect mime type (by extension); more accurate detection can be added
+    try:
+        mime, _ = mimetypes.guess_type(final_path)
+    except Exception:
+        mime = None
+
+    logger.info("Saved upload %s as %s (%d bytes, mime=%s)", base, final_path, total, mime)
    return final_path


⚠️ Potential issue | 🟠 Major

Enforce allowed MIME types and use detect_mime_type()

ALLOWED_CONTENT_TYPES is unused. Detect MIME and reject disallowed types; clean up the saved file if rejected.

Apply this diff:

- # Optionally detect mime type (by extension); more accurate detection can be added - try: - mime, _ = mimetypes.guess_type(final_path) - except Exception: - mime = None - - logger.info("Saved upload %s as %s (%d bytes, mime=%s)", base, final_path, total, mime) + # Detect MIME (with python-magic fallback) and enforce allowed types if configured + mime = detect_mime_type(final_path) + if ALLOWED_CONTENT_TYPES is not None and mime not in ALLOWED_CONTENT_TYPES: + logger.warning("Disallowed content type: %s for %s", mime, base) + with contextlib.suppress(Exception): + os.remove(final_path) + raise DocumentSaveError("Uploaded file type is not allowed.") + + logger.info("Saved upload %s as %s (%d bytes, mime=%s)", base, final_path, total, mime)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Optionally detect mime type (by extension); more accurate detection can be added

try:

mime, _ = mimetypes.guess_type(final_path)

except Exception:

mime = None

logger.info("Saved upload %s as %s (%d bytes, mime=%s)", base, final_path, total, mime)

return final_path

# Detect MIME (with python-magic fallback) and enforce allowed types if configured

mime = detect_mime_type(final_path)

if ALLOWED_CONTENT_TYPES is not None and mime not in ALLOWED_CONTENT_TYPES:

logger.warning("Disallowed content type: %s for %s", mime, base)

with contextlib.suppress(Exception):

os.remove(final_path)

raise DocumentSaveError("Uploaded file type is not allowed.")

logger.info("Saved upload %s as %s (%d bytes, mime=%s)", base, final_path, total, mime)

return final_path

coderabbitai · 2025-09-25T15:12:46Z

services/file_service.py

+    base = os.path.basename(filename)
+    if not base:
+        raise DocumentSaveError("Invalid filename.")
+
+    ext = os.path.splitext(base)[1]
+    safe_name = f"{uuid.uuid4().hex}{ext}"
+    final_path = os.path.abspath(os.path.join(UPLOAD_DIR, safe_name))
+
+    upload_dir_abs = os.path.abspath(UPLOAD_DIR)
+    if not final_path.startswith(upload_dir_abs + os.sep):
+        logger.error("Unsafe file path detected: %s", final_path)
+        raise DocumentSaveError("Unsafe file path.")
+


⚠️ Potential issue | 🟠 Major

Apply same filename sanitization here as in save_upload_file

Inconsistent checks; mirror the stricter validation to avoid edge-case surprises.

Apply this diff:

- base = os.path.basename(filename) - if not base: - raise DocumentSaveError("Invalid filename.") + base = os.path.basename(filename) + if not base or base in {'.', '..'} or any(c in base for c in '\\/:*?"<>|'): + logger.error("Invalid filename for upload: %s", filename) + raise DocumentSaveError("Invalid filename.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

base = os.path.basename(filename)

if not base:

raise DocumentSaveError("Invalid filename.")

ext = os.path.splitext(base)[1]

safe_name = f"{uuid.uuid4().hex}{ext}"

final_path = os.path.abspath(os.path.join(UPLOAD_DIR, safe_name))

upload_dir_abs = os.path.abspath(UPLOAD_DIR)

if not final_path.startswith(upload_dir_abs + os.sep):

logger.error("Unsafe file path detected: %s", final_path)

raise DocumentSaveError("Unsafe file path.")

base = os.path.basename(filename)

if not base or base in {'.', '..'} or any(c in base for c in '\\/:*?"<>|'):

logger.error("Invalid filename for upload: %s", filename)

raise DocumentSaveError("Invalid filename.")

ext = os.path.splitext(base)[1]

safe_name = f"{uuid.uuid4().hex}{ext}"

final_path = os.path.abspath(os.path.join(UPLOAD_DIR, safe_name))

upload_dir_abs = os.path.abspath(UPLOAD_DIR)

if not final_path.startswith(upload_dir_abs + os.sep):

logger.error("Unsafe file path detected: %s", final_path)

raise DocumentSaveError("Unsafe file path.")

🤖 Prompt for AI Agents

In services/file_service.py around lines 116 to 128, the current filename handling builds a UUID-based safe_name but lacks the stricter checks used in save_upload_file; update this block to reuse or mirror the same sanitization/validation: validate and normalize the incoming filename (strip path components, reject names containing os.sep or leading dots), enforce the allowed/extensions rule and length limits used by save_upload_file, derive the extension safely (lowercased) and build safe_name from a validated ext, then compute final_path and re-check that final_path resides inside the absolute UPLOAD_DIR before writing; if any check fails, log the full context and raise DocumentSaveError.

coderabbitai · 2025-09-25T15:12:46Z

services/file_service.py

+    try:
+        mime, _ = mimetypes.guess_type(final_path)
+    except Exception:
+        mime = None
+
+    meta = {
+        'path': final_path,
+        'original_name': base,
+        'size': total,
+        'sha256': sha256.hexdigest(),
+        'mime_type': mime,
+        'saved_at': datetime.now(timezone.utc).isoformat(),
+    }
+    logger.info("Saved upload metadata: %s", {k: meta[k] for k in ('original_name','path','size','mime_type')})
+    return meta


⚠️ Potential issue | 🟠 Major

Return accurate MIME and enforce allowed types in metadata save

Use detect_mime_type() and enforce ALLOWED_CONTENT_TYPES with cleanup on violation.

Apply this diff:

- try: - mime, _ = mimetypes.guess_type(final_path) - except Exception: - mime = None + mime = detect_mime_type(final_path) + if ALLOWED_CONTENT_TYPES is not None and mime not in ALLOWED_CONTENT_TYPES: + logger.warning("Disallowed content type: %s for %s", mime, base) + with contextlib.suppress(Exception): + os.remove(final_path) + raise DocumentSaveError("Uploaded file type is not allowed.") meta = { 'path': final_path, 'original_name': base, 'size': total, 'sha256': sha256.hexdigest(), 'mime_type': mime, 'saved_at': datetime.now(timezone.utc).isoformat(), }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

mime, _ = mimetypes.guess_type(final_path)

except Exception:

mime = None

meta = {

'path': final_path,

'original_name': base,

'size': total,

'sha256': sha256.hexdigest(),

'mime_type': mime,

'saved_at': datetime.now(timezone.utc).isoformat(),

}

logger.info("Saved upload metadata: %s", {k: meta[k] for k in ('original_name','path','size','mime_type')})

return meta

mime = detect_mime_type(final_path)

if ALLOWED_CONTENT_TYPES is not None and mime not in ALLOWED_CONTENT_TYPES:

logger.warning("Disallowed content type: %s for %s", mime, base)

with contextlib.suppress(Exception):

os.remove(final_path)

raise DocumentSaveError("Uploaded file type is not allowed.")

meta = {

'path': final_path,

'original_name': base,

'size': total,

'sha256': sha256.hexdigest(),

'mime_type': mime,

'saved_at': datetime.now(timezone.utc).isoformat(),

}

logger.info("Saved upload metadata: %s", {k: meta[k] for k in ('original_name','path','size','mime_type')})

return meta

coderabbitai · 2025-09-25T15:12:46Z

services/file_service.py

+def delete_upload(name_or_path: str) -> bool:
+    """Delete an upload by filename or absolute path. Returns True if deleted."""
+    if os.path.isabs(name_or_path):
+        path = name_or_path
+    else:
+        path = os.path.join(UPLOAD_DIR, name_or_path)
+    try:
+        if os.path.exists(path):
+            os.remove(path)
+            return True
+    except Exception:
+        logging.getLogger(__name__).exception("Failed to delete upload: %s", path)
+    return False
+


⚠️ Potential issue | 🔴 Critical

Block path traversal: restrict deletes to UPLOAD_DIR and files only

Allowing absolute paths enables deleting arbitrary files. Constrain deletions to files under UPLOAD_DIR.

Apply this diff:

def delete_upload(name_or_path: str) -> bool: - """Delete an upload by filename or absolute path. Returns True if deleted.""" - if os.path.isabs(name_or_path): - path = name_or_path - else: - path = os.path.join(UPLOAD_DIR, name_or_path) - try: - if os.path.exists(path): - os.remove(path) - return True - except Exception: - logging.getLogger(__name__).exception("Failed to delete upload: %s", path) - return False + """Delete an upload by filename or absolute path within UPLOAD_DIR. Returns True if deleted.""" + logger = logging.getLogger(__name__) + if os.path.isabs(name_or_path): + path = name_or_path + else: + path = os.path.join(UPLOAD_DIR, name_or_path) + try: + path = os.path.abspath(path) + upload_dir_abs = os.path.abspath(UPLOAD_DIR) + # Ensure path is inside UPLOAD_DIR and is a regular file + if not path.startswith(upload_dir_abs + os.sep) or not os.path.isfile(path): + logger.warning("Refusing to delete path outside UPLOAD_DIR or non-file: %s", path) + return False + os.remove(path) + return True + except Exception: + logger.exception("Failed to delete upload: %s", path) + return False

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def delete_upload(name_or_path: str) -> bool:

"""Delete an upload by filename or absolute path. Returns True if deleted."""

if os.path.isabs(name_or_path):

path = name_or_path

else:

path = os.path.join(UPLOAD_DIR, name_or_path)

try:

if os.path.exists(path):

os.remove(path)

return True

except Exception:

logging.getLogger(__name__).exception("Failed to delete upload: %s", path)

return False

def delete_upload(name_or_path: str) -> bool:

"""Delete an upload by filename or absolute path within UPLOAD_DIR. Returns True if deleted."""

logger = logging.getLogger(__name__)

if os.path.isabs(name_or_path):

path = name_or_path

else:

path = os.path.join(UPLOAD_DIR, name_or_path)

try:

path = os.path.abspath(path)

upload_dir_abs = os.path.abspath(UPLOAD_DIR)

# Ensure path is inside UPLOAD_DIR and is a regular file

if not path.startswith(upload_dir_abs + os.sep) or not os.path.isfile(path):

logger.warning("Refusing to delete path outside UPLOAD_DIR or non-file: %s", path)

return False

os.remove(path)

return True

except Exception:

logger.exception("Failed to delete upload: %s", path)

return False

🤖 Prompt for AI Agents

In services/file_service.py around lines 230-243, the delete_upload function currently accepts absolute paths and can delete files outside UPLOAD_DIR; change it to resolve the requested path against UPLOAD_DIR only, compute both UPLOAD_DIR and target via Path.resolve(), ensure the resolved target starts with the resolved UPLOAD_DIR path (reject if not), verify the target is a regular file (is_file()) and not a directory or symlink pointing outside, then remove it and return True; on any check failure return False and keep the existing exception logging for actual removal errors.

Enhance file upload service with async support, size limits, and meta…

60ae223

…data retrieval

coderabbitai bot reviewed Sep 25, 2025

View reviewed changes

fehranbit self-requested a review September 28, 2025 15:08

fehranbit approved these changes Sep 28, 2025

View reviewed changes

fehranbit merged commit e5677b6 into main Sep 28, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(file_service): safer uploads, metadata returns, async wrappers, and management helpers #34

feat(file_service): safer uploads, metadata returns, async wrappers, and management helpers #34

Uh oh!

enfayz commented Sep 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 25, 2025 •

edited

Loading

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 25, 2025

Uh oh!

coderabbitai bot Sep 25, 2025

Uh oh!

coderabbitai bot Sep 25, 2025

Uh oh!

coderabbitai bot Sep 25, 2025

Uh oh!

coderabbitai bot Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

feat(file_service): safer uploads, metadata returns, async wrappers, and management helpers #34

feat(file_service): safer uploads, metadata returns, async wrappers, and management helpers #34

Uh oh!

Conversation

enfayz commented Sep 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

enfayz commented Sep 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 25, 2025 •

edited

Loading