Add FastAPI redlining API, token auth, tests, Docker, and tooling #2

arthrod · 2025-10-08T02:27:41Z

User description

This commit introduces a FastAPI application to provide a web interface for the document redlining functionality.

The key features of this change are:

A FastAPI application with two endpoints:
- /redline/single: Accepts two .docx files and returns a single redlined document.
- /redline/double: Accepts two .docx files and returns a zip archive with two redlined documents (original vs. modified and modified vs. original).
API token authentication to secure the endpoints.
A 10MB file size limit to prevent abuse.
A Dockerfile and docker-compose.yml for easy containerization and deployment.
An updated build script (build_differ.py) to correctly build and package the C# binaries for multiple platforms.
Comprehensive tests for the new API endpoints, including tests for authentication and file size limits.
An updated pyproject.toml to manage dependencies and configure the test environment.
A .gitignore file to exclude build artifacts and environment files.
A .env.example file to provide a template for setting the API token.

PR Type

Enhancement

Description

Add FastAPI web service for document redlining
Implement API token authentication and file size limits
Create Docker containerization with multi-platform builds
Add comprehensive test coverage for API endpoints

Diagram Walkthrough

flowchart LR
  A["FastAPI Service"] --> B["Authentication"]
  A --> C["File Upload"]
  C --> D["Redline Processing"]
  D --> E["Single/Double Output"]
  F["Docker Container"] --> A
  G["Build Script"] --> H["Multi-platform Binaries"]
  H --> F

File Walkthrough

Relevant files

Enhancement

2 files

api.py `FastAPI service with redlining endpoints`	+126/-0
build_differ.py `Enhanced multi-platform build script`	+54/-50

Tests

1 files

test_api.py `Comprehensive API endpoint tests`	+86/-0

Configuration changes

3 files

Dockerfile `Docker containerization setup`	+38/-0
docker-compose.yml `Docker compose configuration`	+8/-0
.env.example `Environment variables template`	+3/-0

Dependencies

2 files

dotnet-install.sh `.NET SDK installation script`	+1888/-0
pyproject.toml `Updated dependencies and API environment`	+18/-2

Additional files

2 files

modified.docx	[link]
original.docx	[link]

This commit introduces a FastAPI application to provide a web interface for the document redlining functionality. The key features of this change are: - A FastAPI application with two endpoints: - `/redline/single`: Accepts two `.docx` files and returns a single redlined document. - `/redline/double`: Accepts two `.docx` files and returns a zip archive with two redlined documents (original vs. modified and modified vs. original). - API token authentication to secure the endpoints. - A 10MB file size limit to prevent abuse. - A `Dockerfile` and `docker-compose.yml` for easy containerization and deployment. - An updated build script (`build_differ.py`) to correctly build and package the C# binaries for multiple platforms. - Comprehensive tests for the new API endpoints, including tests for authentication and file size limits. - An updated `pyproject.toml` to manage dependencies and configure the test environment. - A `.gitignore` file to exclude build artifacts and environment files. - A `.env.example` file to provide a template for setting the API token.

gemini-code-assist · 2025-10-08T02:28:00Z

Summary of Changes

Hello @arthrod, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement by integrating a FastAPI-based web interface for the existing document redlining functionality. This change allows users to interact with the redlining service via a RESTful API, making it more accessible and easier to integrate into other systems. The addition of Docker support simplifies deployment, while robust authentication and file size limits ensure the service is secure and stable. The underlying build process for the C# redlining engine has also been refined to support a wider range of platforms more reliably.

Highlights

FastAPI Web Interface: Introduced a new FastAPI application to provide a web interface for the document redlining functionality, offering two main endpoints: /redline/single for a single redlined document and /redline/double for a zip archive containing two redlined documents (original vs. modified and modified vs. original).
API Security and File Limits: Implemented API token authentication to secure the new endpoints and added a 10MB file size limit to prevent abuse, ensuring controlled and safe usage of the redlining service.
Containerization and Deployment: Added a Dockerfile and docker-compose.yml to facilitate easy containerization and deployment of the FastAPI application, streamlining the setup process for development and production environments.
Improved C# Binary Build Process: Updated the build_differ.py script to correctly build and package C# binaries for multiple platforms (Linux x64/ARM64, Windows x64/ARM64, macOS x64/ARM64) in a more robust and maintainable way, including error handling for build failures.
Dependency Management and Testing: Updated pyproject.toml to manage new FastAPI-related dependencies and configured a dedicated api environment. Comprehensive tests for the new API endpoints, covering authentication and file size limits, were also added.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

qodo-merge-pro · 2025-10-08T02:28:18Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	Unpinned remote script execution Description: The Dockerfile downloads and executes dotnet-install.sh over HTTPS without pinning or signature verification, allowing a potential supply-chain attack if the remote script is compromised. Dockerfile [8-13] Referred Code RUN apt-get update && \ apt-get install -y wget && \ wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ chmod +x ./dotnet-install.sh && \ ./dotnet-install.sh --version 8.0.100 && \ rm dotnet-install.sh
	Unbounded request body memory usage Description: The API reads entire uploaded files into memory before enforcing the 10MB limit, enabling memory consumption spikes and potential DoS via large multipart bodies since FastAPI/Starlette default limits are not set. api.py [66-73] Referred Code original_content = await original.read() modified_content = await modified.read() if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: raise HTTPException( status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", )
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
⚪	No custom compliance provided Follow the guide to enable custom compliance check.

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

arthrod · 2025-10-08T02:28:22Z

@coderabbitai review

coderabbitai · 2025-10-08T02:28:51Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added token-protected API for DOCX redlining with single and double modes, file size limits, and ZIP download for dual results.
Tests
- Introduced async tests covering success paths, authentication failures, and file size enforcement.
Documentation
- Added .env example with guidance for setting an API token.
Chores
- Simplified .gitignore.
- Added containerization and local orchestration via Docker and Compose.
- Introduced automated .NET installation and improved multi-platform build packaging.
- Updated project configuration and scripts for API runtime and testing.

Walkthrough

Introduces a FastAPI-based redlining API with token auth, adds tests, containerization (Dockerfile, docker-compose), environment scaffolding (.env.example), updated build/packaging (build_differ.py, pyproject.toml), a .NET installer script, and a simplified .gitignore.

Changes

Cohort / File(s)	Summary
API Implementation `src/python_redlines/api.py`	Adds FastAPI app with token-based auth. Implements POST /redline/single (DOCX out) and POST /redline/double (ZIP stream). Enforces 10 MB per-file limit, loads token from env, uses XmlPowerToolsEngine, streams in-memory ZIP for double. Exposes app, engine, MAX_FILE_SIZE, get_api_token, verify_token.
API Tests `tests/test_api.py`	Adds async tests covering success paths, missing/invalid token, and file size limit. Uses httpx AsyncClient with ASGITransport against app and DOCX fixtures.
Build and Packaging `build_differ.py`	Refactors build flow: looped multi-platform dotnet publish, improved error handling, archive naming, arcname support in compression, safe dist creation and cleanup. Updates compress_files signature to include arcname.
Project Configuration `pyproject.toml`	Adds Hatch envs for API and testing; dependencies: fastapi, uvicorn[standard], httpx, pytest-asyncio, python-multipart, python-dotenv. Adds scripts for starting API and running tests; sets env vars.
Containerization `Dockerfile`, `docker-compose.yml`	New Dockerfile: python:3.11-slim base, installs .NET 8 SDK (via script), installs hatch, builds project, exposes 8000, starts API. Compose: builds service, maps 8000:8000, mounts project to /app.
.NET Installer `dotnet-install.sh`	Adds comprehensive Bash script to download/install .NET SDK/runtime with OS/arch detection, channel/version handling, download/extract, and verification options.
Environment and Repo Hygiene `.env.example`, `.gitignore`	Adds .env example with API_TOKEN placeholder and guidance. Simplifies .gitignore: consolidates Python envs, IDE/editor folders, C# bin/obj, dist outputs, and env files.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant A as FastAPI App
  participant V as verify_token
  participant E as XmlPowerToolsEngine
  participant Z as Zip Builder (in-memory)
  Note over A: Env loaded (.env) to get API_TOKEN

  C->>A: POST /redline/single (files, author_tag, X-API-Token)
  A->>V: Validate token (Header vs env)
  V-->>A: Valid / 401 on mismatch
  A->>A: Validate file sizes (<= MAX_FILE_SIZE)
  A->>E: Compare Original vs Modified
  E-->>A: DOCX bytes
  A-->>C: 200 DOCX (Content-Disposition)

  rect rgba(230,245,255,0.5)
  C->>A: POST /redline/double (files, author_tag, X-API-Token)
  A->>V: Validate token
  A->>E: Redline O->M
  E-->>A: DOCX A
  A->>E: Redline M->O
  E-->>A: DOCX B
  A->>Z: Package DOCX A,B into ZIP stream
  Z-->>A: ZIP stream
  A-->>C: 200 ZIP (streaming)
  end

  alt Error paths
    A-->>C: 401 Unauthorized (bad/missing token)
    A-->>C: 413 Payload Too Large (file limit)
    A-->>C: 500 on missing server token
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

In burrows of bytes where tokens gleam,
I brew redlines like a moonlit dream.
Zip up the changes, swift and neat—
two tracks of edits, a tidy treat.
With Docker’s hum and tests that sing,
I thump my paw—ship everything! 🐇📄🧪

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The pull request description clearly outlines the major features introduced, including the FastAPI endpoints, authentication, file limits, containerization, build script updates, and testing, and directly relates to the changeset.
Title Check	✅ Passed	The title succinctly captures the primary changes by highlighting the addition of the FastAPI redlining API along with token authentication, tests, Docker support, and related tooling, making it clear and directly related to the main objectives of the pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/fastapi-redline-service

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

arthrod · 2025-10-08T02:29:07Z

@claude review @coderabbitai review

qodo-merge-pro · 2025-10-08T02:29:36Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Separate build and runtime environments The current `Dockerfile` includes the entire .NET build toolchain in the final image. It should be refactored into a multi-stage build to create a smaller, more secure production image containing only the necessary runtime artifacts. Examples: Dockerfile [1-38] # Use an official Python runtime as a parent image FROM python:3.11-slim # Set the working directory in the container WORKDIR /app # Install .NET SDK and other dependencies RUN apt-get update && \ apt-get install -y wget && \ wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ ... (clipped 28 lines) Solution Walkthrough: Before: # Dockerfile FROM python:3.11-slim WORKDIR /app # Install .NET SDK and build tools RUN apt-get update && \ apt-get install -y wget && \ wget https://dot.net/v1/dotnet-install.sh ... && \ ./dotnet-install.sh ... ENV PATH="/root/.dotnet:$PATH" # Copy build files and build the C# project COPY csproj/ csproj/ ... RUN hatch run default:build # Copy the rest of the app and start COPY . . CMD ["hatch", "run", "api:start"] After: # Dockerfile # Build stage FROM python:3.11-slim AS builder WORKDIR /app # Install .NET SDK and build tools RUN apt-get update && ... && ./dotnet-install.sh ... ENV PATH="/root/.dotnet:$PATH" # Copy build files and build the C# project COPY csproj/ csproj/ ... RUN hatch run default:build # Final runtime stage FROM python:3.11-slim WORKDIR /app # Copy only the built artifacts from the builder stage COPY --from=builder /app/src/python_redlines/dist ./src/python_redlines/dist # Copy application code and install runtime dependencies COPY . . RUN pip install ... CMD ["hatch", "run", "api:start"] Suggestion importance[1-10]: 9 __ Why: This is a critical architectural suggestion for the new Docker-based deployment, addressing significant image size, security, and efficiency issues by recommending a standard best practice.	High
Security	Prevent DoS by streaming file uploads To prevent potential DoS attacks, read uploaded files in chunks and validate their size incrementally, rather than loading the entire file into memory at once. src/python_redlines/api.py [66-73] -original_content = await original.read() -modified_content = await modified.read() +async def read_with_limit(file: UploadFile, max_size: int) -> bytes: + contents = bytearray() + while chunk := await file.read(8192): # Read in 8KB chunks + contents.extend(chunk) + if len(contents) > max_size: + raise HTTPException( + status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, + detail=f"File size exceeds the limit of {max_size // (10241024)}MB", + ) + return bytes(contents) -if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: - raise HTTPException( - status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, - detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (10241024)}MB", - ) +original_content = await read_with_limit(original, MAX_FILE_SIZE) +modified_content = await read_with_limit(modified, MAX_FILE_SIZE) Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a significant Denial of Service (DoS) vulnerability and provides a robust, standard solution by reading files in chunks to prevent memory exhaustion.	High
Security	Avoid using shell=True for security Avoid using `shell=True` with `subprocess.Popen` for security reasons by refactoring the `run_command` function to accept a list of arguments instead of a string. build_differ.py [21-26] -process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) +process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) for line in process.stdout: print(line.decode().strip()) process.communicate() if process.returncode != 0: raise subprocess.CalledProcessError(process.returncode, command) Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies the security risk of using `shell=True` and proposes the best practice of using a list of arguments, which is a valuable security hardening improvement.	Medium
Security	Verify installer integrity Add a checksum verification step for the downloaded `dotnet-install.sh` script in the `Dockerfile` to mitigate supply-chain attacks. Dockerfile [7-13] -# Install .NET SDK and other dependencies +# Install .NET SDK and other dependencies with checksum verification RUN apt-get update && \ - apt-get install -y wget && \ + apt-get install -y wget ca-certificates && \ + update-ca-certificates && \ wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ + echo "expected_sha256 dotnet-install.sh" > dotnet-install.sh.sha256 && \ + sha256sum -c dotnet-install.sh.sha256 && \ chmod +x ./dotnet-install.sh && \ ./dotnet-install.sh --version 8.0.100 && \ - rm dotnet-install.sh + rm dotnet-install.sh dotnet-install.sh.sha256 Apply / Chat Suggestion importance[1-10]: 8 __ Why: This is a valid and important security suggestion to mitigate supply-chain attacks by verifying the integrity of the downloaded `dotnet-install.sh` script using a checksum.	Medium
Security	Remove risky bind mount Remove the `volumes` bind mount from `docker-compose.yml` to prevent overwriting container artifacts and avoid potential security risks. docker-compose.yml [2-8] services: api: build: . ports: - "8000:8000" - volumes: - - .:/app Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a security and operational risk with bind-mounting the entire project directory, which can overwrite built artifacts and is not recommended for production-like environments.	Medium
Possible issue	Detect publish path dynamically Dynamically locate the .NET publish directory instead of hardcoding the `net8.0` framework version to make the build script more robust against future framework changes. build_differ.py [80-105] for platform, ext in platforms.items(): print(f"Building for {platform}...") try: run_command(f'dotnet publish ./csproj -c Release -r {platform} --self-contained') - build_dir = f'./csproj/bin/Release/net8.0/{platform}/publish' + # Find the actual publish directory dynamically instead of hardcoding net8.0 + base_out = os.path.join('./csproj/bin/Release') + publish_dir = None + for root, dirs, files in os.walk(base_out): + # Match .../<tfm>/<rid>/publish exactly + parts = root.replace("\\", "/").split("/") + if len(parts) >= 4 and parts[-1] == "publish" and parts[-2] == platform: + publish_dir = root + break + if not publish_dir: + raise FileNotFoundError(f"Publish directory not found for {platform} under {base_out}") # The name of the archive file should match what engines.py expects dist_filename = f"{platform}-{version}.{ext}" if platform.startswith('win-'): - dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}" + dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}" elif platform.startswith('osx-'): - dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}" + dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}" # The directory name inside the archive should match what engines.py expects arcname = platform if platform.startswith('win-'): arcname = 'win-' + platform.split('-')[1] elif platform.startswith('osx-'): arcname = 'osx-' + platform.split('-')[1] - compress_files(build_dir, f"{dist_dir}/{dist_filename}", arcname=arcname) + compress_files(publish_dir, f"{dist_dir}/{dist_filename}", arcname=arcname) except (subprocess.CalledProcessError, FileNotFoundError) as e: print(f"Failed to build for {platform}: {e}") - # Continue to next platform continue Apply / Chat Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies a hardcoded value (`net8.0`) that makes the build script brittle and proposes a dynamic approach to find the path, improving robustness.	Low
General	Enforce DOCX and compress ZIP Add explicit `ZIP_DEFLATED` compression when creating the zip archive and validate that uploaded files have a `.docx` extension. src/python_redlines/api.py [86-126] @app.post("/redline/double", dependencies=[Depends(verify_token)]) async def redline_double( original: UploadFile = File(...), modified: UploadFile = File(...), author_tag: str = Form("Author"), ): """ Accepts two .docx files and returns a zip archive with two redlined documents. """ original_content = await original.read() modified_content = await modified.read() if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: raise HTTPException( status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (10241024)}MB", ) - # Redline 1: Original -> Modified - redline_om, _, _ = engine.run_redline( - author_tag, original_content, modified_content - ) + for f in (original, modified): + if not (f.filename and f.filename.lower().endswith(".docx")): + raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Only .docx files are supported") - # Redline 2: Modified -> Original - redline_mo, _, _ = engine.run_redline( - author_tag, modified_content, original_content - ) + redline_om, _, _ = engine.run_redline(author_tag, original_content, modified_content) + redline_mo, _, _ = engine.run_redline(author_tag, modified_content, original_content) - # Create a zip file in memory zip_buffer = io.BytesIO() - with zipfile.ZipFile(zip_buffer, "w") as zip_file: + with zipfile.ZipFile(zip_buffer, "w", compression=zipfile.ZIP_DEFLATED) as zip_file: zip_file.writestr("redline_original_vs_modified.docx", redline_om) zip_file.writestr("redline_modified_vs_original.docx", redline_mo) zip_buffer.seek(0) return StreamingResponse( zip_buffer, media_type="application/zip", headers={"Content-Disposition": "attachment; filename=redlines.zip"}, ) Apply / Chat* Suggestion importance[1-10]: 5 __ Why: The suggestion correctly points out two valid improvements: adding explicit ZIP compression and validating that uploaded files are `.docx`, which enhances robustness and efficiency.	Low
More

coderabbitai · 2025-10-08T02:29:53Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

arthrod · 2025-10-08T02:29:58Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces a significant enhancement by adding a FastAPI web service to expose the document redlining functionality. The inclusion of Docker support, multi-platform build scripts, and comprehensive tests is commendable. My review focuses on several key areas for improvement:

Dockerfile Optimization: I've suggested using a multi-stage build to significantly reduce the final Docker image size.
Configuration: There's a critical issue with a hardcoded path in pyproject.toml that needs to be addressed.
API Performance and Maintainability: I've pointed out opportunities to improve the API's performance by handling blocking calls correctly and to enhance maintainability by reducing code duplication.
Test Correctness: A minor resource leak in the tests has been identified.

Overall, this is a solid contribution. Addressing these points will make the application more robust, performant, and easier to maintain.

gemini-code-assist · 2025-10-08T02:29:58Z

pyproject.toml

+  "fastapi",
+  "python-dotenv"
 ]
+env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" }


There is a hardcoded, user-specific path in the PATH environment variable: /home/jules/.dotnet. This will cause the application to fail for any other user or in any CI/CD environment. This should be replaced with a generic path that resolves to the user's home directory, such as by using the $HOME environment variable.

Suggested change

env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" }

env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" }

gemini-code-assist · 2025-10-08T02:29:59Z

src/python_redlines/api.py

+    redline_output, _, _ = engine.run_redline(
+        author_tag, original_content, modified_content
+    )


The engine.run_redline() method is a synchronous function that performs a blocking operation (running a subprocess). Calling it directly from within an async endpoint will block the entire server's event loop, preventing it from handling other requests. This can severely degrade performance. You should run blocking calls in a separate thread pool using asyncio.get_running_loop().run_in_executor().

import asyncio loop = asyncio.get_running_loop() redline_output, _, _ = await loop.run_in_executor( None, engine.run_redline, author_tag, original_content, modified_content )

gemini-code-assist · 2025-10-08T02:29:59Z

src/python_redlines/api.py

+    redline_om, _, _ = engine.run_redline(
+        author_tag, original_content, modified_content
+    )
+
+    # Redline 2: Modified -> Original
+    redline_mo, _, _ = engine.run_redline(
+        author_tag, modified_content, original_content
+    )


Similar to the other endpoint, the engine.run_redline() calls here are blocking and will stall the event loop. They should be executed in a thread pool. Since there are two independent redlining operations, you can run them concurrently using asyncio.gather to improve the endpoint's response time.

import asyncio loop = asyncio.get_running_loop() # Redline 1: Original -> Modified task_om = loop.run_in_executor( None, engine.run_redline, author_tag, original_content, modified_content ) # Redline 2: Modified -> Original task_mo = loop.run_in_executor( None, engine.run_redline, author_tag, modified_content, original_content ) (redline_om, _, _), (redline_mo, _, _) = await asyncio.gather(task_om, task_mo)

gemini-code-assist · 2025-10-08T02:29:59Z

Dockerfile

+# Use an official Python runtime as a parent image
+FROM python:3.11-slim
+
+# Set the working directory in the container
+WORKDIR /app
+
+# Install .NET SDK and other dependencies
+RUN apt-get update && \
+    apt-get install -y wget && \
+    wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \
+    chmod +x ./dotnet-install.sh && \
+    ./dotnet-install.sh --version 8.0.100 && \
+    rm dotnet-install.sh
+
+# Add .dotnet to the PATH
+ENV PATH="/root/.dotnet:$PATH"
+
+# Copy files required for build
+COPY pyproject.toml .
+COPY hatch_run_build_hook.py .
+COPY build_differ.py .
+COPY src/python_redlines/__about__.py src/python_redlines/__about__.py
+COPY csproj/ csproj/
+
+# Install hatch
+RUN pip install hatch
+
+# Build the project (which includes running build_differ.py)
+RUN hatch run default:build
+
+# Copy the rest of the application code
+COPY . .
+
+# Expose the port the app runs on
+EXPOSE 8000
+
+# Start the API server
+CMD ["hatch", "run", "api:start"]


The Dockerfile can be significantly optimized for a smaller image size and better security by using a multi-stage build. Currently, the final image includes the .NET SDK and other build-time dependencies, which are not required for running the application.

A multi-stage build would involve:

A builder stage to install the .NET SDK, build the C# binaries, and then discard this environment.

A final, clean python:3.11-slim stage that copies only the necessary application code and the compiled binaries from the builder stage.

This practice leads to smaller, more secure, and faster-deploying images. Additionally, it's a good practice to clean up the apt cache within the same RUN layer to further reduce image size by adding && rm -rf /var/lib/apt/lists/*.

gemini-code-assist · 2025-10-08T02:29:59Z

build_differ.py

+            dist_filename = f"{platform}-{version}.{ext}"
+            if platform.startswith('win-'):
+                 dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}"
+            elif platform.startswith('osx-'):
+                 dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}"
+
+            # The directory name inside the archive should match what engines.py expects
+            arcname = platform
+            if platform.startswith('win-'):
+                arcname = 'win-' + platform.split('-')[1]
+            elif platform.startswith('osx-'):
+                arcname = 'osx-' + platform.split('-')[1]


The logic for constructing dist_filename and arcname contains redundancy. For example, when platform is win-x64, the if condition platform.startswith('win-') is met, but the code inside re-constructs the exact same string. This logic can be simplified by removing the conditional blocks, making the code cleaner and easier to maintain.

dist_filename = f"{platform}-{version}.{ext}" # The directory name inside the archive should match what engines.py expects arcname = platform

gemini-code-assist · 2025-10-08T02:30:00Z

src/python_redlines/api.py

+    original_content = await original.read()
+    modified_content = await modified.read()
+
+    if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
+        raise HTTPException(
+            status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
+            detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB",
+        )
+
+    redline_output, _, _ = engine.run_redline(
+        author_tag, original_content, modified_content
+    )
+
+    return Response(
+        content=redline_output,
+        media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+        headers={"Content-Disposition": "attachment; filename=redline.docx"},
+    )
+
+
+@app.post("/redline/double", dependencies=[Depends(verify_token)])
+async def redline_double(
+    original: UploadFile = File(...),
+    modified: UploadFile = File(...),
+    author_tag: str = Form("Author"),
+):
+    """
+    Accepts two .docx files and returns a zip archive with two redlined documents.
+    """
+    original_content = await original.read()
+    modified_content = await modified.read()
+
+    if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
+        raise HTTPException(
+            status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
+            detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB",
+        )


The logic for reading uploaded files and checking their size is duplicated in both the redline_single and redline_double endpoints. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, this common logic should be extracted into a FastAPI dependency.

gemini-code-assist · 2025-10-08T02:30:00Z

tests/test_api.py

+        files = {
+            "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
+            "modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
+        }
+        data = {"author_tag": "test_author"}
+        response = await client.post("/redline/single", files=files, data=data, headers=headers)


The file opened via open("tests/test_data/modified.docx", "rb") is not closed, which will result in a ResourceWarning and can lead to resource leaks in the test suite. It's best practice to use a with statement to ensure that file handles are always closed properly, even if errors occur.

Suggested change

files = {

"original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),

"modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),

}

data = {"author_tag": "test_author"}

response = await client.post("/redline/single", files=files, data=data, headers=headers)

with open("tests/test_data/modified.docx", "rb") as modified_file:

files = {

"original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),

"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),

}

data = {"author_tag": "test_author"}

response = await client.post("/redline/single", files=files, data=data, headers=headers)

coderabbitai · 2025-10-08T02:30:37Z

@arthrod: I'll review the changes in this pull request.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-08T02:32:46Z

src/python_redlines/api.py

+    redline_output, _, _ = engine.run_redline(
+        author_tag, original_content, modified_content


Avoid blocking asyncio loop with synchronous redline subprocess

Both endpoints invoke engine.run_redline directly from an async def. run_redline performs synchronous file I/O and subprocess.run, so awaiting requests will block the event loop for the lifetime of the diff operation. Under load a single request will stall the entire worker, defeating the purpose of exposing an async API. Offload the work via asyncio.to_thread/run_in_threadpool or make the handlers synchronous so FastAPI executes them in a threadpool, and apply the same change to /redline/double.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-08T02:32:46Z

src/python_redlines/api.py

+    original_content = await original.read()
+    modified_content = await modified.read()
+
+    if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
+        raise HTTPException(


File-size guard runs after loading entire upload into memory

The request handlers read each uploaded file (await original.read()) into a bytes object before checking len(...) > MAX_FILE_SIZE. This means a client can upload arbitrarily large files and the server will allocate the entire payload in memory before the size check raises 413, negating the 10 MB protection described in the commit message and opening the API to memory exhaustion. Consider validating the size while streaming the upload or rejecting the request before materialising the whole body.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

build_differ.py (1)
7-14: Security concern: exec() usage in version extraction.

The get_version() function uses exec() to extract version information, which poses security risks if the __about__.py file is ever compromised or modified maliciously.

Apply this diff to use safer version extraction:
 def get_version():
     """
     Extracts the version from the specified __about__.py file.
     """
-    about = {}
-    with open('./src/python_redlines/__about__.py') as f:
-        exec(f.read(), about)
-    return about['__version__']
+    import re
+    from pathlib import Path
+    
+    about_file = Path('./src/python_redlines/__about__.py')
+    content = about_file.read_text()
+    match = re.search(r'__version__\s*=\s*["\']([^"\']+)["\']', content)
+    if not match:
+        raise ValueError("Could not find __version__ in __about__.py")
+    return match.group(1)
pyproject.toml (1)
58-81: Critical: Remove hardcoded user-specific PATH.

The hardcoded PATH contains a user-specific directory that will break in other environments.

Apply this diff to fix:
 [tool.hatch.envs.default]
 dependencies = [
   "coverage[toml]>=6.5",
   "pytest",
   "pytest-asyncio",
   "httpx",
   "python-multipart",
   "fastapi",
   "python-dotenv"
 ]
-env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" }
+env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" }
 [tool.hatch.envs.default.scripts]
 test = "python -m pytest {args:tests}"
Better yet, handle .NET PATH detection dynamically in build scripts rather than hardcoding it here.

🧹 Nitpick comments (11)

.env.example (1)

1-3: Add final newline and expand environment variable documentation.

The file provides basic token guidance but could be more comprehensive:

Missing final newline: dotenv-linter flags this (line 3)

Limited documentation: Additional environment variables might be needed (e.g., MAX_FILE_SIZE, LOG_LEVEL, CORS_ORIGINS)

Apply this diff:
 # Generate a secure secret token, for example, by running:
 # python -c 'import secrets; print(secrets.token_hex(32))'
 API_TOKEN=your_secret_api_token_here
+
+# Optional: Override default file size limit (bytes)
+# MAX_FILE_SIZE=10485760
+
+# Optional: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
+# LOG_LEVEL=INFO
+
Based on learnings: Use python-dotenv (recommended package) to load these variables.

.gitignore (1)

1-33: Well-organized gitignore with good multi-language coverage.

The reorganized .gitignore properly handles:

Python artifacts (__pycache__, *.pyc, .venv/)

.NET build outputs (csproj/bin/, csproj/obj/)

Hatch tooling (.hatch/, distribution directories)

IDE metadata (.idea/, .vscode/)

Environment files (.env)

Consider adding Docker-specific ignores if not already covered elsewhere:
+# Docker
+*.log
+docker-compose.override.yml
+

build_differ.py (3)

17-27: Add proper error handling and improve process management.

The run_command function has been improved but still lacks robustness:

No timeout handling: Long-running commands could hang indefinitely
Limited error context: Exception message doesn't include stdout/stderr
Blocking I/O: Line-by-line reading blocks the main thread

Apply this diff:

-def run_command(command):
+def run_command(command: str, timeout: int = 300) -> None:
     """
-    Runs a shell command and prints its output.
+    Runs a shell command and prints its output.
+    
+    Args:
+        command: Shell command to execute
+        timeout: Maximum execution time in seconds (default: 300)
+    
+    Raises:
+        subprocess.CalledProcessError: If command returns non-zero exit code
+        subprocess.TimeoutExpired: If command exceeds timeout
     """
     process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+    output_lines = []
     for line in process.stdout:
-        print(line.decode().strip())
+        decoded = line.decode().strip()
+        print(decoded)
+        output_lines.append(decoded)
-    process.communicate()
+    try:
+        process.communicate(timeout=timeout)
+    except subprocess.TimeoutExpired:
+        process.kill()
+        raise
     if process.returncode != 0:
-        raise subprocess.CalledProcessError(process.returncode, command)
+        error_output = '\n'.join(output_lines[-20:])  # Last 20 lines
+        raise subprocess.CalledProcessError(
+            process.returncode, 
+            command,
+            output=error_output.encode()
+        )

50-61: Modernize with pathlib and add type hints.

The function uses old-style string path manipulation.

Apply this diff:

-def cleanup_old_builds(dist_dir, current_version):
+from pathlib import Path
+
+def cleanup_old_builds(dist_dir: str, current_version: str) -> None:
     """
     Deletes any build files ending in .zip or .tar.gz in the dist_dir with a different version tag.
+    
+    Args:
+        dist_dir: Distribution directory path
+        current_version: Current version string to preserve
     """
-    if not os.path.exists(dist_dir):
+    dist_path = Path(dist_dir)
+    if not dist_path.exists():
         return
-    for file in os.listdir(dist_dir):
-        if not file.endswith((f'{current_version}.zip', f'{current_version}.tar.gz', '.gitignore')):
-            file_path = os.path.join(dist_dir, file)
-            os.remove(file_path)
-            print(f"Deleted old build file: {file}")
+    
+    for file_path in dist_path.iterdir():
+        if file_path.name == '.gitignore':
+            continue
+        if not file_path.name.endswith((f'{current_version}.zip', f'{current_version}.tar.gz')):
+            file_path.unlink()
+            print(f"Deleted old build file: {file_path.name}")

As per coding guidelines: Prefer modern Python features and pathlib over os.path.

63-110: Add uv shebang and improve error handling in main().

The main function lacks proper script metadata and could benefit from structured logging.

Add a uv shebang at the top of the file and improve main():

+#!/usr/bin/env -S uv run
+# /// script
+# requires-python = ">=3.12"
+# dependencies = [
+#     "loguru",
+# ]
+# ///
+
 import subprocess
 import os
 import tarfile
 import zipfile
+from pathlib import Path
+from loguru import logger
 
 # ... rest of code ...
 
-def main():
+def main() -> None:
-    version = get_version()
-    print(f"Version: {version}")
+    logger.info("Starting build process")
+    
+    try:
+        version = get_version()
+        logger.info(f"Building version: {version}")
+    except Exception as e:
+        logger.error(f"Failed to get version: {e}")
+        raise
 
     dist_dir = "./src/python_redlines/dist/"
-    if not os.path.exists(dist_dir):
-        os.makedirs(dist_dir)
+    Path(dist_dir).mkdir(parents=True, exist_ok=True)
 
     platforms = {
         "linux-x64": "tar.gz",
         "linux-arm64": "tar.gz",
         "win-x64": "zip",
         "win-arm64": "zip",
         "osx-x64": "tar.gz",
         "osx-arm64": "tar.gz",
     }
 
+    failed_platforms = []
     for platform, ext in platforms.items():
-        print(f"Building for {platform}...")
+        logger.info(f"Building for {platform}...")
         try:
             run_command(f'dotnet publish ./csproj -c Release -r {platform} --self-contained')
 
             build_dir = f'./csproj/bin/Release/net8.0/{platform}/publish'
 
             # The name of the archive file should match what engines.py expects
             dist_filename = f"{platform}-{version}.{ext}"
             if platform.startswith('win-'):
                  dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}"
             elif platform.startswith('osx-'):
                  dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}"
 
             # The directory name inside the archive should match what engines.py expects
             arcname = platform
             if platform.startswith('win-'):
                 arcname = 'win-' + platform.split('-')[1]
             elif platform.startswith('osx-'):
                 arcname = 'osx-' + platform.split('-')[1]
 
             compress_files(build_dir, f"{dist_dir}/{dist_filename}", arcname=arcname)
+            logger.success(f"Successfully built {platform}")
         except (subprocess.CalledProcessError, FileNotFoundError) as e:
-            print(f"Failed to build for {platform}: {e}")
+            logger.error(f"Failed to build for {platform}: {e}")
+            failed_platforms.append(platform)
             # Continue to next platform
             continue
 
     cleanup_old_builds(dist_dir, version)
 
-    print("Build and compression complete.")
+    if failed_platforms:
+        logger.warning(f"Build completed with failures for: {', '.join(failed_platforms)}")
+    else:
+        logger.success("Build and compression complete for all platforms.")

As per coding guidelines: Add uv shebangs and implement loguru for structured logging.

src/python_redlines/api.py (1)

31-55: Improve error handling and token validation.

The token verification lacks explicit None handling and detailed error logging.

Apply this diff:

 def get_api_token():
     """
     Loads the API token from environment variables.
+    
+    Returns:
+        str: The configured API token
+    
+    Raises:
+        HTTPException: If API_TOKEN environment variable is not set
     """
     token = os.getenv("API_TOKEN")
     if not token:
+        logger.critical("API_TOKEN environment variable not configured")
         raise HTTPException(
             status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
             detail="API token not configured on server",
         )
     return token
 
 
 async def verify_token(
     x_api_token: Annotated[str, Header()],
     api_token: str = Depends(get_api_token),
-):
+) -> None:
     """
     Verifies the API token provided in the request header.
+    
+    Args:
+        x_api_token: Token from X-Api-Token header
+        api_token: Expected token from environment
+    
+    Raises:
+        HTTPException: If token is invalid or missing
     """
+    if not x_api_token:
+        logger.warning("Missing API token in request")
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED, 
+            detail="Missing API Token"
+        )
+    
     if not secrets.compare_digest(x_api_token, api_token):
+        logger.warning("Invalid API token attempted")
         raise HTTPException(
             status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Token"
         )

pyproject.toml (2)

58-101: Add modern Python tooling dependencies.

The configuration is missing several tools recommended in the coding guidelines.

As per coding guidelines: Use ruff, loguru, and other modern tools.

Apply this diff:

 [tool.hatch.envs.default]
 dependencies = [
   "coverage[toml]>=6.5",
   "pytest",
   "pytest-asyncio",
   "httpx",
   "python-multipart",
   "fastapi",
-  "python-dotenv"
+  "python-dotenv",
+  "ruff",
+  "loguru",
+  "mypy"
 ]
 env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" }
 [tool.hatch.envs.default.scripts]
 test = "python -m pytest {args:tests}"
+lint = "ruff check {args:src tests}"
+format = "ruff format {args:src tests}"
+typecheck = "mypy {args:src tests}"
 test-cov = "coverage run -m pytest {args:tests}"
 cov-report = [
   "- coverage combine",
   "coverage report",
 ]
 cov = [
   "test-cov",
   "cov-report",
 ]
 build = "python -m build_differ"

+[tool.ruff]
+target-version = "py312"
+line-length = 100
+
+[tool.ruff.lint]
+select = [
+    "E",  # pycodestyle errors
+    "W",  # pycodestyle warnings
+    "F",  # pyflakes
+    "I",  # isort
+    "B",  # flake8-bugbear
+    "C4", # flake8-comprehensions
+    "UP", # pyupgrade
+]
+ignore = [
+    "B008",  # FastAPI uses File() in defaults which is intentional
+]
+
+[tool.mypy]
+python_version = "3.12"
+warn_return_any = true
+warn_unused_configs = true
+disallow_untyped_defs = true

92-101: Add production-ready uvicorn configuration.

The API start command lacks production settings.

Apply this diff:

 [tool.hatch.envs.api]
 dependencies = [
   "fastapi",
   "uvicorn[standard]",
   "python-multipart",
-  "python-dotenv"
+  "python-dotenv",
+  "loguru"
 ]
 [tool.hatch.envs.api.scripts]
-start = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000"
+start = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000 --workers 4 --log-level info"
+start-dev = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000 --reload --log-level debug"

tests/test_api.py (3)

1-6: Clarify test constant naming to avoid static analysis warnings.

The imports and structure are sound. However, static analysis flagged TEST_API_TOKEN as a potential hardcoded credential (S105). While this is acceptable in test context, consider renaming to TEST_AUTH_TOKEN or MOCK_API_TOKEN to clarify it's a test fixture, not a production secret.

Apply this diff:

-TEST_API_TOKEN = "test-token"
+MOCK_API_TOKEN = "test-token-for-testing-only"

Then update references in fixtures and tests accordingly.

1-86: Add comprehensive edge case and parametrized tests.

The current test suite covers basic scenarios but is missing critical edge cases and could benefit from parametrization.

Consider adding the following tests:

#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "pytest>=8.4.2",
#     "httpx>=0.28.1",
#     "pytest-asyncio",
# ]
# ///

import pytest
from io import BytesIO
from httpx import AsyncClient, ASGITransport
from python_redlines.api import app, MAX_FILE_SIZE

@pytest.mark.asyncio
async def test_file_size_limit_modified_file(headers):
    """Test /redline/single returns 413 when modified file exceeds MAX_FILE_SIZE."""
    async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
        large_file_content = b"a" * (MAX_FILE_SIZE + 1)
        with open("tests/test_data/original.docx", "rb") as original_file:
            files = {
                "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
                "modified": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
            }
            data = {"author_tag": "test_author"}
            response = await client.post("/redline/single", files=files, data=data, headers=headers)
    assert response.status_code == 413


@pytest.mark.asyncio
async def test_missing_author_tag(headers):
    """Test /redline/single handles missing author_tag parameter."""
    async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
        with open("tests/test_data/original.docx", "rb") as original_file:
            with open("tests/test_data/modified.docx", "rb") as modified_file:
                files = {
                    "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
                    "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
                }
                response = await client.post("/redline/single", files=files, headers=headers)
    # Verify API handles missing optional parameter gracefully (422 or uses default)
    assert response.status_code in [200, 422]


@pytest.mark.asyncio
async def test_invalid_file_format(headers):
    """Test /redline/single handles non-DOCX files gracefully."""
    async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
        invalid_file_content = b"Not a DOCX file"
        files = {
            "original": ("original.docx", invalid_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
            "modified": ("modified.docx", invalid_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
        }
        data = {"author_tag": "test_author"}
        response = await client.post("/redline/single", files=files, data=data, headers=headers)
    # Should return 4xx error for invalid format
    assert 400 <= response.status_code < 500


@pytest.mark.parametrize("endpoint,expected_content_type", [
    ("/redline/single", "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
    ("/redline/double", "application/zip"),
])
@pytest.mark.asyncio
async def test_endpoints_parametrized(endpoint, expected_content_type, headers):
    """Parametrized test for both endpoints verifying content types."""
    async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
        with open("tests/test_data/original.docx", "rb") as original_file:
            with open("tests/test_data/modified.docx", "rb") as modified_file:
                files = {
                    "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
                    "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
                }
                data = {"author_tag": "test_author"}
                response = await client.post(endpoint, files=files, data=data, headers=headers)
    assert response.status_code == 200
    assert response.headers["content-type"] == expected_content_type

These additions would significantly improve test robustness and coverage.

1-86: Consider CI/CD integration and load testing.

For production readiness, consider:

CI/CD Integration: Configure GitHub Actions or equivalent to run tests on every push/PR with multiple Python versions (3.12, 3.13)
Coverage Reporting: Add pytest-cov to measure code coverage and set minimum thresholds (e.g., 80%)
Load Testing: Add load tests using locust or similar to verify the API handles concurrent requests without resource exhaustion
Security Scanning: Integrate bandit or similar tools to catch security issues in API code
Container Testing: Add tests that run against the Dockerized service to verify deployment configuration

Example GitHub Actions workflow:

name: Test
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.12", "3.13"]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
        run: |
          pip install uv
          uv pip install -e ".[test]"
      - name: Run tests with coverage
        run: pytest --cov=src/python_redlines --cov-report=xml --cov-report=term-missing
      - name: Upload coverage
        uses: codecov/codecov-action@v4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 254e742 and 2003706.

⛔ Files ignored due to path filters (3)

src/python_redlines/dist/.gitignore is excluded by !**/dist/**
tests/test_data/modified.docx is excluded by !**/*.docx
tests/test_data/original.docx is excluded by !**/*.docx

📒 Files selected for processing (9)

.env.example (1 hunks)
.gitignore (1 hunks)
Dockerfile (1 hunks)
build_differ.py (2 hunks)
docker-compose.yml (1 hunks)
dotnet-install.sh (1 hunks)
pyproject.toml (3 hunks)
src/python_redlines/api.py (1 hunks)
tests/test_api.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Add docstrings, remove deprecated libraries (no unittest or mock), remove typing Optional, List, etc in favor of modern syntax list, str | str, use async methods as much as possible, when applicable import and implement uvloop, typer, loguru, uv, ruff, pytest, assert, httpx/aiohttp, polars. Add uv shebangs when possible: #!/usr/bin/env -S uv run

/// script

requires-python = ">=3.12"

dependencies = [

"loguru",

]

///

Refactor variables, classes, functions or constants names if not descriptive enough. Don't forget to determine where else it's used.
For JavaScript and TypeScript, favor bun/oxlint/biome/tanstack/elysia, but propose refactoring only if (1) a seamless migration to these tools and/or (2) the current implementation of the code is not working. Always ensure authentication is properly implemented via oauth2 or oidc. Always run check for contrast and visualization for both light and dark themes. Verify if we don't have dead routes (pages that don't are ever rendered). Always review the comments of other commenters and assess them meticulously, indicating for each of them what was correct and what was incorrect. Verify any mock, incomplete or, most importantly, deceiving-looking incomplete implementations. Generally, focus on correct, non-duplicative endpoints, completeness and robustness.

All of your messages, either a response, a summary or an automatic review MUST include, in addition to the other requirements, the following:

An explanation of the goal of the repo and a meticulous explanation of the goal of the PR.

How the PR attempts to achieve EACH of these goals.

Extremely comprehensively, you will indicate on what the PR does well, what is missing and what is wrong. This section should have several subtopics, but each subtopics must be very short, no more than a sentence. [subtopicx, subtopicy, subtopics, etc] that will be my punch list (and must be in list format).

Then propose marginal enhancements, and full solu...

Files:

build_differ.py
tests/test_api.py
src/python_redlines/api.py

🪛 dotenv-linter (3.3.0)

.env.example

[warning] 3-3: [EndingBlankLine] No blank line at the end of the file

(EndingBlankLine)

🪛 Ruff (0.13.3)

build_differ.py

43-43: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)

tests/test_api.py

5-5: Possible hardcoded password assigned to: "TEST_API_TOKEN"

(S105)

src/python_redlines/api.py

59-59: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

60-60: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

88-88: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

89-89: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🪛 Shellcheck (0.11.0)

dotnet-install.sh

[warning] 29-29: bold appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 31-31: black appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 33-33: green appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 35-35: blue appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 36-36: magenta appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 38-38: white appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 141-141: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 146-146: Quote this to prevent word splitting.

(SC2046)

[warning] 175-175: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 194-194: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 287-287: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 288-288: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 351-351: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 402-402: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 403-403: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 405-405: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 448-448: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 477-477: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 502-502: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 542-542: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 581-581: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 705-705: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 740-740: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 777-777: Prefer mapfile or read -a to split command output (or quote to avoid splitting).

(SC2207)

[warning] 778-778: Prefer mapfile or read -a to split command output (or quote to avoid splitting).

(SC2207)

[warning] 939-939: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 954-954: relative_or_absolute_path appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 998-998: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 999-999: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 1001-1001: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 1004-1004: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 1050-1050: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 1221-1221: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 1265-1265: Declare and assign separately to avoid masking return values.

(SC2155)

[error] 1371-1371: Double quote array expansions to avoid re-splitting elements.

(SC2068)

[error] 1384-1384: Double quote array expansions to avoid re-splitting elements.

(SC2068)

[warning] 1425-1425: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)

[warning] 1426-1426: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)

[warning] 1427-1427: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)

[warning] 1469-1469: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)

[warning] 1470-1470: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)

[warning] 1471-1471: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)

[warning] 1479-1479: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)

[warning] 1480-1480: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)

[warning] 1481-1481: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)