-
Couldn't load subscription status.
- Fork 0
Add FastAPI redlining API, token auth, tests, Docker, and tooling #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces a FastAPI application to provide a web interface for the document redlining functionality. The key features of this change are: - A FastAPI application with two endpoints: - `/redline/single`: Accepts two `.docx` files and returns a single redlined document. - `/redline/double`: Accepts two `.docx` files and returns a zip archive with two redlined documents (original vs. modified and modified vs. original). - API token authentication to secure the endpoints. - A 10MB file size limit to prevent abuse. - A `Dockerfile` and `docker-compose.yml` for easy containerization and deployment. - An updated build script (`build_differ.py`) to correctly build and package the C# binaries for multiple platforms. - Comprehensive tests for the new API endpoints, including tests for authentication and file size limits. - An updated `pyproject.toml` to manage dependencies and configure the test environment. - A `.gitignore` file to exclude build artifacts and environment files. - A `.env.example` file to provide a template for setting the API token.
Summary of ChangesHello @arthrod, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement by integrating a FastAPI-based web interface for the existing document redlining functionality. This change allows users to interact with the redlining service via a RESTful API, making it more accessible and easier to integrate into other systems. The addition of Docker support simplifies deployment, while robust authentication and file size limits ensure the service is secure and stable. The underlying build process for the C# redlining engine has also been refined to support a wider range of platforms more reliably. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||
|
@coderabbitai review |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughSummary by CodeRabbit
WalkthroughIntroduces a FastAPI-based redlining API with token auth, adds tests, containerization (Dockerfile, docker-compose), environment scaffolding (.env.example), updated build/packaging (build_differ.py, pyproject.toml), a .NET installer script, and a simplified .gitignore. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant A as FastAPI App
participant V as verify_token
participant E as XmlPowerToolsEngine
participant Z as Zip Builder (in-memory)
Note over A: Env loaded (.env) to get API_TOKEN
C->>A: POST /redline/single (files, author_tag, X-API-Token)
A->>V: Validate token (Header vs env)
V-->>A: Valid / 401 on mismatch
A->>A: Validate file sizes (<= MAX_FILE_SIZE)
A->>E: Compare Original vs Modified
E-->>A: DOCX bytes
A-->>C: 200 DOCX (Content-Disposition)
rect rgba(230,245,255,0.5)
C->>A: POST /redline/double (files, author_tag, X-API-Token)
A->>V: Validate token
A->>E: Redline O->M
E-->>A: DOCX A
A->>E: Redline M->O
E-->>A: DOCX B
A->>Z: Package DOCX A,B into ZIP stream
Z-->>A: ZIP stream
A-->>C: 200 ZIP (streaming)
end
alt Error paths
A-->>C: 401 Unauthorized (bad/missing token)
A-->>C: 413 Payload Too Large (file limit)
A-->>C: 500 on missing server token
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
@claude review @coderabbitai review |
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||||||||||
✅ Actions performedReview triggered.
|
|
@gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant enhancement by adding a FastAPI web service to expose the document redlining functionality. The inclusion of Docker support, multi-platform build scripts, and comprehensive tests is commendable. My review focuses on several key areas for improvement:
- Dockerfile Optimization: I've suggested using a multi-stage build to significantly reduce the final Docker image size.
- Configuration: There's a critical issue with a hardcoded path in
pyproject.tomlthat needs to be addressed. - API Performance and Maintainability: I've pointed out opportunities to improve the API's performance by handling blocking calls correctly and to enhance maintainability by reducing code duplication.
- Test Correctness: A minor resource leak in the tests has been identified.
Overall, this is a solid contribution. Addressing these points will make the application more robust, performant, and easier to maintain.
| "fastapi", | ||
| "python-dotenv" | ||
| ] | ||
| env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a hardcoded, user-specific path in the PATH environment variable: /home/jules/.dotnet. This will cause the application to fail for any other user or in any CI/CD environment. This should be replaced with a generic path that resolves to the user's home directory, such as by using the $HOME environment variable.
| env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" } | |
| env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" } |
| redline_output, _, _ = engine.run_redline( | ||
| author_tag, original_content, modified_content | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The engine.run_redline() method is a synchronous function that performs a blocking operation (running a subprocess). Calling it directly from within an async endpoint will block the entire server's event loop, preventing it from handling other requests. This can severely degrade performance. You should run blocking calls in a separate thread pool using asyncio.get_running_loop().run_in_executor().
import asyncio
loop = asyncio.get_running_loop()
redline_output, _, _ = await loop.run_in_executor(
None, engine.run_redline, author_tag, original_content, modified_content
)| redline_om, _, _ = engine.run_redline( | ||
| author_tag, original_content, modified_content | ||
| ) | ||
|
|
||
| # Redline 2: Modified -> Original | ||
| redline_mo, _, _ = engine.run_redline( | ||
| author_tag, modified_content, original_content | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the other endpoint, the engine.run_redline() calls here are blocking and will stall the event loop. They should be executed in a thread pool. Since there are two independent redlining operations, you can run them concurrently using asyncio.gather to improve the endpoint's response time.
import asyncio
loop = asyncio.get_running_loop()
# Redline 1: Original -> Modified
task_om = loop.run_in_executor(
None, engine.run_redline, author_tag, original_content, modified_content
)
# Redline 2: Modified -> Original
task_mo = loop.run_in_executor(
None, engine.run_redline, author_tag, modified_content, original_content
)
(redline_om, _, _), (redline_mo, _, _) = await asyncio.gather(task_om, task_mo)| # Use an official Python runtime as a parent image | ||
| FROM python:3.11-slim | ||
|
|
||
| # Set the working directory in the container | ||
| WORKDIR /app | ||
|
|
||
| # Install .NET SDK and other dependencies | ||
| RUN apt-get update && \ | ||
| apt-get install -y wget && \ | ||
| wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ | ||
| chmod +x ./dotnet-install.sh && \ | ||
| ./dotnet-install.sh --version 8.0.100 && \ | ||
| rm dotnet-install.sh | ||
|
|
||
| # Add .dotnet to the PATH | ||
| ENV PATH="/root/.dotnet:$PATH" | ||
|
|
||
| # Copy files required for build | ||
| COPY pyproject.toml . | ||
| COPY hatch_run_build_hook.py . | ||
| COPY build_differ.py . | ||
| COPY src/python_redlines/__about__.py src/python_redlines/__about__.py | ||
| COPY csproj/ csproj/ | ||
|
|
||
| # Install hatch | ||
| RUN pip install hatch | ||
|
|
||
| # Build the project (which includes running build_differ.py) | ||
| RUN hatch run default:build | ||
|
|
||
| # Copy the rest of the application code | ||
| COPY . . | ||
|
|
||
| # Expose the port the app runs on | ||
| EXPOSE 8000 | ||
|
|
||
| # Start the API server | ||
| CMD ["hatch", "run", "api:start"] No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dockerfile can be significantly optimized for a smaller image size and better security by using a multi-stage build. Currently, the final image includes the .NET SDK and other build-time dependencies, which are not required for running the application.
A multi-stage build would involve:
- A
builderstage to install the .NET SDK, build the C# binaries, and then discard this environment. - A final, clean
python:3.11-slimstage that copies only the necessary application code and the compiled binaries from thebuilderstage.
This practice leads to smaller, more secure, and faster-deploying images. Additionally, it's a good practice to clean up the apt cache within the same RUN layer to further reduce image size by adding && rm -rf /var/lib/apt/lists/*.
| dist_filename = f"{platform}-{version}.{ext}" | ||
| if platform.startswith('win-'): | ||
| dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}" | ||
| elif platform.startswith('osx-'): | ||
| dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}" | ||
|
|
||
| # The directory name inside the archive should match what engines.py expects | ||
| arcname = platform | ||
| if platform.startswith('win-'): | ||
| arcname = 'win-' + platform.split('-')[1] | ||
| elif platform.startswith('osx-'): | ||
| arcname = 'osx-' + platform.split('-')[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for constructing dist_filename and arcname contains redundancy. For example, when platform is win-x64, the if condition platform.startswith('win-') is met, but the code inside re-constructs the exact same string. This logic can be simplified by removing the conditional blocks, making the code cleaner and easier to maintain.
dist_filename = f"{platform}-{version}.{ext}"
# The directory name inside the archive should match what engines.py expects
arcname = platform| original_content = await original.read() | ||
| modified_content = await modified.read() | ||
|
|
||
| if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, | ||
| detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", | ||
| ) | ||
|
|
||
| redline_output, _, _ = engine.run_redline( | ||
| author_tag, original_content, modified_content | ||
| ) | ||
|
|
||
| return Response( | ||
| content=redline_output, | ||
| media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document", | ||
| headers={"Content-Disposition": "attachment; filename=redline.docx"}, | ||
| ) | ||
|
|
||
|
|
||
| @app.post("/redline/double", dependencies=[Depends(verify_token)]) | ||
| async def redline_double( | ||
| original: UploadFile = File(...), | ||
| modified: UploadFile = File(...), | ||
| author_tag: str = Form("Author"), | ||
| ): | ||
| """ | ||
| Accepts two .docx files and returns a zip archive with two redlined documents. | ||
| """ | ||
| original_content = await original.read() | ||
| modified_content = await modified.read() | ||
|
|
||
| if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, | ||
| detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| files = { | ||
| "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| "modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| } | ||
| data = {"author_tag": "test_author"} | ||
| response = await client.post("/redline/single", files=files, data=data, headers=headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file opened via open("tests/test_data/modified.docx", "rb") is not closed, which will result in a ResourceWarning and can lead to resource leaks in the test suite. It's best practice to use a with statement to ensure that file handles are always closed properly, even if errors occur.
| files = { | |
| "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | |
| with open("tests/test_data/modified.docx", "rb") as modified_file: | |
| files = { | |
| "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/single", files=files, data=data, headers=headers) |
|
@arthrod: I'll review the changes in this pull request. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| redline_output, _, _ = engine.run_redline( | ||
| author_tag, original_content, modified_content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid blocking asyncio loop with synchronous redline subprocess
Both endpoints invoke engine.run_redline directly from an async def. run_redline performs synchronous file I/O and subprocess.run, so awaiting requests will block the event loop for the lifetime of the diff operation. Under load a single request will stall the entire worker, defeating the purpose of exposing an async API. Offload the work via asyncio.to_thread/run_in_threadpool or make the handlers synchronous so FastAPI executes them in a threadpool, and apply the same change to /redline/double.
Useful? React with 👍 / 👎.
| original_content = await original.read() | ||
| modified_content = await modified.read() | ||
|
|
||
| if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: | ||
| raise HTTPException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File-size guard runs after loading entire upload into memory
The request handlers read each uploaded file (await original.read()) into a bytes object before checking len(...) > MAX_FILE_SIZE. This means a client can upload arbitrarily large files and the server will allocate the entire payload in memory before the size check raises 413, negating the 10 MB protection described in the commit message and opening the API to memory exhaustion. Consider validating the size while streaming the upload or rejecting the request before materialising the whole body.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
build_differ.py (1)
7-14: Security concern:exec()usage in version extraction.The
get_version()function usesexec()to extract version information, which poses security risks if the__about__.pyfile is ever compromised or modified maliciously.Apply this diff to use safer version extraction:
def get_version(): """ Extracts the version from the specified __about__.py file. """ - about = {} - with open('./src/python_redlines/__about__.py') as f: - exec(f.read(), about) - return about['__version__'] + import re + from pathlib import Path + + about_file = Path('./src/python_redlines/__about__.py') + content = about_file.read_text() + match = re.search(r'__version__\s*=\s*["\']([^"\']+)["\']', content) + if not match: + raise ValueError("Could not find __version__ in __about__.py") + return match.group(1)pyproject.toml (1)
58-81: Critical: Remove hardcoded user-specific PATH.The hardcoded PATH contains a user-specific directory that will break in other environments.
Apply this diff to fix:
[tool.hatch.envs.default] dependencies = [ "coverage[toml]>=6.5", "pytest", "pytest-asyncio", "httpx", "python-multipart", "fastapi", "python-dotenv" ] -env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" } +env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" } [tool.hatch.envs.default.scripts] test = "python -m pytest {args:tests}"Better yet, handle .NET PATH detection dynamically in build scripts rather than hardcoding it here.
🧹 Nitpick comments (11)
.env.example (1)
1-3: Add final newline and expand environment variable documentation.The file provides basic token guidance but could be more comprehensive:
- Missing final newline: dotenv-linter flags this (line 3)
- Limited documentation: Additional environment variables might be needed (e.g.,
MAX_FILE_SIZE,LOG_LEVEL,CORS_ORIGINS)Apply this diff:
# Generate a secure secret token, for example, by running: # python -c 'import secrets; print(secrets.token_hex(32))' API_TOKEN=your_secret_api_token_here + +# Optional: Override default file size limit (bytes) +# MAX_FILE_SIZE=10485760 + +# Optional: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) +# LOG_LEVEL=INFO +Based on learnings: Use
python-dotenv(recommended package) to load these variables..gitignore (1)
1-33: Well-organized gitignore with good multi-language coverage.The reorganized
.gitignoreproperly handles:
- Python artifacts (
__pycache__,*.pyc,.venv/)- .NET build outputs (
csproj/bin/,csproj/obj/)- Hatch tooling (
.hatch/, distribution directories)- IDE metadata (
.idea/,.vscode/)- Environment files (
.env)Consider adding Docker-specific ignores if not already covered elsewhere:
+# Docker +*.log +docker-compose.override.yml +build_differ.py (3)
17-27: Add proper error handling and improve process management.The
run_commandfunction has been improved but still lacks robustness:
- No timeout handling: Long-running commands could hang indefinitely
- Limited error context: Exception message doesn't include stdout/stderr
- Blocking I/O: Line-by-line reading blocks the main thread
Apply this diff:
-def run_command(command): +def run_command(command: str, timeout: int = 300) -> None: """ - Runs a shell command and prints its output. + Runs a shell command and prints its output. + + Args: + command: Shell command to execute + timeout: Maximum execution time in seconds (default: 300) + + Raises: + subprocess.CalledProcessError: If command returns non-zero exit code + subprocess.TimeoutExpired: If command exceeds timeout """ process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) + output_lines = [] for line in process.stdout: - print(line.decode().strip()) + decoded = line.decode().strip() + print(decoded) + output_lines.append(decoded) - process.communicate() + try: + process.communicate(timeout=timeout) + except subprocess.TimeoutExpired: + process.kill() + raise if process.returncode != 0: - raise subprocess.CalledProcessError(process.returncode, command) + error_output = '\n'.join(output_lines[-20:]) # Last 20 lines + raise subprocess.CalledProcessError( + process.returncode, + command, + output=error_output.encode() + )
50-61: Modernize with pathlib and add type hints.The function uses old-style string path manipulation.
Apply this diff:
-def cleanup_old_builds(dist_dir, current_version): +from pathlib import Path + +def cleanup_old_builds(dist_dir: str, current_version: str) -> None: """ Deletes any build files ending in .zip or .tar.gz in the dist_dir with a different version tag. + + Args: + dist_dir: Distribution directory path + current_version: Current version string to preserve """ - if not os.path.exists(dist_dir): + dist_path = Path(dist_dir) + if not dist_path.exists(): return - for file in os.listdir(dist_dir): - if not file.endswith((f'{current_version}.zip', f'{current_version}.tar.gz', '.gitignore')): - file_path = os.path.join(dist_dir, file) - os.remove(file_path) - print(f"Deleted old build file: {file}") + + for file_path in dist_path.iterdir(): + if file_path.name == '.gitignore': + continue + if not file_path.name.endswith((f'{current_version}.zip', f'{current_version}.tar.gz')): + file_path.unlink() + print(f"Deleted old build file: {file_path.name}")As per coding guidelines: Prefer modern Python features and pathlib over os.path.
63-110: Add uv shebang and improve error handling in main().The main function lacks proper script metadata and could benefit from structured logging.
Add a uv shebang at the top of the file and improve main():
+#!/usr/bin/env -S uv run +# /// script +# requires-python = ">=3.12" +# dependencies = [ +# "loguru", +# ] +# /// + import subprocess import os import tarfile import zipfile +from pathlib import Path +from loguru import logger # ... rest of code ... -def main(): +def main() -> None: - version = get_version() - print(f"Version: {version}") + logger.info("Starting build process") + + try: + version = get_version() + logger.info(f"Building version: {version}") + except Exception as e: + logger.error(f"Failed to get version: {e}") + raise dist_dir = "./src/python_redlines/dist/" - if not os.path.exists(dist_dir): - os.makedirs(dist_dir) + Path(dist_dir).mkdir(parents=True, exist_ok=True) platforms = { "linux-x64": "tar.gz", "linux-arm64": "tar.gz", "win-x64": "zip", "win-arm64": "zip", "osx-x64": "tar.gz", "osx-arm64": "tar.gz", } + failed_platforms = [] for platform, ext in platforms.items(): - print(f"Building for {platform}...") + logger.info(f"Building for {platform}...") try: run_command(f'dotnet publish ./csproj -c Release -r {platform} --self-contained') build_dir = f'./csproj/bin/Release/net8.0/{platform}/publish' # The name of the archive file should match what engines.py expects dist_filename = f"{platform}-{version}.{ext}" if platform.startswith('win-'): dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}" elif platform.startswith('osx-'): dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}" # The directory name inside the archive should match what engines.py expects arcname = platform if platform.startswith('win-'): arcname = 'win-' + platform.split('-')[1] elif platform.startswith('osx-'): arcname = 'osx-' + platform.split('-')[1] compress_files(build_dir, f"{dist_dir}/{dist_filename}", arcname=arcname) + logger.success(f"Successfully built {platform}") except (subprocess.CalledProcessError, FileNotFoundError) as e: - print(f"Failed to build for {platform}: {e}") + logger.error(f"Failed to build for {platform}: {e}") + failed_platforms.append(platform) # Continue to next platform continue cleanup_old_builds(dist_dir, version) - print("Build and compression complete.") + if failed_platforms: + logger.warning(f"Build completed with failures for: {', '.join(failed_platforms)}") + else: + logger.success("Build and compression complete for all platforms.")As per coding guidelines: Add uv shebangs and implement loguru for structured logging.
src/python_redlines/api.py (1)
31-55: Improve error handling and token validation.The token verification lacks explicit None handling and detailed error logging.
Apply this diff:
def get_api_token(): """ Loads the API token from environment variables. + + Returns: + str: The configured API token + + Raises: + HTTPException: If API_TOKEN environment variable is not set """ token = os.getenv("API_TOKEN") if not token: + logger.critical("API_TOKEN environment variable not configured") raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="API token not configured on server", ) return token async def verify_token( x_api_token: Annotated[str, Header()], api_token: str = Depends(get_api_token), -): +) -> None: """ Verifies the API token provided in the request header. + + Args: + x_api_token: Token from X-Api-Token header + api_token: Expected token from environment + + Raises: + HTTPException: If token is invalid or missing """ + if not x_api_token: + logger.warning("Missing API token in request") + raise HTTPException( + status_code=status.HTTP_401_UNAUTHORIZED, + detail="Missing API Token" + ) + if not secrets.compare_digest(x_api_token, api_token): + logger.warning("Invalid API token attempted") raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Token" )pyproject.toml (2)
58-101: Add modern Python tooling dependencies.The configuration is missing several tools recommended in the coding guidelines.
As per coding guidelines: Use ruff, loguru, and other modern tools.
Apply this diff:
[tool.hatch.envs.default] dependencies = [ "coverage[toml]>=6.5", "pytest", "pytest-asyncio", "httpx", "python-multipart", "fastapi", - "python-dotenv" + "python-dotenv", + "ruff", + "loguru", + "mypy" ] env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" } [tool.hatch.envs.default.scripts] test = "python -m pytest {args:tests}" +lint = "ruff check {args:src tests}" +format = "ruff format {args:src tests}" +typecheck = "mypy {args:src tests}" test-cov = "coverage run -m pytest {args:tests}" cov-report = [ "- coverage combine", "coverage report", ] cov = [ "test-cov", "cov-report", ] build = "python -m build_differ" +[tool.ruff] +target-version = "py312" +line-length = 100 + +[tool.ruff.lint] +select = [ + "E", # pycodestyle errors + "W", # pycodestyle warnings + "F", # pyflakes + "I", # isort + "B", # flake8-bugbear + "C4", # flake8-comprehensions + "UP", # pyupgrade +] +ignore = [ + "B008", # FastAPI uses File() in defaults which is intentional +] + +[tool.mypy] +python_version = "3.12" +warn_return_any = true +warn_unused_configs = true +disallow_untyped_defs = true
92-101: Add production-ready uvicorn configuration.The API start command lacks production settings.
Apply this diff:
[tool.hatch.envs.api] dependencies = [ "fastapi", "uvicorn[standard]", "python-multipart", - "python-dotenv" + "python-dotenv", + "loguru" ] [tool.hatch.envs.api.scripts] -start = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000" +start = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000 --workers 4 --log-level info" +start-dev = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000 --reload --log-level debug"tests/test_api.py (3)
1-6: Clarify test constant naming to avoid static analysis warnings.The imports and structure are sound. However, static analysis flagged
TEST_API_TOKENas a potential hardcoded credential (S105). While this is acceptable in test context, consider renaming toTEST_AUTH_TOKENorMOCK_API_TOKENto clarify it's a test fixture, not a production secret.Apply this diff:
-TEST_API_TOKEN = "test-token" +MOCK_API_TOKEN = "test-token-for-testing-only"Then update references in fixtures and tests accordingly.
1-86: Add comprehensive edge case and parametrized tests.The current test suite covers basic scenarios but is missing critical edge cases and could benefit from parametrization.
Consider adding the following tests:
#!/usr/bin/env -S uv run # /// script # requires-python = ">=3.12" # dependencies = [ # "pytest>=8.4.2", # "httpx>=0.28.1", # "pytest-asyncio", # ] # /// import pytest from io import BytesIO from httpx import AsyncClient, ASGITransport from python_redlines.api import app, MAX_FILE_SIZE @pytest.mark.asyncio async def test_file_size_limit_modified_file(headers): """Test /redline/single returns 413 when modified file exceeds MAX_FILE_SIZE.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: large_file_content = b"a" * (MAX_FILE_SIZE + 1) with open("tests/test_data/original.docx", "rb") as original_file: files = { "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } data = {"author_tag": "test_author"} response = await client.post("/redline/single", files=files, data=data, headers=headers) assert response.status_code == 413 @pytest.mark.asyncio async def test_missing_author_tag(headers): """Test /redline/single handles missing author_tag parameter.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: with open("tests/test_data/original.docx", "rb") as original_file: with open("tests/test_data/modified.docx", "rb") as modified_file: files = { "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } response = await client.post("/redline/single", files=files, headers=headers) # Verify API handles missing optional parameter gracefully (422 or uses default) assert response.status_code in [200, 422] @pytest.mark.asyncio async def test_invalid_file_format(headers): """Test /redline/single handles non-DOCX files gracefully.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: invalid_file_content = b"Not a DOCX file" files = { "original": ("original.docx", invalid_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("modified.docx", invalid_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } data = {"author_tag": "test_author"} response = await client.post("/redline/single", files=files, data=data, headers=headers) # Should return 4xx error for invalid format assert 400 <= response.status_code < 500 @pytest.mark.parametrize("endpoint,expected_content_type", [ ("/redline/single", "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), ("/redline/double", "application/zip"), ]) @pytest.mark.asyncio async def test_endpoints_parametrized(endpoint, expected_content_type, headers): """Parametrized test for both endpoints verifying content types.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: with open("tests/test_data/original.docx", "rb") as original_file: with open("tests/test_data/modified.docx", "rb") as modified_file: files = { "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } data = {"author_tag": "test_author"} response = await client.post(endpoint, files=files, data=data, headers=headers) assert response.status_code == 200 assert response.headers["content-type"] == expected_content_typeThese additions would significantly improve test robustness and coverage.
1-86: Consider CI/CD integration and load testing.For production readiness, consider:
- CI/CD Integration: Configure GitHub Actions or equivalent to run tests on every push/PR with multiple Python versions (3.12, 3.13)
- Coverage Reporting: Add pytest-cov to measure code coverage and set minimum thresholds (e.g., 80%)
- Load Testing: Add load tests using locust or similar to verify the API handles concurrent requests without resource exhaustion
- Security Scanning: Integrate bandit or similar tools to catch security issues in API code
- Container Testing: Add tests that run against the Dockerized service to verify deployment configuration
Example GitHub Actions workflow:
name: Test on: [push, pull_request] jobs: test: runs-on: ubuntu-latest strategy: matrix: python-version: ["3.12", "3.13"] steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: ${{ matrix.python-version }} - name: Install dependencies run: | pip install uv uv pip install -e ".[test]" - name: Run tests with coverage run: pytest --cov=src/python_redlines --cov-report=xml --cov-report=term-missing - name: Upload coverage uses: codecov/codecov-action@v4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
src/python_redlines/dist/.gitignoreis excluded by!**/dist/**tests/test_data/modified.docxis excluded by!**/*.docxtests/test_data/original.docxis excluded by!**/*.docx
📒 Files selected for processing (9)
.env.example(1 hunks).gitignore(1 hunks)Dockerfile(1 hunks)build_differ.py(2 hunks)docker-compose.yml(1 hunks)dotnet-install.sh(1 hunks)pyproject.toml(3 hunks)src/python_redlines/api.py(1 hunks)tests/test_api.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Add docstrings, remove deprecated libraries (no unittest or mock), remove typing Optional, List, etc in favor of modern syntax list, str | str, use async methods as much as possible, when applicable import and implement uvloop, typer, loguru, uv, ruff, pytest, assert, httpx/aiohttp, polars. Add uv shebangs when possible: #!/usr/bin/env -S uv run/// script
requires-python = ">=3.12"
dependencies = [
"loguru",
]
///
Refactor variables, classes, functions or constants names if not descriptive enough. Don't forget to determine where else it's used.
For JavaScript and TypeScript, favor bun/oxlint/biome/tanstack/elysia, but propose refactoring only if (1) a seamless migration to these tools and/or (2) the current implementation of the code is not working. Always ensure authentication is properly implemented via oauth2 or oidc. Always run check for contrast and visualization for both light and dark themes. Verify if we don't have dead routes (pages that don't are ever rendered). Always review the comments of other commenters and assess them meticulously, indicating for each of them what was correct and what was incorrect. Verify any mock, incomplete or, most importantly, deceiving-looking incomplete implementations. Generally, focus on correct, non-duplicative endpoints, completeness and robustness.All of your messages, either a response, a summary or an automatic review MUST include, in addition to the other requirements, the following:
- An explanation of the goal of the repo and a meticulous explanation of the goal of the PR.
- How the PR attempts to achieve EACH of these goals.
- Extremely comprehensively, you will indicate on what the PR does well, what is missing and what is wrong. This section should have several subtopics, but each subtopics must be very short, no more than a sentence. [subtopicx, subtopicy, subtopics, etc] that will be my punch list (and must be in list format).
- Then propose marginal enhancements, and full solu...
Files:
build_differ.pytests/test_api.pysrc/python_redlines/api.py
🪛 dotenv-linter (3.3.0)
.env.example
[warning] 3-3: [EndingBlankLine] No blank line at the end of the file
(EndingBlankLine)
🪛 Ruff (0.13.3)
build_differ.py
43-43: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
tests/test_api.py
5-5: Possible hardcoded password assigned to: "TEST_API_TOKEN"
(S105)
src/python_redlines/api.py
59-59: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
60-60: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
88-88: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
89-89: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🪛 Shellcheck (0.11.0)
dotnet-install.sh
[warning] 29-29: bold appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 31-31: black appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 33-33: green appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 35-35: blue appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 36-36: magenta appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 38-38: white appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 141-141: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 146-146: Quote this to prevent word splitting.
(SC2046)
[warning] 175-175: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 194-194: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 287-287: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 288-288: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 351-351: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 402-402: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 403-403: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 405-405: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 448-448: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 477-477: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 502-502: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 542-542: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 581-581: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 705-705: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 740-740: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 777-777: Prefer mapfile or read -a to split command output (or quote to avoid splitting).
(SC2207)
[warning] 778-778: Prefer mapfile or read -a to split command output (or quote to avoid splitting).
(SC2207)
[warning] 939-939: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 954-954: relative_or_absolute_path appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 998-998: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 999-999: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 1001-1001: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 1004-1004: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 1050-1050: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 1221-1221: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 1265-1265: Declare and assign separately to avoid masking return values.
(SC2155)
[error] 1371-1371: Double quote array expansions to avoid re-splitting elements.
(SC2068)
[error] 1384-1384: Double quote array expansions to avoid re-splitting elements.
(SC2068)
[warning] 1425-1425: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1426-1426: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1427-1427: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1469-1469: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1470-1470: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1471-1471: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1479-1479: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1480-1480: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1481-1481: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1508-1508: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1508-1508: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1512-1512: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1512-1512: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1514-1514: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1514-1514: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1520-1520: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1520-1520: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1636-1636: args appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 1638-1638: local_version_file_relative_path appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 1724-1724: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1724-1724: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1729-1729: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1729-1729: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1741-1741: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1741-1741: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
(SC2140)
[warning] 1760-1760: This pattern always overrides a later one on line 1760.
(SC2221)
[warning] 1760-1760: This pattern never matches because of a previous pattern on line 1760.
(SC2222)
🔇 Additional comments (1)
dotnet-install.sh (1)
1-1888: Document vendored script source and version
Unable to fetch the official script automatically; manually verify its origin (e.g. compare with https://github.com/dotnet/install-scripts), determine its commit hash or release tag, and add the source URL and version/commit metadata at the top of dotnet-install.sh.
| def compress_files(source_dir, target_file, arcname=None): | ||
| """ | ||
| Compresses files in the specified directory into a tar.gz or zip file. | ||
| """ | ||
| if not arcname: | ||
| arcname = os.path.basename(source_dir) | ||
|
|
||
| print(f"Compressing {source_dir} to {target_file} with arcname {arcname}") | ||
|
|
||
| if target_file.endswith('.tar.gz'): | ||
| with tarfile.open(target_file, "w:gz") as tar: | ||
| tar.add(source_dir, arcname=os.path.basename(source_dir)) | ||
| tar.add(source_dir, arcname=arcname) | ||
| elif target_file.endswith('.zip'): | ||
| with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf: | ||
| for root, dirs, files in os.walk(source_dir): | ||
| for file in files: | ||
| zipf.write(os.path.join(root, file), | ||
| os.path.relpath(os.path.join(root, file), | ||
| os.path.join(source_dir, '..'))) | ||
| file_path = os.path.join(root, file) | ||
| zip_path = os.path.join(arcname, os.path.relpath(file_path, source_dir)) | ||
| zipf.write(file_path, zip_path) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix unused variable and add type hints.
Static analysis correctly identifies an unused loop variable.
Apply this diff to fix the Ruff B007 warning:
-def compress_files(source_dir, target_file, arcname=None):
+def compress_files(source_dir: str, target_file: str, arcname: str | None = None) -> None:
"""
Compresses files in the specified directory into a tar.gz or zip file.
+
+ Args:
+ source_dir: Directory containing files to compress
+ target_file: Output archive path (.tar.gz or .zip)
+ arcname: Name to use for root directory in archive (defaults to basename of source_dir)
"""
if not arcname:
arcname = os.path.basename(source_dir)
print(f"Compressing {source_dir} to {target_file} with arcname {arcname}")
if target_file.endswith('.tar.gz'):
with tarfile.open(target_file, "w:gz") as tar:
tar.add(source_dir, arcname=arcname)
elif target_file.endswith('.zip'):
with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
- for root, dirs, files in os.walk(source_dir):
+ for root, _dirs, files in os.walk(source_dir):
for file in files:
file_path = os.path.join(root, file)
zip_path = os.path.join(arcname, os.path.relpath(file_path, source_dir))
zipf.write(file_path, zip_path)As per coding guidelines: Add type hints using modern Python | syntax instead of Optional.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def compress_files(source_dir, target_file, arcname=None): | |
| """ | |
| Compresses files in the specified directory into a tar.gz or zip file. | |
| """ | |
| if not arcname: | |
| arcname = os.path.basename(source_dir) | |
| print(f"Compressing {source_dir} to {target_file} with arcname {arcname}") | |
| if target_file.endswith('.tar.gz'): | |
| with tarfile.open(target_file, "w:gz") as tar: | |
| tar.add(source_dir, arcname=os.path.basename(source_dir)) | |
| tar.add(source_dir, arcname=arcname) | |
| elif target_file.endswith('.zip'): | |
| with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf: | |
| for root, dirs, files in os.walk(source_dir): | |
| for file in files: | |
| zipf.write(os.path.join(root, file), | |
| os.path.relpath(os.path.join(root, file), | |
| os.path.join(source_dir, '..'))) | |
| file_path = os.path.join(root, file) | |
| zip_path = os.path.join(arcname, os.path.relpath(file_path, source_dir)) | |
| zipf.write(file_path, zip_path) | |
| def compress_files(source_dir: str, target_file: str, arcname: str | None = None) -> None: | |
| """ | |
| Compresses files in the specified directory into a tar.gz or zip file. | |
| Args: | |
| source_dir: Directory containing files to compress | |
| target_file: Output archive path (.tar.gz or .zip) | |
| arcname: Name to use for root directory in archive (defaults to basename of source_dir) | |
| """ | |
| if not arcname: | |
| arcname = os.path.basename(source_dir) | |
| print(f"Compressing {source_dir} to {target_file} with arcname {arcname}") | |
| if target_file.endswith('.tar.gz'): | |
| with tarfile.open(target_file, "w:gz") as tar: | |
| tar.add(source_dir, arcname=arcname) | |
| elif target_file.endswith('.zip'): | |
| with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf: | |
| for root, _dirs, files in os.walk(source_dir): | |
| for file in files: | |
| file_path = os.path.join(root, file) | |
| zip_path = os.path.join(arcname, os.path.relpath(file_path, source_dir)) | |
| zipf.write(file_path, zip_path) |
🧰 Tools
🪛 Ruff (0.13.3)
43-43: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
🤖 Prompt for AI Agents
In build_differ.py around lines 29 to 48, the compress_files function currently
declares an unused loop variable (dirs) and lacks type hints; update the
signature to include modern Python type hints (e.g., def
compress_files(source_dir: str, target_file: str, arcname: str | None = None) ->
None) and replace the unused loop variable with an underscore (for root, _,
files in os.walk(source_dir)) to resolve the Ruff B007 warning, keeping the rest
of the logic unchanged.
| version: '3.8' | ||
| services: | ||
| api: | ||
| build: . | ||
| ports: | ||
| - "8000:8000" | ||
| volumes: | ||
| - .:/app No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add environment variable loading, health checks, and resource limits.
The docker-compose configuration is minimal and lacks several production-ready features:
- Missing environment file: The API requires
API_TOKENfrom environment but the compose file doesn't load.env - No health checks: Container health monitoring is not configured
- No resource limits: Memory/CPU constraints are absent
- No restart policy: Service won't auto-restart on failure
- Development volume mount: The
.:/appmount is convenient for development but shouldn't be used in production
Apply this diff to add essential production configurations:
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
- volumes:
- - .:/app
+ env_file:
+ - .env
+ healthcheck:
+ test: ["CMD", "python", "-c", "import httpx; httpx.get('http://localhost:8000/health', timeout=2)"]
+ interval: 30s
+ timeout: 3s
+ retries: 3
+ start_period: 10s
+ restart: unless-stopped
+ deploy:
+ resources:
+ limits:
+ cpus: '2'
+ memory: 2G
+ reservations:
+ cpus: '0.5'
+ memory: 512MNote: This assumes a /health endpoint exists. Consider adding one to api.py.
🤖 Prompt for AI Agents
In docker-compose.yml around lines 1 to 8, the service definition is missing
production-ready settings: add env_file: .env to load API_TOKEN and other
secrets, add a healthcheck that polls an existing /health endpoint with sensible
interval, timeout and retries, add restart: always (or on-failure) so the
service auto-restarts, and add resource limits under deploy.resources.limits
(e.g., cpus and memory) to constrain CPU and memory; finally remove or replace
the development bind mount (.:/app) with either no volume or a read-only volume
for production. Ensure the healthcheck command, env_file path, restart policy
and resource values are set to appropriate production-safe values.
| # Use an official Python runtime as a parent image | ||
| FROM python:3.11-slim | ||
|
|
||
| # Set the working directory in the container | ||
| WORKDIR /app | ||
|
|
||
| # Install .NET SDK and other dependencies | ||
| RUN apt-get update && \ | ||
| apt-get install -y wget && \ | ||
| wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ | ||
| chmod +x ./dotnet-install.sh && \ | ||
| ./dotnet-install.sh --version 8.0.100 && \ | ||
| rm dotnet-install.sh | ||
|
|
||
| # Add .dotnet to the PATH | ||
| ENV PATH="/root/.dotnet:$PATH" | ||
|
|
||
| # Copy files required for build | ||
| COPY pyproject.toml . | ||
| COPY hatch_run_build_hook.py . | ||
| COPY build_differ.py . | ||
| COPY src/python_redlines/__about__.py src/python_redlines/__about__.py | ||
| COPY csproj/ csproj/ | ||
|
|
||
| # Install hatch | ||
| RUN pip install hatch | ||
|
|
||
| # Build the project (which includes running build_differ.py) | ||
| RUN hatch run default:build | ||
|
|
||
| # Copy the rest of the application code | ||
| COPY . . | ||
|
|
||
| # Expose the port the app runs on | ||
| EXPOSE 8000 | ||
|
|
||
| # Start the API server | ||
| CMD ["hatch", "run", "api:start"] No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical security issues: running as root and no multi-stage build.
The Dockerfile has several security and optimization issues:
- Runs as root: Container runs as root user (security risk)
- Single-stage build: Includes build tools in final image (bloat)
- No health check: Container health monitoring not configured
- Build-time network dependency: Downloads installer during build
Apply this diff to create a secure multi-stage build:
-# Use an official Python runtime as a parent image
-FROM python:3.11-slim
+# Multi-stage build for smaller, more secure images
+FROM python:3.11-slim AS builder
# Set the working directory in the container
WORKDIR /app
# Install .NET SDK and other dependencies
RUN apt-get update && \
apt-get install -y wget && \
wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \
chmod +x ./dotnet-install.sh && \
./dotnet-install.sh --version 8.0.100 && \
- rm dotnet-install.sh
+ rm dotnet-install.sh && \
+ apt-get clean && \
+ rm -rf /var/lib/apt/lists/*
# Add .dotnet to the PATH
ENV PATH="/root/.dotnet:$PATH"
# Copy files required for build
COPY pyproject.toml .
COPY hatch_run_build_hook.py .
COPY build_differ.py .
COPY src/python_redlines/__about__.py src/python_redlines/__about__.py
COPY csproj/ csproj/
# Install hatch
RUN pip install hatch
# Build the project (which includes running build_differ.py)
RUN hatch run default:build
+# Final stage - runtime image
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# Create non-root user
+RUN groupadd -r appuser && useradd -r -g appuser appuser && \
+ chown -R appuser:appuser /app
+
+# Copy built artifacts from builder
+COPY --from=builder --chown=appuser:appuser /app/dist ./dist
+COPY --from=builder --chown=appuser:appuser /app/src ./src
+
# Copy the rest of the application code
-COPY . .
+COPY --chown=appuser:appuser pyproject.toml .
+COPY --chown=appuser:appuser README.md .
+
+# Install runtime dependencies only
+RUN pip install --no-cache-dir hatch && \
+ pip install --no-cache-dir -e .
+
+# Switch to non-root user
+USER appuser
# Expose the port the app runs on
EXPOSE 8000
+# Add health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
+ CMD python -c "import httpx; httpx.get('http://localhost:8000/health', timeout=2)" || exit 1
+
# Start the API server
CMD ["hatch", "run", "api:start"]🤖 Prompt for AI Agents
In Dockerfile lines 1-38, the image currently runs as root, performs build and
downloads installers in the final image, and lacks a HEALTHCHECK; convert to a
secure multi-stage build by creating a builder stage (use an official dotnet SDK
base like mcr.microsoft.com/dotnet/sdk:8.0 or include the dotnet installer in
build context so no external network fetch at build time), perform all apt
installs, dotnet and Python build steps and hatch build in that builder, produce
only the runtime artifacts into a minimal final stage based on python:3.11-slim
(no build tools), create a non-root user in the final stage and set USER to it,
copy only needed files from builder, install runtime-only Python deps with pip
--no-cache-dir, and add a HEALTHCHECK instruction to probe the application
endpoint; ensure no wget/dotnet installer downloads happen in the final stage
and remove build-time tools from the final image.
| import os | ||
| import secrets | ||
| import zipfile | ||
| import io | ||
| from typing import Annotated | ||
|
|
||
| from dotenv import load_dotenv | ||
| from fastapi import ( | ||
| FastAPI, | ||
| File, | ||
| UploadFile, | ||
| Form, | ||
| Response, | ||
| Depends, | ||
| HTTPException, | ||
| status, | ||
| Header, | ||
| ) | ||
| from fastapi.responses import StreamingResponse | ||
| from python_redlines.engines import XmlPowerToolsEngine | ||
|
|
||
| load_dotenv() | ||
|
|
||
| app = FastAPI() | ||
|
|
||
| engine = XmlPowerToolsEngine() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add essential API features: CORS, health check, rate limiting.
The API lacks several production-ready features.
Add these features to the API:
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.middleware.trustedhost import TrustedHostMiddleware
import time
from collections import defaultdict
from threading import Lock
app = FastAPI(
title="Python Redlines API",
description="API for generating redlined Word documents",
version="1.0.0",
docs_url="/docs",
redoc_url="/redoc",
)
# CORS configuration
app.add_middleware(
CORSMiddleware,
allow_origins=os.getenv("CORS_ORIGINS", "*").split(","),
allow_credentials=True,
allow_methods=["POST"],
allow_headers=["*"],
)
# Trusted host middleware for security
app.add_middleware(
TrustedHostMiddleware,
allowed_hosts=os.getenv("ALLOWED_HOSTS", "*").split(",")
)
# Simple rate limiting (consider using slowapi for production)
request_counts = defaultdict(list)
rate_limit_lock = Lock()
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
client_ip = request.client.host
current_time = time.time()
with rate_limit_lock:
# Clean old requests (older than 1 minute)
request_counts[client_ip] = [
t for t in request_counts[client_ip]
if current_time - t < 60
]
# Check rate limit (e.g., 10 requests per minute)
if len(request_counts[client_ip]) >= 10:
logger.warning(f"Rate limit exceeded for {client_ip}")
raise HTTPException(
status_code=status.HTTP_429_TOO_MANY_REQUESTS,
detail="Rate limit exceeded. Try again later."
)
request_counts[client_ip].append(current_time)
response = await call_next(request)
return response
@app.get("/health")
async def health_check():
"""Health check endpoint for container orchestration."""
return {
"status": "healthy",
"timestamp": datetime.utcnow().isoformat(),
"version": "1.0.0"
}Critical: Authentication should use OAuth2/OIDC, not simple bearer tokens.
The current implementation uses a basic token comparison, which doesn't meet the authentication requirements specified in the coding guidelines.
As per coding guidelines: "Always ensure authentication is properly implemented via oauth2 or oidc."
The current token-based auth is insufficient for production use. Consider implementing OAuth2 with JWT tokens:
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "fastapi",
# "python-jose[cryptography]",
# "passlib[bcrypt]",
# "python-multipart",
# "loguru",
# "python-dotenv",
# ]
# ///
from datetime import datetime, timedelta
from typing import Annotated
from fastapi import Depends, FastAPI, HTTPException, status
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
from jose import JWTError, jwt
from passlib.context import CryptContext
from loguru import logger
# OAuth2 configuration
SECRET_KEY = os.getenv("SECRET_KEY") # Different from API_TOKEN
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 30
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
async def get_current_user(token: Annotated[str, Depends(oauth2_scheme)]):
credentials_exception = HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
headers={"WWW-Authenticate": "Bearer"},
)
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
username: str = payload.get("sub")
if username is None:
raise credentials_exception
except JWTError:
raise credentials_exception
return username
@app.post("/token")
async def login(form_data: Annotated[OAuth2PasswordRequestForm, Depends()]):
# Implement user authentication logic here
# This is a placeholder
access_token = create_access_token(data={"sub": form_data.username})
return {"access_token": access_token, "token_type": "bearer"}Then update endpoints to use Depends(get_current_user) instead of Depends(verify_token).
🛠️ Refactor suggestion | 🟠 Major
Add comprehensive logging and observability.
The API lacks structured logging, which is critical for debugging and monitoring in production.
Apply this diff to add loguru-based logging:
+#!/usr/bin/env -S uv run
+# /// script
+# requires-python = ">=3.12"
+# dependencies = [
+# "fastapi",
+# "uvicorn",
+# "python-multipart",
+# "python-dotenv",
+# "loguru",
+# ]
+# ///
+
import os
import secrets
import zipfile
import io
from typing import Annotated
from dotenv import load_dotenv
+from loguru import logger
from fastapi import (
FastAPI,
File,
UploadFile,
Form,
Response,
Depends,
HTTPException,
status,
Header,
+ Request,
)
from fastapi.responses import StreamingResponse
from python_redlines.engines import XmlPowerToolsEngine
load_dotenv()
+# Configure logging
+logger.add(
+ "logs/api_{time:YYYY-MM-DD}.log",
+ rotation="1 day",
+ retention="30 days",
+ level=os.getenv("LOG_LEVEL", "INFO"),
+)
+
app = FastAPI()
+
+@app.middleware("http")
+async def log_requests(request: Request, call_next):
+ logger.info(f"Request: {request.method} {request.url.path}")
+ try:
+ response = await call_next(request)
+ logger.info(f"Response: {response.status_code}")
+ return response
+ except Exception as e:
+ logger.error(f"Request failed: {e}")
+ raiseAs per coding guidelines: Implement loguru for structured logging.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import os | |
| import secrets | |
| import zipfile | |
| import io | |
| from typing import Annotated | |
| from dotenv import load_dotenv | |
| from fastapi import ( | |
| FastAPI, | |
| File, | |
| UploadFile, | |
| Form, | |
| Response, | |
| Depends, | |
| HTTPException, | |
| status, | |
| Header, | |
| ) | |
| from fastapi.responses import StreamingResponse | |
| from python_redlines.engines import XmlPowerToolsEngine | |
| load_dotenv() | |
| app = FastAPI() | |
| engine = XmlPowerToolsEngine() | |
| #!/usr/bin/env -S uv run | |
| # /// script | |
| # requires-python = ">=3.12" | |
| # dependencies = [ | |
| # "fastapi", | |
| # "uvicorn", | |
| # "python-multipart", | |
| # "python-dotenv", | |
| # "loguru", | |
| # ] | |
| # /// | |
| import os | |
| import secrets | |
| import zipfile | |
| import io | |
| from typing import Annotated | |
| from dotenv import load_dotenv | |
| from loguru import logger | |
| from fastapi import ( | |
| FastAPI, | |
| File, | |
| UploadFile, | |
| Form, | |
| Response, | |
| Depends, | |
| HTTPException, | |
| status, | |
| Header, | |
| Request, | |
| ) | |
| from fastapi.responses import StreamingResponse | |
| from python_redlines.engines import XmlPowerToolsEngine | |
| load_dotenv() | |
| # Configure logging | |
| logger.add( | |
| "logs/api_{time:YYYY-MM-DD}.log", | |
| rotation="1 day", | |
| retention="30 days", | |
| level=os.getenv("LOG_LEVEL", "INFO"), | |
| ) | |
| app = FastAPI() | |
| @app.middleware("http") | |
| async def log_requests(request: Request, call_next): | |
| logger.info(f"Request: {request.method} {request.url.path}") | |
| try: | |
| response = await call_next(request) | |
| logger.info(f"Response: {response.status_code}") | |
| return response | |
| except Exception as e: | |
| logger.error(f"Request failed: {e}") | |
| raise | |
| engine = XmlPowerToolsEngine() |
🤖 Prompt for AI Agents
In src/python_redlines/api.py lines 1-27, add structured loguru-based logging
and basic observability: import logger from loguru and configure a sink
(stdout/stderr) with level read from environment (e.g., LOG_LEVEL), JSON
formatting or structured output, and sensible rotation/retention; add a helper
to bind request_id (generate via secrets.token_hex) and timestamp to each
request; register FastAPI middleware that on request creates a bound logger and
logs method, path, headers (non-sensitive), and on response logs status_code and
duration; add an exception handler that logs stack traces and error context
before raising HTTPException; ensure any engine creation or startup (engine =
XmlPowerToolsEngine()) is logged with success/failure and error details; do not
leave prints—use logger.info/debug/error with structured fields.
| @app.post("/redline/single", dependencies=[Depends(verify_token)]) | ||
| async def redline_single( | ||
| original: UploadFile = File(...), | ||
| modified: UploadFile = File(...), | ||
| author_tag: str = Form("Author"), | ||
| ): | ||
| """ | ||
| Accepts two .docx files and returns a single redlined .docx file. | ||
| """ | ||
| original_content = await original.read() | ||
| modified_content = await modified.read() | ||
|
|
||
| if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, | ||
| detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", | ||
| ) | ||
|
|
||
| redline_output, _, _ = engine.run_redline( | ||
| author_tag, original_content, modified_content | ||
| ) | ||
|
|
||
| return Response( | ||
| content=redline_output, | ||
| media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document", | ||
| headers={"Content-Disposition": "attachment; filename=redline.docx"}, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use async file operations and add request validation.
The endpoint uses synchronous file reading and lacks content-type validation.
The Ruff B008 warnings about File() in defaults are false positives for FastAPI - this is the correct FastAPI pattern. However, the synchronous .read() is a real issue.
Apply this diff:
@app.post("/redline/single", dependencies=[Depends(verify_token)])
async def redline_single(
- original: UploadFile = File(...),
- modified: UploadFile = File(...),
+ original: Annotated[UploadFile, File(description="Original .docx file")],
+ modified: Annotated[UploadFile, File(description="Modified .docx file")],
author_tag: str = Form("Author"),
-):
+) -> Response:
"""
Accepts two .docx files and returns a single redlined .docx file.
+
+ Args:
+ original: Original Word document
+ modified: Modified Word document
+ author_tag: Author name for tracking (default: "Author")
+
+ Returns:
+ Response: Redlined document as .docx
+
+ Raises:
+ HTTPException: If file size exceeds limit or content type is invalid
"""
- original_content = await original.read()
- modified_content = await modified.read()
+ # Validate content types
+ valid_content_type = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+ if original.content_type not in [valid_content_type, "application/octet-stream"]:
+ logger.warning(f"Invalid content type for original: {original.content_type}")
+ raise HTTPException(
+ status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE,
+ detail="Original file must be a .docx file"
+ )
+ if modified.content_type not in [valid_content_type, "application/octet-stream"]:
+ logger.warning(f"Invalid content type for modified: {modified.content_type}")
+ raise HTTPException(
+ status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE,
+ detail="Modified file must be a .docx file"
+ )
+
+ # Validate author_tag to prevent injection
+ if not author_tag or len(author_tag) > 100 or not author_tag.isprintable():
+ logger.warning(f"Invalid author_tag: {author_tag}")
+ raise HTTPException(
+ status_code=status.HTTP_400_BAD_REQUEST,
+ detail="Invalid author_tag: must be 1-100 printable characters"
+ )
+
+ logger.info(f"Processing single redline request: {original.filename} vs {modified.filename}")
+
+ try:
+ original_content = await original.read()
+ modified_content = await modified.read()
+ except Exception as e:
+ logger.error(f"Failed to read uploaded files: {e}")
+ raise HTTPException(
+ status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+ detail="Failed to read uploaded files"
+ )
if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
+ logger.warning(f"File size exceeded: original={len(original_content)}, modified={len(modified_content)}")
raise HTTPException(
status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB",
)
- redline_output, _, _ = engine.run_redline(
- author_tag, original_content, modified_content
- )
+ try:
+ redline_output, _, _ = engine.run_redline(
+ author_tag, original_content, modified_content
+ )
+ except Exception as e:
+ logger.error(f"Redline generation failed: {e}")
+ raise HTTPException(
+ status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+ detail="Failed to generate redline document"
+ )
+
+ logger.success(f"Successfully generated single redline")
return Response(
content=redline_output,
media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
headers={"Content-Disposition": "attachment; filename=redline.docx"},
)Note: The UploadFile.read() is already async in Starlette/FastAPI, so using await is correct. The issue is that we need better error handling and validation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @app.post("/redline/single", dependencies=[Depends(verify_token)]) | |
| async def redline_single( | |
| original: UploadFile = File(...), | |
| modified: UploadFile = File(...), | |
| author_tag: str = Form("Author"), | |
| ): | |
| """ | |
| Accepts two .docx files and returns a single redlined .docx file. | |
| """ | |
| original_content = await original.read() | |
| modified_content = await modified.read() | |
| if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: | |
| raise HTTPException( | |
| status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, | |
| detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", | |
| ) | |
| redline_output, _, _ = engine.run_redline( | |
| author_tag, original_content, modified_content | |
| ) | |
| return Response( | |
| content=redline_output, | |
| media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document", | |
| headers={"Content-Disposition": "attachment; filename=redline.docx"}, | |
| ) | |
| @app.post("/redline/single", dependencies=[Depends(verify_token)]) | |
| async def redline_single( | |
| original: Annotated[UploadFile, File(description="Original .docx file")], | |
| modified: Annotated[UploadFile, File(description="Modified .docx file")], | |
| author_tag: str = Form("Author"), | |
| ) -> Response: | |
| """ | |
| Accepts two .docx files and returns a single redlined .docx file. | |
| Args: | |
| original: Original Word document | |
| modified: Modified Word document | |
| author_tag: Author name for tracking (default: "Author") | |
| Returns: | |
| Response: Redlined document as .docx | |
| Raises: | |
| HTTPException: If file size exceeds limit or content type is invalid | |
| """ | |
| # Validate content types | |
| valid_content_type = "application/vnd.openxmlformats-officedocument.wordprocessingml.document" | |
| if original.content_type not in [valid_content_type, "application/octet-stream"]: | |
| logger.warning(f"Invalid content type for original: {original.content_type}") | |
| raise HTTPException( | |
| status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE, | |
| detail="Original file must be a .docx file" | |
| ) | |
| if modified.content_type not in [valid_content_type, "application/octet-stream"]: | |
| logger.warning(f"Invalid content type for modified: {modified.content_type}") | |
| raise HTTPException( | |
| status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE, | |
| detail="Modified file must be a .docx file" | |
| ) | |
| # Validate author_tag to prevent injection | |
| if not author_tag or len(author_tag) > 100 or not author_tag.isprintable(): | |
| logger.warning(f"Invalid author_tag: {author_tag}") | |
| raise HTTPException( | |
| status_code=status.HTTP_400_BAD_REQUEST, | |
| detail="Invalid author_tag: must be 1-100 printable characters" | |
| ) | |
| logger.info(f"Processing single redline request: {original.filename} vs {modified.filename}") | |
| try: | |
| original_content = await original.read() | |
| modified_content = await modified.read() | |
| except Exception as e: | |
| logger.error(f"Failed to read uploaded files: {e}") | |
| raise HTTPException( | |
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail="Failed to read uploaded files" | |
| ) | |
| if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: | |
| logger.warning(f"File size exceeded: original={len(original_content)}, modified={len(modified_content)}") | |
| raise HTTPException( | |
| status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, | |
| detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", | |
| ) | |
| try: | |
| redline_output, _, _ = engine.run_redline( | |
| author_tag, original_content, modified_content | |
| ) | |
| except Exception as e: | |
| logger.error(f"Redline generation failed: {e}") | |
| raise HTTPException( | |
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail="Failed to generate redline document" | |
| ) | |
| logger.success("Successfully generated single redline") | |
| return Response( | |
| content=redline_output, | |
| media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document", | |
| headers={"Content-Disposition": "attachment; filename=redline.docx"}, | |
| ) |
🧰 Tools
🪛 Ruff (0.13.3)
59-59: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
60-60: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
| @pytest.mark.asyncio | ||
| async def test_redline_single(headers): | ||
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | ||
| with open("tests/test_data/original.docx", "rb") as original_file: | ||
| with open("tests/test_data/modified.docx", "rb") as modified_file: | ||
| files = { | ||
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| } | ||
| data = {"author_tag": "test_author"} | ||
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | ||
|
|
||
| assert response.status_code == 200 | ||
| assert response.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document" | ||
| assert "attachment; filename=redline.docx" in response.headers["content-disposition"] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add docstring and validate response body content.
The test correctly validates the happy path but is missing:
- Docstring (coding guideline requirement)
- Response body validation (should verify it's a valid DOCX)
Apply this diff:
@pytest.mark.asyncio
async def test_redline_single(headers):
+ """Test /redline/single endpoint returns a valid redlined DOCX with correct headers."""
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
assert response.status_code == 200
assert response.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
assert "attachment; filename=redline.docx" in response.headers["content-disposition"]
+ # Verify response body is non-empty and has DOCX signature
+ assert len(response.content) > 0
+ assert response.content[:4] == b"PK\x03\x04", "Response should be a valid ZIP/DOCX file"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.asyncio | |
| async def test_redline_single(headers): | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| with open("tests/test_data/original.docx", "rb") as original_file: | |
| with open("tests/test_data/modified.docx", "rb") as modified_file: | |
| files = { | |
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | |
| assert response.status_code == 200 | |
| assert response.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document" | |
| assert "attachment; filename=redline.docx" in response.headers["content-disposition"] | |
| @pytest.mark.asyncio | |
| async def test_redline_single(headers): | |
| """Test /redline/single endpoint returns a valid redlined DOCX with correct headers.""" | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| with open("tests/test_data/original.docx", "rb") as original_file: | |
| with open("tests/test_data/modified.docx", "rb") as modified_file: | |
| files = { | |
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | |
| assert response.status_code == 200 | |
| assert response.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document" | |
| assert "attachment; filename=redline.docx" in response.headers["content-disposition"] | |
| # Verify response body is non-empty and has DOCX signature | |
| assert len(response.content) > 0 | |
| assert response.content[:4] == b"PK\x03\x04", "Response should be a valid ZIP/DOCX file" |
🤖 Prompt for AI Agents
In tests/test_api.py around lines 18 to 33, the test function lacks a docstring
and doesn't validate the response body is a valid DOCX; add a one-line docstring
describing the test, then after asserting status and headers load
response.content into an io.BytesIO and open it with zipfile.ZipFile to verify
it is a valid ZIP-based DOCX by asserting the presence of expected entries
(e.g., "[Content_Types].xml" and "word/document.xml") and that the file
size/content is non-empty; raise/assert appropriately if ZipFile fails or
entries are missing.
| @pytest.mark.asyncio | ||
| async def test_redline_double(headers): | ||
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | ||
| with open("tests/test_data/original.docx", "rb") as original_file: | ||
| with open("tests/test_data/modified.docx", "rb") as modified_file: | ||
| files = { | ||
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| } | ||
| data = {"author_tag": "test_author"} | ||
| response = await client.post("/redline/double", files=files, data=data, headers=headers) | ||
|
|
||
| assert response.status_code == 200 | ||
| assert response.headers["content-type"] == "application/zip" | ||
| assert "attachment; filename=redlines.zip" in response.headers["content-disposition"] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add docstring and validate ZIP response structure.
Similar to test_redline_single, this test needs docstring and response body validation.
Apply this diff:
@pytest.mark.asyncio
async def test_redline_double(headers):
+ """Test /redline/double endpoint returns a ZIP with two redlined DOCX files."""
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/double", files=files, data=data, headers=headers)
assert response.status_code == 200
assert response.headers["content-type"] == "application/zip"
assert "attachment; filename=redlines.zip" in response.headers["content-disposition"]
+ # Verify response is a valid ZIP
+ assert len(response.content) > 0
+ assert response.content[:4] == b"PK\x03\x04", "Response should be a valid ZIP file"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.asyncio | |
| async def test_redline_double(headers): | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| with open("tests/test_data/original.docx", "rb") as original_file: | |
| with open("tests/test_data/modified.docx", "rb") as modified_file: | |
| files = { | |
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/double", files=files, data=data, headers=headers) | |
| assert response.status_code == 200 | |
| assert response.headers["content-type"] == "application/zip" | |
| assert "attachment; filename=redlines.zip" in response.headers["content-disposition"] | |
| @pytest.mark.asyncio | |
| async def test_redline_double(headers): | |
| """Test /redline/double endpoint returns a ZIP with two redlined DOCX files.""" | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| with open("tests/test_data/original.docx", "rb") as original_file: | |
| with open("tests/test_data/modified.docx", "rb") as modified_file: | |
| files = { | |
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/double", files=files, data=data, headers=headers) | |
| assert response.status_code == 200 | |
| assert response.headers["content-type"] == "application/zip" | |
| assert "attachment; filename=redlines.zip" in response.headers["content-disposition"] | |
| # Verify response is a valid ZIP | |
| assert len(response.content) > 0 | |
| assert response.content[:4] == b"PK\x03\x04", "Response should be a valid ZIP file" |
| @pytest.mark.asyncio | ||
| async def test_redline_single_no_token(): | ||
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | ||
| response = await client.post("/redline/single") | ||
| assert response.status_code == 422 # Validation error for missing header | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add docstring to security test.
The test correctly validates missing token handling but needs documentation.
Apply this diff:
@pytest.mark.asyncio
async def test_redline_single_no_token():
+ """Test /redline/single returns 422 validation error when X-Api-Token header is missing."""
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.post("/redline/single")
assert response.status_code == 422 # Validation error for missing header📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.asyncio | |
| async def test_redline_single_no_token(): | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| response = await client.post("/redline/single") | |
| assert response.status_code == 422 # Validation error for missing header | |
| @pytest.mark.asyncio | |
| async def test_redline_single_no_token(): | |
| """Test /redline/single returns 422 validation error when X-Api-Token header is missing.""" | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| response = await client.post("/redline/single") | |
| assert response.status_code == 422 # Validation error for missing header |
🤖 Prompt for AI Agents
In tests/test_api.py around lines 52 to 57, the test function
test_redline_single_no_token lacks a docstring explaining its intent; add a
one-line docstring directly under the async def that describes that this test
asserts the endpoint returns 422 when the authentication header/token is
missing, e.g. "Ensure POST /redline/single returns 422 when Authorization token
is not provided." Keep it concise and place it before the test body.
| @pytest.mark.asyncio | ||
| async def test_redline_single_invalid_token(): | ||
| headers = {"X-Api-Token": "invalid-token"} | ||
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | ||
| with open("tests/test_data/original.docx", "rb") as original_file: | ||
| with open("tests/test_data/modified.docx", "rb") as modified_file: | ||
| files = { | ||
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| } | ||
| data = {"author_tag": "test_author"} | ||
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | ||
| assert response.status_code == 401 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add docstring to security test.
The test correctly validates invalid token handling but needs documentation.
Apply this diff:
@pytest.mark.asyncio
async def test_redline_single_invalid_token():
+ """Test /redline/single returns 401 unauthorized when X-Api-Token header has invalid value."""
headers = {"X-Api-Token": "invalid-token"}
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.asyncio | |
| async def test_redline_single_invalid_token(): | |
| headers = {"X-Api-Token": "invalid-token"} | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| with open("tests/test_data/original.docx", "rb") as original_file: | |
| with open("tests/test_data/modified.docx", "rb") as modified_file: | |
| files = { | |
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | |
| assert response.status_code == 401 | |
| @pytest.mark.asyncio | |
| async def test_redline_single_invalid_token(): | |
| """Test /redline/single returns 401 unauthorized when X-Api-Token header has invalid value.""" | |
| headers = {"X-Api-Token": "invalid-token"} | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| with open("tests/test_data/original.docx", "rb") as original_file: | |
| with open("tests/test_data/modified.docx", "rb") as modified_file: | |
| files = { | |
| "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | |
| assert response.status_code == 401 |
🤖 Prompt for AI Agents
In tests/test_api.py around lines 59 to 72, the security test lacks a docstring;
add a one-line docstring immediately below the test function definition
describing purpose (e.g., "Ensure endpoint returns 401 for invalid API token
when submitting DOCX files"), keep it brief and in present tense, and ensure it
does not change test behavior or indentation.
| @pytest.mark.asyncio | ||
| async def test_file_size_limit(headers): | ||
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | ||
| # Create a dummy file larger than the limit | ||
| large_file_content = b"a" * (MAX_FILE_SIZE + 1) | ||
| files = { | ||
| "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| "modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | ||
| } | ||
| data = {"author_tag": "test_author"} | ||
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | ||
|
|
||
| assert response.status_code == 413 No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix critical resource leak and add docstring.
This test has a critical resource leak on line 81 where a file is opened without proper cleanup. This will cause issues on Windows and under concurrent test execution.
Apply this diff to fix the leak and add docstring:
@pytest.mark.asyncio
async def test_file_size_limit(headers):
+ """Test /redline/single returns 413 when original file exceeds MAX_FILE_SIZE."""
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
# Create a dummy file larger than the limit
large_file_content = b"a" * (MAX_FILE_SIZE + 1)
- files = {
- "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
- "modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
- }
- data = {"author_tag": "test_author"}
- response = await client.post("/redline/single", files=files, data=data, headers=headers)
+ with open("tests/test_data/modified.docx", "rb") as modified_file:
+ files = {
+ "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
+ "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
+ }
+ data = {"author_tag": "test_author"}
+ response = await client.post("/redline/single", files=files, data=data, headers=headers)
assert response.status_code == 413📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.asyncio | |
| async def test_file_size_limit(headers): | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| # Create a dummy file larger than the limit | |
| large_file_content = b"a" * (MAX_FILE_SIZE + 1) | |
| files = { | |
| "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| "modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post("/redline/single", files=files, data=data, headers=headers) | |
| assert response.status_code == 413 | |
| @pytest.mark.asyncio | |
| async def test_file_size_limit(headers): | |
| """Test /redline/single returns 413 when original file exceeds MAX_FILE_SIZE.""" | |
| async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: | |
| # Create a dummy file larger than the limit | |
| large_file_content = b"a" * (MAX_FILE_SIZE + 1) | |
| with open("tests/test_data/modified.docx", "rb") as modified_file: | |
| files = { | |
| "original": ( | |
| "large.docx", | |
| large_file_content, | |
| "application/vnd.openxmlformats-officedocument.wordprocessingml.document", | |
| ), | |
| "modified": ( | |
| "modified.docx", | |
| modified_file, | |
| "application/vnd.openxmlformats-officedocument.wordprocessingml.document", | |
| ), | |
| } | |
| data = {"author_tag": "test_author"} | |
| response = await client.post( | |
| "/redline/single", files=files, data=data, headers=headers | |
| ) | |
| assert response.status_code == 413 |
🤖 Prompt for AI Agents
In tests/test_api.py around lines 74 to 86, the test_file_size_limit opens
tests/test_data/modified.docx without closing it (line ~81) and lacks a
docstring; change the test to open the file using a context manager (with
open(...) as modified_file:) so the file is closed after constructing the files
payload (or alternatively load its bytes before building files), and add a brief
docstring at the top of the test describing what it validates (e.g., "Ensure
uploading a file larger than MAX_FILE_SIZE returns 413").
User description
This commit introduces a FastAPI application to provide a web interface for the document redlining functionality.
The key features of this change are:
/redline/single: Accepts two.docxfiles and returns a single redlined document./redline/double: Accepts two.docxfiles and returns a zip archive with two redlined documents (original vs. modified and modified vs. original).Dockerfileanddocker-compose.ymlfor easy containerization and deployment.build_differ.py) to correctly build and package the C# binaries for multiple platforms.pyproject.tomlto manage dependencies and configure the test environment..gitignorefile to exclude build artifacts and environment files..env.examplefile to provide a template for setting the API token.PR Type
Enhancement
Description
Add FastAPI web service for document redlining
Implement API token authentication and file size limits
Create Docker containerization with multi-platform builds
Add comprehensive test coverage for API endpoints
Diagram Walkthrough
File Walkthrough
2 files
FastAPI service with redlining endpointsEnhanced multi-platform build script1 files
Comprehensive API endpoint tests3 files
Docker containerization setupDocker compose configurationEnvironment variables template2 files
.NET SDK installation scriptUpdated dependencies and API environment2 files