Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 17 additions & 20 deletions backend/app/api/v1/report_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,6 @@

router = APIRouter()

def cleanup_file(filepath: str):
"""
Removes the file at the given filepath.
"""
if os.path.exists(filepath):
os.remove(filepath)

def render_report_html(report_data: dict) -> str:
"""
Renders the report JSON data into a basic HTML structure.
Expand Down Expand Up @@ -139,35 +132,39 @@ async def get_report_pdf(
if not final_report_json:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Final report content not available.")

report_dir = settings.REPORT_OUTPUT_DIR
report_dir.mkdir(parents=True, exist_ok=True)
pdf_filepath = report_dir / f"report_{report_id}.pdf"

if pdf_filepath.exists():
logger.info(f"Serving existing PDF report for report_id: {report_id}")
return FileResponse(
path=pdf_filepath,
media_type="application/pdf",
filename=f"report_{report_id}.pdf",
status_code=status.HTTP_200_OK
)

logger.info(f"Generating new PDF report for report_id: {report_id}")
html_content = render_report_html(final_report_json)

pdf_path = None
try:
# Offload blocking PDF generation to a thread pool
loop = asyncio.get_event_loop()
pdf_file = await loop.run_in_executor(
None, # Use default thread pool executor
lambda: tempfile.NamedTemporaryFile(delete=False, suffix=".pdf")
)
pdf_path = pdf_file.name
await loop.run_in_executor(
None,
lambda: HTML(string=html_content).write_pdf(pdf_path)
lambda: HTML(string=html_content).write_pdf(pdf_filepath)
)
Comment on lines +139 to 157
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Race condition: concurrent PDF generation for the same report.

When multiple requests arrive simultaneously for a report that doesn't have a cached PDF, all requests will pass the pdf_filepath.exists() check and attempt to generate the PDF concurrently. This can lead to:

  • Corrupted PDF files due to simultaneous writes
  • Wasted compute resources
  • Inconsistent user experience

Consider using file-based locking or an in-memory lock (e.g., using asyncio.Lock stored in a dictionary keyed by report_id) to ensure only one request generates the PDF at a time, while others wait for the generation to complete.

Example approach using asyncio locks:

# At module level
_pdf_generation_locks: dict[str, asyncio.Lock] = {}

# In get_report_pdf function, before checking if PDF exists:
if report_id not in _pdf_generation_locks:
    _pdf_generation_locks[report_id] = asyncio.Lock()

async with _pdf_generation_locks[report_id]:
    # Re-check existence after acquiring lock
    if pdf_filepath.exists():
        logger.info(f"Serving existing PDF report for report_id: {report_id}")
        return FileResponse(...)
    
    # Generate PDF...
🤖 Prompt for AI Agents
In backend/app/api/v1/report_download.py around lines 139 to 157, multiple
concurrent requests can pass the pdf_filepath.exists() check and start PDF
generation simultaneously; add a per-report asyncio lock to serialize
generation: create a module-level dict mapping report_id to asyncio.Lock,
acquire the lock before re-checking pdf_filepath.exists(), return the
FileResponse if rediscovered, otherwise generate the PDF inside the lock (using
the existing executor call), then release the lock and optionally remove the
lock from the dict to avoid growth; this ensures only one coroutine writes the
PDF while others wait and then serve the cached file.

pdf_file.close()

# Return the PDF with appropriate headers and add cleanup task
# Return the newly generated PDF
response = FileResponse(
path=pdf_path,
path=pdf_filepath,
media_type="application/pdf",
filename=f"report_{report_id}.pdf",
status_code=status.HTTP_200_OK
)
background_tasks.add_task(cleanup_file, pdf_path)
return response
except Exception as e:
if pdf_path:
cleanup_file(pdf_path)
logger.exception(f"Failed to generate PDF report for report_id: {report_id}", extra={"report_id": report_id})
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
Expand Down
2 changes: 1 addition & 1 deletion backend/app/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ class Settings(BaseSettings):
TEST_DB_NAME: str | None = None

BASE_DIR: Path = Path(__file__).parent.parent.parent
REPORT_OUTPUT_DIR: Path = BASE_DIR / "reports_output"
REPORT_OUTPUT_DIR: Path = BASE_DIR / "storage" / "reports"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Missing import for Path.

The Path class is used on lines 53-54 but is not imported. This will cause a NameError at runtime when the Settings class is instantiated.

Add the missing import at the top of the file:

 from pydantic_settings import BaseSettings, SettingsConfigDict
 from typing import Dict, List
+from pathlib import Path
🧰 Tools
🪛 Ruff (0.14.8)

54-54: Undefined name Path

(F821)

🤖 Prompt for AI Agents
In backend/app/core/config.py around line 54, the Settings class uses Path
(REPORT_OUTPUT_DIR) but Path is not imported, causing a NameError; add the
missing import by importing Path from the pathlib module at the top of the file
(for example: from pathlib import Path) so the REPORT_OUTPUT_DIR assignment
resolves correctly.


model_config = SettingsConfigDict(env_file=".env")

Expand Down