Skip to content

feat(spreadsheet-analysis): fail-fast size guard on analyze_spreadsheet downloads#397

Merged
DerrickF merged 1 commit into
developfrom
feature/spreadsheet-size-guard
May 28, 2026
Merged

feat(spreadsheet-analysis): fail-fast size guard on analyze_spreadsheet downloads#397
DerrickF merged 1 commit into
developfrom
feature/spreadsheet-size-guard

Conversation

@DerrickF
Copy link
Copy Markdown
Contributor

Summary

Closes #258 (items 1+2). Streaming download/upload (item 3) tracked separately.

Adds a pre-download file size guard to analyze_spreadsheet. Previously, arbitrarily large files could be downloaded and base64-encoded into memory — a 100 MB XLSX becomes ~133 MB in-memory after inflation. This change stops that before any S3 or Code Interpreter call is made.

Changes

analyze_tool.py

  • Hard cap (ANALYZE_MAX_FILE_SIZE_BYTES, default 25 MB): files above this are rejected immediately with an actionable error message before _download_file is called. No S3 GetObject, no Code Interpreter start.
  • Soft warning (ANALYZE_WARN_FILE_SIZE_BYTES, default 10 MB): files in the 10–25 MB range proceed but attach a slow-analysis warning to both success and error responses.
  • Both thresholds are env-tunable without a redeploy. A logger.warning fires at module load if the thresholds are misconfigured (warn >= max).
  • Uses size_bytes already present on the file_info dict from list_spreadsheets — zero extra AWS calls.
  • Docstring Safety limits section updated to document the new caps.

test_size_guard.py (new file, 8 tests)

  • Oversize file rejected, Code Interpreter never started
  • Error message includes actual/limit sizes and remediation hint
  • Soft warning present on success response
  • Soft warning present on error response
  • Zero/None size_bytes does not block (regression guard for legacy records)
  • Threshold overrides via monkeypatch.setattr work correctly

Test results

140 passed, 1 warning in 6.51s

…et downloads

Adds 25 MB hard-fail and 10 MB soft-warning thresholds (env-tunable via
ANALYZE_MAX_FILE_SIZE_BYTES / ANALYZE_WARN_FILE_SIZE_BYTES). The check
runs before _download_file using the size_bytes already on file_info, so
oversize files never hit S3 GetObject or base64. A logger.warning fires
at module load when the thresholds are misconfigured (warn >= max).
Soft warning is attached to both success and error responses for files
in the 10-25 MB range. Docstring updated with the new safety limit.

Closes #258 (items 1+2). Streaming (item 3) tracked separately.
@DerrickF DerrickF merged commit 9d4a523 into develop May 28, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant