-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Advisory Details
Title: Arbitrary Infinite Loop Denial of Service (DoS) via Crafted PDF Table of Contents
Description:
Summary
An unbounded while loop vulnerability in the toc_transformer function allows an unauthenticated attacker to cause a perpetual Denial of Service (DoS) and rapidly exhaust LLM API credits. By providing a PDF with an intentionally long Table of Contents, the system triggers length-truncated API responses that permanently trap the application into continuously querying the backend LLM API.
Details
The root cause resides in pageindex/page_index.py at line 303 within the toc_transformer() function. The application uses an LLM to structure a raw Table of Contents string into a hierarchical JSON format.
If the LLM's response hits the maximum output token limit (finish_reason == "length"), the application automatically attempts to instruct the model to "continue". Crucially, the while loop lacks any retry counter or iteration limits (unlike the correctly-patched extract_toc_content function which explicitly caps attempts to 5).
Consequently, if the model repeatedly truncates the JSON or rejects the completeness check, the execution falls into an inescapable infinite loop:
while not (if_complete == "yes" and finish_reason == "finished"):
# ... rebuilds prompt and calls ChatGPT_API_with_finish_reason
new_complete, finish_reason = ChatGPT_API_with_finish_reason(model=model, prompt=prompt)
# ...
if_complete = check_if_toc_transformation_is_complete(toc_content, last_complete, model)
# NO ITERATION LIMIT OR BAILOUT CONDITIONPoC
- Generate an adversarial PDF with thousands of sections in the TOC (sufficiently large to cause the LLM to truncate output), or set up a Mock OpenAI proxy that forcibly returns
finish_reason: "length". - Run the application via the CLI against the malicious PDF:
python run_pageindex.py --pdf_path evil_toc.pdf --model gpt-3.5-turbo
- Observe the process forever attempting to complete the TOC, utilizing 100% of a CPU thread and rapidly emitting requests. (In a real production environment, this drastically drains OpenAI API credits).
Log of Evidence
[*] Setting up Mock API environment variables on port 18080
[*] Triggering PageIndex parsing on the malicious PDF...
[*] Executing: python3 run_pageindex.py --pdf_path evil_toc.pdf --model gpt-3.5-turbo
[Target] Parsing PDF...
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
[MockAPI] Returning finish_reason: 'length' (max_output_reached)
[MockAPI] Returning completed: 'no'
...
[!] The process has been running for over 15 seconds, stuck in the infinite loop.
Impact
This vulnerability allows a complete and unauthenticated Denial of Service (DoS) by causing process hanging and unbounded API usage, resulting in service unavailability and the immediate financial exhaustion of the backend LLM service billing account.
Affected products
- Ecosystem: python
- Package name: PageIndex
- Affected versions: All versions currently in repository (
mainbranch) - Patched versions:
Severity
- Severity: High
- Vector string: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
Weaknesses
- CWE: CWE-835: Loop with Unreachable Exit Condition ('Infinite Loop')
Occurrences
| Permalink | Description |
|---|---|
| pageindex/page_index.py#L303 | The vulnerable unbounded while loop within toc_transformer failing to cap API retry attempts. |