### Batch API Folder Processing Upload Example

In [2]:
# Install the libraries (ipython is used for displaying markdown in this demo)
# !pip3 install --upgrade ipython
# !pip3 install --upgrade any-parser

In [3]:
import json
import os
from datetime import datetime

from dotenv import load_dotenv

from any_parser import AnyParser

In [4]:
# Load environment variables
load_dotenv(override=True)

# Get API key and create parser
api_key = os.environ.get("CAMBIO_API_KEY")
if not api_key:
    raise ValueError("CAMBIO_API_KEY is not set")
ap = AnyParser(api_key)

Create Batch Request

In [5]:
# Upload folder for batch processing
WORKING_FOLDER = "./sample_data"
responses = ap.batches.create(WORKING_FOLDER)

# Save responses to JSONL file with timestamp
timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
output_file = f"./sample_data_{timestamp}.jsonl"

with open(output_file, "w") as f:
    for response in responses:
        f.write(json.dumps(response.model_dump()) + "\n")

print(f"Upload responses saved to: {output_file}")

Upload responses saved to: ./sample_data_20250102134950.jsonl


Check the first element status in the jsonl using the requestId

In [11]:
# Get first response from the JSONL file
with open(output_file, "r") as f:
    first_response = json.loads(f.readline())

request_id = first_response["requestId"]
print(f"Checking status for file: {first_response['fileName']}")

# Retrieve status using request ID
markdown = ap.batches.retrieve(request_id)
if markdown and markdown.result:
    print("Content retrieved successfully")
else:
    print("Content not yet available")

Checking status for file: test3.pdf
Content not yet available


Note: Batch extraction is currently in beta testing. Processing time may take up to 2 hours to complete.

After 2 hours, you can check the content of the first file in the folder again

In [14]:
# Retrieve status using request ID
markdown = ap.batches.retrieve(request_id)
if markdown and markdown.result:
    print("Content retrieved successfully")
else:
    print("Content not yet available")

Content retrieved successfully


After the job is completed, refer to examples/parse_batch_fetch.ipynb to fetch all responses in the jsonl file:

https://github.com/CambioML/any-parser/blob/main/examples/parse_batch_fetch.ipynb


## End of the notebook

Check more [case studies](https://www.cambioml.com/blog) of CambioML!

<a href="https://www.cambioml.com/" title="Title">
    <img src="./sample_data/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>