# Unstructured API On-Demand Jobs Quickstart

This notebook shows how to use the [Unstructured Python SDK](https://docs.unstructured.io/api-reference/workflow/overview#unstructured-python-sdk) to have Unstructured process local files by using its _on-demand jobs_ functionality, which is part of the Unstructured API's collection of [workflow operations](https://docs.unstructured.io/api-reference/workflow/overview).

---

üìù **Note**: The on-demand jobs functionality is designed to work *only by processing local files*.

To process files (and data) in remote file and blob storage, databases, and vector stores, you must use other workflow operations in the Unstructured API. To learn how, see the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb).  

---

## Requirements

To run this notebook, you will need:

- An Unstructured account. To sign up for an account, go to https://unstructured.io. In the top navigation bar, click **Get started for free**, and follow the on-screen directions to finish signing up. After you sign up, you are immediately signed in to your new **Let's Go** account, at https://platform.unstructured.io.
- An Unstructured API key, as follows:

  1. After you are signed in to your account, on the sidebar, click **API Keys**.
  2. Click **Generate New Key**.
  3. Enter some meaningful display name for the key, and then click **Continue**.
  4. Next to the new key's name, click the **Copy** icon. The key's value is copied to your system's clipboard. If you lose this key, simply return to the list and click **Copy** again.

- One or more local files for Unstructured to process. This notebook assumes that the local files you want to process are all PDF files, and that these PDFs are in a folder that is accessible from this notebook. The easiest and fastest way to create this folder is as follows:

  1. On this notebook's sidebar, click the folder (**Files**) icon. The **Files** pane opens and displays the contents of the `/content` folder. (This folder should have a hidden `.config` subfolder and a `sample_data` subfolder.)
  2. Right-click any blank area within the **Files** pane, and then click **New folder**.
  3. Enter a name for the new folder. This notebook assumes the folder is named `input`, and the path to this new folder is `/content/input`.
  4. To upload files to this folder, do the following:

     a. Rest your mouse pointer on the `input` folder.<br/>
     b. Click the ellipsis (three dots) icon, and then click **Upload**.<br/>
     c. Browse to and select the files on your local machine that you want to upload to this `input` folder.<br/>

---

‚ö†Ô∏è **Important**: Each on-demand job is limited to 10 files, and each file is limited to 10 MB in size.

---

- A destination folder for Unstructured to send its processed results to. This notebook assumes that the destination folder is accessible from this notebook. The easiest and fastest way to create this folder is as follows:

  1. If the **Files** pane is not already showing, on this notebook's sidebar, click the folder (**Files**) icon. The **Files** pane opens and displays the contents of the `/content` folder.
  2. Right-click any blank area within the **Files** pane, and then click **New folder**.
  3. Enter a name for the new destination folder. This notebook assumes the folder is named `output`, and the path to this new folder is `/content/output`.

---

‚ö†Ô∏è **Warning**: Any files that you upload to these `input` or `output` folders will be deleted whenever Google Colab disconnects or resets, for example due to inactivity, manual restart, or session timeout.

---


## Step 1: Install the Unstructured Python SDK and other required packages

Run the following cell to install the Unstructured Python SDK on a virtual machine (VM) in Google's cloud. This VM is associated with this notebook.

In [None]:
!pip install unstructured-client

Collecting unstructured-client
  Downloading unstructured_client-0.42.4-py3-none-any.whl.metadata (23 kB)
Collecting pypdf>=6.2.0 (from unstructured-client)
  Downloading pypdf-6.4.0-py3-none-any.whl.metadata (7.1 kB)
Downloading unstructured_client-0.42.4-py3-none-any.whl (207 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m207.9/207.9 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pypdf-6.4.0-py3-none-any.whl (329 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m329.5/329.5 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf, unstructured-client
Successfully installed pypdf-6.4.0 unstructured-client-0.42.4


## Step 2: Set your Unstructured API key

In the following cell, replace `<unstructured-api-key>` with the value of your API key, and then run the cell.

As a security best practice, you would typically set this key elsewhere (for example, as an environment variable or stored in a secure key vault) and then access it programmatically here. But to keep things simple here for demonstration purposes, just specify your API key in plaintext in the following cell.

In [None]:
UNSTRUCTURED_API_KEY = "<unstructured-api-key>"

## Step 3: Run an on-demand job

In this step, you use the Python package `httpx` to run an on-demand job. (The `httpx` package is required for now, until on-demand job functionality is added to the Unstructured Python SDK.) This job is based on a predefined Unstructured [workflow](https://docs.unstructured.io/ui/overview#how-does-it-work) definition that contains the following workflow nodes:

- A [High Res partitoner](https://docs.unstructured.io/ui/partitioning).
- An [image description enrichment](https://docs.unstructured.io/ui/enriching/image-descriptions).
- A [tables to HTML enrichment](https://docs.unstructured.io/ui/enriching/table-to-html).
- A [generative OCR optimization enrichment](https://docs.unstructured.io/ui/enriching/generative-ocr).

This predefined workflow definition does not apply [chunking](https://docs.unstructured.io/ui/chunking) or generate [embeddings](https://docs.unstructured.io/ui/embedding).

The result of this job is a unique job ID; a list of input file IDs, one input file ID per file; and a list of processing result outputs, one processing result output per file. Each processing result output is referenced by its output node ID and file ID.

To complete this step, run the following cell.

In [None]:
from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateJobRequest
from unstructured_client.models.shared import BodyCreateJob, InputFiles
import os, json

# Set variables for:

# - On-demand job's type.
# - On-demand job's workflow template name.
# - On-demand job's settings.
# - Path to local input files.

# - Input files array.
# - On-demand job ID.
# - On-demand job output file IDs and output node IDs.

job_type = "template"
template_id = "hi_res_and_enrichment"
request_data = json.dumps({"job_type": job_type, "template_id": template_id})
input_dir = "/content/input/"

files = []
job_id = ""
job_input_file_ids = []
job_output_node_files = []

# Read in all input files.
for filename in os.listdir(input_dir):
    full_path = os.path.join(input_dir, filename)

    # Skip non-files (for example, directories).
    if not os.path.isfile(full_path):
        continue

    files.append(
        (
            InputFiles(
                content=open(full_path, "rb"),
                file_name=filename,
                content_type="application/pdf"
            )
        )
    )

# Run the on-demand job, capturing the job ID and the job's
# input/output file IDs and output node IDs.
with UnstructuredClient(api_key_auth=UNSTRUCTURED_API_KEY) as client:
    response = client.jobs.create_job(
        request=CreateJobRequest(
            body_create_job=BodyCreateJob(
                request_data=request_data,
                input_files=files
            )
        )
    )

    job_id = response.job_information.id
    print(f"Job ID: {job_id}\n")

    job_input_file_ids = response.job_information.input_file_ids
    print("Input file details:\n")

    for job_input_file_id in job_input_file_ids:
        print(job_input_file_id)

    job_output_node_files = response.job_information.output_node_files
    print("\nOutput node file details:\n")

    for output_node_file in job_output_node_files:
        print(output_node_file)

Job ID: 9bd1a2f3-90a2-410a-9064-8d29dd296067

Input file details:

250713305v1-d88db765.pdf
H03-Cryptosystem-proposed-by-Nash-61a58972.pdf

Output node file details:

file_id='250713305v1-d88db765.pdf' node_id='93fc2ce8-e7c8-424f-a6aa-41460fc5d35d' node_subtype='unstructured_api' node_type='partition'
file_id='250713305v1-d88db765.pdf' node_id='4eb78731-4669-438c-9e2c-c76fcb1c9a52' node_subtype='openai_image_description' node_type='prompter'
file_id='250713305v1-d88db765.pdf' node_id='35cacdfe-3ac1-4183-bbf4-826cd88c882c' node_subtype='anthropic_ocr' node_type='prompter'
file_id='250713305v1-d88db765.pdf' node_id='ee5d4bf2-3783-4818-9f69-9ebbaa8778ea' node_subtype='anthropic_table2html' node_type='prompter'
file_id='H03-Cryptosystem-proposed-by-Nash-61a58972.pdf' node_id='93fc2ce8-e7c8-424f-a6aa-41460fc5d35d' node_subtype='unstructured_api' node_type='partition'
file_id='H03-Cryptosystem-proposed-by-Nash-61a58972.pdf' node_id='4eb78731-4669-438c-9e2c-c76fcb1c9a52' node_subtype='openai_

## Step 4: Poll for job completion

In this step, you monitor your job's progress and confirm its completion.

To complete this step, run the following cell, which lets you know how the job is progressing and when the job is completed.

Do not proceed to the next step until you see the message `Job is completed`.

In [None]:
import time

def poll_job_status(client, job_id):
    while True:
        response = client.jobs.get_job(
            request={
                "job_id": job_id
            }
        )

        job = response.job_information

        if job.status == "SCHEDULED":
            print("Job is scheduled, polling again in 10 seconds...")
            time.sleep(10)
        elif job.status == "IN_PROGRESS":
            print("Job is in progress, polling again in 10 seconds...")
            time.sleep(10)
        else:
            print("Job is completed.")
            break

    return job

with UnstructuredClient(api_key_auth=UNSTRUCTURED_API_KEY) as client:
    job = poll_job_status(client, job_id)
    print(f"Job details:\n---\n{job.model_dump_json(indent=4)}")

Job is in progress, polling again in 10 seconds...
Job is in progress, polling again in 10 seconds...
Job is in progress, polling again in 10 seconds...
Job is in progress, polling again in 10 seconds...
Job is in progress, polling again in 10 seconds...
Job is in progress, polling again in 10 seconds...
Job is in progress, polling again in 10 seconds...
Job is in progress, polling again in 10 seconds...
Job is in progress, polling again in 10 seconds...
Job is completed.
Job details:
---
{
    "created_at": "2025-12-10T18:45:45.911031Z",
    "id": "de7b344b-f30a-4739-880b-e7e204d4be4f",
    "status": "COMPLETED",
    "workflow_id": "a61a2082-a37b-4273-96fe-37e69763ff7b",
    "workflow_name": "Job de7b344b",
    "job_type": "ephemeral"
}


## Step 5: Download the job's processed results

In this step, you use the on-demand job's job ID and the input file IDs from Step 3 to download the job's results into the `/content/output` folder that you created during this notebook's Requirements.

To complete this step, run the following cell.

In [None]:
from unstructured_client.models.operations import DownloadJobOutputRequest
import json

output_dir = "/content/output/"

with UnstructuredClient(api_key_auth=UNSTRUCTURED_API_KEY) as client:
    for job_input_file_id in job_input_file_ids:
        print(f"Attempting to get processed results from file_id '{job_input_file_id}'...")

        response = client.jobs.download_job_output(
            request=DownloadJobOutputRequest(
                job_id=job_id,
                file_id=job_input_file_id
            )
        )

        output_path = os.path.join(output_dir, f"{job_input_file_id}.json")

        with open(output_path, "w") as f:
            json.dump(response.any, f, indent=4)

        print(f"Saved output for file_id '{job_input_file_id}' to '{output_path}'.\n")

Attempting to get processed results from file_id '250713305v1-d88db765.pdf'...
Saved output for file_id '250713305v1-d88db765.pdf' to '/content/output/250713305v1-d88db765.pdf.json'.

Attempting to get processed results from file_id 'H03-Cryptosystem-proposed-by-Nash-61a58972.pdf'...
Saved output for file_id 'H03-Cryptosystem-proposed-by-Nash-61a58972.pdf' to '/content/output/H03-Cryptosystem-proposed-by-Nash-61a58972.pdf.json'.



## Step 6: View the downloaded results

To view the downloaded job's results, do the following:

1. On this notebook's sidebar, click the folder (**Files**) icon, if the **Files** pane is not already shown.
2. In the **Files** pane, click to expand the `output` folder.
3. Double-click one of the files that end in `.json`.
4. The file's contents appear in a pane on the right side of this notebook. You should notice the following:

- Unstructured outputs its results in industry-standard [JSON](https://www.json.org/) format, which is ideal for RAG, agentic AI, and model fine-tuning.
- Each object in the JSON is called a [document element](https://docs.unstructured.io/ui/document-elements) and contains a `text` representation of the content that Unstructured detected for the particular portion of the document that was analyzed.
- The `type` is the kind of document element that Unstructured categorizes it as, such as whether it is a title (`Title`), a table (`Table`), an image (`Image`), a series of well-formulated sentences (`NarrativeText`), some kind of free text (`UncategorizedText`), a part of a list (`ListItem`), and so on. [Learn more](https://docs.unstructured.io/ui/document-elements#element-type).
- The `element_id` is a unique identifier that Unstructured generates to refer to each document element. [Learn more](https://docs.unstructured.io/ui/document-elements#element-id).
- `metadata` contains supporting details about each document element, such as the page number it occurred on, the file it occurred in, and so on. [Learn more](https://docs.unstructured.io/ui/document-elements#metadata).


## Next steps

Congratulations! You have just run an on-demand job with Unstructured.

Learn more about [on-demand jobs](https://docs.unstructured.io/api-reference/workflow/overview#run-an-on-demand-job).

You can also learn more about the [Unstructured API](https://docs.unstructured.io/api-reference/overview).

This notebook shows how to process local files only. To process files (and data) in remote file and blob storage, databases, and vector stores, you must use other workflow operations in the Unstructured API. To learn how, see the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb).