# Unstructured Partition Endpoint Quickstart

This notebook shows how to use the [Unstructured Python SDK](https://docs.unstructured.io/api-reference/partition/sdk-python) to have Unstructured process a local file by using the [Unstructured Partition Endpoint](https://docs.unstructured.io/api-reference/partition/overview).

---

📝 **Note**: The Unstructured Partition Endpoint as described in this notebook is intended only for rapid prototyping of some of Unstructured's [partitioning](https://docs.unstructured.io/api-reference/partition/partitioning) strategies, with limited support for [chunking](https://docs.unstructured.io/api-reference/partition/chunking). It is designed to work *only with processing of local files*.

Take your code to the next level by switching over to the [Unstructured Workflow Endpoint](https://docs.unstructured.io/api-reference/workflow/overview) for production-level scenarios, file processing in batches, files and data in remote locations, full support for [chunking](https://docs.unstructured.io/ui/chunking), generating [embeddings](https://docs.unstructured.io/ui/embedding), applying post-transform [enrichments](https://docs.unstructured.io/ui/enriching/overview), using the latest and highest-performing models, and much more. [Get started](https://docs.unstructured.io/api-reference/workflow/overview).  

---

## Requirements

To run this notebook, you will need:

- An Unstructured account. To sign up for an account, go to https://unstructured.io. In the top navigation bar, click **Get started for free**, and follow the on-screen directions to finish signing up. After you sign up, you are immediately signed in to your new account, at https://platform.unstructured.io.
- An Unstructured API key, as follows:

  1. After you are signed in to your account, click **API Keys**.
  2. Click **Generate New Key**.
  3. Enter some meaningful display name for the key, and then click **Continue**.
  4. Next to the new key's name, click the **Copy** icon. The key's value is copied to your system's clipboard. If you lose this key, simply return to the list and click **Copy** again.

- One or more local files for Unstructured to process. This notebook assumes that the local files you want to process are in a folder that is accessible from this notebook. The easiest and fastest way to create this folder is as follows:

  1. On this notebook's sidebar, click the folder (**Files**) icon.
  2. Right-click the folder with two dots showing after it, and then click **New folder**.
  3. Enter a name for the new folder. This notebook assumes the folder is named `input`.
  4. To upload files to this folder, do the following:

     a. Rest your mouse pointer on the `input` folder.<br/>
     b. Click the ellipsis (three dots) icon, and then click **Upload**.<br/>
     c. Browse to and select the files on your local machine that you want to upload to this `input` folder.<br/>

- A destination folder for Unstructured to send its processed results to. This notebook assumes that the destination folder is accessible from this notebook. The easiest and fastest way to create this folder is as follows:

  1. On this notebook's sidebar, click the folder (**Files**) icon.
  2. Right-click the folder with two dots showing after it, and then click **New folder**.
  3. Enter a name for the new destination folder. This notebook assumes the folder is named `output`.

---

⚠️ **Warning**: Any files that you upload to these `input` or `output` folders will be deleted whenever Google Colab disconnects or resets, for example due to inactivity, manual restart, or session timeout.

---


## Step 1: Install the Unstructured Python SDK.

Run the following cell to install the Unstructured Python SDK on a virtual machine (VM) in Google's cloud. This VM is associated with this notebook.

In [None]:
!pip install unstructured-client

## Step 2: Set your Unstructured API key

In the following cell, replace `<unstructured-api-key>` with the value of your API key, and then run the cell.

As a security best practice, you would typically set this key elsewhere (for example, as an environment variable or stored in a secure key vault) and then access it programmatically here. But to keep things simple here for demonstration purposes, just specify your API key in plaintext in the following cell.

In [2]:
UNSTRUCTURED_API_KEY = "<unstructured-api-key>"

## Step 3: Call the Unstructured Partition Endpoint to process the files

Run the following cell. If successful, new files are added to the `output` folder. It could take a few seconds to a minute or more for these new files to appear, depending on the number, size. and complexity of the files that you specified. These new files will have the same names as the filenames in the `input` folder. However, these new files' extension will be `.json`.

In [3]:
import asyncio
import os
import json
import unstructured_client
from unstructured_client.models import shared, errors

client = unstructured_client.UnstructuredClient(
    api_key_auth=UNSTRUCTURED_API_KEY
)

async def partition_file_via_api(filename):
    req = {
        "partition_parameters": {
            "files": {
                "content": open(filename, "rb"),
                "file_name": os.path.basename(filename),
            },
            "strategy": shared.Strategy.AUTO,
            "vlm_model": "gpt-4o",
            "vlm_model_provider": "openai",
            "languages": ['eng'],
            "split_pdf_page": True,
            "split_pdf_allow_failed": True,
            "split_pdf_concurrency_level": 15
        }
    }

    try:
        res = await client.general.partition_async(request=req)
        return res.elements
    except errors.UnstructuredClientError as e:
        print(f"Error partitioning {filename}: {e.message}")
        return []

async def process_file_and_save_result(input_filename, output_dir):
    elements = await partition_file_via_api(input_filename)

    if elements:
        results_name = f"{os.path.basename(input_filename)}.json"
        output_filename = os.path.join(output_dir, results_name)

        with open(output_filename, "w") as f:
            json.dump(elements, f, indent=4)

def load_filenames_in_directory(input_dir):
    filenames = []
    for root, _, files in os.walk(input_dir):
        for file in files:
            if not file.endswith('.json'):
                filenames.append(os.path.join(root, file))

    return filenames

async def process_files():
    input_dir = "./input/"
    output_dir = "./output/"

    filenames = load_filenames_in_directory(input_dir)

    os.makedirs(output_dir, exist_ok=True)

    tasks = []

    for filename in filenames:
        tasks.append(
            process_file_and_save_result(filename, output_dir)
        )

    await asyncio.gather(*tasks)

await process_files()

## Step 4: View the results

In the **Files** pane on the left, double-click any of the new files with the extension `.json` that are within the `output` folder. A display pane appears on the right, showing the file's contents.

## Learn more

- For a version of this notebook's code that you can run on your own local development machine, see the [Unstructured API Quickstart](https://docs.unstructured.io/api-reference/partition/quickstart).
- [Unstructured Python SDK](https://docs.unstructured.io/api-reference/partition/sdk-python)
- [Unstructured Partition Endpoint](https://docs.unstructured.io/api-reference/partition/overview)
- [Unstructured documentation](https://docs.unstructured.io)