LLM Classifier

This project classifies text data from a CSV file using the OpenAI batch API. The batch API allows large workloads to be run at off-peak times at a fraction of the cost of running the same workload on the regular OpenAI API.

Prerequisites

Before you begin, make sure you have the following installed:

uv: This project uses uv for package management. Install it using the following command:
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```
See the uv documentation for more details.
Python: Install Python using uv:
```
uv python install
```

Installation

Clone the repository:

git clone https://github.com/chriscarrollsmith/batch-classifier.git
cd batch-classifier

Install the project dependencies using uv:
```
uv sync
```
Create a .env file in the root directory and add your OpenAI API key:
```
OPENAI_API_KEY=your_openai_api_key
```

Usage

Prepare your input CSV file named input.csv and define your prompt_template and ClassificationResponse model in prompt.py. Ensure that the column names in your CSV match the placeholders in prompt_template. For example, if your prompt uses {item}, your CSV should have a column named item.

Note: The classifier supports Pydantic models with nested structures (which will be flattened in the output CSV) and Enum fields.
Run the submit_batch.py script:
```
uv run python submit_batch.py
```
This script will read input.csv and construct a batch request to classify each item using the LLM, and output helper files called batch_info.json and batch_results.jsonl to track the batch request and process the results.
Run the process_batch.py script:
```
uv run python process_batch.py
```
This script will read batch_info.json and batch_results.jsonl, make a request to retrieve the batch results, and process the returned results, outputting the results to output.csv. If the batch request is still in progress, the script will wait for the batch request to complete (polling every 30 seconds) and then process the results.

If your output CSV contains rows that already have complete observations, these rows will be preserved unchanged when using the output as input for another run. This allows you to process missed items by feeding the output CSV back in as input.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
__init__.py		__init__.py
classifier.py		classifier.py
input.csv		input.csv
output.csv		output.csv
process_batch.py		process_batch.py
prompt.py		prompt.py
pyproject.toml		pyproject.toml
submit_batch.py		submit_batch.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Classifier

Prerequisites

Installation

Usage

About

Uh oh!

Releases

Packages

Languages

chriscarrollsmith/batch_classifier

Folders and files

Latest commit

History

Repository files navigation

LLM Classifier

Prerequisites

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages