This project classifies text data from a CSV file using a DeepSeek V3 in JSON mode (though LiteLLM makes it very easy to swap in a different LLM). DeepSeek V3 was chosen for its very low cost, very high throughput, and highly permissive rate limiting.
Before you begin, make sure you have the following installed:
-
uv: This project uses
uvfor package management. Install it using the following command:curl -LsSf https://astral.sh/uv/install.sh | shSee the uv documentation for more details.
-
Python: Install Python using
uv:uv python install
-
Clone the repository:
git clone https://github.com/chriscarrollsmith/csv-classifier.git cd csv-classifier -
Install the project dependencies using
uv:uv sync
-
Create a
.envfile in the root directory and add your DeepSeek API key:DEEPSEEK_API_KEY=your_deepseek_api_key
-
Prepare your input CSV file named
input.csvand define yourprompt_templateandClassificationResponsemodel inprompt.py. Ensure that the column names in your CSV match the placeholders inprompt_template. For example, if your prompt uses{item}, your CSV should have a column nameditem.Note: The classifier supports Pydantic models with nested structures (which will be flattened in the output CSV) and Enum fields. If your output CSV contains rows that already have complete observations, these rows will be preserved unchanged when using the output as input for another run. This allows you to process missed items by feeding the output CSV back in as input.
-
Run the
classifier.pyscript:uv run python classifier.py
This script will read
input.csv, classify each item using the LLM, and output the results tooutput.csv.