Multi-Label Classification on Consumer Hardware

This project demonstrates multi-label text classification using different model architectures (mDeBERTa-v3-base and GPT-OSS-20B) fine-tuned on consumer hardware. It includes synthetic data generation, model training, inference, and comprehensive evaluation on the EuroChef+ multilingual customer support dataset.

Overview

The project compares three approaches:

mDeBERTa-v3-base: Fine-tuned multilingual transformer (Microsoft)
GPT-OSS-20B (Base): Zero-shot inference using OpenAI's GPT-OSS-20B
GPT-OSS-20B (LoRA): LoRA fine-tuned adapter on GPT-OSS-20B

Key Results:

Best Accuracy: mDeBERTa-v3-base (F1: 0.8097)
Best Exact Match: GPT-OSS-20B + LoRA (0.4094)
Fastest Inference: mDeBERTa-v3-base (235 samples/s)

Project Structure

📁 Notebooks

mDeBERTa Experiments

1-train.ipynb: Complete training pipeline for mDeBERTa-v3-base including data preprocessing, model configuration, and training on multilingual customer support data
2-inference.ipynb: Run inference using the trained mDeBERTa model
3-evaluate.ipynb: Comprehensive evaluation with metrics (F1, precision, recall, exact match) and per-label analysis
evaluation_results.md: Detailed results showing F1 Micro: 0.8097, throughput: 235 samples/s

GPT-OSS-20B Experiments

1-test_oss.ipynb: Initial testing and exploration of GPT-OSS-20B model capabilities
2-evaluate_base.ipynb: Evaluation of base GPT-OSS-20B without fine-tuning (zero-shot)
3-finetune_lora.ipynb: LoRA fine-tuning pipeline for GPT-OSS-20B with optimized hyperparameters
4-evaluate_lora.ipynb: Evaluation of LoRA fine-tuned GPT-OSS-20B model
evaluation_results_base.md: Base model results (F1 Micro: 0.5751)
evaluation_results_lora.md: LoRA model results (F1 Micro: 0.8018, Exact Match: 0.4094)

Analysis

comparison.ipynb: Side-by-side comparison of all three models with performance analysis
comparison_results.md: Summary comparison table with metrics across all models

📁 Data Generation

synthetic_gen.py: Multilingual synthetic data generator supporting OpenAI and Gemini APIs

📁 Deployment

dockerfile: Docker configuration for deployment

Features

Synthetic Data Generation

Multi-Provider Support: Generate data using OpenAI or Google Gemini APIs
Structured Output: JSON schema-validated responses using Pydantic models
Context-Aware: Automatically includes existing messages to avoid duplicates and maintain variety
Multilingual: Supports English, French, Dutch, and German with culturally appropriate language patterns
Flexible CLI: Comprehensive command-line interface for customization
Batch Generation: Generate multiple batches with configurable parameters

Model Training & Evaluation

Multi-label Classification: 15 labels including sentiment, priority, user type, and issue categories
Consumer Hardware Optimized: All training done on consumer GPUs using efficient techniques
LoRA Fine-tuning: Memory-efficient adapter-based fine-tuning for large models
Comprehensive Metrics: F1 (micro/macro/weighted), precision, recall, exact match, Hamming loss
Per-label Analysis: Detailed performance breakdown for each classification label

Installation & Setup

Prerequisites

Python 3.8+
CUDA-capable GPU (recommended for training)
16GB+ RAM for LoRA fine-tuning

1. Clone the repository

git clone <repository-url>
cd local_oss

2. Install dependencies

For data generation:

pip install openai google-genai pydantic

For model training and evaluation:

pip install torch transformers datasets evaluate scikit-learn peft accelerate

3. Set up API keys (for synthetic data generation)

export OPENAI_API_KEY="your-openai-api-key"
export GEMINI_API_KEY="your-gemini-api-key"

Quick Start

Generate Synthetic Data

Basic usage with OpenAI:

cd synthetic_data
python synthetic_gen.py

Using Gemini:

python synthetic_gen.py --provider gemini

Train Models

mDeBERTa:

Open mdeberta/1-train.ipynb
Run all cells to train the model
Model will be saved and optionally pushed to Hugging Face Hub

GPT-OSS-20B with LoRA:

Open oss20b/3-finetune_lora.ipynb
Configure LoRA parameters in the notebook
Run training cells
Adapter will be saved locally and optionally pushed to Hub

Run Evaluation

Each model has a dedicated evaluation notebook:

mDeBERTa: mdeberta/3-evaluate.ipynb
GPT-OSS-20B Base: oss20b/2-evaluate_base.ipynb
GPT-OSS-20B LoRA: oss20b/4-evaluate_lora.ipynb

Compare all models: analysis/comparison.ipynb

Synthetic Data Generation

Command-Line Options

Option	Short	Default	Description
`--provider`	`-p`	`openai`	API provider (`openai` or `gemini`)
`--model`	`-m`	Auto	Model to use (provider-specific)
`--num-messages`	`-n`	`40`	Messages per batch
`--batches`	`-b`	`1`	Number of batches to generate
`--french`		`12`	French messages per batch
`--dutch`		`12`	Dutch messages per batch
`--english`		`12`	English messages per batch
`--german`		`4`	German messages per batch
`--temperature`	`-t`	`0.8`/`0.6`	Generation temperature
`--output`	`-o`	`customer_support_messages.jsonl`	Output file path
`--no-existing`		`False`	Skip existing messages in prompt

Usage Examples

Generate multiple batches:

python synthetic_gen.py --provider openai --batches 5

Customize language distribution:

python synthetic_gen.py --french 20 --dutch 10 --english 8 --german 2

Use a specific model:

python synthetic_gen.py --provider openai --model gpt-4o-mini
python synthetic_gen.py --provider gemini --model gemini-2.0-flash-exp

Adjust temperature for more/less creative outputs:

python synthetic_gen.py --temperature 0.9

Custom output file:

python synthetic_gen.py --output my_custom_dataset.jsonl

Dataset

EuroChef+ Customer Support Dataset

Source: BenTouss/eurochef-cs
Languages: English, French, Dutch, German
Labels (15): technical_issue, feature_request, content_request, content_quality, account_management, refund_request, normal, frustrated, positive, low_priority, premium_user, enterprise, trial_user, churn_risk, payment_issue
Test Set: 127 samples

Model Performance

Model	F1 Micro	Exact Match	Latency (ms)	Size
mDeBERTa-v3-base	0.8097	0.3543	4.26	278M params
GPT-OSS-20B (Base)	0.5751	0.0079	8199.33	20B params
GPT-OSS-20B (LoRA)	0.8018	0.4094	740.41	20B + adapters

Key Takeaways:

mDeBERTa offers the best balance of accuracy and speed for production deployment
LoRA fine-tuning dramatically improves GPT-OSS-20B performance (39% F1 increase)
LoRA achieves highest exact match rate, crucial for automation confidence
Consumer hardware is viable for training competitive models

Technical Details

LoRA Configuration

Rank: 32
Alpha: 64
Target Modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
Dropout: 0.05
Trainable Parameters: ~0.2% of base model

Trained Models

All models are available on Hugging Face:

mDeBERTa: BenTouss/mdeberta-eurochef
GPT-OSS-20B LoRA: BenTouss/oss20b-eurochef-lora

Output Format

Synthetic Data Output

Messages are saved in JSONL format with the following structure:

{
  "message": "Bonjour, j'ai un problème avec...",
  "language": "French",
  "tags": ["technical_issue", "urgent", "premium_user", "frustrated"]
}

Prediction Output Format

Model predictions are evaluated using multi-label metrics with the following structure:

{
  "predictions": ["technical_issue", "premium_user", "frustrated"],
  "ground_truth": ["technical_issue", "premium_user", "normal"],
  "f1_score": 0.67,
  "exact_match": False
}

Available Classification Labels

Problem Categories:

technical_issue - App/streaming problems, bugs, crashes
billing - Payment issues, subscription questions
account_management - Login, profile, settings
content_request - Requests for specific recipes/content
feature_request - Suggestions for new features
content_quality - Feedback on recipe quality
refund_request - Request for money back
payment_issue - Billing/payment problems

Sentiment:

frustrated - Negative emotional tone
positive - Positive feedback
normal - Neutral tone

Priority:

low_priority - Can wait for resolution
(Normal priority is default, not labeled)

User Type:

premium_user - Paid subscriber
enterprise - Business account
trial_user - Free trial period
churn_risk - Likely to cancel subscription

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude/skills		.claude/skills
.devcontainer		.devcontainer
.github		.github
analysis		analysis
mdeberta		mdeberta
oss20b		oss20b
personal_cloud		personal_cloud
synthetic_data		synthetic_data
.gitignore		.gitignore
README.md		README.md

Bennoo/classification_experience

Folders and files

Latest commit

History

Repository files navigation

Multi-Label Classification on Consumer Hardware

Overview

Project Structure

📁 Notebooks

mDeBERTa Experiments

GPT-OSS-20B Experiments

Analysis

📁 Data Generation

📁 Deployment

Features

Synthetic Data Generation

Model Training & Evaluation

Installation & Setup

Prerequisites

1. Clone the repository

2. Install dependencies

3. Set up API keys (for synthetic data generation)

Quick Start

Generate Synthetic Data

Train Models

Run Evaluation

Synthetic Data Generation

Command-Line Options

Usage Examples

Dataset

Model Performance

Technical Details

LoRA Configuration

Trained Models

Output Format

Synthetic Data Output

Prediction Output Format

Available Classification Labels

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages