This project demonstrates multi-label text classification using different model architectures (mDeBERTa-v3-base and GPT-OSS-20B) fine-tuned on consumer hardware. It includes synthetic data generation, model training, inference, and comprehensive evaluation on the EuroChef+ multilingual customer support dataset.
The project compares three approaches:
- mDeBERTa-v3-base: Fine-tuned multilingual transformer (Microsoft)
- GPT-OSS-20B (Base): Zero-shot inference using OpenAI's GPT-OSS-20B
- GPT-OSS-20B (LoRA): LoRA fine-tuned adapter on GPT-OSS-20B
Key Results:
- Best Accuracy: mDeBERTa-v3-base (F1: 0.8097)
- Best Exact Match: GPT-OSS-20B + LoRA (0.4094)
- Fastest Inference: mDeBERTa-v3-base (235 samples/s)
- 1-train.ipynb: Complete training pipeline for mDeBERTa-v3-base including data preprocessing, model configuration, and training on multilingual customer support data
- 2-inference.ipynb: Run inference using the trained mDeBERTa model
- 3-evaluate.ipynb: Comprehensive evaluation with metrics (F1, precision, recall, exact match) and per-label analysis
- evaluation_results.md: Detailed results showing F1 Micro: 0.8097, throughput: 235 samples/s
- 1-test_oss.ipynb: Initial testing and exploration of GPT-OSS-20B model capabilities
- 2-evaluate_base.ipynb: Evaluation of base GPT-OSS-20B without fine-tuning (zero-shot)
- 3-finetune_lora.ipynb: LoRA fine-tuning pipeline for GPT-OSS-20B with optimized hyperparameters
- 4-evaluate_lora.ipynb: Evaluation of LoRA fine-tuned GPT-OSS-20B model
- evaluation_results_base.md: Base model results (F1 Micro: 0.5751)
- evaluation_results_lora.md: LoRA model results (F1 Micro: 0.8018, Exact Match: 0.4094)
- comparison.ipynb: Side-by-side comparison of all three models with performance analysis
- comparison_results.md: Summary comparison table with metrics across all models
- synthetic_gen.py: Multilingual synthetic data generator supporting OpenAI and Gemini APIs
- dockerfile: Docker configuration for deployment
- Multi-Provider Support: Generate data using OpenAI or Google Gemini APIs
- Structured Output: JSON schema-validated responses using Pydantic models
- Context-Aware: Automatically includes existing messages to avoid duplicates and maintain variety
- Multilingual: Supports English, French, Dutch, and German with culturally appropriate language patterns
- Flexible CLI: Comprehensive command-line interface for customization
- Batch Generation: Generate multiple batches with configurable parameters
- Multi-label Classification: 15 labels including sentiment, priority, user type, and issue categories
- Consumer Hardware Optimized: All training done on consumer GPUs using efficient techniques
- LoRA Fine-tuning: Memory-efficient adapter-based fine-tuning for large models
- Comprehensive Metrics: F1 (micro/macro/weighted), precision, recall, exact match, Hamming loss
- Per-label Analysis: Detailed performance breakdown for each classification label
- Python 3.8+
- CUDA-capable GPU (recommended for training)
- 16GB+ RAM for LoRA fine-tuning
git clone <repository-url>
cd local_ossFor data generation:
pip install openai google-genai pydanticFor model training and evaluation:
pip install torch transformers datasets evaluate scikit-learn peft accelerateexport OPENAI_API_KEY="your-openai-api-key"
export GEMINI_API_KEY="your-gemini-api-key"Basic usage with OpenAI:
cd synthetic_data
python synthetic_gen.pyUsing Gemini:
python synthetic_gen.py --provider geminimDeBERTa:
- Open mdeberta/1-train.ipynb
- Run all cells to train the model
- Model will be saved and optionally pushed to Hugging Face Hub
GPT-OSS-20B with LoRA:
- Open oss20b/3-finetune_lora.ipynb
- Configure LoRA parameters in the notebook
- Run training cells
- Adapter will be saved locally and optionally pushed to Hub
Each model has a dedicated evaluation notebook:
- mDeBERTa: mdeberta/3-evaluate.ipynb
- GPT-OSS-20B Base: oss20b/2-evaluate_base.ipynb
- GPT-OSS-20B LoRA: oss20b/4-evaluate_lora.ipynb
Compare all models: analysis/comparison.ipynb
| Option | Short | Default | Description |
|---|---|---|---|
--provider |
-p |
openai |
API provider (openai or gemini) |
--model |
-m |
Auto | Model to use (provider-specific) |
--num-messages |
-n |
40 |
Messages per batch |
--batches |
-b |
1 |
Number of batches to generate |
--french |
12 |
French messages per batch | |
--dutch |
12 |
Dutch messages per batch | |
--english |
12 |
English messages per batch | |
--german |
4 |
German messages per batch | |
--temperature |
-t |
0.8/0.6 |
Generation temperature |
--output |
-o |
customer_support_messages.jsonl |
Output file path |
--no-existing |
False |
Skip existing messages in prompt |
Generate multiple batches:
python synthetic_gen.py --provider openai --batches 5Customize language distribution:
python synthetic_gen.py --french 20 --dutch 10 --english 8 --german 2Use a specific model:
python synthetic_gen.py --provider openai --model gpt-4o-mini
python synthetic_gen.py --provider gemini --model gemini-2.0-flash-expAdjust temperature for more/less creative outputs:
python synthetic_gen.py --temperature 0.9Custom output file:
python synthetic_gen.py --output my_custom_dataset.jsonlEuroChef+ Customer Support Dataset
- Source: BenTouss/eurochef-cs
- Languages: English, French, Dutch, German
- Labels (15): technical_issue, feature_request, content_request, content_quality, account_management, refund_request, normal, frustrated, positive, low_priority, premium_user, enterprise, trial_user, churn_risk, payment_issue
- Test Set: 127 samples
| Model | F1 Micro | Exact Match | Latency (ms) | Size |
|---|---|---|---|---|
| mDeBERTa-v3-base | 0.8097 | 0.3543 | 4.26 | 278M params |
| GPT-OSS-20B (Base) | 0.5751 | 0.0079 | 8199.33 | 20B params |
| GPT-OSS-20B (LoRA) | 0.8018 | 0.4094 | 740.41 | 20B + adapters |
Key Takeaways:
- mDeBERTa offers the best balance of accuracy and speed for production deployment
- LoRA fine-tuning dramatically improves GPT-OSS-20B performance (39% F1 increase)
- LoRA achieves highest exact match rate, crucial for automation confidence
- Consumer hardware is viable for training competitive models
- Rank: 32
- Alpha: 64
- Target Modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0.05
- Trainable Parameters: ~0.2% of base model
All models are available on Hugging Face:
- mDeBERTa: BenTouss/mdeberta-eurochef
- GPT-OSS-20B LoRA: BenTouss/oss20b-eurochef-lora
Messages are saved in JSONL format with the following structure:
{
"message": "Bonjour, j'ai un problème avec...",
"language": "French",
"tags": ["technical_issue", "urgent", "premium_user", "frustrated"]
}Model predictions are evaluated using multi-label metrics with the following structure:
{
"predictions": ["technical_issue", "premium_user", "frustrated"],
"ground_truth": ["technical_issue", "premium_user", "normal"],
"f1_score": 0.67,
"exact_match": False
}Problem Categories:
technical_issue- App/streaming problems, bugs, crashesbilling- Payment issues, subscription questionsaccount_management- Login, profile, settingscontent_request- Requests for specific recipes/contentfeature_request- Suggestions for new featurescontent_quality- Feedback on recipe qualityrefund_request- Request for money backpayment_issue- Billing/payment problems
Sentiment:
frustrated- Negative emotional tonepositive- Positive feedbacknormal- Neutral tone
Priority:
low_priority- Can wait for resolution- (Normal priority is default, not labeled)
User Type:
premium_user- Paid subscriberenterprise- Business accounttrial_user- Free trial periodchurn_risk- Likely to cancel subscription
Contributions are welcome! Please feel free to submit a Pull Request.
MIT