A professional demonstration of Large Language Model (LLM) fine-tuning using Hugging Face Transformers, PEFT/LoRA, and MLflow for experiment tracking. This project showcases modern MLOps practices.
- Parameter-Efficient Fine-Tuning: Uses LoRA (Low-Rank Adaptation) for efficient training
- Modern MLOps: MLflow integration for experiment tracking and model versioning
- Reproducible: Docker support and comprehensive configuration management
- Professional Structure: Clean code organization with proper testing
- Comprehensive Evaluation: Detailed metrics, visualizations, and confusion matrices
- CLI Interface: Easy-to-use command-line tools for training and evaluation
- Installation
- Quick Start
- Project Structure
- Configuration
- Training
- Evaluation
- MLflow Integration
- Docker Support
- Testing
- API Reference
- Contributing
- License
- Python 3.8+
- CUDA-compatible GPU (recommended)
- Docker (optional)
-
Clone the repository:
git clone <your-repo-url> cd llm_finetuning
-
Install dependencies:
pip install -e .
-
Install development dependencies (optional):
pip install -e ".[dev]"
-
Build the Docker image:
docker build -t llm-finetuning-demo .
-
Run with Docker Compose:
docker-compose up mlflow
# Local
mlflow server --backend-store-uri sqlite:///mlruns/mlflow.db --default-artifact-root ./mlruns --host 0.0.0.0 --port 5000
# Or with Docker
docker-compose up mlflow
python llm_finetuning/train.py --config llm_finetuning/configs/train.yaml
python llm_finetuning/evaluate.py --model_path checkpoints/best_model
Open your browser and navigate to http://localhost:5000
to view the MLflow UI.
llm_finetuning/
βββ llm_finetuning/ # Main package
β βββ __init__.py
β βββ train.py # Training script
β βββ evaluate.py # Evaluation script
β βββ configs/ # Configuration files
β β βββ train.yaml # Training configuration
β βββ utils/ # Utility modules
β βββ __init__.py
β βββ data_utils.py # Data processing utilities
β βββ model_utils.py # Model utilities
β βββ training_utils.py # Training utilities
β βββ mlflow_utils.py # MLflow integration
β βββ reproducibility.py # Reproducibility utilities
βββ tests/ # Test suite
β βββ __init__.py
β βββ test_data_utils.py
β βββ test_model_utils.py
β βββ test_reproducibility.py
βββ checkpoints/ # Model checkpoints
βββ logs/ # Training logs
βββ evaluation_results/ # Evaluation results
βββ mlruns/ # MLflow runs
βββ pyproject.toml # Project configuration
βββ Dockerfile # Docker configuration
βββ docker-compose.yml # Docker Compose configuration
βββ pytest.ini # Test configuration
βββ README.md # This file
The project uses YAML configuration files for easy parameter management. The main configuration file is llm_finetuning/configs/train.yaml
.
model:
name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
max_length: 512
use_cache: false
lora:
r: 16
lora_alpha: 32
lora_dropout: 0.1
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj"]
bias: "none"
task_type: "CAUSAL_LM"
training:
num_train_epochs: 3
per_device_train_batch_size: 4
learning_rate: 2e-4
weight_decay: 0.01
warmup_ratio: 0.1
early_stopping_patience: 3
mlflow:
experiment_name: "llm-finetuning-demo"
tracking_uri: "http://localhost:5000"
log_model: true
log_artifacts: true
python llm_finetuning/train.py --config llm_finetuning/configs/train.yaml
python llm_finetuning/train.py \
--config llm_finetuning/configs/train.yaml \
--learning_rate 1e-4 \
--batch_size 8 \
--epochs 5 \
--max_samples 5000
python llm_finetuning/train.py \
--config llm_finetuning/configs/train.yaml \
--resume_from_checkpoint checkpoints/checkpoint-1000
Parameter | Description | Default |
---|---|---|
--config |
Path to configuration file | Required |
--output_dir |
Output directory for checkpoints | From config |
--model_name |
Model name to use | From config |
--max_samples |
Maximum training samples | From config |
--learning_rate |
Learning rate | From config |
--batch_size |
Batch size | From config |
--epochs |
Number of epochs | From config |
--seed |
Random seed | From config |
--resume_from_checkpoint |
Resume from checkpoint | None |
python llm_finetuning/evaluate.py --model_path checkpoints/best_model
python llm_finetuning/evaluate.py \
--model_path checkpoints/best_model \
--test_dataset ag_news \
--max_samples 1000 \
--batch_size 16 \
--save_predictions
Parameter | Description | Default |
---|---|---|
--model_path |
Path to trained model | Required |
--test_dataset |
Test dataset name | ag_news |
--max_samples |
Maximum test samples | None (all) |
--batch_size |
Evaluation batch size | 8 |
--output_dir |
Output directory | ./evaluation_results |
--mlflow_experiment |
MLflow experiment name | llm-finetuning-evaluation |
--save_predictions |
Save predictions to file | False |
The evaluation script computes and logs the following metrics:
- Accuracy: Overall classification accuracy
- Precision: Weighted average precision
- Recall: Weighted average recall
- F1-Score: Weighted average F1-score
- Per-class Metrics: Precision, recall, F1, and support for each class
- Confusion Matrix: Visual representation of classification results
# Local
mlflow server --backend-store-uri sqlite:///mlruns/mlflow.db --default-artifact-root ./mlruns --host 0.0.0.0 --port 5000
# Docker
docker-compose up mlflow
Access the MLflow UI at http://localhost:5000
to:
- View experiment runs and metrics
- Compare different model configurations
- Download model artifacts
- View training curves and visualizations
- Track hyperparameters and system information
MLflow automatically logs:
- Hyperparameters: All configuration parameters
- Metrics: Training and validation losses, accuracy, F1-score
- Artifacts: Model checkpoints, evaluation plots, predictions
- System Info: Hardware specifications, software versions
- Code Version: Git commit hash (if available)
# Build image
docker build -t llm-finetuning-demo .
# Run MLflow server
docker run -p 5000:5000 -v $(pwd)/mlruns:/app/mlruns llm-finetuning-demo
# Run training
docker run -v $(pwd)/mlruns:/app/mlruns -v $(pwd)/checkpoints:/app/checkpoints llm-finetuning-demo python llm_finetuning/train.py --config llm_finetuning/configs/train.yaml
# Start MLflow server
docker-compose up mlflow
# Run training (in another terminal)
docker-compose run --rm training python llm_finetuning/train.py --config llm_finetuning/configs/train.yaml
# Run evaluation (in another terminal)
docker-compose run --rm evaluation python llm_finetuning/evaluate.py --model_path checkpoints/best_model
pytest
pytest --cov=llm_finetuning --cov-report=html
# Unit tests only
pytest -m unit
# Integration tests only
pytest -m integration
# Skip slow tests
pytest -m "not slow"
- Unit Tests: Test individual functions and classes
- Integration Tests: Test component interactions
- Mocking: Uses
unittest.mock
for external dependencies
Handles dataset loading and preprocessing for text classification.
from llm_finetuning.utils import DataProcessor
processor = DataProcessor(tokenizer, max_length=512)
datasets = processor.load_ag_news_dataset(max_samples=1000)
tokenized_datasets = processor.prepare_datasets(datasets)
Sets up model and tokenizer for fine-tuning.
from llm_finetuning.utils import setup_model_and_tokenizer
model, tokenizer = setup_model_and_tokenizer(
model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
max_length=512
)
Applies LoRA configuration to a model.
from llm_finetuning.utils import setup_lora_model
lora_config = {
"r": 16,
"lora_alpha": 32,
"lora_dropout": 0.1,
"target_modules": ["q_proj", "v_proj"]
}
model = setup_lora_model(model, lora_config)
from llm_finetuning.utils import set_seed, get_device_info
set_seed(42) # Set random seed
device_info = get_device_info() # Get hardware information
from llm_finetuning.utils import setup_mlflow, log_training_metrics
setup_mlflow(config) # Setup MLflow tracking
log_training_metrics({"loss": 0.5, "accuracy": 0.8}) # Log metrics
This project is perfect for:
- Portfolio Demonstration: Showcase ML engineering skills
- Learning: Understand modern LLM fine-tuning practices
- Research: Experiment with different configurations
- Production: Use as a template for real-world projects
- Extend the
DataProcessor
class - Add dataset loading logic
- Update configuration files
- Add tests
- Update the model configuration
- Modify
setup_model_and_tokenizer
if needed - Test with your model
- Extend the
compute_metrics
function - Update evaluation scripts
- Add visualization code
-
CUDA Out of Memory
- Reduce batch size
- Use gradient accumulation
- Enable mixed precision training
-
MLflow Connection Issues
- Check if MLflow server is running
- Verify tracking URI in configuration
-
Model Loading Issues
- Check model name and availability
- Verify Hugging Face authentication
- Check the logs in the
logs/
directory - Review MLflow UI for detailed metrics
- Run tests to verify installation
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
# Install development dependencies
pip install -e ".[dev]"
# Run pre-commit hooks
pre-commit install
# Run tests
pytest
# Format code
black llm_finetuning/
isort llm_finetuning/
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for the Transformers library
- MLflow for experiment tracking
- PEFT for parameter-efficient fine-tuning
- TinyLlama for the base model
For questions or suggestions, please open an issue or contact [your-email@example.com].
Happy Fine-Tuning! π