Skip to content

ITheEqualizer/Pytorch

Repository files navigation

PyTorch Professional Project Template

A production-ready, professionally structured PyTorch project template with comprehensive utilities, logging, checkpointing, and best practices baked in.

✨ Features

  • πŸ—οΈ Professional Architecture - Modular design with clear separation of concerns
  • βš™οΈ Configuration Management - Centralized dataclass-based configuration
  • πŸ“Š Advanced Logging - Structured logging with TensorBoard integration
  • πŸ’Ύ Checkpoint Management - Automated model versioning and best model tracking
  • πŸ“ˆ Metrics Tracking - Built-in accuracy, loss, and custom metrics
  • 🎯 Type Hints - Fully type-hinted codebase for better IDE support
  • πŸ“š Comprehensive Documentation - Docstrings throughout, zero code comments
  • πŸ§ͺ Unit Tests - Test suite for all major components
  • 🐳 Docker Support - Both GPU (CUDA) and CPU containers
  • πŸŽ“ Examples Included - Custom datasets, transfer learning, and more

πŸ†• What's New

This project has been significantly enhanced with professional features:

  • Configuration System: Centralized dataclass-based config management
  • Advanced Utilities: Checkpoint management, metrics tracking, visualization
  • Enhanced Training: Early stopping, LR scheduling, comprehensive logging
  • Code Quality: Zero comments, full docstrings, complete type hints
  • Testing: Unit tests for all major components
  • Documentation: Architecture guide, usage examples, API docs
  • Examples: Custom datasets, transfer learning demonstrations

πŸ“‹ Prerequisites

πŸš€ Quick Start

Using PowerShell (Windows)

.\docker-commands.ps1 build
.\docker-commands.ps1 run
.\docker-commands.ps1 shell

Once in the container:

cd /workspace/src
python train.py

Using Docker Compose

docker-compose up -d pytorch-gpu
docker exec -it pytorch-gpu bash

Local Installation (Without Docker)

pip install -e .
cd src
python train.py

πŸ“ Project Structure

Pytorch/
β”œβ”€β”€ src/                         # Main source code
β”‚   β”œβ”€β”€ config.py               # Configuration management
β”‚   β”œβ”€β”€ logger.py               # Logging utilities
β”‚   β”œβ”€β”€ train.py                # Training script with early stopping
β”‚   β”œβ”€β”€ inference.py            # Inference script
β”‚   β”œβ”€β”€ models/                 # Model definitions
β”‚   β”‚   β”œβ”€β”€ simple_nn.py       # Simple neural network
β”‚   β”‚   └── __init__.py
β”‚   └── utils/                  # Utility modules
β”‚       β”œβ”€β”€ checkpoint.py      # Checkpoint management
β”‚       β”œβ”€β”€ metrics.py         # Metrics calculation
β”‚       β”œβ”€β”€ model.py           # Model utilities
β”‚       β”œβ”€β”€ data.py            # Data utilities
β”‚       β”œβ”€β”€ visualization.py   # Visualization tools
β”‚       └── __init__.py
β”œβ”€β”€ examples/                    # Usage examples
β”‚   β”œβ”€β”€ custom_dataset.py      # Custom dataset integration
β”‚   └── transfer_learning.py   # Transfer learning example
β”œβ”€β”€ tests/                       # Unit tests
β”‚   β”œβ”€β”€ test_config.py
β”‚   β”œβ”€β”€ test_utils.py
β”‚   └── test_models.py
β”œβ”€β”€ docs/                        # Documentation
β”‚   β”œβ”€β”€ ARCHITECTURE.md        # System architecture
β”‚   └── USAGE.md               # Detailed usage guide
β”œβ”€β”€ data/                        # Dataset directory
β”œβ”€β”€ models/                      # Saved models
β”œβ”€β”€ outputs/                     # Training outputs
β”‚   β”œβ”€β”€ logs/                  # TensorBoard logs
β”‚   └── checkpoints/           # Model checkpoints
β”œβ”€β”€ notebooks/                   # Jupyter notebooks
β”œβ”€β”€ pyproject.toml              # Package configuration, deps, tool config
β”œβ”€β”€ .flake8                     # flake8 config (not read from pyproject)
β”œβ”€β”€ requirements.txt            # Docker dependencies (no torch)
β”œβ”€β”€ docker-compose.yml         # Docker Compose config
β”œβ”€β”€ .github/workflows/ci.yml   # Lint, type-check, and test on CI
└── README.md                  # This file

πŸ’» Usage

Training

from config import get_config
from models import SimpleModel
from logger import setup_logger
import torch.nn as nn

config = get_config()
logger = setup_logger(log_dir=config.paths.logs_dir)

model = SimpleModel(
    input_size=config.model.input_size,
    hidden_size=config.model.hidden_size,
    num_classes=config.model.num_classes
).to(config.device.device)

Run training:

cd src
python train.py

Inference

cd src
python inference.py

Custom Configuration

from config import Config, ModelConfig, TrainingConfig

config = Config(
    model=ModelConfig(hidden_size=256, num_classes=20),
    training=TrainingConfig(batch_size=128, num_epochs=50)
)

Performance & Reproducibility Flags

Efficiency features are CPU-safe and opt-in (TF32 + cudnn.benchmark turn on automatically on CUDA):

from config import Config, TrainingConfig

config = Config(
    training=TrainingConfig(
        use_amp=True,        # mixed precision (CUDA only)
        compile_model=True,  # torch.compile, falls back to eager on failure
        gradient_clip=1.0,   # max gradient norm (applied during training)
        drop_last=True,      # drop the last partial training batch
    ),
    deterministic=True,      # reproducible runs (disables TF32/benchmark)
)

🎯 Key Features

Configuration Management

Centralized configuration using dataclasses:

from config import get_config

config = get_config()
config.training.batch_size = 128
config.model.hidden_size = 256

Checkpoint Management

Automatic model versioning:

from utils import CheckpointManager

checkpoint_manager = CheckpointManager(config.paths.checkpoints_dir)
checkpoint_manager.save(model, optimizer, epoch, metrics, is_best=True)

Metrics Tracking

Built-in metrics calculation:

from utils import AverageMeter, calculate_accuracy

loss_meter = AverageMeter('Loss')
acc = calculate_accuracy(outputs, targets)

Logging

Structured logging with TensorBoard:

from logger import setup_logger, MetricsLogger

logger = setup_logger(log_dir=config.paths.logs_dir)
metrics_logger = MetricsLogger(logger)
metrics_logger.log_epoch(epoch, metrics)

πŸ“š Documentation

πŸ§ͺ Running Tests

Tests insert ../src onto sys.path, so they run from any directory:

pytest -v                          # run the suite
pytest --cov=src --cov-report=term-missing   # with coverage

Lint, format, and type-check (matches CI):

black src tests
isort src tests
flake8 src tests
mypy src

πŸŽ“ Examples

Custom Dataset

python examples/custom_dataset.py

Transfer Learning

python examples/transfer_learning.py

🐳 Docker Commands

PowerShell (Windows)

.\docker-commands.ps1 help         # Show all commands
.\docker-commands.ps1 build        # Build GPU image
.\docker-commands.ps1 build-cpu    # Build CPU image
.\docker-commands.ps1 run          # Run GPU container
.\docker-commands.ps1 run-cpu      # Run CPU container
.\docker-commands.ps1 shell        # Open bash shell
.\docker-commands.ps1 jupyter      # Start Jupyter notebook
.\docker-commands.ps1 tensorboard  # Start TensorBoard
.\docker-commands.ps1 stop         # Stop containers
.\docker-commands.ps1 clean        # Remove containers/images

Makefile (Linux/Mac)

make help                          # Show all commands
make build                         # Build GPU image
make run                           # Run GPU container
make shell                         # Open bash shell
make jupyter                       # Start Jupyter
make tensorboard                   # Start TensorBoard
make stop                          # Stop containers
make clean                         # Cleanup

πŸ“Š TensorBoard

Start TensorBoard to visualize training:

.\docker-commands.ps1 tensorboard

Then open: http://localhost:6006

πŸ”§ Advanced Features

Early Stopping

Automatic training termination when validation performance plateaus.

Learning Rate Scheduling

Dynamic learning rate adjustment based on validation metrics.

Model Utilities

  • Parameter counting
  • Weight initialization strategies
  • Model freezing/unfreezing for transfer learning
  • Layer-wise learning rate decay

Data Utilities

  • Custom dataset classes
  • Train/val split utilities
  • Data normalization helpers

Visualization

  • Training curve plotting
  • Confusion matrix visualization
  • Learning rate schedule plots

🎨 Code Quality

  • Zero Comments: Self-documenting code with clear naming
  • Type Hints: Full type annotations for IDE support
  • Docstrings: Google-style docstrings for all functions
  • PEP 8: Follows Python style guidelines
  • Modular: Clear separation of concerns
  • Tested: Unit tests for critical components

πŸ” Customization

Adding Python Packages

Edit requirements.txt and rebuild:

.\docker-commands.ps1 build

Environment Variables

Copy and edit .env:

Copy-Item .env.example .env

GPU Configuration

Modify docker-compose.yml:

environment:
  - CUDA_VISIBLE_DEVICES=0,1  # Use specific GPUs

πŸ› Troubleshooting

GPU Not Detected

  1. Ensure NVIDIA Docker runtime is installed
  2. Check: nvidia-smi
  3. Verify: docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Port Already in Use

Change port mapping in docker-compose.yml:

ports:
  - "8889:8888"  # Use different port

❓ FAQ

How do I add my own model?

Create a new file in src/models/ and add it to src/models/__init__.py. See simple_nn.py for reference.

How do I use my own dataset?

Check out examples/custom_dataset.py for a complete example of integrating custom datasets.

Where are my trained models saved?

Models are saved in:

  • outputs/checkpoints/ - All checkpoints
  • outputs/checkpoints/best_model.pth - Best performing model

How do I resume training?

Use the checkpoint utilities to load a previous checkpoint:

from utils.checkpoint import load_checkpoint
checkpoint = load_checkpoint('outputs/checkpoints/best_model.pth', model, optimizer)
start_epoch = checkpoint['epoch'] + 1

Can I run this without Docker?

Yes! Install with pip install -e . and run scripts directly.

How do I monitor training progress?

Use TensorBoard: .\docker-commands.ps1 tensorboard then open http://localhost:6006

πŸ“– Resources

πŸ“„ License

See LICENSE file for details.


Built with best practices for production ML projects

About

A base for Pytorch based projects.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors