Skip to content

Modular PyTorch lab for running configurable deep-learning experiments, with ready-to-use data loaders, training pipelines, and metric tracking for both vision and NLP benchmarks.

Fatemerjn/Modular-Deep-Learning

Repository files navigation

Deep Learning Experiment Lab

A professional, config-driven workspace for vision and NLP experimentation. The project now ships with production-ready training utilities, curated notebooks, and a clean repository structure that separates reusable code from exploratory research.

Highlights

  • Single CLI for training. python -m src.cli --config <file> boots a full pipeline: dataset prep, model construction, optimisation, and metrics export.
  • Configurable data/model registry. Built-in support for CIFAR-10 classification and IMDB sentiment analysis with extensible loaders and architectures.
  • Notebook gallery without “HW” naming. Coursework notebooks have been polished, renamed, and grouped into topic-focused folders.
  • Professional hygiene. Metrics land in reports/, experiments in runs/, and the codebase follows a clean module layout.

Quick Start

  1. Create & activate an environment
    python -m venv .venv
    source .venv/bin/activate  # or `conda create -n dl-lab python=3.11 && conda activate dl-lab`
  2. Install dependencies
    pip install --upgrade pip
    pip install -r requirements.txt
  3. Run a smoke training job
    python -m src.cli --config configs/cifar10_baseline.yaml --epochs 1
  4. Explore notebooks
    jupyter lab notebooks/

Project Layout

.
├── configs/                  # YAML configs consumed by src.cli
├── data/                     # Datasets (auto-created, gitignored)
├── notebooks/
│   ├── advanced-vision/      # DINO, adversarial, diffusion studies
│   ├── foundations-neural-networks/
│   ├── generative-models/
│   ├── sequence-modeling/
│   └── vision-projects/      # CIFAR-10 classifier/colorizer, Cats-vs-Dogs CNN, Tiny YOLOv2
├── reports/
│   ├── figures/              # Curated plots
│   └── metrics/              # JSON summaries saved by the trainer
├── runs/                     # Checkpoints & logs (ignored)
├── src/                      # Reusable package (data, models, training, CLI)
├── requirements.txt
├── Makefile
└── .github/workflows/ci.yml

Training via CLI

  • Configs – Each YAML defines the dataset, optimisation hyper-parameters, and loader settings. See configs/cifar10_baseline.yaml and configs/imdb_lstm.yaml for reference.
  • Overrides – CLI arguments override config keys. Example:
    python -m src.cli --config configs/imdb_lstm.yaml --epochs 5 --batch-size 32 --device cpu
  • Outputs – At the end of training a <experiment>_metrics.json file is written to reports/metrics/ with train/val/test loss and accuracy curves.
  • Extensibility – Add a new dataset by implementing a factory in src/data.py and a corresponding model in src/models.py, then reference it in a config.

Notebook Portfolio

  • vision-projects/
    • cifar10_classification_colorization.ipynb – custom ResNet-18 classifier and U-Net colouriser pipelines.
    • cats_vs_dogs_cnn.ipynb – strong regularised CNN baseline for the filtered Cats vs Dogs dataset.
    • tiny_yolo_voc.ipynb – Tiny YOLOv2 forward pass with Darknet weight loading, decoding, and NMS visualisations.
  • foundations-neural-networks/ – PyTorch primers and numpy-from-scratch exercises.
  • sequence-modeling/ – transformer/GPT practice notebooks plus helper scripts.
  • generative-models/ & advanced-vision/ – GANs, VAEs, diffusion, and modern vision research studies.

Notebook Etiquette

  • Clear outputs before committing long runs.
  • Record compute notes (GPU/CPU, epochs, metrics) in a markdown cell.
  • Persist final metrics to reports/metrics/<notebook>-summary.json when appropriate.

Configuration Primer

Key fields available to every config:

  • experiment: Name used for metric filenames.
  • dataset: One of {cifar10, imdb} (extendable).
  • batch_size, epochs, learning_rate, optimizer, weight_decay: Passed straight to the trainer.
  • val_split, num_workers, grad_clip: Data loader and optimisation friendliness.
  • Dataset specific:
    • CIFAR-10: nothing extra required (transforms handled internally).
    • IMDB: embedding_dim, hidden_size, num_layers, max_tokens, dropout.

Data & Checkpoints

  • Datasets download automatically into data/. Keep the folder out of git.
  • Long-lived assets (plots, tables, curated checkpoints) should be copied into reports/.
  • For very large checkpoints consider using external storage and referencing it in documentation.

Development Notes

  • src/utils.py centralises seeding, config loading, logging, and training statistics helpers.
  • src/data.py returns a DataBundle holding loaders and dataset metadata (e.g., vocab for IMDB).
  • src/models.py exposes clean factory functions (build_model) backed by modular CNN/LSTM implementations.
  • src/train.py orchestrates the full loop with gradient clipping, optional schedulers, best-checkpoint selection, and metric persistence.
  • TODO list: add unit tests (pytest), integrate linting (ruff/black), wire make lint|test|format targets into CI.

Roadmap

  1. Add Hydra/OmegaConf for hierarchical experiment configs and sweeps.
  2. Integrate experiment tracking (Weights & Biases, MLflow) in the trainer.
  3. Introduce mixed-precision training and checkpoint saving to runs/.
  4. Publish documentation with MkDocs and ship contribution guidelines.

About

Modular PyTorch lab for running configurable deep-learning experiments, with ready-to-use data loaders, training pipelines, and metric tracking for both vision and NLP benchmarks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages