# 00 - Local Environment Setup (CPU)

This notebook sets up the local development environment for the Format Matters project.

**Prerequisites:**
- Python 3.10 or higher
- pip package manager

**Steps:**
1. Create virtual environment
2. Install dependencies
3. Verify installation
4. Check system information

## 1. Create Virtual Environment

Run these commands in your terminal (not in this notebook):

```bash
# Navigate to project root
cd format-matters

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate

# On Windows:
.venv\Scripts\activate

# Install dependencies
pip install -r env/requirements.txt

# Launch Jupyter
jupyter notebook
```

## 2. Verify Python Version

In [1]:
import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

# Check version is 3.10+
assert sys.version_info >= (3, 10), "Python 3.10 or higher required"
print("✓ Python version check passed")

Python version: 3.12.6 (tags/v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)]
Python executable: C:\Users\arjya\Fall 2025\Systems for ML\Project 1\SML\format-matters\.venv\Scripts\python.exe
✓ Python version check passed


## 3. Install Dependencies (if not already done)

Uncomment and run if you haven't installed via terminal:

In [2]:
# !pip install -r ../env/requirements.txt

## 4. Verify Core Dependencies

In [3]:
import importlib
import sys

# Required packages
required_packages = [
    ('torch', '2.8'),
    ('torchvision', '0.23'),
    ('numpy', '1.26.4'),
    ('pandas', '2.2.2'),
    ('PIL', '10.3.0'),  # Pillow
    ('webdataset', '0.2.86'),
    ('tfrecord', '1.14.6'),
    ('lmdb', '1.5.1'),
    ('pyarrow', '16.1.0'),
    ('psutil', '5.9.8'),
    ('matplotlib', '3.8.4'),
    ('tqdm', '4.66.4'),
]

print("Checking installed packages:\n")
all_ok = True

for package_name, expected_version in required_packages:
    try:
        module = importlib.import_module(package_name)
        version = getattr(module, '__version__', 'unknown')
        status = "✓" if version.startswith(expected_version.split('.')[0]) else "⚠"
        print(f"{status} {package_name:15s} {version:15s} (expected: {expected_version})")
        if status == "⚠":
            all_ok = False
    except ImportError as e:
        print(f"✗ {package_name:15s} NOT INSTALLED")
        all_ok = False

print("\n" + "="*60)
if all_ok:
    print("✓ All dependencies installed successfully!")
else:
    print("⚠ Some dependencies missing or version mismatch")
    print("  Run: pip install -r ../env/requirements.txt")

Checking installed packages:

✓ torch           2.8.0+cpu       (expected: 2.8)
✓ torchvision     0.23.0+cpu      (expected: 0.23)
✓ numpy           1.26.4          (expected: 1.26.4)
✓ pandas          2.2.2           (expected: 2.2.2)
✓ PIL             10.3.0          (expected: 10.3.0)
✓ webdataset      0.2.86          (expected: 0.2.86)
⚠ tfrecord        unknown         (expected: 1.14.6)
✓ lmdb            1.5.1           (expected: 1.5.1)
✓ pyarrow         16.1.0          (expected: 16.1.0)
✓ psutil          5.9.8           (expected: 5.9.8)
✓ matplotlib      3.8.4           (expected: 3.8.4)
✓ tqdm            4.66.4          (expected: 4.66.4)

⚠ Some dependencies missing or version mismatch
  Run: pip install -r ../env/requirements.txt


## 5. Check PyTorch Configuration

In [4]:
import torch

print("PyTorch Configuration:")
print(f"  Version: {torch.__version__}")
print(f"  CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"  CUDA version: {torch.version.cuda}")
    print(f"  GPU count: {torch.cuda.device_count()}")
    print(f"  GPU name: {torch.cuda.get_device_name(0)}")
else:
    print("  Running on CPU (expected for local setup)")

print(f"\n  MPS (Apple Silicon) available: {torch.backends.mps.is_available()}")
if torch.backends.mps.is_available():
    print("  ✓ Can use Apple Silicon GPU acceleration")

PyTorch Configuration:
  Version: 2.8.0+cpu
  CUDA available: False
  Running on CPU (expected for local setup)

  MPS (Apple Silicon) available: False


## 6. System Information

In [5]:
import platform
import psutil
import os

print("System Information:")
print(f"  OS: {platform.system()} {platform.release()}")
print(f"  Platform: {platform.platform()}")
print(f"  Processor: {platform.processor()}")
print(f"  CPU cores: {psutil.cpu_count(logical=False)} physical, {psutil.cpu_count(logical=True)} logical")
print(f"  RAM: {psutil.virtual_memory().total / (1024**3):.1f} GB")
print(f"  Python: {platform.python_version()}")
print(f"\nWorking directory: {os.getcwd()}")

System Information:
  OS: Windows 11
  Platform: Windows-11-10.0.26100-SP0
  Processor: AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD
  CPU cores: 8 physical, 16 logical
  RAM: 15.3 GB
  Python: 3.12.6

Working directory: C:\Users\arjya\Fall 2025\Systems for ML\Project 1\SML\format-matters\notebooks


## 7. Create Project Directories

In [6]:
from pathlib import Path

# Define directory structure
directories = [
    "../data/raw/cifar10",
    "../data/raw/imagenet-mini",
    "../data/built",
    "../runs",
    "../scripts",
]

print("Creating project directories:\n")
for dir_path in directories:
    path = Path(dir_path)
    path.mkdir(parents=True, exist_ok=True)
    print(f"  ✓ {path.resolve()}")

print("\n✓ Directory structure created successfully!")

Creating project directories:

  ✓ C:\Users\arjya\Fall 2025\Systems for ML\Project 1\SML\format-matters\data\raw\cifar10
  ✓ C:\Users\arjya\Fall 2025\Systems for ML\Project 1\SML\format-matters\data\raw\imagenet-mini
  ✓ C:\Users\arjya\Fall 2025\Systems for ML\Project 1\SML\format-matters\data\built
  ✓ C:\Users\arjya\Fall 2025\Systems for ML\Project 1\SML\format-matters\runs
  ✓ C:\Users\arjya\Fall 2025\Systems for ML\Project 1\SML\format-matters\scripts

✓ Directory structure created successfully!


## 8. Test Basic Functionality

In [7]:
import torch
import torchvision
import numpy as np
import pandas as pd
from PIL import Image

print("Testing basic functionality:\n")

# Test PyTorch
x = torch.randn(2, 3, 224, 224)
print(f"✓ PyTorch tensor creation: {x.shape}")

# Test torchvision transforms
transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(224),
    torchvision.transforms.ToTensor(),
])
print(f"✓ Torchvision transforms: {transform}")

# Test numpy
arr = np.random.rand(10, 10)
print(f"✓ NumPy array creation: {arr.shape}")

# Test pandas
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
print(f"✓ Pandas DataFrame: {df.shape}")

# Test PIL
img = Image.new('RGB', (100, 100), color='red')
print(f"✓ PIL Image creation: {img.size}")

print("\n✓ All basic functionality tests passed!")

Testing basic functionality:

✓ PyTorch tensor creation: torch.Size([2, 3, 224, 224])
✓ Torchvision transforms: Compose(
    Resize(size=224, interpolation=bilinear, max_size=None, antialias=True)
    ToTensor()
)
✓ NumPy array creation: (10, 10)
✓ Pandas DataFrame: (3, 2)
✓ PIL Image creation: (100, 100)

✓ All basic functionality tests passed!


## 9. Environment Summary

In [8]:
import json
from datetime import datetime

env_info = {
    "timestamp": datetime.now().isoformat(),
    "environment": "local",
    "python_version": sys.version,
    "pytorch_version": torch.__version__,
    "cuda_available": torch.cuda.is_available(),
    "mps_available": torch.backends.mps.is_available(),
    "os": platform.system(),
    "cpu_count": psutil.cpu_count(logical=True),
    "ram_gb": round(psutil.virtual_memory().total / (1024**3), 1),
}

print("Environment Summary:")
print(json.dumps(env_info, indent=2))

# Save to file
env_file = Path("../runs/env_local.json")
env_file.parent.mkdir(parents=True, exist_ok=True)
env_file.write_text(json.dumps(env_info, indent=2))
print(f"\n✓ Environment info saved to: {env_file.resolve()}")

Environment Summary:
{
  "timestamp": "2025-10-07T13:12:24.606323",
  "environment": "local",
  "python_version": "3.12.6 (tags/v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)]",
  "pytorch_version": "2.8.0+cpu",
  "cuda_available": false,
  "mps_available": false,
  "os": "Windows",
  "cpu_count": 16,
  "ram_gb": 15.3
}

✓ Environment info saved to: C:\Users\arjya\Fall 2025\Systems for ML\Project 1\SML\format-matters\runs\env_local.json


## ✅ Setup Complete!

Your local environment is ready. Next steps:

1. **Prepare datasets**: Run `01_prepare_datasets.ipynb`
2. **Build formats**: Run builder notebooks (02-05)
3. **Run experiments**: Execute training notebooks (20-21)
4. **Analyze results**: Use analysis notebooks (30-31)

**Note**: For GPU-accelerated training, use Kaggle (see `00_env_setup_kaggle.ipynb`)