# Sign Language Translator - Google Colab Training Notebook

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NKEthio/Sign_Language_Translator/blob/main/Sign_Language_Translator_Colab.ipynb)

This notebook provides a complete end-to-end workflow for training a Sign Language (ASL) alphabet classifier using PyTorch.

## What you'll learn:
- Setting up the environment in Google Colab
- Downloading and preparing the Sign Language MNIST dataset from Kaggle
- Training a classifier with pretrained backbones (ResNet, MobileNet, EfficientNet)
- Evaluating model performance
- Running inference on test images
- Saving and exporting trained models

**GPU Recommendation**: For faster training, enable GPU in Runtime > Change runtime type > Hardware accelerator > GPU

## 1. Environment Setup

First, let's clone the repository and install all dependencies.

In [None]:
# Clone the repository
!git clone https://github.com/NKEthio/Sign_Language_Translator.git
%cd Sign_Language_Translator

In [None]:
# Install the package and dependencies
!pip install -e . -q

In [None]:
# Verify installation and check GPU availability
import torch
import sys

print(f"Python version: {sys.version}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")

# Import the package to verify it works
from sign_language_translator import (
    build_classifier,
    ModelConfig,
    build_train_transforms,
    build_eval_transforms,
    train_one_epoch,
    evaluate,
    set_seed,
)
print("\n✓ Package imported successfully!")

## 2. Download Dataset from Kaggle

We'll use the Sign Language MNIST dataset from Kaggle. To download it, you need to:

1. Get your Kaggle API credentials:
   - Go to https://www.kaggle.com/account
   - Click "Create New API Token" to download `kaggle.json`
   - Upload the file when prompted below

2. The dataset contains 24 classes (letters A-Z excluding J and Z which require motion)

In [None]:
# Upload kaggle.json file
from google.colab import files
import os

print("Please upload your kaggle.json file:")
uploaded = files.upload()

# Setup kaggle credentials
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
print("✓ Kaggle credentials configured!")

In [None]:
# Download the Sign Language MNIST dataset
!kaggle datasets download -d datamunge/sign-language-mnist
!unzip -q sign-language-mnist.zip -d sign_language_data
!ls -lh sign_language_data/
print("\n✓ Dataset downloaded!")

## 3. Convert CSV Data to Images

The dataset comes as CSV files. We'll convert them to images in ImageFolder format for easier training.

In [None]:
# Convert CSV to ImageFolder format
!slt-convert \
  --train_csv sign_language_data/sign_mnist_train.csv \
  --test_csv sign_language_data/sign_mnist_test.csv \
  --out_dir data/sign_mnist

print("\n✓ Data conversion complete!")
!ls -l data/sign_mnist/

In [None]:
# Inspect the dataset structure
import os

train_dir = "data/sign_mnist/train"
test_dir = "data/sign_mnist/test"

classes = sorted(os.listdir(train_dir))
print(f"Number of classes: {len(classes)}")
print(f"Classes: {classes}")

# Count images per class
for cls in classes[:5]:  # Show first 5 classes
    train_count = len(os.listdir(os.path.join(train_dir, cls)))
    test_count = len(os.listdir(os.path.join(test_dir, cls))) if os.path.exists(os.path.join(test_dir, cls)) else 0
    print(f"Class '{cls}': {train_count} train, {test_count} test images")

## 4. Visualize Sample Images

Let's visualize some sample images from the dataset.

In [None]:
import matplotlib.pyplot as plt
from PIL import Image
import random

# Show sample images from different classes
fig, axes = plt.subplots(3, 8, figsize=(16, 6))
fig.suptitle('Sample Sign Language Images', fontsize=16)

for idx, ax in enumerate(axes.flat):
    if idx < len(classes):
        cls = classes[idx]
        cls_dir = os.path.join(train_dir, cls)
        img_files = os.listdir(cls_dir)
        img_path = os.path.join(cls_dir, random.choice(img_files))
        img = Image.open(img_path)
        ax.imshow(img, cmap='gray')
        ax.set_title(f'Class: {cls}')
        ax.axis('off')
    else:
        ax.axis('off')

plt.tight_layout()
plt.show()

## 5. Training Configuration

Set up the training parameters. You can modify these based on your needs:

- **backbone**: Model architecture (`resnet18`, `resnet34`, `mobilenet_v3_small`, `mobilenet_v3_large`, `efficientnet_b0`)
- **epochs**: Number of training epochs
- **batch_size**: Batch size for training
- **learning_rate**: Learning rate for optimizer
- **image_size**: Input image size (default: 224)

In [None]:
# Training configuration
BACKBONE = "resnet18"  # Options: resnet18, resnet34, mobilenet_v3_small, mobilenet_v3_large, efficientnet_b0
EPOCHS = 10
BATCH_SIZE = 64
LEARNING_RATE = 1e-3
IMAGE_SIZE = 224
GRAYSCALE = True  # Sign MNIST is grayscale
OUTPUT_DIR = "artifacts"

print("Training Configuration:")
print(f"  Backbone: {BACKBONE}")
print(f"  Epochs: {EPOCHS}")
print(f"  Batch Size: {BATCH_SIZE}")
print(f"  Learning Rate: {LEARNING_RATE}")
print(f"  Image Size: {IMAGE_SIZE}")
print(f"  Grayscale: {GRAYSCALE}")

## 6. Train the Model

Now let's train the model using the CLI tool or custom Python code.

### Option A: Using CLI Tool (Recommended)

In [None]:
# Train using the CLI tool
!slt-train \
  --data_dir data/sign_mnist/train \
  --val_dir data/sign_mnist/test \
  --backbone {BACKBONE} \
  --epochs {EPOCHS} \
  --batch_size {BATCH_SIZE} \
  --lr {LEARNING_RATE} \
  --image_size {IMAGE_SIZE} \
  --grayscale \
  --output_dir {OUTPUT_DIR}

### Option B: Custom Training Loop (for more control)

If you want more control over the training process, you can use the Python API directly:

In [None]:
# Custom training loop (alternative to CLI)
# Uncomment to use this instead of the CLI tool

# from pathlib import Path
# from torch.utils.data import DataLoader
# from torch import nn
# from sign_language_translator.datasets import ImageFolderConfig, build_imagefolder_dataset

# # Set random seed
# set_seed(42)

# # Setup device
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# print(f"Using device: {device}")

# # Create transforms
# image_size = (IMAGE_SIZE, IMAGE_SIZE)
# train_tfms = build_train_transforms(image_size=image_size, grayscale=GRAYSCALE)
# val_tfms = build_eval_transforms(image_size=image_size, grayscale=GRAYSCALE)

# # Create datasets
# train_ds = build_imagefolder_dataset(
#     ImageFolderConfig(root=Path("data/sign_mnist/train"), image_size=image_size, grayscale=GRAYSCALE),
#     transform=train_tfms,
# )
# val_ds = build_imagefolder_dataset(
#     ImageFolderConfig(root=Path("data/sign_mnist/test"), image_size=image_size, grayscale=GRAYSCALE),
#     transform=val_tfms,
# )

# print(f"Training samples: {len(train_ds)}")
# print(f"Validation samples: {len(val_ds)}")
# print(f"Number of classes: {len(train_ds.classes)}")

# # Create data loaders
# train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
# val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)

# # Build model
# config = ModelConfig(
#     backbone=BACKBONE,
#     num_classes=len(train_ds.classes),
#     pretrained=True,
#     dropout=0.2,
#     in_channels=1 if GRAYSCALE else 3,
# )
# model = build_classifier(config)
# model.to(device)

# print(f"\nModel: {BACKBONE}")
# print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")

# # Setup training
# criterion = nn.CrossEntropyLoss()
# optimizer = torch.optim.AdamW(model.parameters(), lr=LEARNING_RATE, weight_decay=1e-4)
# scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=EPOCHS)

# # Training loop
# best_val_acc = 0.0
# output_dir = Path(OUTPUT_DIR)
# output_dir.mkdir(parents=True, exist_ok=True)

# print("\nStarting training...")
# for epoch in range(1, EPOCHS + 1):
#     train_loss, train_acc = train_one_epoch(model, train_loader, criterion, optimizer, device)
#     val_loss, val_acc = evaluate(model, val_loader, criterion, device)
#     scheduler.step()
    
#     print(f"Epoch {epoch:03d}/{EPOCHS} | "
#           f"train_loss={train_loss:.4f} train_acc={train_acc:.4f} | "
#           f"val_loss={val_loss:.4f} val_acc={val_acc:.4f}")
    
#     # Save best model
#     if val_acc > best_val_acc:
#         best_val_acc = val_acc
#         best_path = output_dir / "model_best.pt"
#         torch.save({
#             "epoch": epoch,
#             "model_state": model.state_dict(),
#             "classes": train_ds.classes,
#             "backbone": BACKBONE,
#             "grayscale": GRAYSCALE,
#             "image_size": image_size,
#         }, best_path)
#         print(f"  ✓ Saved new best model (val_acc={val_acc:.4f})")

# print(f"\nTraining complete! Best validation accuracy: {best_val_acc:.4f}")

## 7. Evaluate the Trained Model

Let's check the model's performance on the test set.

In [None]:
# Load the best model
import torch
from pathlib import Path

model_path = Path(OUTPUT_DIR) / "model_best.pt"
if model_path.exists():
    checkpoint = torch.load(model_path, map_location='cpu')
    print("Model checkpoint loaded:")
    print(f"  Epoch: {checkpoint.get('epoch', 'N/A')}")
    print(f"  Backbone: {checkpoint['backbone']}")
    print(f"  Classes: {len(checkpoint['classes'])}")
    print(f"  Image size: {checkpoint['image_size']}")
    print(f"  Grayscale: {checkpoint['grayscale']}")
else:
    print(f"Model not found at {model_path}")

In [None]:
# Create confusion matrix and classification report
from sign_language_translator.datasets import ImageFolderConfig, build_imagefolder_dataset
from torch.utils.data import DataLoader
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

# Reload model for evaluation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
checkpoint = torch.load(model_path, map_location=device)

# Rebuild model
config = ModelConfig(
    backbone=checkpoint['backbone'],
    num_classes=len(checkpoint['classes']),
    pretrained=False,
    dropout=0.2,
    in_channels=1 if checkpoint['grayscale'] else 3,
)
model = build_classifier(config)
model.load_state_dict(checkpoint['model_state'])
model.to(device)
model.eval()

# Load test data
val_tfms = build_eval_transforms(image_size=checkpoint['image_size'], grayscale=checkpoint['grayscale'])
val_ds = build_imagefolder_dataset(
    ImageFolderConfig(root=Path("data/sign_mnist/test"), image_size=checkpoint['image_size'], grayscale=checkpoint['grayscale']),
    transform=val_tfms,
)
val_loader = DataLoader(val_ds, batch_size=64, shuffle=False, num_workers=2)

# Collect predictions
all_preds = []
all_labels = []

print("Evaluating model...")
with torch.no_grad():
    for images, labels in val_loader:
        images = images.to(device)
        outputs = model(images)
        preds = outputs.argmax(dim=1).cpu().numpy()
        all_preds.extend(preds)
        all_labels.extend(labels.numpy())

# Classification report
print("\nClassification Report:")
print(classification_report(all_labels, all_preds, target_names=checkpoint['classes']))

# Confusion matrix
cm = confusion_matrix(all_labels, all_preds)
plt.figure(figsize=(12, 10))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=checkpoint['classes'], yticklabels=checkpoint['classes'])
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.show()

## 8. Test Inference on Sample Images

Let's test the model on some sample images from the test set.

In [None]:
# Test inference on random samples
import random
from PIL import Image
import torchvision.transforms as T

# Get random test samples
num_samples = 8
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
fig.suptitle('Model Predictions on Test Samples', fontsize=16)

# Create transform
transform = build_eval_transforms(image_size=checkpoint['image_size'], grayscale=checkpoint['grayscale'])

for idx, ax in enumerate(axes.flat):
    if idx < num_samples:
        # Get random image
        cls = random.choice(checkpoint['classes'])
        cls_dir = os.path.join("data/sign_mnist/test", cls)
        if os.path.exists(cls_dir):
            img_files = os.listdir(cls_dir)
            img_path = os.path.join(cls_dir, random.choice(img_files))
            
            # Load and predict
            img = Image.open(img_path)
            img_tensor = transform(img).unsqueeze(0).to(device)
            
            with torch.no_grad():
                output = model(img_tensor)
                probs = torch.softmax(output, dim=1)
                pred_idx = output.argmax(dim=1).item()
                confidence = probs[0, pred_idx].item()
            
            pred_class = checkpoint['classes'][pred_idx]
            
            # Display
            ax.imshow(img, cmap='gray')
            color = 'green' if pred_class == cls else 'red'
            ax.set_title(f'True: {cls}\nPred: {pred_class} ({confidence:.2%})', color=color)
            ax.axis('off')
    else:
        ax.axis('off')

plt.tight_layout()
plt.show()

## 9. Using the Inference CLI Tool

You can also use the command-line inference tool:

In [None]:
# Test inference using CLI tool
# First, get a sample image path
import os
import random

# Get a random test image
test_class = random.choice(checkpoint['classes'])
test_class_dir = f"data/sign_mnist/test/{test_class}"
if os.path.exists(test_class_dir):
    test_image = os.path.join(test_class_dir, random.choice(os.listdir(test_class_dir)))
    print(f"Testing on image: {test_image}")
    print(f"True class: {test_class}\n")
    
    !slt-infer --model {OUTPUT_DIR}/model_best.pt --image {test_image}

## 10. Export and Download Model

Download the trained model to use it locally or in other applications.

In [None]:
# Download the trained model
from google.colab import files

print("Downloading trained model...")
if os.path.exists(f"{OUTPUT_DIR}/model_best.pt"):
    files.download(f"{OUTPUT_DIR}/model_best.pt")
    print("✓ Model downloaded!")
else:
    print("Model file not found!")

### Optional: Export to ONNX Format

For deployment in production environments, you can export the model to ONNX format:

In [None]:
# Export to ONNX (optional)
import torch.onnx

# Create dummy input
dummy_input = torch.randn(1, 1 if checkpoint['grayscale'] else 3, 
                          checkpoint['image_size'][0], checkpoint['image_size'][1]).to(device)

# Export
onnx_path = f"{OUTPUT_DIR}/model.onnx"
torch.onnx.export(
    model,
    dummy_input,
    onnx_path,
    export_params=True,
    opset_version=11,
    do_constant_folding=True,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}
)

print(f"✓ Model exported to ONNX: {onnx_path}")
print(f"File size: {os.path.getsize(onnx_path) / 1024 / 1024:.2f} MB")

# Download ONNX model
files.download(onnx_path)

## 11. Next Steps and Tips

### Improving Model Performance

1. **Try different backbones**: EfficientNet and MobileNet variants may give better results
2. **Increase epochs**: Train for 20-30 epochs for better convergence
3. **Learning rate tuning**: Try different learning rates (1e-4, 3e-4, 1e-3)
4. **Data augmentation**: The training transforms include rotation, scaling, and other augmentations
5. **Model ensembling**: Train multiple models and average their predictions

### Using the Model Locally

1. Download the `model_best.pt` file
2. Clone the repository on your local machine
3. Install dependencies: `pip install -e .`
4. Run inference:
   ```bash
   slt-infer --model model_best.pt --image your_image.png
   # or for webcam
   slt-infer --model model_best.pt --webcam
   ```

### Custom Dataset

To train on your own sign language dataset:
1. Organize images in ImageFolder format: `data/train/<CLASS>/*.png`
2. Adjust `--grayscale` flag based on your images (omit for RGB)
3. Modify `--image_size` if needed
4. Run training with your custom data directory

### Resources

- **Repository**: https://github.com/NKEthio/Sign_Language_Translator
- **Dataset**: https://www.kaggle.com/datamunge/sign-language-mnist
- **PyTorch Documentation**: https://pytorch.org/docs/
- **Torchvision Models**: https://pytorch.org/vision/stable/models.html

---

**Happy Training! 🚀**