# Week 14: CNN Lab - Rock, Paper, Scissors

**Objective:** Build, train, and test a Convolutional Neural Network (CNN) to classify images of hands playing Rock, Paper, or Scissors.

### Step 1: Setup and Data Download

This first cell downloads the dataset from Kaggle.

In [2]:
%pip install kagglehub

Collecting kagglehub
  Obtaining dependency information for kagglehub from https://files.pythonhosted.org/packages/a4/8e/4077b08b95a1f8302c694a8b399bd413815fbe89045c41e6e08cd7d9439a/kagglehub-0.3.13-py3-none-any.whl.metadata
  Downloading kagglehub-0.3.13-py3-none-any.whl.metadata (38 kB)
Collecting pyyaml (from kagglehub)
  Obtaining dependency information for pyyaml from https://files.pythonhosted.org/packages/da/e3/ea007450a105ae919a72393cb06f122f288ef60bba2dc64b26e2646fa315/pyyaml-6.0.3-cp311-cp311-win_amd64.whl.metadata
  Downloading pyyaml-6.0.3-cp311-cp311-win_amd64.whl.metadata (2.4 kB)
Collecting tqdm (from kagglehub)
  Obtaining dependency information for tqdm from https://files.pythonhosted.org/packages/d0/30/dc54f88dd4a2b5dc8a0279bdd7270e735851848b762aeb1c1184ed1f6b14/tqdm-4.67.1-py3-none-any.whl.metadata
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
     ---------------------------------------- 0.0/57.7 kB ? eta -:--:--
     ----------------------------------


[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
import kagglehub

path = kagglehub.dataset_download("drgfreeman/rockpaperscissors")

print("Path to dataset files:", path)

  from .autonotebook import tqdm as notebook_tqdm


Downloading from https://www.kaggle.com/api/v1/datasets/download/drgfreeman/rockpaperscissors?dataset_version_number=2...


100%|██████████| 306M/306M [00:30<00:00, 10.7MB/s] 

Extracting files...





Path to dataset files: C:\Users\diyap\.cache\kagglehub\datasets\drgfreeman\rockpaperscissors\versions\2


In [5]:
import kagglehub
import shutil
import os

# 1. Download the dataset and capture the actual local path
# This ensures you point to where the files actually are on YOUR computer
path = kagglehub.dataset_download("drgfreeman/rockpaperscissors")
print("Dataset downloaded to:", path)

# 2. Inspect the downloaded folder to find the correct source root
# Sometimes datasets have nested folders (e.g., the images might be inside a subfolder)
print("Contents of download:", os.listdir(path))

# --- UPDATE THIS BASED ON THE PRINT ABOVE ---
# If 'rock', 'paper', 'scissors' are directly in the list printed above, use:
src_root = path
# If they are inside a subfolder (like 'rps-cv-images'), use:
# src_root = os.path.join(path, "rps-cv-images") 

dst_root = "dataset" # Using a relative path is safer for local VS Code

os.makedirs(dst_root, exist_ok=True)

folders_to_copy = ["rock", "paper", "scissors"]

for folder in folders_to_copy:
    src_path = os.path.join(src_root, folder)
    dst_path = os.path.join(dst_root, folder)

    if os.path.exists(src_path):
        shutil.copytree(src_path, dst_path, dirs_exist_ok=True)
        print(f"Successfully copied: {folder}")
    else:
        print(f"Folder not found: {folder} (Looked at: {src_path})")

Dataset downloaded to: C:\Users\diyap\.cache\kagglehub\datasets\drgfreeman\rockpaperscissors\versions\2
Contents of download: ['paper', 'README_rpc-cv-images.txt', 'rock', 'rps-cv-images', 'scissors']
Successfully copied: rock
Successfully copied: paper
Successfully copied: scissors


### Step 2: Imports and Device Setup

Import the necessary libraries and check if a GPU is available.

In [7]:
%pip install torch torchvision

Collecting torch
  Obtaining dependency information for torch from https://files.pythonhosted.org/packages/47/cc/7a2949e38dfe3244c4df21f0e1c27bce8aedd6c604a587dd44fc21017cb4/torch-2.9.1-cp311-cp311-win_amd64.whl.metadata
  Downloading torch-2.9.1-cp311-cp311-win_amd64.whl.metadata (30 kB)
Collecting torchvision
  Obtaining dependency information for torchvision from https://files.pythonhosted.org/packages/fa/bb/cfc6a6f6ccc84a534ed1fdf029ae5716dd6ff04e57ed9dc2dab38bf652d5/torchvision-0.24.1-cp311-cp311-win_amd64.whl.metadata
  Downloading torchvision-0.24.1-cp311-cp311-win_amd64.whl.metadata (5.9 kB)
Collecting filelock (from torch)
  Obtaining dependency information for filelock from https://files.pythonhosted.org/packages/76/91/7216b27286936c16f5b4d0c530087e4a54eead683e6b0b73dd0c64844af6/filelock-3.20.0-py3-none-any.whl.metadata
  Downloading filelock-3.20.0-py3-none-any.whl.metadata (2.1 kB)
Collecting sympy>=1.13.3 (from torch)
  Obtaining dependency information for sympy>=1.13.3 fr


[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [8]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
from PIL import Image
import numpy as np

# TODO: Set the 'device' variable
# Check if CUDA (GPU) is available, otherwise use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # <-- Replace this

print("Using device:", device)

Using device: cpu


### Step 3: Data Loading and Preprocessing

Here we will define our image transformations, load the dataset, split it, and create DataLoaders.

In [10]:
# ... (Previous imports)

# CHECK THIS PATH: Ensure it matches where you downloaded the data
DATA_DIR = "dataset"  

# ---------------------------------------------------------
# 1. Define Transforms
# ---------------------------------------------------------
transform = transforms.Compose([
    transforms.Resize((128, 128)),      # Resize images to fixed size
    transforms.ToTensor(),              # Convert [0, 255] pixel values to [0, 1] tensors
    # Normalize RGB channels: (input - mean) / std
    # We use 3 values because the images are Color (Red, Green, Blue)
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) 
])

# Load dataset
full_dataset = datasets.ImageFolder(DATA_DIR, transform=transform)
class_names = full_dataset.classes
print("Classes found:", class_names)

# ---------------------------------------------------------
# 2. Split the Dataset
# ---------------------------------------------------------
# Calculate lengths (must be integers)
total_count = len(full_dataset)
train_count = int(0.8 * total_count)
test_count = total_count - train_count  # Use subtraction to ensure no images are left out

# Use random_split
train_dataset, test_dataset = random_split(full_dataset, [train_count, test_count])

# ---------------------------------------------------------
# 3. Create DataLoaders
# ---------------------------------------------------------
# DataLoaders handle the batching and shuffling logic
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# ---------------------------------------------------------
# Verification
# ---------------------------------------------------------
print(f"Total images: {len(full_dataset)}")
print(f"Training images: {len(train_dataset)}")
print(f"Test images: {len(test_dataset)}")

# Optional: Check the shape of one batch to ensure it works
data_iter = iter(train_loader)
images, labels = next(data_iter)
print(f"Batch shape: {images.shape}") 
# Expected: torch.Size([32, 3, 128, 128]) -> [Batch_Size, Channels, Height, Width]

Classes found: ['paper', 'rock', 'scissors']
Total images: 2188
Training images: 1750
Test images: 438
Batch shape: torch.Size([32, 3, 128, 128])


### Step 4: Define the CNN Model

Fill in the `conv_block` and `fc_block` with the correct layers.

In [11]:
class RPS_CNN(nn.Module):
    def __init__(self):
        super(RPS_CNN, self).__init__()

        # ---------------------------------------------------------
        # 1. Convolutional Block
        # ---------------------------------------------------------
        # Input: (3, 128, 128) -> Output: (64, 16, 16)
        self.conv_block = nn.Sequential(
            # Block 1
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2), # Image becomes 64x64
            
            # Block 2
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2), # Image becomes 32x32
            
            # Block 3
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)  # Image becomes 16x16
        )

        # ---------------------------------------------------------
        # 2. Fully Connected Block
        # ---------------------------------------------------------
        # The flattened size is 64 channels * 16 height * 16 width = 16,384
        self.fc = nn.Sequential(
            nn.Flatten(), # Converts 3D tensor to 1D vector
            nn.Linear(64 * 16 * 16, 256),
            nn.ReLU(),
            nn.Dropout(p=0.3),
            nn.Linear(256, 3) # Output layer: 3 classes (Rock, Paper, Scissors)
        )

    def forward(self, x):
        x = self.conv_block(x)
        x = self.fc(x)
        return x

# ---------------------------------------------------------
# 3. Initialization
# ---------------------------------------------------------
# Create the model and move it to GPU if available
model = RPS_CNN().to(device)

# Loss Function
# CrossEntropyLoss combines LogSoftmax and NLLLoss in one single class.
# It is the standard loss function for multi-class classification.
criterion = nn.CrossEntropyLoss()

# Optimizer
# Adam is a standard optimizer that handles learning rate adjustment automatically.
optimizer = optim.Adam(model.parameters(), lr=0.001)

print(model)

RPS_CNN(
  (conv_block): Sequential(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU()
    (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=16384, out_features=256, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.3, inplace=False)
    (4): Linear(in_features=256, out_features=3, bias=True)
  )
)


### Step 5: Train the Model

Fill in the core training steps inside the loop.

In [12]:
EPOCHS = 10

for epoch in range(EPOCHS):
    model.train() # Set the model to training mode
    total_loss = 0

    for images, labels in train_loader:
        # Move data to the correct device
        images, labels = images.to(device), labels.to(device)

        # -----------------------------------------------------
        # THE TRAINING STEPS
        # -----------------------------------------------------
        
        # 1. Clear the gradients
        # PyTorch accumulates gradients by default. We must reset them 
        # before calculating new ones for this batch.
        optimizer.zero_grad()

        # 2. Forward pass
        # Pass the images through the model to get predictions
        outputs = model(images)

        # 3. Calculate the loss
        # Compare the model's 'outputs' against the actual 'labels'
        loss = criterion(outputs, labels)

        # 4. Backward pass (Backpropagation)
        # Calculate the gradient of the loss with respect to model parameters
        loss.backward()

        # 5. Update the weights
        # Adjust the weights based on the gradients calculated above
        optimizer.step()

        # -----------------------------------------------------

        total_loss += loss.item()

    print(f"Epoch {epoch+1}/{EPOCHS}, Loss = {total_loss/len(train_loader):.4f}")

print("Training complete!")

Epoch 1/10, Loss = 0.6361
Epoch 2/10, Loss = 0.1332
Epoch 3/10, Loss = 0.0693
Epoch 4/10, Loss = 0.0462
Epoch 5/10, Loss = 0.0277
Epoch 6/10, Loss = 0.0163
Epoch 7/10, Loss = 0.0101
Epoch 8/10, Loss = 0.0260
Epoch 9/10, Loss = 0.0067
Epoch 10/10, Loss = 0.0209
Training complete!


### Step 6: Evaluate the Model

Test the model's accuracy on the unseen test set.

In [13]:
model.eval() # Set the model to evaluation mode
correct = 0
total = 0

# 1. Disable gradient calculation
# This saves memory and computation since we aren't updating weights
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)

        # 2. Get raw model outputs (logits)
        outputs = model(images)

        # 3. Get the predicted class
        # torch.max returns a tuple: (max_value, index_of_max_value)
        # We only care about the index (which corresponds to class 0, 1, or 2)
        _, predicted = torch.max(outputs, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")

Test Accuracy: 97.26%


### Step 7: Test on a Single Image

Let's see how the model performs on one image.

In [14]:
def predict_image(model, img_path):
    model.eval()

    img = Image.open(img_path).convert("RGB")
    
    # Apply the same transforms as training
    # unsqueeze(0) adds a "batch" dimension: [3, 128, 128] -> [1, 3, 128, 128]
    img = transform(img).unsqueeze(0).to(device)

    with torch.no_grad():
        # -----------------------------------------------------
        # 1. Get the raw model outputs
        output = model(img)

        # 2. Get the predicted class index
        _, pred = torch.max(output, 1)
        # -----------------------------------------------------

    return class_names[pred.item()]


# ---------------------------------------------------------
# Test with a real file from your local dataset
# ---------------------------------------------------------
import os
import random

# Let's pick a random image from the 'paper' folder to test
test_folder = os.path.join(DATA_DIR, "paper")

# Get a list of files, assuming the folder exists
if os.path.exists(test_folder):
    files = os.listdir(test_folder)
    if files:
        # Pick the first available file
        random_file = files[0] 
        test_img_path = os.path.join(test_folder, random_file)
        
        print(f"Testing image: {test_img_path}")
        prediction = predict_image(model, test_img_path)
        print(f"Model prediction: {prediction}")
    else:
        print("Paper folder is empty.")
else:
    print(f"Could not find folder: {test_folder}. Check your DATA_DIR variable.")

Testing image: dataset\paper\04l5I8TqdzF9WDMJ.png
Model prediction: paper


### Step 8: Play the Game!

This code is complete. If your model is trained, you can run this cell to have the model play against itself.

In [15]:
import random
import os

# Ensure this matches your local folder name
DATA_DIR = "dataset" 

def pick_random_image(class_name):
    # FIXED: Use os.path.join with the local directory variable
    folder = os.path.join(DATA_DIR, class_name)
    
    if not os.path.exists(folder):
        print(f"Error: Folder not found {folder}")
        return None
        
    files = os.listdir(folder)
    img = random.choice(files)
    return os.path.join(folder, img)

def rps_winner(move1, move2):
    if move1 == move2:
        return "It's a Draw!"

    rules = {
        "rock": "scissors",
        "paper": "rock",
        "scissors": "paper"
    }

    if rules[move1] == move2:
        return f"Player 1 wins! ({move1} smashes {move2})"
    else:
        return f"Player 2 wins! ({move2} beats {move1})"


# -----------------------------------------------------------
# 1. Choose any two random classes (The "Truth")
# -----------------------------------------------------------
choices = ["rock", "paper", "scissors"]
c1 = random.choice(choices)
c2 = random.choice(choices)

# Pick actual image files from your drive
img1_path = pick_random_image(c1)
img2_path = pick_random_image(c2)

print(f"Player 1 selected a random '{c1}' image.")
print(f"Player 2 selected a random '{c2}' image.")


# -----------------------------------------------------------
# 2. Predict their labels using the model (The "AI Vision")
# -----------------------------------------------------------
# The model doesn't know c1/c2; it has to look at the pixels to guess!



p1 = predict_image(model, img1_path)
p2 = predict_image(model, img2_path)

print(f"\nAI sees Player 1 as: {p1}")
print(f"AI sees Player 2 as: {p2}")

# -----------------------------------------------------------
# 3. Decide the winner
# -----------------------------------------------------------

print("\n--- RESULT ---")
print(rps_winner(p1, p2))

Player 1 selected a random 'scissors' image.
Player 2 selected a random 'rock' image.

AI sees Player 1 as: scissors
AI sees Player 2 as: rock

--- RESULT ---
Player 2 wins! (rock beats scissors)
