# Action Recognition From Still Imagen Using Deep Learning Networks

Action recognition, the ability to identify and categorize human actions from visual data, has been
a long-standing challenge in the field of computer vision. Traditionally, this task has been tackled
using video footage, where the temporal information provided by consecutive frames allows for a
more robust understanding of the action's dynamics. Recent advances in deep learning have
enabled action recognition to be achieved with impressive accuracy using still images, even in
challenging conditions.

Indeed, everyday human actions like "climbing," "fishing," or "phoning" can also be effectively
described in still images. Furthermore, certain actions captured in videos, such as "taking photos,"
are inherently static and may require recognition methods solely based on static cues. Driven by
the potential implications of recognizing actions in still images and the comparative neglect of this
problem in computer vision, this assignment delves into the recognition of human actions utilizing
a single photograph.

For this project, the accompanying dataset encompasses a training set and a test set,
encompassing actions across 40 distinct categories. The Stanford 40 Action Dataset comprises
images depicting individuals executing 40 different actions. For each image, we provide a
bounding box surrounding the person performing the action, as indicated by the image's filename.
The dataset comprises 9532 images in total, with 180-300 images per action category. The
dataset is attached to this file for your convenience.

## 1. Data Loader to read the training and testing sets from the Standford 40 dataset

In [1]:
import torch
from tqdm import tqdm


# Check device availability more efficiently
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
print(f"Using device: {device_name}")


Using device: NVIDIA GeForce RTX 3060 Laptop GPU


In [2]:
import StanfordDataLoader as DL
from torchvision import transforms

# Define transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

# Create an instance of StanfordDataLoader
data_loader = DL.StanfordDataLoader(base_dir="./Stanford40")

# Create DataLoaders with bounding boxes
train_loader, test_loader = data_loader.create_dataloaders(transform, batch_size=32)

# Print dataset statistics
print(f"Dataset with boxes (training): {len(train_loader)} batches")
print(f"Dataset with boxes (testing): {len(test_loader)} batches")


Dataset with boxes (training): 125 batches
Dataset with boxes (testing): 173 batches


## 2. Custom CNN

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim
from StanfordCNN import CustomCNN, train_model, evaluate_model

# Define the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Instantiate the model
num_classes = 40
model = CustomCNN(num_classes=num_classes).to(device)

# Define the loss function and the optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
train_model(model, train_loader, criterion, optimizer, device, num_epochs=15)

# Evaluate the model
evaluate_model(model, test_loader, device)


Epoch 1/15: 100%|██████████| 125/125 [00:19<00:00,  6.37it/s]


Epoch [1/15], Loss: 6.1416


Epoch 2/15: 100%|██████████| 125/125 [00:18<00:00,  6.58it/s]


Epoch [2/15], Loss: 3.6895


Epoch 3/15: 100%|██████████| 125/125 [00:18<00:00,  6.78it/s]


Epoch [3/15], Loss: 3.6932


Epoch 4/15: 100%|██████████| 125/125 [00:18<00:00,  6.61it/s]


Epoch [4/15], Loss: 3.6895


Epoch 5/15: 100%|██████████| 125/125 [00:20<00:00,  6.21it/s]


Epoch [5/15], Loss: 3.6923


Epoch 6/15: 100%|██████████| 125/125 [00:20<00:00,  6.05it/s]


Epoch [6/15], Loss: 3.6894


Epoch 7/15: 100%|██████████| 125/125 [00:20<00:00,  5.96it/s]


Epoch [7/15], Loss: 3.6894


Epoch 8/15: 100%|██████████| 125/125 [00:21<00:00,  5.91it/s]


Epoch [8/15], Loss: 3.6895


Epoch 9/15: 100%|██████████| 125/125 [00:21<00:00,  5.86it/s]


Epoch [9/15], Loss: 3.6894


Epoch 10/15: 100%|██████████| 125/125 [00:20<00:00,  6.09it/s]


Epoch [10/15], Loss: 3.6894


Epoch 11/15: 100%|██████████| 125/125 [00:20<00:00,  5.97it/s]


Epoch [11/15], Loss: 3.6894


Epoch 12/15: 100%|██████████| 125/125 [00:20<00:00,  6.14it/s]


Epoch [12/15], Loss: 3.6894


Epoch 13/15: 100%|██████████| 125/125 [00:19<00:00,  6.41it/s]


Epoch [13/15], Loss: 3.6894


Epoch 14/15: 100%|██████████| 125/125 [00:19<00:00,  6.36it/s]


Epoch [14/15], Loss: 3.6893


Epoch 15/15: 100%|██████████| 125/125 [00:19<00:00,  6.34it/s]


Epoch [15/15], Loss: 3.6894
Test Accuracy: 3.49%


In [4]:
from torch.optim import SGD
from torch.optim.lr_scheduler import StepLR
from StanfordCNN import CustomResNet

# Instantiate the model
num_classes = 40
model = CustomResNet(num_classes=num_classes).to(device)

# Define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

# Training loop
def train_model(model, train_loader, criterion, optimizer, scheduler, device, num_epochs=20):
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for images, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs}"):
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        scheduler.step()
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss / len(train_loader):.4f}")

# Evaluate function remains the same
train_model(model, train_loader, criterion, optimizer, scheduler, device)
evaluate_model(model, test_loader, device)


Epoch 1/20: 100%|██████████| 125/125 [00:20<00:00,  6.04it/s]


Epoch [1/20], Loss: 4.0045


Epoch 2/20: 100%|██████████| 125/125 [00:20<00:00,  6.22it/s]


Epoch [2/20], Loss: 3.6695


Epoch 3/20: 100%|██████████| 125/125 [00:19<00:00,  6.43it/s]


Epoch [3/20], Loss: 3.5845


Epoch 4/20: 100%|██████████| 125/125 [00:19<00:00,  6.45it/s]


Epoch [4/20], Loss: 3.5187


Epoch 5/20: 100%|██████████| 125/125 [00:20<00:00,  5.95it/s]


Epoch [5/20], Loss: 3.4378


Epoch 6/20: 100%|██████████| 125/125 [00:21<00:00,  5.75it/s]


Epoch [6/20], Loss: 3.3823


Epoch 7/20: 100%|██████████| 125/125 [00:22<00:00,  5.62it/s]


Epoch [7/20], Loss: 3.3015


Epoch 8/20: 100%|██████████| 125/125 [00:21<00:00,  5.93it/s]


Epoch [8/20], Loss: 3.2486


Epoch 9/20: 100%|██████████| 125/125 [00:22<00:00,  5.67it/s]


Epoch [9/20], Loss: 3.1470


Epoch 10/20: 100%|██████████| 125/125 [00:22<00:00,  5.62it/s]


Epoch [10/20], Loss: 3.0536


Epoch 11/20: 100%|██████████| 125/125 [00:23<00:00,  5.38it/s]


Epoch [11/20], Loss: 2.8143


Epoch 12/20: 100%|██████████| 125/125 [00:21<00:00,  5.83it/s]


Epoch [12/20], Loss: 2.6888


Epoch 13/20: 100%|██████████| 125/125 [00:21<00:00,  5.86it/s]


Epoch [13/20], Loss: 2.6348


Epoch 14/20: 100%|██████████| 125/125 [00:19<00:00,  6.30it/s]


Epoch [14/20], Loss: 2.5710


Epoch 15/20: 100%|██████████| 125/125 [00:19<00:00,  6.43it/s]


Epoch [15/20], Loss: 2.5144


Epoch 16/20: 100%|██████████| 125/125 [00:19<00:00,  6.30it/s]


Epoch [16/20], Loss: 2.4540


Epoch 17/20: 100%|██████████| 125/125 [00:19<00:00,  6.34it/s]


Epoch [17/20], Loss: 2.3800


Epoch 18/20: 100%|██████████| 125/125 [00:19<00:00,  6.39it/s]


Epoch [18/20], Loss: 2.3321


Epoch 19/20: 100%|██████████| 125/125 [00:19<00:00,  6.26it/s]


Epoch [19/20], Loss: 2.2269


Epoch 20/20: 100%|██████████| 125/125 [00:19<00:00,  6.39it/s]


Epoch [20/20], Loss: 2.1415
Test Accuracy: 22.90%


## 3. Pre-trained Deep Learning Models

### 3.1 ResNet

In [4]:
import torch
from StanfordResNet import CustomResNet, train_model, evaluate_model

# Initialize the model
model = CustomResNet(num_classes=40).to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.resnet.fc.parameters(), lr=0.001)

# Train the model
train_model(model, train_loader, criterion, optimizer, device, num_epochs=15)

# Evaluate the model
evaluate_model(model, test_loader, device)

Epoch 1/15: 100%|██████████| 125/125 [00:14<00:00,  8.74it/s]


Epoch [1/15], Loss: 3.1313


Epoch 2/15: 100%|██████████| 125/125 [00:14<00:00,  8.75it/s]


Epoch [2/15], Loss: 2.1530


Epoch 3/15: 100%|██████████| 125/125 [00:14<00:00,  8.48it/s]


Epoch [3/15], Loss: 1.7688


Epoch 4/15: 100%|██████████| 125/125 [00:15<00:00,  8.22it/s]


Epoch [4/15], Loss: 1.5554


Epoch 5/15: 100%|██████████| 125/125 [00:14<00:00,  8.38it/s]


Epoch [5/15], Loss: 1.4056


Epoch 6/15: 100%|██████████| 125/125 [00:15<00:00,  8.01it/s]


Epoch [6/15], Loss: 1.3006


Epoch 7/15: 100%|██████████| 125/125 [00:14<00:00,  8.74it/s]


Epoch [7/15], Loss: 1.2109


Epoch 8/15: 100%|██████████| 125/125 [00:14<00:00,  8.78it/s]


Epoch [8/15], Loss: 1.1285


Epoch 9/15: 100%|██████████| 125/125 [00:14<00:00,  8.67it/s]


Epoch [9/15], Loss: 1.0679


Epoch 10/15: 100%|██████████| 125/125 [00:14<00:00,  8.54it/s]


Epoch [10/15], Loss: 1.0248


Epoch 11/15: 100%|██████████| 125/125 [00:14<00:00,  8.50it/s]


Epoch [11/15], Loss: 0.9877


Epoch 12/15: 100%|██████████| 125/125 [00:13<00:00,  8.94it/s]


Epoch [12/15], Loss: 0.9328


Epoch 13/15: 100%|██████████| 125/125 [00:14<00:00,  8.92it/s]


Epoch [13/15], Loss: 0.9047


Epoch 14/15: 100%|██████████| 125/125 [00:14<00:00,  8.69it/s]


Epoch [14/15], Loss: 0.8696


Epoch 15/15: 100%|██████████| 125/125 [00:14<00:00,  8.56it/s]


Epoch [15/15], Loss: 0.8406
Test Accuracy: 56.53%


### 3.2 GoogleNet

In [8]:
import torch.optim as optim
from StanfordGoogleNet import GooglenetModel, train_model, evaluate_model

# Instantiate the model
model = GooglenetModel(num_classes=40).to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
train_model(model, train_loader, criterion, optimizer, device, num_epochs=7)

# Evaluate the model
evaluate_model(model, test_loader, device)


Epoch [1/7], Loss: 2.5721
Epoch [2/7], Loss: 1.5681
Epoch [3/7], Loss: 1.0757
Epoch [4/7], Loss: 0.7325
Epoch [5/7], Loss: 0.4391
Epoch [6/7], Loss: 0.3168
Epoch [7/7], Loss: 0.2326
Model accuracy on the test set: 52.71%


52.711496746203906

### 3.3 VGG 

In [3]:
import torch.optim as optim
import torch.nn as nn
from StanfordVGG import VGGModel, train_model, evaluate_model

model = VGGModel(num_classes=40).to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
train_model(model, train_loader, criterion, optimizer, device, num_epochs=7)

# Evaluate the model
evaluate_model(model, test_loader, device)


Epoch 1/7: 100%|██████████| 125/125 [01:10<00:00,  1.78it/s]


Epoch [1/7], Loss: 3.7248


Epoch 2/7: 100%|██████████| 125/125 [01:07<00:00,  1.84it/s]


Epoch [2/7], Loss: 3.6868


Epoch 3/7: 100%|██████████| 125/125 [01:07<00:00,  1.86it/s]


Epoch [3/7], Loss: 3.6963


Epoch 4/7: 100%|██████████| 125/125 [01:06<00:00,  1.88it/s]


Epoch [4/7], Loss: 3.7826


Epoch 5/7: 100%|██████████| 125/125 [01:08<00:00,  1.84it/s]


Epoch [5/7], Loss: 3.6956


Epoch 6/7: 100%|██████████| 125/125 [01:08<00:00,  1.82it/s]


Epoch [6/7], Loss: 3.6938


Epoch 7/7: 100%|██████████| 125/125 [01:08<00:00,  1.83it/s]


Epoch [7/7], Loss: 3.6930
Model accuracy on the test set: 3.40%


### 3.4 MobileNet

In [6]:
import torch
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from StanfordMobileNet import MobileNet, train_model, evaluate_model

# Instantiate the model
model = MobileNet(num_classes=40).to(device)

# Define loss function and optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
train_model(model, train_loader, criterion, optimizer, device, num_epochs=10)

# Evaluate the model
evaluate_model(model, test_loader, device)


Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /home/user/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
100%|██████████| 13.6M/13.6M [00:00<00:00, 22.7MB/s]


Epoch [1/10], Loss: 2.6651
Epoch [2/10], Loss: 1.7730
Epoch [3/10], Loss: 1.3718
Epoch [4/10], Loss: 1.0172
Epoch [5/10], Loss: 0.7689
Epoch [6/10], Loss: 0.5749
Epoch [7/10], Loss: 0.4864
Epoch [8/10], Loss: 0.4763
Epoch [9/10], Loss: 0.3495
Epoch [10/10], Loss: 0.3612
Model accuracy on test set: 50.89%


50.88575560375994