# Action Recognition From Still Imagen Using Deep Learning Networks

Action recognition, the ability to identify and categorize human actions from visual data, has been
a long-standing challenge in the field of computer vision. Traditionally, this task has been tackled
using video footage, where the temporal information provided by consecutive frames allows for a
more robust understanding of the action's dynamics. Recent advances in deep learning have
enabled action recognition to be achieved with impressive accuracy using still images, even in
challenging conditions.

Indeed, everyday human actions like "climbing," "fishing," or "phoning" can also be effectively
described in still images. Furthermore, certain actions captured in videos, such as "taking photos,"
are inherently static and may require recognition methods solely based on static cues. Driven by
the potential implications of recognizing actions in still images and the comparative neglect of this
problem in computer vision, this assignment delves into the recognition of human actions utilizing
a single photograph.

For this project, the accompanying dataset encompasses a training set and a test set,
encompassing actions across 40 distinct categories. The Stanford 40 Action Dataset comprises
images depicting individuals executing 40 different actions. For each image, we provide a
bounding box surrounding the person performing the action, as indicated by the image's filename.
The dataset comprises 9532 images in total, with 180-300 images per action category. The
dataset is attached to this file for your convenience.

## 1. Data Loader to read the training and testing sets from the Standford 40 dataset

This code configures the device that will be used for training and evaluating the model, selecting a GPU if available or defaulting to the CPU otherwise. It also prints the name of the selected device. This setup is essential for leveraging the computational acceleration provided by GPUs, which significantly speeds up operations during deep neural network training and inference.
Notes:

- GPU Utilization: Ensure that CUDA and the appropriate drivers are installed for PyTorch to detect the GPU.
- Scalability: This approach makes the code portable across systems with varying hardware capabilities.

In [1]:
import torch
from Utils import *

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
print(f"Using device: {device_name}")

Using device: NVIDIA GeForce RTX 3060 Laptop GPU


In [2]:
import StanfordDataLoader as DL
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

data_loader = DL.StanfordDataLoader(base_dir="./Stanford40")
train_loader, test_loader = data_loader.create_dataloaders(transform, batch_size=32)

print(f"Dataset with boxes (training): {len(train_loader)} batches")
print(f"Dataset with boxes (testing): {len(test_loader)} batches")

Dataset with boxes (training): 125 batches
Dataset with boxes (testing): 173 batches


## 2. Custom CNN

In [None]:
import torch.nn as nn
import torch.optim as optim
from ModelsCNN import CustomCNN

model = CustomCNN(num_classes=40).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

train_model(model, train_loader, criterion, optimizer, device, num_epochs=5)
evaluate_model(model, test_loader, device)

Epoch 1/5: 100%|██████████| 125/125 [00:19<00:00,  6.57it/s]


Epoch [1/5], Loss: 7.0626


Epoch 2/5: 100%|██████████| 125/125 [00:18<00:00,  6.68it/s]


Epoch [2/5], Loss: 3.6897


Epoch 3/5: 100%|██████████| 125/125 [00:18<00:00,  6.64it/s]


Epoch [3/5], Loss: 3.6895


Epoch 4/5: 100%|██████████| 125/125 [00:19<00:00,  6.46it/s]


Epoch [4/5], Loss: 3.6899


Epoch 5/5: 100%|██████████| 125/125 [00:19<00:00,  6.43it/s]


Epoch [5/5], Loss: 3.6882
Test Accuracy: 3.42%


In [7]:
from torch.optim import SGD
from torch.optim.lr_scheduler import StepLR
from ModelsCNN import CustomResNet

model = CustomResNet(num_classes=40).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

def train_model_custom(model, train_loader, criterion, optimizer, scheduler, device, num_epochs=20):
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for images, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs}"):
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        scheduler.step()
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss / len(train_loader):.4f}")

train_model_custom(model, train_loader, criterion, optimizer, scheduler, device)
evaluate_model(model, test_loader, device)


Epoch 1/20: 100%|██████████| 125/125 [00:19<00:00,  6.26it/s]


Epoch [1/20], Loss: 3.9963


Epoch 2/20: 100%|██████████| 125/125 [00:19<00:00,  6.29it/s]


Epoch [2/20], Loss: 3.6929


Epoch 3/20: 100%|██████████| 125/125 [00:20<00:00,  6.18it/s]


Epoch [3/20], Loss: 3.6637


Epoch 4/20: 100%|██████████| 125/125 [00:19<00:00,  6.29it/s]


Epoch [4/20], Loss: 3.6218


Epoch 5/20: 100%|██████████| 125/125 [00:19<00:00,  6.26it/s]


Epoch [5/20], Loss: 3.5845


Epoch 6/20: 100%|██████████| 125/125 [00:19<00:00,  6.29it/s]


Epoch [6/20], Loss: 3.5390


Epoch 7/20: 100%|██████████| 125/125 [00:19<00:00,  6.27it/s]


Epoch [7/20], Loss: 3.4437


Epoch 8/20: 100%|██████████| 125/125 [00:20<00:00,  6.24it/s]


Epoch [8/20], Loss: 3.3231


Epoch 9/20: 100%|██████████| 125/125 [00:20<00:00,  6.16it/s]


Epoch [9/20], Loss: 3.2598


Epoch 10/20: 100%|██████████| 125/125 [00:20<00:00,  6.21it/s]


Epoch [10/20], Loss: 3.1544


Epoch 11/20: 100%|██████████| 125/125 [00:19<00:00,  6.26it/s]


Epoch [11/20], Loss: 2.9098


Epoch 12/20: 100%|██████████| 125/125 [00:20<00:00,  6.24it/s]


Epoch [12/20], Loss: 2.7732


Epoch 13/20: 100%|██████████| 125/125 [00:19<00:00,  6.31it/s]


Epoch [13/20], Loss: 2.7027


Epoch 14/20: 100%|██████████| 125/125 [00:20<00:00,  6.25it/s]


Epoch [14/20], Loss: 2.6524


Epoch 15/20: 100%|██████████| 125/125 [00:19<00:00,  6.32it/s]


Epoch [15/20], Loss: 2.5655


Epoch 16/20: 100%|██████████| 125/125 [00:19<00:00,  6.30it/s]


Epoch [16/20], Loss: 2.5062


Epoch 17/20: 100%|██████████| 125/125 [00:20<00:00,  6.24it/s]


Epoch [17/20], Loss: 2.4278


Epoch 18/20: 100%|██████████| 125/125 [00:20<00:00,  6.19it/s]


Epoch [18/20], Loss: 2.3511


Epoch 19/20: 100%|██████████| 125/125 [00:19<00:00,  6.28it/s]


Epoch [19/20], Loss: 2.2447


Epoch 20/20: 100%|██████████| 125/125 [00:19<00:00,  6.26it/s]


Epoch [20/20], Loss: 2.1361
Test Accuracy: 23.25%


## 3. Pre-trained Deep Learning Models

### 3.1 ResNet

In [8]:
from ModelsCNN import ResNetModel

model = ResNetModel(num_classes=40).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.resnet.fc.parameters(), lr=0.001)

train_model(model, train_loader, criterion, optimizer, device, num_epochs=15)
evaluate_model(model, test_loader, device)

Epoch 1/15: 100%|██████████| 125/125 [00:13<00:00,  8.94it/s]


Epoch [1/15], Loss: 3.1078


Epoch 2/15: 100%|██████████| 125/125 [00:13<00:00,  9.04it/s]


Epoch [2/15], Loss: 2.1413


Epoch 3/15: 100%|██████████| 125/125 [00:13<00:00,  9.01it/s]


Epoch [3/15], Loss: 1.7520


Epoch 4/15: 100%|██████████| 125/125 [00:13<00:00,  9.02it/s]


Epoch [4/15], Loss: 1.5197


Epoch 5/15: 100%|██████████| 125/125 [00:13<00:00,  9.07it/s]


Epoch [5/15], Loss: 1.3782


Epoch 6/15: 100%|██████████| 125/125 [00:13<00:00,  9.17it/s]


Epoch [6/15], Loss: 1.2872


Epoch 7/15: 100%|██████████| 125/125 [00:13<00:00,  9.09it/s]


Epoch [7/15], Loss: 1.2022


Epoch 8/15: 100%|██████████| 125/125 [00:13<00:00,  9.12it/s]


Epoch [8/15], Loss: 1.1325


Epoch 9/15: 100%|██████████| 125/125 [00:13<00:00,  9.17it/s]


Epoch [9/15], Loss: 1.0775


Epoch 10/15: 100%|██████████| 125/125 [00:13<00:00,  9.09it/s]


Epoch [10/15], Loss: 1.0162


Epoch 11/15: 100%|██████████| 125/125 [00:13<00:00,  9.12it/s]


Epoch [11/15], Loss: 0.9808


Epoch 12/15: 100%|██████████| 125/125 [00:14<00:00,  8.78it/s]


Epoch [12/15], Loss: 0.9302


Epoch 13/15: 100%|██████████| 125/125 [00:13<00:00,  9.39it/s]


Epoch [13/15], Loss: 0.9047


Epoch 14/15: 100%|██████████| 125/125 [00:13<00:00,  9.40it/s]


Epoch [14/15], Loss: 0.8769


Epoch 15/15: 100%|██████████| 125/125 [00:13<00:00,  9.39it/s]


Epoch [15/15], Loss: 0.8289
Test Accuracy: 57.59%


### 3.2 GoogleNet

In [9]:
import torch.optim as optim
from ModelsCNN import GooglenetModel
import torch.nn as nn

model = GooglenetModel(num_classes=40).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

train_model(model, train_loader, criterion, optimizer, device, num_epochs=7)
evaluate_model(model, test_loader, device)

Epoch 1/7: 100%|██████████| 125/125 [00:21<00:00,  5.76it/s]


Epoch [1/7], Loss: 2.5719


Epoch 2/7: 100%|██████████| 125/125 [00:21<00:00,  5.76it/s]


Epoch [2/7], Loss: 1.6039


Epoch 3/7: 100%|██████████| 125/125 [00:21<00:00,  5.74it/s]


Epoch [3/7], Loss: 1.0832


Epoch 4/7: 100%|██████████| 125/125 [00:21<00:00,  5.74it/s]


Epoch [4/7], Loss: 0.6847


Epoch 5/7: 100%|██████████| 125/125 [00:21<00:00,  5.74it/s]


Epoch [5/7], Loss: 0.4551


Epoch 6/7: 100%|██████████| 125/125 [00:21<00:00,  5.74it/s]


Epoch [6/7], Loss: 0.2940


Epoch 7/7: 100%|██████████| 125/125 [00:21<00:00,  5.71it/s]


Epoch [7/7], Loss: 0.2393
Test Accuracy: 54.75%


### 3.3 VGG 

In [10]:
import torch.optim as optim
import torch.nn as nn
from ModelsCNN import VGGModel

model = VGGModel(num_classes=40).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

train_model(model, train_loader, criterion, optimizer, device, num_epochs=7)
evaluate_model(model, test_loader, device)

Epoch 1/7: 100%|██████████| 125/125 [01:06<00:00,  1.89it/s]


Epoch [1/7], Loss: 3.7280


Epoch 2/7: 100%|██████████| 125/125 [01:05<00:00,  1.90it/s]


Epoch [2/7], Loss: 3.6980


Epoch 3/7: 100%|██████████| 125/125 [01:05<00:00,  1.92it/s]


Epoch [3/7], Loss: 3.6969


Epoch 4/7: 100%|██████████| 125/125 [01:05<00:00,  1.92it/s]


Epoch [4/7], Loss: 3.6942


Epoch 5/7: 100%|██████████| 125/125 [01:05<00:00,  1.92it/s]


Epoch [5/7], Loss: 3.6961


Epoch 6/7: 100%|██████████| 125/125 [01:05<00:00,  1.92it/s]


Epoch [6/7], Loss: 3.6955


Epoch 7/7: 100%|██████████| 125/125 [01:05<00:00,  1.92it/s]


Epoch [7/7], Loss: 3.6938
Test Accuracy: 3.49%


### 3.4 MobileNet

In [11]:
import torch
import torch.optim as optim
from ModelsCNN import MobileNetModel

model = MobileNetModel(num_classes=40).to(device)
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

train_model(model, train_loader, criterion, optimizer, device, num_epochs=10)
evaluate_model(model, test_loader, device)

Epoch 1/10: 100%|██████████| 125/125 [00:21<00:00,  5.84it/s]


Epoch [1/10], Loss: 2.5913


Epoch 2/10: 100%|██████████| 125/125 [00:21<00:00,  5.84it/s]


Epoch [2/10], Loss: 1.7228


Epoch 3/10: 100%|██████████| 125/125 [00:21<00:00,  5.84it/s]


Epoch [3/10], Loss: 1.2845


Epoch 4/10: 100%|██████████| 125/125 [00:21<00:00,  5.81it/s]


Epoch [4/10], Loss: 0.9705


Epoch 5/10: 100%|██████████| 125/125 [00:21<00:00,  5.77it/s]


Epoch [5/10], Loss: 0.7333


Epoch 6/10: 100%|██████████| 125/125 [00:21<00:00,  5.77it/s]


Epoch [6/10], Loss: 0.5672


Epoch 7/10: 100%|██████████| 125/125 [00:21<00:00,  5.83it/s]


Epoch [7/10], Loss: 0.4890


Epoch 8/10: 100%|██████████| 125/125 [00:21<00:00,  5.83it/s]


Epoch [8/10], Loss: 0.4508


Epoch 9/10: 100%|██████████| 125/125 [00:21<00:00,  5.83it/s]


Epoch [9/10], Loss: 0.3512


Epoch 10/10: 100%|██████████| 125/125 [00:21<00:00,  5.81it/s]


Epoch [10/10], Loss: 0.3134
Test Accuracy: 52.37%


# 4. Analysis of models