# Human action recognition in still images
Author: [Genevieve Masioni](https://genevievemasioni.com)

## Abstract
Human action recognition is a standard Computer Vision problem and has been well
studied. The problem is usually addressed in the scope of video interpretation/ understanding. But some actions in video are static by nature ("taking a photo") and only require recognition methods based on static cues. There's therefore an impact in recognizing human actions in still images. In this project, we propose an AI algorithm to recognize seven human actions using a photograph. 

Seven actions/ classes:
- Interacting with computer, 
- Photographing, 
- Playing Instrument, 
- Riding Bike, 
- Riding Horse,
- Running, 
- Walking

## Keywords
Convolutional Neural Networks (CNN), Keras, Image Classification

## Algorithm
1. Pre-processing: smoothing, resizing, normalisation using the Histogram Equalization (HE) technique.
2. Features extraction: using the Histogram of Oriented Gradient (HOG), and Local Binary Pattern (LBP) feature extraction algorithms.
3. Model training: comparison of multiple models (KNN, SVM) using the training set.
4. Model validation: cross-validation using the test set.

## Pre-processing

In [17]:
from tensorflow.keras.utils import image_dataset_from_directory

image_size = (500,500)
#image_size = (32,32)
batch_size = 1
class_names = [
    "Interacting with computer",
    "Photographing",
    "Playing Instrument",
    "Riding Bike",
    "Riding Horse",
    "Running",
    "Walking"
]

dataset_path = "./dataset/"

print("Training set:")
data_train = image_dataset_from_directory(
    directory=dataset_path + "TrainSet", 
    labels="inferred",
    label_mode="categorical",
    class_names=class_names,
    color_mode="rgb",
    batch_size=batch_size,
    image_size=image_size,
    crop_to_aspect_ratio=True
)

print("Test set:")
data_test = image_dataset_from_directory(
    directory=dataset_path + "TestSet", 
    labels="inferred",
    label_mode="categorical",
    class_names=class_names,
    color_mode="rgb",
    batch_size=batch_size,
    image_size=image_size,
    crop_to_aspect_ratio=True
)

Training set:
Found 420 files belonging to 7 classes.
Test set:
Found 140 files belonging to 7 classes.


## Model creation/ autoencoder

![imagen.png](attachment:imagen.png)

Let's create a CNN model with 3 encoding layers. We stack up convolutional layers followed by Max Pooling layers (reduce the spatial size, hence the number of parameters, hence the computation time and the chances of overfitting).

- Convolution: ![imagen-2.png](attachment:imagen-2.png)
    - N: size of input -> 500x500
    - F: filter/kernel size -> 32 filters of size 3x3
    - S: stride -> 1
    - P: padding -> 1 pixel
    - M: size of output -> same as input because of padding
- Max Pooling filter of size 2x2 and a stride of 2 -> reduce the size of input by half.
- Dropout layer to avoid overfitting.
- Fully connected/ Dense layer followed by softmax layer (because multiple classes).

In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten
from time import time
from datetime import timedelta

def createCNN(input_shape, nb_classes):
    # plain stack of layers with one input and one output
    model = Sequential() 
    nb_filters = 32
    filter_size = (3, 3)
    max_pooling_size = (2, 2)
    dropout_ratio = 0.25
    
    model.add(Conv2D(nb_filters, filter_size, padding='same', activation='relu', input_shape=input_shape))
    model.add(Conv2D(nb_filters, filter_size, activation='relu'))
    model.add(MaxPooling2D(pool_size=max_pooling_size))
    model.add(Dropout(dropout_ratio))

    # use two times more filters
    model.add(Conv2D(nb_filters*2, filter_size, padding='same', activation='relu'))
    model.add(Conv2D(nb_filters*2, filter_size, activation='relu'))
    model.add(MaxPooling2D(pool_size=max_pooling_size))
    model.add(Dropout(dropout_ratio))
    
    model.add(Conv2D(nb_filters*2, filter_size, padding='same', activation='relu'))
    model.add(Conv2D(nb_filters*2, filter_size, activation='relu'))
    model.add(MaxPooling2D(pool_size=max_pooling_size))
    model.add(Dropout(dropout_ratio))

    # convert dropout layer's output 3D array to 1D array
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.5))
    # classification over X classes
    model.add(Dense(nb_classes, activation='softmax'))

    return model

input_shape = data_train.element_spec[0].shape[1:]
nb_classes = data_train.element_spec[1].shape[1]
model = createCNN(input_shape, nb_classes)

Let's check the model summary.

In [19]:
X = data_train
y = []
time_start = time()
for element in X:
    y.append(model(element[0]))
    break
model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_24 (Conv2D)          (None, 350, 350, 32)      896       
                                                                 
 conv2d_25 (Conv2D)          (None, 348, 348, 32)      9248      
                                                                 
 max_pooling2d_12 (MaxPoolin  (None, 174, 174, 32)     0         
 g2D)                                                            
                                                                 
 dropout_16 (Dropout)        (None, 174, 174, 32)      0         
                                                                 
 conv2d_26 (Conv2D)          (None, 174, 174, 64)      18496     
                                                                 
 conv2d_27 (Conv2D)          (None, 172, 172, 64)      36928     
                                                      

## Model Training

Let's start by defining an optimizer, a loss function and metrics to keep track of.

- Optimizer: RMSProp
- Loss: categorical cross-entropy because this a (7 class) classification problem
- Metrics: categorical accuracy

In [22]:
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import CategoricalAccuracy

# Optimizer
optimizer = RMSprop()
# Loss fuction to minimize
loss = CategoricalCrossentropy()
# Metrics to monitor
metrics = CategoricalAccuracy()

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

16

Then we define the training parameters: number of epochs (50 to 100) and the batch size. The test data is used for validation.

In [23]:
batch_size = 1
epochs = 100
time_start = time()
history = model.fit(data_train, epochs=epochs, 
          batch_size=batch_size, verbose=1, 
          validation_data=data_test)
elapsed_time = time() - time_start
print(f"Training time: {timedelta(seconds=elapsed_time)}. ")

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 1:43:03.908446 seconds. 


## Model validation

Let's evaluate our model and check its accuracy using cross-validation over the test set.

In [129]:
time_start = time()
model.evaluate(data_test)
elapsed_time = time() - time_start
print(f"Testing time: {timedelta(seconds=elapsed_time)}. ")

Testing time: 0.4710991382598877 seconds. 


### Loss and accuracy


In [2]:
import matplotlib.pyplot as plt

# Accuracy curve
plt.figure(figsize=[8,6])
plt.plot(history.history['accuracy'],'r',linewidth=3.0)
plt.plot(history.history['val_accuracy'],'b',linewidth=3.0)
plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18)
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Accuracy',fontsize=16)
plt.title('Accuracy Curves',fontsize=16)

# Loss curve
plt.figure(figsize=[8,6])
plt.plot(history.history['loss'],'r',linewidth=3.0)
plt.plot(history.history['val_loss'],'b',linewidth=3.0)
plt.legend(['Training loss', 'Validation Loss'],fontsize=18)
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.title('Loss Curves',fontsize=16)

NameError: name 'history' is not defined

<Figure size 576x432 with 0 Axes>