<a href="https://colab.research.google.com/github/carmenbarriga/Violence-Detection-in-Videos-with-Transformers/blob/main/Transformers/ViViT/ViolenceInMovies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Video Vision Transformer (ViViT) for Violence Detection**
@InProceedings{arnab2021vivit,
  title={ViViT: A Video Vision Transformer},
  author={Arnab, Anurag and Dehghani, Mostafa and Heigold, Georg and Sun, Chen and Lu{\v{c}}i{\'c}, Mario and Schmid, Cordelia},
  booktitle={International Conference on Computer Vision (ICCV)},
  year={2021}
}

## **1.- Installation of the necessary libraries**

*   **Einops:** Library that allows to perform tensor operations

In [1]:
! pip install einops



## **2.- Mount Google Drive**
Mount Google Drive to be able to access Google Drive files and directories

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## **3.- Import the necessary libraries**

In [3]:
import copy
import cv2
import math
import numpy as np
import os
import pandas as pd
import time
import torch

from einops import rearrange, repeat
from einops.layers.torch import Rearrange
from skimage.transform import resize
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
from sklearn import model_selection
from torch import nn
from torch.utils.data import DataLoader, Dataset
from torch.optim import lr_scheduler
from tqdm.notebook import tqdm

## **4.- Make some initial configurations**
The function `seed_everything` is used to set seeds across various libraries and environments in Python to ensure reproducibility of results. Seed 1001 will be used.

In [4]:
def seed_everything(seed):
  os.environ["PYTHONHASHSEED"] = str(seed)
  # Sets the seed for the numpy library's random number generator
  np.random.seed(seed)
  # Sets the seed for the torch library's random number generator (PyTorch) for both the CPU and GPU
  torch.manual_seed(seed)
  torch.cuda.manual_seed(seed)
  # To ensure that calculations performed with the torch library on the GPU are deterministic
  torch.backends.cudnn.deterministic = True
  # Turn off automatic benchmarking and default settings are used to ensure more stable and predictable execution
  torch.backends.cudnn.benchmark = False

seed_everything(1001)

Releases the GPU cache used by PyTorch and displays the current Pytorch version

In [5]:
torch.cuda.empty_cache()
torch.__version__

'2.0.1+cu118'

To determine on which device the PyTorch computations will be executed, either on a GPU (CUDA) or on the CPU

In [6]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

'cuda'

## **5.- Prepare the data**

Set **Violence in Movies** dataset folder path


In [7]:
violence_in_movies_folder = '/content/drive/MyDrive/transformers-for-violence-detection-in-videos/Datasets/Violence in Movies/'
violence_in_movies_weights_dir = '/content/drive/MyDrive/transformers-for-violence-detection-in-videos/ViViT/Weights/violence_in_movies_best_model_weights.pth'
violence_in_movies_results_folder = '/content/drive/MyDrive/transformers-for-violence-detection-in-videos/ViViT/Results/'

Get the data for training and testing:
*   80% train
*   20% test

In [8]:
train_data = pd.read_csv('/content/drive/MyDrive/transformers-for-violence-detection-in-videos/Data Divisions/Violence in Movies/train.csv')
test_data = pd.read_csv('/content/drive/MyDrive/transformers-for-violence-detection-in-videos/Data Divisions/Violence in Movies/test.csv')

Show train data information

In [9]:
print('Train data shape: ', train_data.shape)
print('Number of violence videos in train data: ', train_data['label'].value_counts()[1])
print('Number of non violence videos in train data: ', train_data['label'].value_counts()[0])

Train data shape:  (160, 2)
Number of violence videos in train data:  79
Number of non violence videos in train data:  81


Show test data information

In [10]:
print('Test data shape: ', test_data.shape)
print('Number of violence videos in test data: ', test_data['label'].value_counts()[1])
print('Number of non violence videos in test data: ', test_data['label'].value_counts()[0])

Test data shape:  (40, 2)
Number of violence videos in test data:  21
Number of non violence videos in test data:  19


Defining some video properties

In [11]:
time_steps = 32   # Number of frames of each video
color_channels = 3  # Number of color channels
height = 128  # Height of each frame
width = 128   # Width of each frame

Class to perform the preprocessing of the videos. Videos that contain a greater number of frames than the amount passed to the class will be cut. The videos that contain less than the average amount will be completed with zeros until reaching the average.

In [12]:
def capture(filename, time_steps, color_channels, height, width):
  # Create an array to store the video frames after being processed
  frames = np.zeros((time_steps, color_channels, height, width), dtype=float)
  # VideoCapture object to open and read the video
  video_capture = cv2.VideoCapture(filename)
  # To check if the VideoCapture object was able to open the video
  if video_capture.isOpened():
    # To keep track of how many frames have been stored in the frames array
    frames_counter = 0
    while frames_counter < time_steps:
      # Read the next frame
      is_frame_read, frame = video_capture.read()
      # Check if there are no more frames available
      if not is_frame_read:
        break
      # Resize the original frame to the specified dimensions (height, width, color_channels) keeping its original aspect ratio
      frame = resize(frame, (height, width, color_channels))
      # To add an extra dimension (1, height, width, color_channels)
      frame = np.expand_dims(frame, axis=0)
      # Moves axis -1 (last axis) to index 1 (1, color_channels, height, width)
      frame = np.moveaxis(frame, -1, 1)
      # Normalization of the pixel values of the frame (if necessary)
      if np.max(frame) > 1:
        frame = frame / 255.0
      # Store the processed frame in the corresponding position within the frames array
      frames[frames_counter][:] = frame
      frames_counter += 1

    del frame
    del is_frame_read
  frames = np.moveaxis(frames, 1, 0)  # [channels, frames, height, width]

  return frames


class TaskDataset(Dataset):
  def __init__(self, data, time_steps=40, color_channels=3, height=256, width=256):
    # data is a pandas dataframe that contains the paths to the video files with their labels
    self.data_locations = data
    self.time_steps, self.color_channels, self.height, self.width = time_steps, color_channels, height, width

  def __len__(self):
    return len(self.data_locations)

  def __getitem__(self, idx):
    if torch.is_tensor(idx):
      idx = idx.tolist()
    # To process the video and get its frames
    video = capture(self.data_locations.iloc[idx, 0], self.time_steps, self.color_channels, self.height, self.width)
    # Dictionary containing the processed video, its corresponding label and its path
    sample = {
      'video': torch.from_numpy(video),
      'label': torch.from_numpy(np.asarray(self.data_locations.iloc[idx, 1])),
      'path': self.data_locations.iloc[idx, 0]
    }

    return sample

Passing the training data to the TaskDataset class

In [13]:
train_dataset = TaskDataset(
  data=train_data, time_steps=time_steps, color_channels=color_channels, height=height, width=width
)

Passing the test data to the TaskDataset class

In [14]:
test_dataset = TaskDataset(
  data=test_data, time_steps=time_steps, color_channels=color_channels, height=height, width=width
)

Defining the train batch size

In [15]:
BATCH_SIZE = 16

Creating a `DataLoader` to load data in batches during training

In [16]:
train_loader = DataLoader(
  dataset=train_dataset,
  batch_size=BATCH_SIZE,
  pin_memory=True,
  drop_last=True,
  num_workers=0,
  shuffle=True
)

Creating a `DataLoader` to load data in batches during test

In [17]:
TEST_BATCH_SIZE = 10

In [18]:
test_loader = DataLoader(
  dataset=test_dataset,
  batch_size=TEST_BATCH_SIZE,
  pin_memory=True,
  drop_last=True,
  num_workers=0,
  shuffle=False
)

Putting the `DataLoaders` in the `dataloaders` dictionary and their sizes in the `dataset_sizes` dictionary

In [19]:
dataloaders = {'train': train_loader, 'test': test_loader}
dataset_sizes = {'train': len(train_dataset), 'test': len(test_dataset)}
print(dataloaders)
print(dataset_sizes)

{'train': <torch.utils.data.dataloader.DataLoader object at 0x7f48577c0be0>, 'test': <torch.utils.data.dataloader.DataLoader object at 0x7f48577c2aa0>}
{'train': 160, 'test': 40}


To realease the memory because `data`, `train_data` and `test_data` are no longer needed

In [20]:
del train_data
del test_data

## **6.- ViViT**

In [21]:
class PreNorm(nn.Module):
  def __init__(self, dimension, fn):
    super(PreNorm, self).__init__()
    self.norm = nn.LayerNorm(dimension)
    self.fn = fn

  def forward(self, x, **kwargs):
    return self.fn(self.norm(x), **kwargs)

In [22]:
class Attention(nn.Module):
  def __init__(self, dimension, heads=8, head_dimension=64, dropout=0.):
    super(Attention, self).__init__()
    inner_dim = head_dimension * heads
    project_out = not (heads == 1 and head_dimension == dimension)

    self.heads = heads
    self.scale = head_dimension ** -0.5

    self.attend = nn.Softmax(dim=-1)
    self.dropout = nn.Dropout(dropout)

    self.to_qkv = nn.Linear(dimension, inner_dim * 3, bias=False)

    self.to_out = nn.Sequential(
      nn.Linear(inner_dim, dimension),
      nn.Dropout(dropout)
    ) if project_out else nn.Identity()

  def forward(self, x):
    qkv = self.to_qkv(x).chunk(3, dim=-1)
    q, k, v = map(lambda t: rearrange(
      t, 'b n (h d) -> b h n d', h=self.heads), qkv)

    dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale

    attn = self.attend(dots)
    attn = self.dropout(attn)

    out = torch.matmul(attn, v)
    out = rearrange(out, 'b h n d -> b n (h d)')

    return self.to_out(out)

In [23]:
class FeedForward(nn.Module):
  def __init__(self, dimension, hidden_dimension, dropout=0.):
    super(FeedForward, self).__init__()
    self.network = nn.Sequential(
      nn.Linear(dimension, hidden_dimension),
      nn.GELU(),
      nn.Dropout(dropout),
      nn.Linear(hidden_dimension, dimension),
      nn.Dropout(dropout)
    )

  def forward(self, x):
    return self.network(x)

In [24]:
class Transformer(nn.Module):
  def __init__(self, dimension, layers, heads, head_dimension, mlp_dimension, dropout=0.):
    super(Transformer, self).__init__()
    self.layers = nn.ModuleList([])
    for _ in range(layers):
      self.layers.append(nn.ModuleList([
        PreNorm(dimension, Attention(dimension, heads=heads,
          head_dimension=head_dimension, dropout=dropout)),
        PreNorm(dimension, FeedForward(
          dimension, mlp_dimension, dropout=dropout))
      ]))

  def forward(self, x):
    for attn, ff in self.layers:
      x = attn(x) + x
      x = ff(x) + x
    return x

In [25]:
class ViViT(nn.Module):
  def __init__(
    self,
    height,
    width,
    frames,
    patch_height,
    patch_width,
    patch_frame,
    number_classes,
    dimension,
    layers=4,
    heads=3,
    in_channels=3,
    head_dimension=64,
    dropout=0.,
    embedding_dropout=0.,
    mlp_dimension=4
  ):
    super(ViViT, self).__init__()

    assert height % patch_height == 0 and width % patch_width == 0, 'Image dimensions must be divisible by the patch size'
    assert frames % patch_frame == 0, 'Frames must be divisible by frame patch size'

    number_image_patches = (height // patch_height) * \
      (width // patch_width)
    number_frame_patches = (frames // patch_frame)

    patch_dimension = in_channels * patch_height * patch_width * patch_frame

    self.patch_embedding = nn.Sequential(
      Rearrange('b c (f pf) (h p1) (w p2) -> b f (h w) (p1 p2 pf c)',
        p1=patch_height, p2=patch_width, pf=patch_frame),
      nn.LayerNorm(patch_dimension),
      nn.Linear(patch_dimension, dimension),
      nn.LayerNorm(dimension)
    )

    self.pos_embedding = nn.Parameter(torch.randn(
      1, number_frame_patches, number_image_patches, dimension))
    self.dropout = nn.Dropout(embedding_dropout)

    self.spatial_cls_token = nn.Parameter(torch.randn(1, 1, dimension))
    self.spatial_transformer = Transformer(
      dimension, layers, heads, head_dimension, mlp_dimension, dropout)

    self.temporal_cls_token = nn.Parameter(torch.randn(1, 1, dimension))
    self.temporal_transformer = Transformer(
      dimension, layers, heads, head_dimension, mlp_dimension, dropout)

    self.to_latent = nn.Identity()

    self.mlp_head = nn.Sequential(
      nn.LayerNorm(dimension),
      nn.Linear(dimension, number_classes)
    )

  def forward(self, x):
    x = self.patch_embedding(x)
    b, f, n, _ = x.shape

    x = x + self.pos_embedding[:, :f, :n]

    spatial_cls_tokens = repeat(
      self.spatial_cls_token, '1 1 d -> b f 1 d', b=b, f=f)
    x = torch.cat((spatial_cls_tokens, x), dim=2)

    x = self.dropout(x)

    x = rearrange(x, 'b f n d -> (b f) n d')

    x = self.spatial_transformer(x)
    x = rearrange(x, '(b f) n d -> b f n d', b=b)
    x = x[:, :, 0]

    temporal_cls_tokens = repeat(
      self.temporal_cls_token, '1 1 d-> b 1 d', b=b)
    x = torch.cat((temporal_cls_tokens, x), dim=1)

    x = self.temporal_transformer(x)
    x = x[:, 0]

    x = self.to_latent(x)

    return self.mlp_head(x)

## **7.- Training**

In [26]:
def train_model(model, criterion, optimizer, scheduler, device='cuda', num_epochs=7):
  model.to(device)

  # Start the training time
  since = time.time()

  # Save the best loss value during model training
  best_loss = float('inf')

  # Create a copy of the current model weights
  best_model_weights = copy.deepcopy(model.state_dict())

  for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch + 1, num_epochs))
    print('-' * 10)

    model.train()
    running_loss = 0.0
    correct_predictions_counter = 0

    # To create a progress bar to iterate over the 'train' dataloader using the tqdm library
    progress_bar = tqdm(dataloaders['train'], total=int(len(dataloaders['train'])))

    for batch, sample in enumerate(progress_bar):
      # Get the videos and labels and move them to the corresponding device memory
      inputs = sample['video'].to(device, dtype=torch.float)  # [batch_size, color_channels, time_steps, height, width]
      labels = sample['label'].view(sample['label'].shape[0], 1).to(device, dtype=torch.float)  # [batch_size] -> [batch_size, 1]

      # To clean up the accumulated gradients and ensure that the gradients are calculated correctly
      # for the current batch during backpropagation and updating of the weights
      optimizer.zero_grad()

      # Get the outputs predicted by the model
      outputs = model(inputs)

      # Calculate the loss with the function specified in the criterion variable
      loss = criterion(outputs, labels)

      # Computes the gradients of all model parameters with respect to the loss function
      loss.backward()

      # Update model parameters based on gradients computed during backpropagation
      optimizer.step()

      # To get the total loss of the current batch:
      #   - loss.item() is the scalar value of the current batch loss
      #   - inputs.size(0) gets the batch size
      running_loss += loss.item() * inputs.size(0)

      # Apply a sigmoid activation function to the outputs to obtain the predictions
      # and round the predictions to be binary (0 or 1)
      predictions = torch.round(torch.sigmoid(outputs))

      # Adds the number of correct predictions in the current batch to the accumulated correct predictions counter
      correct_predictions_counter += torch.sum(predictions == labels.data)

    # Calculates the average loss for each epoch
    epoch_loss = running_loss / dataset_sizes['train']
    # Calculates the accuracy for each epoch
    epoch_accuracy = correct_predictions_counter.double() / dataset_sizes['train']
    print('Train Loss: {:.4f} Accuracy: {:.4f}'.format(epoch_loss, epoch_accuracy))

    # Updates the state of the optimizer based on the loss obtained in each training epoch
    scheduler.step(epoch_loss)

    # Stores the model weights that correspond to the best loss achieved so far
    if epoch_loss < best_loss:
      best_loss = epoch_loss
      best_model_weights = copy.deepcopy(model.state_dict())

  # End the training time
  time_elapsed = time.time() - since
  print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

  # The model is loaded with the weights corresponding to the best saved model
  model.load_state_dict(best_model_weights)

  return model

Initialize the model

In [27]:
model = ViViT(
  height=height,
  width=width,
  frames=time_steps,
  patch_height=8,
  patch_width=8,
  patch_frame=8,
  number_classes=1,
  dimension=128,
  layers=8,
  heads=8,
  in_channels=3,
  head_dimension=64,
  dropout=0.1,
  embedding_dropout=0.2,
  mlp_dimension=256
)
model

ViViT(
  (patch_embedding): Sequential(
    (0): Rearrange('b c (f pf) (h p1) (w p2) -> b f (h w) (p1 p2 pf c)', p1=8, p2=8, pf=8)
    (1): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
    (2): Linear(in_features=1536, out_features=128, bias=True)
    (3): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  )
  (dropout): Dropout(p=0.2, inplace=False)
  (spatial_transformer): Transformer(
    (layers): ModuleList(
      (0-7): 8 x ModuleList(
        (0): PreNorm(
          (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (fn): Attention(
            (attend): Softmax(dim=-1)
            (dropout): Dropout(p=0.1, inplace=False)
            (to_qkv): Linear(in_features=128, out_features=1536, bias=False)
            (to_out): Sequential(
              (0): Linear(in_features=512, out_features=128, bias=True)
              (1): Dropout(p=0.1, inplace=False)
            )
          )
        )
        (1): PreNorm(
          (norm): LayerNorm((128,), e

In [28]:
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, weight_decay=0.00001)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=2, verbose=True)
model = train_model(model, criterion, optimizer, scheduler, device=device, num_epochs=30)

Epoch 1/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.6840 Accuracy: 0.5813
Epoch 2/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.6573 Accuracy: 0.6313
Epoch 3/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.5999 Accuracy: 0.6313
Epoch 4/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.5489 Accuracy: 0.7250
Epoch 5/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.4842 Accuracy: 0.7750
Epoch 6/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.4243 Accuracy: 0.8250
Epoch 7/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.2724 Accuracy: 0.9000
Epoch 8/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.1273 Accuracy: 0.9438
Epoch 9/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0847 Accuracy: 0.9813
Epoch 10/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.1634 Accuracy: 0.9500
Epoch 11/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0997 Accuracy: 0.9750
Epoch 12/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.1032 Accuracy: 0.9563
Epoch 00012: reducing learning rate of group 0 to 5.0000e-05.
Epoch 13/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0777 Accuracy: 0.9750
Epoch 14/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0555 Accuracy: 0.9875
Epoch 15/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0384 Accuracy: 0.9938
Epoch 16/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0221 Accuracy: 0.9938
Epoch 17/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0156 Accuracy: 1.0000
Epoch 18/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0150 Accuracy: 0.9938
Epoch 19/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0080 Accuracy: 1.0000
Epoch 20/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0074 Accuracy: 1.0000
Epoch 21/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0057 Accuracy: 1.0000
Epoch 22/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0050 Accuracy: 1.0000
Epoch 23/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0047 Accuracy: 1.0000
Epoch 24/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0043 Accuracy: 1.0000
Epoch 25/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0040 Accuracy: 1.0000
Epoch 26/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0039 Accuracy: 1.0000
Epoch 27/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0036 Accuracy: 1.0000
Epoch 28/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0034 Accuracy: 1.0000
Epoch 29/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0033 Accuracy: 1.0000
Epoch 30/30
----------


  0%|          | 0/10 [00:00<?, ?it/s]

Train Loss: 0.0031 Accuracy: 1.0000
Training complete in 82m 6s


## **7.- Test**

In [29]:
def test_model(model, criterion, device='cuda'):
  model.to(device)

  # To start the evaluation time
  since = time.time()

  model.eval()

  running_loss = 0.0
  correct_predictions_counter = 0

  pred_vs_real = {}
  pred_vs_real['path']= []
  pred_vs_real['label']= []
  pred_vs_real['prediction']= []

  # To create a progress bar to iterate over the 'test' dataloader using the tqdm library
  progress_bar = tqdm(dataloaders['test'], total=int(len(dataloaders['test'])))

  processed_batch_counter = 0
  for batch, sample in enumerate(progress_bar):
    # Get the videos and labels and move them to the corresponding device memory
    inputs = sample['video'].to(device , dtype=torch.float)
    labels = sample['label'].view(sample['label'].shape[0], 1).to(device, dtype=torch.float)
    paths = sample['path']

    # Get the outputs predicted by the model
    outputs = model(inputs)

    # Apply a sigmoid activation function to the outputs to obtain the predictions
    # and round the predictions to be binary (0 or 1)
    predictions = torch.round(torch.sigmoid(outputs))

    # Add the predictions and labels to the dictionary pred_vs_real
    # converted to a numpy array and move them to CPU memory
    pred_vs_real['prediction'].extend(predictions.cpu().detach().numpy().flatten())
    pred_vs_real['label'].extend(labels.cpu().detach().numpy().flatten())
    pred_vs_real['path'].extend(list(paths))

    # Calculate the loss with the function specified in the criterion variable
    loss = criterion(outputs, labels)

    # To get the total loss of the current batch:
    #   - loss.item() is the scalar value of the current batch loss
    #   - inputs.size(0) gets the batch size
    running_loss += loss.item() * inputs.size(0)
    # Adds the number of correct predictions in the current batch to the accumulated correct predictions counter
    correct_predictions_counter += torch.sum(predictions == labels.data)

    # Updates the progress message in the progress_bar iterator showing the average loss
    # To do this, divide the accumulated loss by the total number of samples processed so far
    processed_batch_counter += 1
    progress_bar.set_postfix(loss=(running_loss / (processed_batch_counter * dataloaders['test'].batch_size)))

  final_loss = running_loss / dataset_sizes['test']
  accuracy = correct_predictions_counter.double() / dataset_sizes['test']
  precision = precision_score(pred_vs_real['label'], pred_vs_real['prediction'])
  recall = recall_score(pred_vs_real['label'], pred_vs_real['prediction'])
  f1 = f1_score(pred_vs_real['label'], pred_vs_real['prediction'])
  print('{} Loss: {:.4f} Accuracy: {:.4f} Precision: {:.4f} Recall: {:.4f} F1 Score: {:.4f}'.format('Test', final_loss, accuracy, precision, recall, f1))

  # Calculate and print the confusion matrix
  confusion = confusion_matrix(pred_vs_real['label'], pred_vs_real['prediction'])
  print("Confusion Matrix:")
  print(confusion)

  time_elapsed = time.time() - since
  print('Testing complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

  return pred_vs_real

In [30]:
pred_vs_real = test_model(model, criterion, device)

  0%|          | 0/4 [00:00<?, ?it/s]

Test Loss: 0.5185 Accuracy: 0.9000 Precision: 1.0000 Recall: 0.8095 F1 Score: 0.8947
Confusion Matrix:
[[19  0]
 [ 4 17]]
Testing complete in 0m 57s


Save model test results in a CSV file

In [31]:
# Create a DataFrame with the data from pred_vs_real
pred_vs_real_dataframe = pd.DataFrame({'path': pred_vs_real['path'], 'label': pred_vs_real['label'], 'prediction': pred_vs_real['prediction']})

# Save the DataFrame to a CSV file
pred_vs_real_dataframe.to_csv(violence_in_movies_results_folder + 'violence_in_movies_results.csv', index=False)

In [32]:
# Save the weights
model_weights = copy.deepcopy(model.state_dict())
torch.save(model_weights, violence_in_movies_weights_dir)