<h1> Accident Detection From CCTV Footage </h1>


## Overview
### Baseline paper :
Large Multi-Modal Foundation Model for Traffic Accident Analysis [https://arxiv.org/pdf/2401.03040 ]
‚Å†LLM Multimodal Traffic Accident Forecasting [ https://www.mdpi.com/1424-8220/23/22/9225 ]

## Brief Overview:
Multi-Modal Traffic Accident Analysis for Safer Roads:- Develop an innovative model to analyze diverse traffic data, uncover accident root causes, and proactively suggest preventive solutions.

## Description
### The Challenge
Traffic accidents remain a persistent global threat despite extensive safety efforts.
Traditional models often centre on single data sources, failing to capture the complex interplay of factors contributing to accidents.
A holistic, multi-modal approach is needed to understand and mitigate traffic risks effectively.

## The Task
Construct a model that seamlessly integrates and analyzes data from various sources:

1. Vehicular data (speed, GPS, sensor readings)
2. Pedestrian behavior (movement patterns, crossings)
3. CCTV footage (traffic flow, potential incidents)
4. Weather conditions (visibility, precipitation)
5. Road infrastructure (layout, signage, condition)

The model's insights should pinpoint the leading causes of accidents and inform potential preventive measures.

Given a dataset containing multimodal information such as images, videos, and textual descriptions of road scenes, the goal is to develop a robust accident detection system using a Multimodal Language Model (LLM). The system should accurately classify each scene into one of two categories: "no accident" (label 0) or "accident" (label 1).


## Scoring
Data preparation - 20% <br>
Evaluation - 20 % <br>
Plots - 20 % <br>
Model finetuning - 40% <br>

<h1>1. Loading Data</h1>

In [13]:
import warnings
warnings.filterwarnings("ignore")

import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

In [14]:
training_data_dir = os.path.join("/kaggle/input/accident-detection-from-cctv-footage/data/train")
training_data_dir = os.path.join("data/train")
training_data = tf.keras.utils.image_dataset_from_directory(
                            training_data_dir,image_size=(256, 256),
                            seed = 42
                            )

Found 791 files belonging to 2 classes.


In [15]:
#  iterator extracts in each batch of 32 images 
training_data_iterator = training_data.as_numpy_iterator()
training_batch = training_data_iterator.next()

<h1>2. Preprocessing Data </h1>

In [16]:
# Normalizing rgb pixels value between between 0 & 1 
training_data = training_data.map(lambda x,y: (x/255, y))
training_batch = training_data.as_numpy_iterator().next()

# Sanity Check pixel min/max pixel values after normalization
print("Max pixel value : ",training_batch[0].max())
print("Min pixel value : ",training_batch[0].min())

Max pixel value :  1.0
Min pixel value :  0.0


<h2>Loading Validation data for Hyper-parameter Turing</h2>

In [17]:
validation_data_dir = os.path.join("/kaggle/input/accident-detection-from-cctv-footage/data/val")
validation_data_dir = os.path.join("data/val")

validation_data = tf.keras.utils.image_dataset_from_directory(validation_data_dir)
validation_data_iterator = validation_data.as_numpy_iterator()
validation_batch = validation_data_iterator.next()

Found 98 files belonging to 2 classes.


In [18]:
# Normalizing Validation data
validation_data = validation_data.map(lambda x,y: (x/255, y))
validation_batch = validation_data.as_numpy_iterator().next()

# Sanity Check pixel min/max pixel values after normalization
print("Max pixel value : ",validation_batch[0].max())
print("Min pixel value : ",validation_batch[0].min())

Max pixel value :  1.0
Min pixel value :  0.0


<h1> 3. Building CNN Architecture  </h1>


In [19]:
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Add, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam

In [26]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from transformers import AutoModel

class CustomModel(nn.Module):
    def __init__(self):
        super(CustomModel, self).__init__()
        # Load the transformer-based model
        self.text_model = AutoModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
        
        # Freeze the transformer-based model
        for param in self.text_model.parameters():
            param.requires_grad = False
        
        # CNN layers for image processing
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(32, 16, kernel_size=3, padding=1)
        
        # Fully connected layers for final classification
        self.fc1 = nn.Linear(16 * 64 * 64 + 768, 128)  # Concatenate image and text embeddings
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(128, 1)

    def forward(self, images, text_input):
        # Process image through CNN layers
        x = self.pool(F.relu(self.conv1(images)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 16 * 64 * 64)  # Adjust the size here
        
        # Process text through transformer-based model
        text_outputs = self.text_model(**text_input).last_hidden_state[:, 0, :]  # Use the CLS token
        
        # Concatenate image and text embeddings
        concatenated_features = torch.cat((x, text_outputs), dim=1)
        
        # Fully connected layers for final classification
        x = F.relu(self.fc1(concatenated_features))
        x = self.dropout(x)
        x = torch.sigmoid(self.fc2(x))
        
        return x

# Create an instance of the custom model
model = CustomModel()
for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values: {param}")


ImportError: 
AutoModel requires the PyTorch library but it was not found in your environment.
However, we were able to find a TensorFlow installation. TensorFlow classes begin
with "TF", but are otherwise identically named to our PyTorch classes. This
means that the TF equivalent of the class you tried to import would be "TFAutoModel".
If you want to use TensorFlow, please use TF classes instead!

If you really do want to use PyTorch please go to
https://pytorch.org/get-started/locally/ and follow the instructions that
match your environment.


In [27]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from transformers import AutoModel

class CustomModel(nn.Module):
    def __init__(self, image_embedding_dim=768):
        super(CustomModel, self).__init__()
        
        # Load the pre-trained image-to-text model
        self.img_to_text_model = AutoModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
        
        # Freeze the image-to-text model
        for param in self.img_to_text_model.parameters():
            param.requires_grad = False
        
        # Define the CNN architecture for image feature extraction
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
        
        # Define the fully connected layer
        self.fc1 = nn.Linear(in_features=32 * 64 * 64 + image_embedding_dim, out_features=128)
        self.fc2 = nn.Linear(in_features=128, out_features=1)
        
    def forward(self, images, text_input):
        # Process images through CNN
        x = self.pool(F.relu(self.conv1(images)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 32 * 64 * 64)  # Flatten
        
        # Process text through the image-to-text model
        text_outputs = self.img_to_text_model(**text_input).last_hidden_state[:, 0, :]  # Use the CLS token
        
        # Concatenate image and text embeddings
        x = torch.cat((x, text_outputs), dim=1)
        
        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = torch.sigmoid(x)
        
        return x

# Create an instance of the custom model
model = CustomModel()


ImportError: 
AutoModel requires the PyTorch library but it was not found in your environment.
However, we were able to find a TensorFlow installation. TensorFlow classes begin
with "TF", but are otherwise identically named to our PyTorch classes. This
means that the TF equivalent of the class you tried to import would be "TFAutoModel".
If you want to use TensorFlow, please use TF classes instead!

If you really do want to use PyTorch please go to
https://pytorch.org/get-started/locally/ and follow the instructions that
match your environment.


In [None]:
# # Define input layer
# inputs = Input(shape=(256, 256, 3))

# # First Convolutional Block
# x = Conv2D(16, (3,3), 1, activation='relu', padding='same')(inputs)
# x = MaxPooling2D()(x)

# # Second Convolutional Block with residual connection
# conv1 = Conv2D(32, (3,3), 1, activation='relu', padding='same')(x)
# conv2 = Conv2D(32, (3,3), 1, activation='relu', padding='same')(conv1)
# # Adding convolutional layer to match the number of channels
# residual = Conv2D(32, (1, 1), strides=(1, 1), padding='same')(x)
# residual = Add()([residual, conv2])
# x = MaxPooling2D()(residual)

# # Third Convolutional Block with residual connection
# conv3 = Conv2D(16, (3,3), 1, activation='relu', padding='same')(x)
# conv4 = Conv2D(16, (3,3), 1, activation='relu', padding='same')(conv3)
# # Adding convolutional layer to match the number of channels
# residual = Conv2D(16, (1, 1), strides=(1, 1), padding='same')(x)
# residual = Add()([residual, conv4])
# x = MaxPooling2D()(residual)

# # Flatten layer
# x = Flatten()(x)

# # Fully connected layers
# x = Dense(256, activation='relu')(x)
# outputs = Dense(1, activation='sigmoid')(x)

# model = Model(inputs = inputs, outputs=outputs)

In [None]:
# model = Sequential()

# model.add(Conv2D(16, (3,3), 1, activation='relu', input_shape=(256,256,3)))
# model.add(MaxPooling2D())
# model.add(Conv2D(32, (3,3), 1, activation='relu'))
# model.add(MaxPooling2D())
# model.add(Conv2D(16, (3,3), 1, activation='relu'))
# model.add(MaxPooling2D())
# model.add(Flatten())
# # Adding neural Layer
# model.add(Dense(256, activation='relu'))
# model.add(Dense(1, activation='sigmoid'))

In [None]:
learning_rate = 0.001 
optimizer = Adam(learning_rate=learning_rate)
model.compile(optimizer = optimizer, loss='binary_crossentropy', metrics = ['accuracy'])

In [None]:
model.summary()

<h1> 4.  Training Convolutional Neural Network </h1>

In [None]:
# setting up for logging 
logdir='logs'
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
early_stopping_callback = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

In [None]:
bst_model = model.fit(training_data, epochs=50, validation_data=validation_data, callbacks=[early_stopping_callback])
bst_model = model.fit(training_data, epochs=50, validation_data=validation_data, callbacks=[tensorboard_callback])
model.save("/kaggle/working/accidents.keras")                                                     

In [None]:
bst_model.history['validation_accuracy'][-1]

<h2>5. Plotting Training Loss and Accuracy Curve with epochs</h2>

In [None]:
fig = plt.figure()
plt.plot(bst_model.history['loss'], color='red', label='training loss')
plt.plot(bst_model.history['validation_loss'], color='blue', label='validation_loss')
fig.suptitle('Loss', fontsize=20)
plt.legend(loc="upper left")
plt.xlabel("Epoch")
plt.ylabel("loss")
plt.show()

In [None]:
fig = plt.figure()
plt.plot(bst_model.history['accuracy'], color='red', label='training accuracy')
plt.plot(bst_model.history['validation_accuracy'], color='blue', label='validation_accuracy')
fig.suptitle('Accuracy', fontsize=20)
plt.legend(loc="upper left")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.show()

# 6. Evaluation
Model performance will be measured by its F1 scores in predicting and analyzing actual traffic accidents.
Solutions offering actionable insights and demonstrable potential to reduce accident frequency and impact will be favoured.

In [None]:
test_data_dir = os.path.join("/kaggle/input/accident-detection-from-cctv-footage/data/test")
test_data = tf.keras.utils.image_dataset_from_directory(test_data_dir)
test_data_iterator = test_data.as_numpy_iterator()
test_batch = test_data_iterator.next()

In [None]:
from sklearn.metrics import confusion_matrix, classification_report, f1_score, recall_score, precision_score

y_true = []  # true labels
y_pred = []  # predicted labels

for batch in test_data:
    X, y = batch
    yhat = model.predict(X)
    y_pred.extend(yhat.flatten().round().astype(int))
    y_true.extend(y.flatten().astype(int))
    
# Calculate evaluation metrics
f1 = f1_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)

# Print evaluation metrics and confusion matrix
print("Precision:", precision)
print("Recall:", recall)
print("F1 score:", f1)
print("Confusion Matrix:")
print(cm)

# Generate classification report
target_names = ['Not Accident', 'Accident']
classification_rep = classification_report(y_true, y_pred, target_names=target_names)
print("Classification Report:")
print(classification_rep)

In [None]:
def F1_score(precision, recall):
    return (2*precision*recall)/(precision+recall)

pre = tf.keras.metrics.Precision()
re = tf.keras.metrics.Recall()

for batch in test_data:
    X, y = batch
    yhat = model.predict(X)
    pre.update_state(y, yhat)
    re.update_state(y, yhat)

print("Model achieved an precision score of {:5f}".format(pre.result()))
print("Model achieved an recall score of {:5f}".format(re.result()))

f1_score = F1_score(pre.result(), re.result())
print("Model achieved an F1-score of {:5f}".format(f1_score))

<h1> 7.Test just to see model working </h1>

In [None]:
import cv2

# load random samples from samples directory
random_data_dirname = os.path.join("/kaggle/input/accident-detection-from-cctv-footage/data/test/Accident")
pics = [os.path.join(random_data_dirname, filename) for filename in os.listdir(random_data_dirname)]

# load first file from samples
sample = cv2.imread(pics[1], cv2.IMREAD_COLOR)
sample = cv2.resize(sample, (256, 256))

prediction = 1 - model.predict(np.expand_dims(sample/255, 0))

if prediction >= 0.5: 
    label = 'Predicted class is Accident'
else:
    label = 'Predicted class is Not Accident'

plt.title(label)
plt.imshow(sample)
plt.show()

### Create CSV Files for Submission

In [None]:
import cv2
import pandas as pd

# load random samples from samples directory
test_data_dirname = os.path.join("/kaggle/input/accident-detection-from-cctv-footage/data/test")
pics = [os.path.join(test_data_dirname, filename) for filename in os.listdir(test_data_dirname)]


filenames = []
predictions = []

for dirname in os.listdir(test_data_dirname):
    for filename in os.listdir(os.path.join(test_data_dirname, dirname)):
        if not filename.endswith(".jpg"):
            continue
        filepath = os.path.join(test_data_dirname, dirname, filename)
        
        # load first file from samples
        sample = cv2.imread(filepath, cv2.IMREAD_COLOR)
        sample = cv2.resize(sample, (256, 256))
        
        # predict using model
        prediction = 1 - model.predict(np.expand_dims(sample/255, 0))
        # done because when we loaded data by default 0 label is given to first folder
        # which is Accident but we want just opposite labels
        # 0: Accident, 1: Not Accident
        
        filenames.append(filename)
        
        output = 1 if float(prediction[0][0]) >= 0.5 else 0
        predictions.append(output)

df = pd.DataFrame(columns=["ID", "Column ID"])
df["ID"] = filenames
df["Column ID"] = predictions
df.to_csv("/kaggle/working/submission.csv",index=False)