<h1> Accident Detection From CCTV Footage </h1>


## 0. Overview
### Baseline paper :
Large Multi-Modal Foundation Model for Traffic Accident Analysis [https://arxiv.org/pdf/2401.03040 ]
⁠LLM Multimodal Traffic Accident Forecasting [ https://www.mdpi.com/1424-8220/23/22/9225 ]

## Brief Overview:
Multi-Modal Traffic Accident Analysis for Safer Roads:- Develop an innovative model to analyze diverse traffic data, uncover accident root causes, and proactively suggest preventive solutions.

## Description
### The Challenge
Traffic accidents remain a persistent global threat despite extensive safety efforts.
Traditional models often centre on single data sources, failing to capture the complex interplay of factors contributing to accidents.
A holistic, multi-modal approach is needed to understand and mitigate traffic risks effectively.

## The Task
Construct a model that seamlessly integrates and analyzes data from various sources:

1. Vehicular data (speed, GPS, sensor readings)
2. Pedestrian behavior (movement patterns, crossings)
3. CCTV footage (traffic flow, potential incidents)
4. Weather conditions (visibility, precipitation)
5. Road infrastructure (layout, signage, condition)

The model's insights should pinpoint the leading causes of accidents and inform potential preventive measures.

Given a dataset containing multimodal information such as images, videos, and textual descriptions of road scenes, the goal is to develop a robust accident detection system using a Multimodal Language Model (LLM). The system should accurately classify each scene into one of two categories: "no accident" (label 0) or "accident" (label 1).


## Scoring
Data preparation - 20% <br>
Evaluation - 20 % <br>
Plots - 20 % <br>
Model finetuning - 40% <br>

<h1>1. Loading Data</h1>

In [210]:
import warnings
warnings.filterwarnings("ignore")

import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

In [211]:
training_data_dir = os.path.join("/kaggle/input/accident-detection-from-cctv-footage/data/train")
training_data = tf.keras.utils.image_dataset_from_directory(
                            training_data_dir,image_size=(256, 256),
                            seed = 42
                            )

Found 791 files belonging to 2 classes.


In [None]:
#  iterator extracts in each batch of 32 images 
training_data_iterator = training_data.as_numpy_iterator()
training_batch = training_data_iterator.next()

<h1>2. Preprocessing Data </h1>

In [None]:
# Normalizing rgb pixels value between between 0 & 1 
training_data = training_data.map(lambda x,y: (x/255, y))
training_batch = training_data.as_numpy_iterator().next()

# Sanity Check pixel min/max pixel values after normalization
print("Max pixel value : ",training_batch[0].max())
print("Min pixel value : ",training_batch[0].min())

<h2>Loading Validation data for Hyper-parameter Turing</h2>

In [None]:
validation_data_dir = os.path.join("/kaggle/input/accident-detection-from-cctv-footage/data/val")
validation_data = tf.keras.utils.image_dataset_from_directory(validation_data_dir)
validation_data_iterator = validation_data.as_numpy_iterator()
validation_batch = validation_data_iterator.next()

In [None]:
# Normalizing Validation data
validation_data = validation_data.map(lambda x,y: (x/255, y))
validation_batch = validation_data.as_numpy_iterator().next()

# Sanity Check pixel min/max pixel values after normalization
print("Max pixel value : ",validation_batch[0].max())
print("Min pixel value : ",validation_batch[0].min())

<h1> 3. Building CNN Architecture  </h1>


In [None]:
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Add, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam

#### Arch-1
Deep CNN With residual connections followed by classifying dense layer

In [None]:
# # Define input layer
# inputs = Input(shape=(256, 256, 3))

# # First Convolutional Block
# x = Conv2D(16, (3,3), 1, activation='relu', padding='same')(inputs)
# x = MaxPooling2D()(x)

# # Second Convolutional Block with residual connection
# conv1 = Conv2D(32, (3,3), 1, activation='relu', padding='same')(x)
# conv2 = Conv2D(32, (3,3), 1, activation='relu', padding='same')(conv1)
# # Adding convolutional layer to match the number of channels
# residual = Conv2D(32, (1, 1), strides=(1, 1), padding='same')(x)
# residual = Add()([residual, conv2])
# x = MaxPooling2D()(residual)

# # Third Convolutional Block with residual connection
# conv3 = Conv2D(16, (3,3), 1, activation='relu', padding='same')(x)
# conv4 = Conv2D(16, (3,3), 1, activation='relu', padding='same')(conv3)
# # Adding convolutional layer to match the number of channels
# residual = Conv2D(16, (1, 1), strides=(1, 1), padding='same')(x)
# residual = Add()([residual, conv4])
# x = MaxPooling2D()(residual)

# # Add another Convolutional Layer
# x = Conv2D(8, (3,3), 1, activation='relu', padding='same')(x)
# x = MaxPooling2D()(x)

# # Flatten layer
# x = Flatten()(x)

# # Fully connected layers
# x = Dense(32, activation='relu')(x)
# x = Dense(16, activation='relu')(x)
# x = Dense(8, activation='relu')(x)
# outputs = Dense(1, activation='sigmoid')(x)

# model = Model(inputs = inputs, outputs=outputs)

## Load older model

In [None]:
# from tensorflow.keras.models import load_model

# # Provide the path to the saved model
# # model_path = "/kaggle/working/accidents.keras"
# model_path = "model.keras"


# # Load the model
# loaded_model = load_model(model_path)
# model = loaded_model

In [None]:
model = Sequential()

model.add(Conv2D(16, (3,3), 1, activation='relu', input_shape=(256,256,3)))
model.add(MaxPooling2D())

model.add(Conv2D(32, (3,3), 1, activation='relu'))
model.add(MaxPooling2D())

model.add(Conv2D(16, (3,3), 1, activation='relu'))
model.add(MaxPooling2D())

model.add(Flatten())


# Adding neural Layer
model.add(Dense(256, activation='relu'))
# model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

## Arch - 2

Using residual connections in cnn 
And deeper dense layers

In [None]:
# from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense
# from tensorflow.keras.models import Model

# # Define input layer
# inputs = Input(shape=(256, 256, 3))

# # First Convolutional Block
# x = Conv2D(16, (3,3), 1, activation='relu', padding='same')(inputs)
# x = MaxPooling2D()(x)

# # Second Convolutional Block
# x = Conv2D(32, (3,3), 1, activation='relu', padding='same')(x)
# x = MaxPooling2D()(x)

# # Third Convolutional Block
# x = Conv2D(16, (3,3), 1, activation='relu', padding='same')(x)
# x = MaxPooling2D()(x)

# # Flatten layer
# x = Flatten()(x)

# # Dense layers
# x = Dense(256, activation='relu')(x)
# # x = Dense(64, activation='relu')(x)
# outputs = Dense(1, activation='sigmoid')(x)

# # Create the model
# model = Model(inputs=inputs, outputs=outputs)

In [None]:
learning_rate = 0.00003 
optimizer = Adam(learning_rate=learning_rate)
model.compile(optimizer = optimizer, loss='binary_crossentropy', metrics = ['accuracy'])

model.summary()

<h1> 4.  Training Convolutional Neural Network </h1>

In [None]:
early_stopping_callback = EarlyStopping(monitor='val_loss', mode='min', patience=5, restore_best_weights=True)
bst_model = model.fit(training_data, epochs=5, validation_data=validation_data, callbacks=[early_stopping_callback])

In [None]:
# setting up for logging 
logdir='logs'
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
bst_model = model.fit(training_data, epochs=6, validation_data=validation_data, callbacks=[tensorboard_callback])

In [None]:
model.save("/kaggle/working/accidents.keras")                                                     

In [None]:
bst_model.history
bst_model.history['val_accuracy'][-1]

<h2>5. Plotting Training Loss and Accuracy Curve with epochs</h2>

In [None]:
fig = plt.figure()
plt.plot(bst_model.history['loss'], color='red', label='training loss')
plt.plot(bst_model.history['val_loss'], color='blue', label='validation_loss')
fig.suptitle('Loss', fontsize=20)
plt.legend(loc="upper left")
plt.xlabel("Epoch")
plt.ylabel("loss")
plt.show()

In [None]:
fig = plt.figure()
plt.plot(bst_model.history['accuracy'], color='red', label='training accuracy')
plt.plot(bst_model.history['val_accuracy'], color='blue', label='validation_accuracy')
fig.suptitle('Accuracy', fontsize=20)
plt.legend(loc="upper left")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.show()

# 6. Evaluation
Model performance will be measured by its F1 scores in predicting and analyzing actual traffic accidents.
Solutions offering actionable insights and demonstrable potential to reduce accident frequency and impact will be favoured.

In [None]:
test_data_dir = os.path.join("/kaggle/input/accident-detection-from-cctv-footage/data/test")
test_data = tf.keras.utils.image_dataset_from_directory(test_data_dir)
test_data_iterator = test_data.as_numpy_iterator()
test_batch = test_data_iterator.next()

In [None]:
import numpy as np
warnings.filterwarnings("ignore")


y_true = []  # true labels
y_pred = []  # predicted labels

for batch in test_data:
    X, y = batch
    yhat = model.predict(X)
    y_pred.extend(yhat.flatten().round().astype(int))
    y_true.extend(np.array(y).flatten().astype(int))  # Convert y to NumPy array

# Calculate evaluation metrics
f1 = f1_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)

# Print evaluation metrics and confusion matrix
print("Precision:", precision)
print("Recall:", recall)
print("F1 score:", f1)
print("Confusion Matrix:")
print(cm)

# Generate classification report
target_names = ['Not Accident', 'Accident']
classification_rep = classification_report(y_true, y_pred, target_names=target_names)
print("Classification Report:")
print(classification_rep)


# 7. Sanity check on Test

In [None]:
import cv2
import os
import numpy as np
import matplotlib.pyplot as plt
import random

# Define the directories for each class
class_directories = ["/kaggle/input/accident-detection-from-cctv-footage/data/test/Accident",
                     "/kaggle/input/accident-detection-from-cctv-footage/data/test/Non Accident"]

# Randomly select one image from each class directory
selected_images = []
for directory in class_directories:
    filenames = os.listdir(directory)
    selected_image = random.choice(filenames)
    selected_image_path = os.path.join(directory, selected_image)
    selected_images.append(selected_image_path)

# Load and resize the selected images
samples = []
for image_path in selected_images:
    sample = cv2.imread(image_path, cv2.IMREAD_COLOR)
    sample = cv2.resize(sample, (256, 256))
    samples.append(sample)

# Perform prediction for each sample
t_label = "Accident"
for sample in samples:
    prediction = 1 - model.predict(np.expand_dims(sample / 255, 0))

    if prediction >= 0.5:
        label = f'Predicted class is Accident; Actual is {t_label}'
    else:
        label = f'Predicted class is Not Accident; Actual is {t_label}'

    plt.title(label)
    plt.imshow(sample)
    plt.show()
    
    t_label = "Non Accident"


### Create CSV Files for Submission

In [None]:
import cv2
import os
import pandas as pd

# Define the directory containing the test data
test_data_dir = "/kaggle/input/accident-detection-from-cctv-footage/data/test"

# Initialize lists to store filenames and predictions
filenames = []
labels=[]
predictions = []

# Iterate through subdirectories in the test data directory
for subdir in os.listdir(test_data_dir):
    subdir_path = os.path.join(test_data_dir, subdir)
    c=1
    # Check if the item in the directory is a subdirectory
    if os.path.isdir(subdir_path):
        # Iterate through files in the subdirectory
        for filename in os.listdir(subdir_path):
            # Check if the file is a JPEG image
            if filename.endswith(".jpg"):
                filepath = os.path.join(subdir_path, filename)
                
                # Load and resize the image
                sample = cv2.imread(filepath, cv2.IMREAD_COLOR)
                sample = cv2.resize(sample, (256, 256))
                
                # Predict using the model
                prediction = model.predict(np.expand_dims(sample / 255, 0))
                
                # Assign labels based on the prediction
                output = 1 if prediction >= 0.5 else 0
                
                # Append filename and prediction to lists
                filenames.append(filename)
                predictions.append(output)
                labels.append(c)
    c=0
# Create a DataFrame to store filenames and predictions
df = pd.DataFrame({"ID": filenames, "Column ID": predictions})

# Save the DataFrame to a CSV file
output_csv_path = "/kaggle/working/submission.csv"
df.to_csv(output_csv_path, index=False)

In [None]:
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix, classification_report

# Calculate evaluation metrics
f1 = f1_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)

# Print evaluation metrics and confusion matrix
print("Precision:", precision)
print("Recall:", recall)
print("F1 score:", f1)
print("Confusion Matrix:")
print(cm)

# Generate classification report
target_names = ['Not Accident', 'Accident']
classification_rep = classification_report(y_true, y_pred, target_names=target_names)
print("Classification Report:")
print(classification_rep)


## Future work

Captions generation through image-to-text model which describes the imgae in details using pre-trained models like 
gpt2 (https://huggingface.co/nlpconnect/vit-gpt2-image-captioning) or 
blip (https://huggingface.co/Salesforce/blip-image-captioning-large)
etc.

>> 

In [None]:
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
img_to_text_model = AutoModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")

# Freeze the model
img_to_text_model.trainable = False