**Automated Detection of Violent Events in Video Streams**
The project's goal is to develop and implement an advanced deep learning model capable of automatically recognizing violent activities in video streams, an importantconcern for public safety and surveillance systems. Leveraging the power ofconvolutional neural networks (CNNs) and/or recurrent neural networks (RNNs), the project combines spatial and temporal data analysis to accurately identify instancesof violence among individuals or groups within video footage.

**1. Data Collection and Preprocessing**
Our dataset comprises 1000 violence and 1000 non-violence videos, all sourced from YouTube videos. The violence videos within our dataset encompass real street fight situations recorded in various environments and conditions. In contrast, the non-violence videos are sourced from a broad spectrum of human actions, including sports, eating, walking, and more.

**Import Necessary Dependencies**

In [1]:
import cv2
import os
import tensorflow as tf
import numpy as np
from tensorflow.keras.applications import InceptionV3
from tensorflow.keras.models import Model
from tensorflow.keras import Input
from tqdm import tqdm
from tensorflow.keras.models import save_model

**Video Dataset Compilation**

**Monter le Drive**

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**List out the data from our dataset**

In [4]:
violence = os.listdir('/content/drive/MyDrive/Violence')
nonviolence = os.listdir('/content/drive/MyDrive/NonViolence')

In [5]:
violence_path = [os.path.join('/content/drive/MyDrive/Violence',name) for name in violence]
nonviolence_path = [os.path.join('/content/drive/MyDrive/NonViolence',name) for name in nonviolence]

In [6]:
violence_path[1]

'/content/drive/MyDrive/Violence/V_101.mp4'

**Preprocessing**

In [7]:
def preprocess_video(video_path, frame_interval=1, target_size=(224, 224)):
    cap = cv2.VideoCapture(video_path)
    frames = []
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        # Réduction du taux de trame en fonction de l'intervalle spécifié
        if cap.get(cv2.CAP_PROP_POS_FRAMES) % frame_interval == 0:
            # Redimensionnement du cadre à la taille cible
            frame = cv2.resize(frame, target_size)
            frames.append(frame)
    cap.release()
    return frames

def data_augmentation(frames):
    augmented_frames = []
    for frame in frames:
        # Exemple de transformation: retournement horizontal
        flipped_frame = cv2.flip(frame, 1)
        augmented_frames.append(flipped_frame)
    return augmented_frames

In [8]:
# Exemple d'utilisation
video_frames = preprocess_video('V_101.mp4', frame_interval=5, target_size=(224, 224))
augmented_frames = data_augmentation(video_frames)

**Model Design and Implementation:**

**Spatial Feature Extraction**: Utilize a pre-trained model (CNN, transformers),known for its effectiveness in image recognition tasks, to extract spatialfeatures from individual video frames.
InceptionV3

In [9]:
pretrained_model = InceptionV3()
# Create a new model for feature extraction
# Extract features from the second-to-last layer of the InceptionV3 model
pretrained_model = Model(inputs=pretrained_model.input,outputs=pretrained_model.layers[-2].output)
pretrained_model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_1 (InputLayer)        [(None, 299, 299, 3)]        0         []                            
                                                                                                  
 conv2d (Conv2D)             (None, 149, 149, 32)         864       ['input_1[0][0]']             
                                                                                                  
 batch_normalization (Batch  (None, 149, 149, 32)         96        ['conv2d[0][0]']              
 Normalization)                                                                                   
                                              

**Frame Feature Extraction Function**¶
In this section, we define a function for extracting features from an individual frame using the previously configured feature extraction model.

In [10]:
def feature_extractor(frame):
    # Expand the dimensions of the frame for model compatibility
    img = np.expand_dims(frame, axis=0)

    # Use the pre-trained feature extraction model to obtain the feature vector
    feature_vector = pretrained_model.predict(img, verbose=0)

    # Return the extracted feature vector
    return feature_vector

**Video Frames Extraction Function**

In [11]:
def frames_extraction(video_path, SEQUENCE_LENGTH=16, IMAGE_WIDTH=299, IMAGE_HEIGHT=299, total_video=0):
    # List to store features for all videos
    all_video_features = []

    # Loop through each video
    for pos in tqdm(range(total_video)):
        frames_list = []

        # Open the video file for reading
        video_reader = cv2.VideoCapture(video_path[pos])

        # Get the total number of frames in the video
        video_frames_count = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))

        # Calculate the number of frames to skip in order to achieve the desired sequence length
        skip_frames_window = max(int(video_frames_count / SEQUENCE_LENGTH), 1)

        # Loop through each frame in the sequence
        for frame_counter in range(SEQUENCE_LENGTH):
            # Set the position of the video reader to the current frame
            video_reader.set(cv2.CAP_PROP_POS_FRAMES, frame_counter * skip_frames_window)

            # Read the frame
            success, frame = video_reader.read()

            # Break if unable to read the frame
            if not success:
                break

            # Convert the frame to RGB and resize it
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            resized_frame = cv2.resize(frame_rgb, (IMAGE_HEIGHT, IMAGE_WIDTH))

            # Normalize the frame
            normalized_frame = resized_frame / 255

            # Extract features using the previously defined feature extraction function
            features = feature_extractor(normalized_frame)

            # Append the features to the list
            frames_list.append(features)

        # Append the list of features for the current video to the overall list
        all_video_features.append(frames_list)

        # Release the video reader
        video_reader.release()

    # Convert the list of features to a numpy array
    return np.array(all_video_features)

**we will only using 500 videos on both violence and non violence classes**

In [12]:
violence_features = frames_extraction(violence_path[:500],total_video=len(violence_path[:500]))
non_violence_features = frames_extraction(nonviolence_path[:500],total_video=len(nonviolence_path[:500]))

100%|██████████| 500/500 [1:12:05<00:00,  8.65s/it]
100%|██████████| 500/500 [59:40<00:00,  7.16s/it]


In [13]:
np.save('/content/drive/MyDrive/violence_features.npy',violence_features)# save the feature in our directory and make it reusable

In [14]:
np.save('/content/drive/MyDrive/non_violence_features.npy',non_violence_features)# save the feature in our directory and make it reusable

**Loading Non-Violence and Violence Feature Data**
In this section, we load the precomputed feature data for non-violence and violence videos. The features are stored in NumPy arrays.

In [15]:
non_violence_data = np.load('/content/drive/MyDrive/non_violence_features.npy')
violence_data = np.load('/content/drive/MyDrive/violence_features.npy')

In [16]:
violence_data[0].shape

(16, 1, 2048)

**Temporal Feature Extraction: That can be used with LSTM**

Creating LSTM Model and Preparing Data¶
In this section, we define an Bidirectional LSTM (Long Short-Term Memory) model for video classification and prepare the data for training.

In [17]:
from keras.models import Sequential
from keras.layers import LSTM, Dense,Bidirectional,BatchNormalization,Dropout
from sklearn.model_selection import train_test_split
import numpy as np

# Create labels
violence_labels = np.zeros(len(violence_data))
nonviolence_labels = np.ones(len(non_violence_data))

# Combine features and labels
X = np.concatenate([violence_data, non_violence_data], axis=0)
y = np.concatenate([violence_labels, nonviolence_labels], axis=0)

In [18]:
len(X)# total samples

1000

In [19]:
X[0].shape# shape of each samples

(16, 1, 2048)

In [20]:
y[0:20]# first 20 labels

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])

In [21]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=32)

X_train_reshaped = X_train.reshape((X_train.shape[0], 16, 2048))# reshape to (16,2048)
X_test_reshaped = X_test.reshape((X_test.shape[0], 16, 2048))# reshape to (16,2048)

**LSTM Model Definition using Keras Functional API**

In [22]:
# Define the input layer
inputs = Input(shape=(16, 2048))

# Build the LSTM model using Functional API
x = Bidirectional(LSTM(200, return_sequences=True))(inputs)
x = BatchNormalization()(x)
x = Dropout(0.3)(x)
x = Bidirectional(LSTM(100))(x)
x = BatchNormalization()(x)
x = Dropout(0.3)(x)
x = Dense(200, activation='relu')(x)
outputs = Dense(1, activation='sigmoid')(x)

# Create the model
model = Model(inputs=inputs, outputs=outputs)

In [23]:
model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 16, 2048)]        0         
                                                                 
 bidirectional (Bidirection  (None, 16, 400)           3598400   
 al)                                                             
                                                                 
 batch_normalization_94 (Ba  (None, 16, 400)           1600      
 tchNormalization)                                               
                                                                 
 dropout (Dropout)           (None, 16, 400)           0         
                                                                 
 bidirectional_1 (Bidirecti  (None, 200)               400800    
 onal)                                                           
                                                           

**Compiling The Model**

In [24]:
# Compile your model with an appropriate loss and optimizer
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train_reshaped,y_train,validation_data=(X_test_reshaped,y_test),epochs=5,batch_size=32)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7d856d1365f0>

**Benchmark & Evaluation**

**Model evaluation**

In [25]:
# Evaluate the model on the test set
accuracy = model.evaluate(X_test_reshaped, y_test)
print("Test Accuracy:", accuracy[1])

Test Accuracy: 0.9549999833106995


**Lets Test With Unseen Videos**

In [26]:
violence_features_test = frames_extraction(violence_path[500:510],total_video=len(violence_path[500:510]))
non_violence_features_test = frames_extraction(nonviolence_path[500:510],total_video=len(nonviolence_path[500:510]))

100%|██████████| 10/10 [01:04<00:00,  6.44s/it]
100%|██████████| 10/10 [01:02<00:00,  6.26s/it]


In [27]:
test_violence = violence_features_test.reshape((violence_features_test.shape[0], 16, 2048))
test_non_violence = non_violence_features_test.reshape((non_violence_features_test.shape[0], 16, 2048))

In [28]:
test_violence[0].shape

(16, 2048)

In [29]:
np.expand_dims(test_violence[0],axis=0).shape# if we do prediiction single video then we need to perform expand dim

(1, 16, 2048)

In [30]:
class_names = ['violence','non_violence']# class names

**Model Testing**

In [31]:
predicted_non_violence = [class_names[1] if i > 0.5 else class_names[0] for i in model.predict(test_non_violence)]# tested with non violence video
predicted_violence = [class_names[1] if i > 0.5 else class_names[0] for i in model.predict(test_violence)]# tested with violence video



In [32]:
predicted_non_violence

['non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence']

In [33]:
predicted_violence

['non_violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence']

**Classification Report For The Model Prediction**

In [34]:
from sklearn.metrics import classification_report

y_pred = model.predict(X_test_reshaped)
y_preds = [1 if i > 0.5 else 0 for i in y_pred]
# Generate a classification report
report = classification_report(y_test, y_preds)

# Print the classification report
print("Classification Report:\n", report)

Classification Report:
               precision    recall  f1-score   support

         0.0       0.94      0.97      0.95        98
         1.0       0.97      0.94      0.96       102

    accuracy                           0.95       200
   macro avg       0.96      0.96      0.95       200
weighted avg       0.96      0.95      0.96       200



In [35]:
# Enregistrer le modèle sur le disque
model.save('model.h5')

  saving_api.save_model(
