Crop an image to remove black bars around the scene.

Parameters:
- image (numpy.ndarray): Input image to be cropped.
- y_nonzero (numpy.ndarray or None): Array containing row indices of nonzero values.
- x_nonzero (numpy.ndarray or None): Array containing column indices of nonzero values.

Returns:
* tuple: A tuple containing:
  * numpy.ndarray: Cropped image with black bars removed.
  * numpy.ndarray: Row indices of nonzero values.
  * numpy.ndarray: Column indices of nonzero values.

This function crops an input image to remove black bars surrounding the scene.
The cropping is based on the regions where the pixel values exceed a certain brightness threshold.
This helps in stabilizing videos and improves training performance by preventing flickering and noise.

The cropping process is performed only on the first frame of a frame burst to ensure consistency across the video.
If the indices of nonzero values (y_nonzero and x_nonzero) are not provided,
the function calculates them using a grayscale version of the input image and a brightness threshold of 1.

The cropped image is obtained by slicing the original image using the minimum and maximum indices of nonzero values
along rows and columns. The function returns the cropped image along with the corresponding row and column indices,
which can be used for cropping subsequent frames in a video burst.

In [None]:
# Crop an image to remove black bars around the scene
def crop(image, y_nonzero, x_nonzero):
  # The calculation of which columns and rows have to be removed should be done
  # only with the first frame of a frame burst
  # This prevents a very subtle and rare bug in which a video 'flickers', due to the noise
  # In this way, videos are more stable, which in turns improves training performance
  if y_nonzero is None or x_nonzero is None:
    # We use a gray scaled version of the image to obtain all rows and columns
    # containing pixels with a brightness over a certain threshold
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # On a scale [0-255], a threshold of 1 is enough to remove all black bars
    # without removing important information, especially on very dark or nightly images
    brightness_threshold = 1
    y_nonzero, x_nonzero = np.where(gray_image > brightness_threshold)
  # A slice of the original image is returned containing only
  # rows and columns of pixels with relevant information
  # We also have to return the rows and column indeces to be used on all subsequent frames in the video burst
  return image[np.min(y_nonzero):np.max(y_nonzero), np.min(x_nonzero):np.max(x_nonzero)], y_nonzero, x_nonzero

`load_and_preprocess_video` takes the path to a video file and a configuration dictionary as input. It then extracts frames from the video file, preprocesses them according to the provided configurations, and returns a list of numpy arrays containing the processed frames.

In [None]:
# Function to load and preprocess video frames with a fixed number of frames
def load_and_preprocess_video(video_path, pkl_config):
  # We use the library ComputerVision2 to read video files and extract relevant frames
  cap = cv2.VideoCapture(video_path)
  # The declared fps by the video file often doesn't correspond with the effective video information
  # Probably due to subsequent re-encoding by the dataset provider, many videos
  # have more frames than the original source, bringing the problem of repeated frames
  declared_fps = int(cap.get(cv2.CAP_PROP_FPS ))
  try:
    # We calculate how many sets of frames (a.k.a. frame bursts) containing 'FRAMES' frames
    # we can extarct from the video, assuming a certain FPS
    frame_factor = declared_fps / pkl_config['FPS']
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) / frame_factor
    number_of_sets = (int)(total_frames // pkl_config['FRAMES'])
  except:
    # Sometimes the file doesn't provide a video fps
    # This may lead to a DivBy0 Error. We cannot recover from this
    cap.release()
    return []

  frame_sets = []
  for fset in range(number_of_sets):
    # Calculate frame indices to extract equidistant frames
    frame_indices = np.linspace(pkl_config['FRAMES'] * fset, pkl_config['FRAMES'] * (fset + 1), pkl_config['FRAMES'], dtype=int)
    frames = []
    # Initially set to None to perform the cropping calculation only on the first frame of each burst
    y_nonzero, x_nonzero = None, None
    for idx in frame_indices:
      # Extarct a frame from the video
      cap.set(cv2.CAP_PROP_POS_FRAMES, idx*frame_factor)
      ret, frame = cap.read()
      if not ret:
          # Some videos have variable frame rates. The read might fail
          break
      if pkl_config['CROP']:
        frame, y_nonzero, x_nonzero = crop(frame, y_nonzero, x_nonzero) # prevent flickering
      frame = cv2.resize(frame, (pkl_config['SIZE'], pkl_config['SIZE']))
      frames.append(frame)

    frames = np.array(frames)
    # Filter frames only having the specified shape
    if frames.shape == (pkl_config['FRAMES'], pkl_config['SIZE'], pkl_config['SIZE'], 3):
      if pkl_config['FRAMES'] == 1: # allows for 2dCNNs
        frame_sets.append(frames[0]) # only return the frame instead of a burst of 1 frame
      else:
        frame_sets.append(frames) # append the burst of frames
  cap.release()
  return frame_sets # a (possibly empty) array of numpy arrays in shape

Load and preprocess video frames with a fixed number of frames.

Parameters:
- video_path (str): Path to the video file.
- pkl_config (dict): Dictionary containing configuration parameters for preprocessing.

Returns:
- list: A list containing numpy arrays representing sets of preprocessed video frames.

This function loads and preprocesses video frames from the specified video file.
It extracts frames at equidistant intervals to form sets of frames (frame bursts) with a fixed number of frames defined by 'pkl_config['FRAMES']'.

The function uses OpenCV (cv2) to read the video file. It calculates the effective frames per second (FPS) of the video to handle videos with variable frame rates.

The preprocessing includes optional cropping of frames to remove black bars around the scene, which helps in preventing flickering.
The cropping calculation is performed only on the first frame of each burst to ensure stability across the video.

Each frame is resized to the dimensions specified in 'pkl_config['SIZE']'.
The function filters frames based on the specified shape and returns sets of preprocessed frames as numpy arrays.

If the video file does not provide FPS information, the function returns an empty list.

Note: This function assumes that the input video is in color (RGB).

In [None]:
# Function to create a dataset of videos with corresponding labels if not already present
def create_video_dataset(dataset_path, pkl_config, pickle_name):

    train_data = []
    train_labels = []
    test_data = []
    test_labels = []

    pickles_dir = f'{ROOT_PATH}/pickles'

    train_data_pkl_filepath = f'{pickles_dir}/{pickle_name}-train_data.pkl'
    train_label_pkl_filepath = f'{pickles_dir}/{pickle_name}-train_labels.pkl'
    test_data_pkl_filepath = f'{pickles_dir}/{pickle_name}-test_data.pkl'
    test_label_pkl_filepath = f'{pickles_dir}/{pickle_name}-test_labels.pkl'
    try: # don't create the pickle files if they are already present
      with open(train_data_pkl_filepath, 'rb') as trd, open(train_label_pkl_filepath, 'rb') as trl, open(test_data_pkl_filepath, 'rb') as ted, open(test_label_pkl_filepath, 'rb') as tel:
        train_data = pickle.load(trd)
        train_labels = pickle.load(trl)
        test_data = pickle.load(ted)
        test_labels = pickle.load(tel)
      print("extracted from cached pickles")
    except FileNotFoundError: # files are missing: create them
      dirs = os.listdir(dataset_path)
      dirs.append('fight-detection-surv-dataset-master/fight')
      dirs.append('fight-detection-surv-dataset-master/noFight')
      dirs.remove('fight-detection-surv-dataset-master')
      assert len(dirs) == 5
      print(f"loading dataset {dataset_path} {dirs}")

      labelled_video_paths = []
      # Collect every video inside all directories
      for dir in tqdm(dirs, desc="all dirs"):
          dir = os.path.join(dataset_path, dir)
          for f in os.listdir(dir):
            if f.endswith('.mp4') or f.endswith('.mov'):
              # The first letter of a non-violence video filename is always 'n'
              labelled_video_paths.append((os.path.join(dir, f), 0 if 'n' == f[0] else 1))

      # All videos are shuffled to prevent cutting out always the same source when rebalancing
      random.shuffle(labelled_video_paths)
      labelled_frame_sets = [[], []] # [[non-violence], [violence]]
      count_files = len(labelled_video_paths)
      for video_path, label in tqdm(labelled_video_paths, desc="all labelled_video_paths"):
          # We parse each video to extract the maximum amount of frame bursts as we can
          labelled_frame_sets[label].extend(load_and_preprocess_video(video_path, pkl_config))

      violence_amount = len(labelled_frame_sets[1])
      non_violence_amount = len(labelled_frame_sets[0])
      print(f"extracted ({violence_amount = })+({non_violence_amount = }) = {violence_amount + non_violence_amount} frame sets from {len(labelled_video_paths)} videos")
      # We rebalance the dataset to get the same number of violence and non-violence videos
      min_amount = min(violence_amount, non_violence_amount)
      # Remove from array without making a copy
      labelled_frame_sets[0][min_amount:] = []
      labelled_frame_sets[1][min_amount:] = []

      # We use the TRAIN_SPLIT parameter to create two independent datasets for training and testing
      train_data = labelled_frame_sets[0][:(int)(pkl_config['TRAIN_SPLIT']*min_amount)] + labelled_frame_sets[1][:(int)(pkl_config['TRAIN_SPLIT']*min_amount)]
      train_labels = ([0] * min_amount)[:(int)(pkl_config['TRAIN_SPLIT']*min_amount)] + ([1] * min_amount)[:(int)(pkl_config['TRAIN_SPLIT']*min_amount)]
      test_data = labelled_frame_sets[0][(int)(pkl_config['TRAIN_SPLIT']*min_amount):] + labelled_frame_sets[1][(int)(pkl_config['TRAIN_SPLIT']*min_amount):]
      test_labels = ([0] * min_amount)[(int)(pkl_config['TRAIN_SPLIT']*min_amount):] + ([1] * min_amount)[(int)(pkl_config['TRAIN_SPLIT']*min_amount):]
      # no need to shuffle those arrays, as it is already done inside batch scheduling during model fitting

      train_data = np.array(train_data)
      train_labels = np.array(train_labels)
      test_data = np.array(test_data)
      test_labels = np.array(test_labels)

      with open(train_data_pkl_filepath, 'wb') as trd, open(train_label_pkl_filepath, 'wb') as trl, open(test_data_pkl_filepath, 'wb') as ted, open(test_label_pkl_filepath, 'wb') as tel:
        pickle.dump(train_data, trd)
        pickle.dump(train_labels, trl)
        pickle.dump(test_data, ted)
        pickle.dump(test_labels, tel)
      print(f"pickle files generated: {train_data = } {train_labels = } {test_data = } {test_labels = }")

    except Exception as e:
      print(f"An unexpected error occurred: {e}")
    return train_data, train_labels, test_data, test_labels


This demonstrates how to use the load_video_dataset function to load and preprocess video data from a dataset located at dataset_path. The function utilizes the predefined dataset configurations specified in dataset_configs.

The function creates training and testing splits (X_train, Y_train, X_test, Y_test) based on the specified TRAIN_SPLIT parameter (80% training, 20% testing). It preprocesses the video frames according to the provided configurations, including resizing to 224x224 pixels, extracting 15 frames per set, aiming for 5 frames per second, and performing cropping to remove black bars.

If the dataset is missing, the function creates it (create_on_missing=True). Finally, the function returns the training and testing data splits for further use in model training and evaluation.

In [None]:
dataset_configs = {
  'SIZE':  224,
  'FRAMES':  15,
  'TRAIN_SPLIT':  0.8,
  'FPS':  5,
  'CROP':  True
}

ROOT_PATH = 'drive/MyDrive/Piras_Quint_Volpi'
dataset_path = f'{ROOT_PATH}/Dataset originale'

X_train, Y_train, X_test, Y_test = create_video_dataset(dataset_path, dataset_configs, 'default')