
**Applying Active Learning Models to Unseen Video Data (Google Cloud)**
---



**We need to connect the google drive with google colab to access the models and scripts. It may require google authentication to access your drive. We are utilizing T4 GPU runtime in google colab. Make sure to check and update your runtime.**

In [1]:
# Connect google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Need to connect orginal directory to access code and models**

In [None]:
cd /content/drive/MyDrive/Google_colab/AL-MDN


/content/drive/MyDrive/Google_colab/AL-MDN


*** This part is loading necessary packagees such as Google cloud storage, torch, common python packages. We are also importing some funtion such as build_ssd_gmm, VOC_CLASSES from our Active learning model***

In [None]:
# load packages
from google.cloud import storage
import os
import cv2
import numpy as np
import sys
import torch
import torch.nn as nn
import argparse
import csv

from ssd_gmm import build_ssd_gmm
from data import VOC_CLASSES
import warnings
warnings.filterwarnings("ignore")

**This code list all the files from Google Cloud Storage. The videos are stored in the cloud storage for processing**

In [None]:
# Function to list files from GCS
def list_files_from_bucket(bucket_name, prefix):
    client = storage.Client.create_anonymous_client()
    files = [blob.name for blob in client.list_blobs(bucket_name, prefix=prefix)]
    return files

**This code block takes different input**
1. Trained Model
2. Google Cloud Bucket name & directory
3. Frame Rate to extract video frames



In [None]:
# Configuration
trained_model = 'weights/ssd300_AL_VOC_id_1_num_labels_110000_120000.pth'
bucket = 'nmfs_odp_sefsc'
prefix = 'PEMD/VIDEO_DATA/GOM_REEF_FISH/SoJo_2022'
frame_rate = 5

**This code block loads the trained models. The function build_ssd_gmm initialize our model.**

In [None]:
if torch.cuda.is_available():
    torch.set_default_tensor_type('torch.cuda.FloatTensor')

# Initialize SSD
net = build_ssd_gmm('test', 300, 145)
net = nn.DataParallel(net)
net.load_state_dict(torch.load(trained_model, map_location=torch.device('cpu')))
net.eval()

DataParallel(
  (module): SSD_GMM(
    (vgg): ModuleList(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): ReLU(inplace=True)
      (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (3): ReLU(inplace=True)
      (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (6): ReLU(inplace=True)
      (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (8): ReLU(inplace=True)
      (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (11): ReLU(inplace=True)
      (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (13): ReLU(inplace=True)
      (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (15): ReLU(inplace=True)
      (

**This funtion extracts frames at a defined frame rate. It saves the frames as .JPG format and stores that frames in a temporary drectory**

In [None]:
def extract_frames(video_file, output_dir, desired_fps):
    cap = cv2.VideoCapture(video_file)
    if not cap.isOpened():
        print(f"Error opening video file {video_file}")
        return []

    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_count = -1
    extracted_frames = []

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        frame_count += 1
        if frame_count % int(fps / desired_fps) == 0:
            frame_identifier = frame_count // int(fps / desired_fps)
            filename = f"{os.path.splitext(os.path.basename(video_file))[0]}_{frame_identifier}.jpg"
            output_path = os.path.join(output_dir, filename)
            cv2.imwrite(output_path, frame)
            extracted_frames.append((output_path, frame_identifier))
            print(f"Frame {frame_identifier} saved as {filename}")

    cap.release()
    return extracted_frames

**This code block define the directory for temporary directory to store extracted frames. In addition, it also shows the ourput directory to save the output as VIAME csv format. If the folder doesn't exisit, it will create necessary direcotries.**

In [None]:
# Define directories
temp_dir = 'temp_image'
output_dir = 'output'

os.makedirs(temp_dir, exist_ok=True)
os.makedirs(output_dir, exist_ok=True)

In [None]:

# Function to process a single video
def process_video(video_path):
    extracted_frames = extract_frames(video_path, temp_dir, frame_rate)
    if not extracted_frames:
        return

    video_name = os.path.splitext(os.path.basename(video_path))[0]
    csv_file_path = os.path.join(output_dir, f'detection_{video_name}.csv')

    csv_header = ['# 1: Detection or Track-id', '2: Video or Image Identifier', '3: Unique Frame Identifier',
                  '4-7: Img-bbox(TL_x', 'TL_y', 'BR_x', 'BR_y)', '8: Detection or Length Confidence',
                  '9: Target Length (0 or -1 if invalid)', '10-11+: Repeated Species', 'Confidence Pairs or Attributes']

    csv_data = [csv_header]

    for frame_path, frame_id in extracted_frames:
        image = cv2.imread(frame_path)

        x = cv2.resize(image, (300, 300)).astype(np.float32)
        x -= (104.0, 117.0, 123.0)
        x = x.astype(np.float32)
        x = x[:, :, ::-1].copy()
        x = torch.from_numpy(x).permute(2, 0, 1).to(device)
        xx = x.unsqueeze(0)

        with torch.no_grad():
            detections = net(xx)

        detections_to_save = []

        for i in range(detections.size(1)):
            j = 0
            while detections[0, i, j, 0] >= 0.5:
                score = detections[0, i, j, 0].item()
                label_name = VOC_CLASSES[i - 1]
                pt = (detections[0, i, j, 1:5] * torch.Tensor([image.shape[1], image.shape[0], image.shape[1], image.shape[0]])).cpu().numpy()
                bbox = [int(pt[0]), int(pt[1]), int(pt[2]), int(pt[3])]
                confidence = f"{score:.2f}"
                class_name = label_name
                detections_to_save.append([j + 1, os.path.basename(frame_path), frame_id, *bbox, confidence, -1, class_name, confidence])
                j += 1

        csv_data.extend(detections_to_save)

    with open(csv_file_path, 'w', newline='') as csvfile:
        csv_writer = csv.writer(csvfile)
        csv_writer.writerows(csv_data)

    print(f'Detections saved to {csv_file_path}')

    for frame_path, _ in extracted_frames:
        os.remove(frame_path)

In [None]:
# Process videos from GCS
files = list_files_from_bucket(bucket, prefix)
for file_name in files:
    video_url = os.path.join('https://storage.googleapis.com', bucket, file_name)
    print(f"Processing video: {video_url}")
    process_video(video_url)

Processing video: https://storage.googleapis.com/nmfs_odp_sefsc/PEMD/VIDEO_DATA/GOM_REEF_FISH/SoJo_2022/942205001_cam3.avi
Frame 0 saved as 942205001_cam3_0.jpg
Frame 1 saved as 942205001_cam3_1.jpg
Frame 2 saved as 942205001_cam3_2.jpg
Frame 3 saved as 942205001_cam3_3.jpg
Frame 4 saved as 942205001_cam3_4.jpg
Frame 5 saved as 942205001_cam3_5.jpg
Frame 6 saved as 942205001_cam3_6.jpg
Frame 7 saved as 942205001_cam3_7.jpg
Frame 8 saved as 942205001_cam3_8.jpg
Frame 9 saved as 942205001_cam3_9.jpg
Frame 10 saved as 942205001_cam3_10.jpg
Frame 11 saved as 942205001_cam3_11.jpg
Frame 12 saved as 942205001_cam3_12.jpg
Frame 13 saved as 942205001_cam3_13.jpg
Frame 14 saved as 942205001_cam3_14.jpg
Frame 15 saved as 942205001_cam3_15.jpg
Frame 16 saved as 942205001_cam3_16.jpg
Frame 17 saved as 942205001_cam3_17.jpg
Frame 18 saved as 942205001_cam3_18.jpg
Frame 19 saved as 942205001_cam3_19.jpg
Frame 20 saved as 942205001_cam3_20.jpg
Frame 21 saved as 942205001_cam3_21.jpg
Frame 22 saved as

KeyboardInterrupt: 

**This shell command is OPTIONAL. If you already complete running your model, the bottom command is not required. Required if you want to run all the code at a time.**

In [None]:
!python eval_voc_viame_videos_2.py --trained_model weights/ssd300_AL_VOC_id_1_num_labels_110000_120000.pth --bucket nmfs_odp_sefsc --prefix PEMD/VIDEO_DATA/GOM_REEF_FISH/SoJo_2022 --frame_rate 5



**References**

```
This work is adapted from AL-MDN, and SSD.

If you find this work useful, please feel free to cite:

@inproceedings{nabi2023probabilistic,
  title={Probabilistic Model-Based Active Learning with Attention Mechanism for Fish Species Recognition},
  author={Nabi, MM and Shah, Chiranjibi and Alaba, Simegnew Yihunie and Prior, Jack and Campbell, Matthew D and Wallace, Farron and Moorhead, Robert and Ball, John E},
  booktitle={OCEANS 2023-MTS/IEEE US Gulf Coast},
  pages={1--8},
  year={2023},
  organization={IEEE}
}
```

