<a href="https://colab.research.google.com/github/Chunder1/Google-Colab-/blob/main/DataProcessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Data Processing and Active Learning with Roboflow
What this notebook does.
1.) Turns a video input into a frame output
  1a.) Video inpu path = unprocessed_Data @ "video_dir"
  1b.) Video Output path = activeLearningDataDirectory @ "output_dir"
2.) Uses a roboflow model and hosted inference endpoint uses active learning to improve model performance using the output frames.

#1.) Install Dependencies

First we need to install the required dependencies. Those dependencies are...
1.) Open-CV
2.) Roboflow
3.) Inference SDK
Run the code cell below.

In [None]:
!pip install opencv-python
!pip install roboflow
!pip install inference_sdk
!pip install google-api-python-client
!pip install os

#2.) Data Processing
This script is for processing videos into frames and putting those frames into a directory called "datasetDirectory". Once these videos are split into images or frames the aforementioned directory will be accessed by the active learning portion of this notebook.

First we need to mount google drive so that the input data can be accessed. The "drive.flush_and_unmount()" line is there to ensure that no errors are encountered if google drive was previously mounted.

### 2a.) Mount the Drive

In [None]:
from google.colab import drive
drive.flush_and_unmount()  # This will unmount the drive
drive.mount('/content/gdrive')  # Remount the drive


### 2b.) Process videos into frames
The four important variables of this script are
1.) "video_dir" or the path to where the videos are stored. Note: Make sure the file types in this folder end with .avi and that there are no nested folders.

2.) "output_dir" or the destination directory for the output which in this case is frames or images.

3.) "gloabl_frame_count" This should be whatever the number of the last frame outputted into the output_dir is plus 1

4.) "capture_interval" This variable determines the frequnecy in which frames are captured. If the video is filmed at 30 frames per second and this is set to 1 then 30 frames per second would be captured or every frame. if it were set to 2 then every other frame would be captured. So on and so forth. We have it set to 10 so every tenth frame is captured or in other words 3 frames per second are captured.

In [None]:
import cv2
import os

# Directory where your videos are stored
video_dir = '/content/gdrive/MyDrive/unprocessed_Data'

# The directory to store the output PNG images
output_dir = '/content/gdrive/MyDrive/datasetDirectory'
os.makedirs(output_dir, exist_ok=True)  # Ensure the output directory exists

# List all files in the video directory that match your naming convention
video_files = [f for f in os.listdir(video_dir) if f.startswith("vid_") and f.endswith(".avi")]

# Global frame counter
global_frame_count = 0

# Process each video file
for video_file in video_files:
    video_path = os.path.join(video_dir, video_file)

    # Open the video file
    video = cv2.VideoCapture(video_path)

    # Check if the video file is opened successfully
    if not video.isOpened():
        print(f"Error opening video file: {video_file}")
        continue
    else:
        print(f"Video file {video_file} opened successfully")

    # Read frames from the video and convert them to PNG images
    frame_count = 0
    capture_interval = 10
    while video.isOpened():
        ret, frame = video.read()
        if not ret:
            break

        # Only save every Nth frame (as defined by capture_interval)
        if frame_count % capture_interval == 0:
            # Use the global frame count in the filename to ensure uniqueness
            output_path = os.path.join(output_dir, f'img_{global_frame_count:08d}.png')
            cv2.imwrite(output_path, frame)
            global_frame_count += 1

        frame_count += 1

    # Release the video file
    video.release()

    print(f"Total frames extracted from {video_file}: {frame_count}")

print(f"Total frames extracted: {global_frame_count}")


Now that all of the videos have been turned into images instead of using a hosted inference endpoint we are going to run this through a model hosted on roboflow but do it in this notebook. We will then create a file associated with that image that has the annotations from the inference.

# 3.) Run Inference
This portion of the notebook utilizes Roboflows hosted inference endpoint. The purpose of this is to be able to feed a massive dataset to the desired model while active learning is enabled.

In [None]:
from roboflow import Roboflow
from inference_sdk import InferenceHTTPClient
import os
from PIL import Image
import requests
from google.colab import drive

folder_path = "/content/gdrive/MyDrive/datasetDirectory"  # Update this path to where your data is located

# Initialize the Roboflow client
CLIENT = InferenceHTTPClient(
    api_url="http://detect.roboflow.com",
    api_key="###################" # Roboflow API Key
)

# Iterate through each file in the folder
for file_name in os.listdir(folder_path):
    if file_name.lower().endswith(('.png', '.jpg', '.jpeg')):  # Check if the file is an image
        file_path = os.path.join(folder_path, file_name)

        # Perform inference on the image
        result = CLIENT.infer(file_path, model_id="") #Model_id to the project you want to access

        # Print inference completion message for each image
        print(f"Inference Completed for {file_name}")

print("All inferences completed.")