---
**Author:** Arifa Kokab  
**For:** AAI-540 Machine Learning Operations  
**Institution:** University of San Diego

# Video Frame Extraction & Upload Pipeline

## Introduction

This notebook demonstrates an automated workflow for extracting frames from a video file and uploading them to AWS S3. This process enables downstream machine learning tasks such as batch inference, facial expression analysis, or dataset creation. The workflow is optimized for reproducibility and integration into a larger MLOps pipeline.

---

## 1. Setup: Import Libraries and Initialize S3 Client

This section imports the required libraries (Boto3, OpenCV, and OS) and initializes the AWS S3 client for subsequent file operations.

In [1]:
import boto3
import cv2
import os

# Initialize S3 client
s3 = boto3.client('s3')

## 2. Download Video File from S3

We specify the S3 bucket and object key (video filename) and download the video file locally. This ensures the video is available for frame extraction.

In [2]:
# Specify your bucket name and object key (filename in S3)
bucket_name = 'sagemaker-us-east-1-301806113644'
object_key = 'videos/WIN_20250622_02_25_17_Pro.mp4'
local_file = 'WIN_20250622_02_25_17_Pro.mp4'

# Download from S3
s3.download_file(bucket_name, object_key, local_file)
print(f"Downloaded '{object_key}' from S3 bucket '{bucket_name}' to '{local_file}'")

Downloaded 'videos/WIN_20250622_02_25_17_Pro.mp4' from S3 bucket 'sagemaker-us-east-1-301806113644' to 'WIN_20250622_02_25_17_Pro.mp4'


## 3. Extract Frames from Video

Using OpenCV, we extract frames from the video at a fixed frame rate (e.g., 1 frame per second). Each extracted frame is saved as a JPEG image in a local directory. This allows for granular analysis or batch processing in subsequent steps.

In [3]:
video_path = "WIN_20250622_02_25_17_Pro.mp4"
output_dir = "video_frames"
os.makedirs(output_dir, exist_ok=True)

cap = cv2.VideoCapture(video_path)
frame_rate = 1  # Extract 1 frame per second
frame_count = 0
saved_count = 0

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    if int(cap.get(cv2.CAP_PROP_POS_FRAMES)) % int(cap.get(cv2.CAP_PROP_FPS) * frame_rate) == 0:
        frame_filename = os.path.join(output_dir, f"frame_{saved_count:04d}.jpg")
        cv2.imwrite(frame_filename, frame)
        saved_count += 1
cap.release()

print(f"✓ {saved_count} frames extracted to '{output_dir}'")

✓ 22 frames extracted to 'video_frames'


## 4. Upload Extracted Frames to S3

All extracted frame images are uploaded to a specified folder (prefix) in the same S3 bucket. This step enables distributed processing, storage, or further machine learning tasks on AWS.

In [4]:
s3 = boto3.client('s3')
bucket_name = 'sagemaker-us-east-1-301806113644'
s3_prefix = 'batch_input/'

for filename in os.listdir(output_dir):
    if filename.endswith(".jpg"):
        local_path = os.path.join(output_dir, filename)
        s3.upload_file(local_path, bucket_name, s3_prefix + filename)

print("✓ All frames uploaded to S3.")

✓ All frames uploaded to S3.


## Conclusion

This notebook provides a reproducible method for extracting video frames and storing them in AWS S3, forming a crucial preprocessing step in video-based machine learning pipelines. The approach ensures data is efficiently prepared for batch inference, annotation, or training purposes within a cloud-based MLOps workflow.