# **Video dataset visualization**

copyright 2024, Denis Rothman

The video dataset was generated by Denis Rothman with the [OpenAI Sora diffusion transformer model on inVideo](https://ai.invideo.io/) specifically for a Video Production educational use case.

Read [inVideo Terms and Conditions](https://invideo.io/terms-and-conditions/) for the the usage of videos you may want to generate yourself.

The notebook process:

*   Installing the environement
*   Video download and displaying functions
*   Introduction video(with audio)
*   Dataset of AI-generated videos
*   Displaying the thumbnails and videos in the dataset

**Note:** To avoid possible memory overflows due to the multiple video displays, run each section in a sequence.









# Installing the environment

## Importing modules, libraries




In [1]:
from IPython.display import HTML # to display videos
import base64 # to encode videos as base64
from base64 import b64encode # to encode videos as base64
import os # to interact with the operating system
import subprocess # to run commands
import time # to measure execution time
import csv # to save comments
import uuid # to generate unique ids
import cv2 # to split videos
from PIL import Image # to display videos
import pandas as pd # to display comments
import numpy as np # to use Numerical Python
from io import BytesIO #for a binary stream of data in memory

## GitHub

In [2]:
def download(directory, filename):
    # The base URL of the image files in the GitHub repository
    base_url = 'https://raw.githubusercontent.com/Denis2054/RAG-Driven-Generative-AI/main/'

    # Complete URL for the file
    file_url = f"{base_url}{directory}/{filename}"

    # Use curl to download the file, including an Authorization header for the private token
    try:
        # Prepare the curl command with the Authorization header
        # PRIVATE_TOKEN will be removed at publication
        PRIVATE_TOKEN = "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"
        curl_command = f'curl -H "Authorization: token {PRIVATE_TOKEN}" -o {filename} {file_url}'

        # Execute the curl command
        subprocess.run(curl_command, check=True, shell=True)
        print(f"Downloaded '{filename}' successfully.")
    except subprocess.CalledProcessError:
        print(f"Failed to download '{filename}'. Check the URL, your internet connection and the file path")

# Video downloadiing and displaying functions

In [3]:
# downloading file from GitHub
def download_video(filename):
  # Define your variables
  directory = "Chapter10/videos"
  filename = file_name
  download(directory, filename)

In [4]:
# Open the file in binary mode
def display_video(file_name):
  with open(file_name, 'rb') as file:
      video_data = file.read()

  # Encode the video file as base64
  video_url = b64encode(video_data).decode()

  # Create an HTML string with the embedded video
  html = f'''
  <video width="640" height="480" controls>
    <source src="data:video/mp4;base64,{video_url}" type="video/mp4">
  Your browser does not support the video tag.
  </video>
  '''
  # Display the video
  HTML(html)
  # Return the HTML object
  return HTML(html)

In [5]:
def display_video_frame(file_name, frame_number, size):
    # Open the video file
    cap = cv2.VideoCapture(file_name)

    # Move to the frame_number
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_number)

    # Read the frame
    success, frame = cap.read()
    if not success:
        return "Failed to grab frame"

    # Convert the color from BGR to RGB
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # Convert to PIL image and resize
    img = Image.fromarray(frame)
    mg = img.resize(size, Image.LANCZOS)  # Resize image to specified size

    # Convert the PIL image to a base64 string to embed in HTML
    buffered = BytesIO()
    img.save(buffered, format="JPEG")
    img_str = base64.b64encode(buffered.getvalue()).decode()

    # Create an HTML string with the embedded image
    html_str = f'''
    <img src="data:image/jpeg;base64,{img_str}" width="{size[0]}" height="{size[1]}">
    '''
    # Display the image
    display(HTML(html_str))
    # Return the HTML object for further use if needed
    return HTML(html_str)


# Introduction video (with audio)

In [15]:
# select file
print("Collecting video")
file_name="AI_Professor_Introduces_New_Course.mp4"
#file_name = "AI_Professor_Introduces_New_Course.mp4" # Enter the name of the video file to process here
print(f"Video: {file_name}")

# Downloading video
print("Downloading video: downloading from GitHub")
download_video(file_name)

Collecting video
Video: AI_Professor_Introduces_New_Course.mp4
Downloading video: downloading from GitHub
Downloaded 'AI_Professor_Introduces_New_Course.mp4' successfully.


Display the thumbnail of the AI-generated introduction video

In [16]:
print("Displaying a frame of video: ",file_name)
frame_number=5
display_video_frame(file_name, frame_number, size=(135, 90));

Displaying a frame of video:  AI_Professor_Introduces_New_Course.mp4


Display the introduction video

Uncomment the following code to display the video

In [8]:
#print("Displaying video: ",file_name)
#display_video(file_name)

# Dataset of AI generated videos



In [9]:
lfiles = [
    "jogging1.mp4",
    "jogging2.mp4",
    "skiing1.mp4",
    "soccer_pass.mp4",
    "soccer_player_head.mp4",
    "soccer_player_running.mp4",
    "surfer1.mp4",
    "surfer2.mp4",
    "swimming1.mp4",
    "walking1.mp4",
    "alpinist1",
    "ball_passing_goal.mp4",
    "basketball1.mp4",
    "basketball2.mp4",
    "basketball3.mp4",
    "basketball4.mp4",
    "basketball5.mp4",
    "female_player_after_scoring.mp4",
    "football1.mp4",
    "football2.mp4",
    "hockey1.mp4"
]

# Displaying the videos and thumbnails in the dataset

## Collecting the video dataset

In [10]:
for file_name in lfiles:
  print("Collecting video",file_name)
  print("Downloading video",file_name)
  download_video(file_name)

Collecting video jogging1.mp4
Downloading video jogging1.mp4
Downloaded 'jogging1.mp4' successfully.
Collecting video jogging2.mp4
Downloading video jogging2.mp4
Downloaded 'jogging2.mp4' successfully.
Collecting video skiing1.mp4
Downloading video skiing1.mp4
Downloaded 'skiing1.mp4' successfully.
Collecting video soccer_pass.mp4
Downloading video soccer_pass.mp4
Downloaded 'soccer_pass.mp4' successfully.
Collecting video soccer_player_head.mp4
Downloading video soccer_player_head.mp4
Downloaded 'soccer_player_head.mp4' successfully.
Collecting video soccer_player_running.mp4
Downloading video soccer_player_running.mp4
Downloaded 'soccer_player_running.mp4' successfully.
Collecting video surfer1.mp4
Downloading video surfer1.mp4
Downloaded 'surfer1.mp4' successfully.
Collecting video surfer2.mp4
Downloading video surfer2.mp4
Downloaded 'surfer2.mp4' successfully.
Collecting video swimming1.mp4
Downloading video swimming1.mp4
Downloaded 'swimming1.mp4' successfully.
Collecting video wa

## Thumbnail of the videos

In [None]:
for file_name in lfiles:
  print("Displaying a frame of video: ",file_name)
  display_video_frame(file_name, frame_number=5, size=(100, 110))

## Displaying a video

1.Select a video in the list of downloaded videos

In [12]:
file_name="football1.mp4" # Enter the name of the video file to process here

2.Uncomment the following code to display the video


In [13]:
#print("Displaying video: ",file_name)
#display_video(file_name)