# Youtube Video to Deduplicated Screenshots


| Project |GitHub| Colab |
|:--|:-:|:-:|
| ⭐ **Youtube Video to Deduplicated Screenshots** | [![GitHub](https://img.shields.io/badge/GitHub-Visit-brightgreen.svg)](https://github.com/citronlegacy/Video-to-Screenshots/blob/main/Youtube_Video_to_Deduplicated_Screenshots.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/citronlegacy/Video-to-Screenshots/blob/main/Youtube_Video_to_Deduplicated_Screenshots.ipynb) |
| 🎬 **Video To Screenshots** | [![GitHub](https://img.shields.io/badge/GitHub-Visit-brightgreen.svg)](https://github.com/citronlegacy/Video-to-Screenshots/blob/main/Video-to-Screenshots.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/citronlegacy/Video-to-Screenshots/blob/main/Video-to-Screenshots.ipynb) |
| 🎬 **Youtube Video to Screenshots** | [![GitHub](https://img.shields.io/badge/GitHub-Visit-brightgreen.svg)](https://github.com/citronlegacy/Video-to-Screenshots/blob/main/Youtube-Video-to-Screenshots.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/citronlegacy/Video-to-Screenshots/blob/main/Youtube-Video-to-Screenshots.ipynb) |
| 🔄 **Duplicate Image Deleter** | [![GitHub](https://img.shields.io/badge/GitHub-Visit-brightgreen.svg)](https://github.com/citronlegacy/Video-to-Screenshots/blob/main/Duplicate_Image_Deleter.ipynb) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/citronlegacy/Video-to-Screenshots/blob/main/Duplicate_Image_Deleter.ipynb) |



### Project Description

This Google Colab notebook supports automation of several steps
1. Download a Youtube  
2. Take screenshots of a video for a given interval
3. Automatically Remove Duplicates  
4. Zip the output and create a log file

For most use cases you can simply populate the cell for Step 1 and then run cell 2 and 3 without entering anything.

---

### Libraries Used
- Copied some code from https://github.com/Maximax67/LoRA-Dataset-Automaker
- **FFmpeg:** A multimedia framework for handling audio, video, and other multimedia files.
- **tqdm:** A library for displaying progress bars in Python.
- **subprocess:** A module to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.
- **shlex:** A module for parsing strings into tokens, especially useful when dealing with command-line-like syntax.
- **os:** A module for interacting with the operating system, providing functionality to manage directories and files.
- **ipywidgets:** A library for creating interactive widgets in Jupyter notebooks.
- **pytube:** A library for downloading YouTube videos.
- **zipfile:** A module to work with zip archives in Python.
- **pillow:** A Python Imaging Library to process and manipulate images.
- **fiftyone:** A library for exploring, analyzing, and visualizing datasets for computer vision tasks.
- **toml:** A library for parsing and creating TOML (Tom's Obvious Minimal Language) configuration files.
- **datetime:** A module for working with dates and times in Python.




### Project Disclaimer

This Colab notebook is provided for educational and informational purposes only. The content and code within this notebook are not intended for production use, and any actions taken based on the provided information are at your own risk.
When using the `pytube` library to download videos from YouTube, please be aware of YouTube's terms of service. Unauthorized downloading of videos may violate YouTube's terms.

---

In [None]:
#@title Install requirements and connect to Google Drive
#@markdown Installation usually takes about 1 minute
import os
import time
from google.colab import drive
if not os.path.exists('/content/drive'):
    print("📂 Connecting to Google Drive...")
    drive.mount('/content/drive')

from IPython import get_ipython
from IPython.display import display, Markdown
from google.colab.output import clear as clear_output
!apt-get install ffmpeg
import subprocess
import shlex
import re
!pip install tqdm
!pip install fiftyone
!pip install pytube
from tqdm import tqdm
from ipywidgets import widgets
from pytube import YouTube
import zipfile
import fiftyone as fo
import fiftyone.zoo as foz
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import torch
from fiftyone import ViewField as VF
import shutil
from google.colab.output import clear as clear_output

def countNumberOfFilesInFolder(folder):
    count = 0
    for f in os.listdir(folder):
        if (os.path.isfile(os.path.join(folder, f))):
            count += 1
    return count

def delete_dir(directory):
    try:
        shutil.rmtree(directory)
        print(f"Directory '{directory}' deleted successfully.")
    except Exception as e:
        print(f"Error deleting directory '{directory}': {e}")

def get_file_size(file_path):
    try:
        size_bytes = os.path.getsize(file_path)
        size_kilobytes = size_bytes / 1024.0
        size_megabytes = size_kilobytes / 1024.0
        size_gigabytes = size_megabytes / 1024.0

        print(f"File Size: {size_kilobytes:.2f} KB | {size_megabytes:.2f} MB | {size_gigabytes:.2f} GB")
        return size_bytes
    except Exception as e:
        print(f"Error: {e}")
        return None

def zip_folder(folder_path, zip_path):
    try:
        with zipfile.ZipFile(f"{zip_path}.zip", 'w', zipfile.ZIP_DEFLATED) as zipf:
            total_files = sum([len(files) for root, dirs, files in os.walk(folder_path)])

            with tqdm(total=total_files, desc="Zipping", unit="file") as pbar:
                for foldername, subfolders, filenames in os.walk(folder_path):
                    for filename in filenames:
                        file_path = os.path.join(foldername, filename)
                        arcname = os.path.relpath(file_path, folder_path)
                        zipf.write(file_path, arcname=arcname)
                        pbar.update(1)

        print(f"\nFolder '{folder_path}' \nSuccessfully zipped to '{zip_path}.zip'")
        zipfileReference = zip_path + ".zip"
        get_file_size(zipfileReference)
    except Exception as e:
        print(f"Error: {e}")


def clean_string(input_string):
    # Remove special characters
    cleaned_string = re.sub(r'[^\w\s]', '', input_string)

    # Replace whitespaces with underscores
    cleaned_string = cleaned_string.replace(' ', '_')

    return cleaned_string

start_time = time.time()

!pip install fiftyone

clear_output()
print("Install Successfull!")

elapsed_time = time.time() - start_time
print(f"Libraries installed in {elapsed_time:.2f} seconds.")

# Rest of the second chunk of code
print("Install Successfull!")


In [None]:
#@title Step 1. Get a Youtube Video
#@markdown Define your folder in Colab or Google Drive

# Set variables
input_storage_location = "Video file in Colab (/content/)" #@param ["Video file in Colab (/content/)", "Video file in Google Drive (/content/drive/MyDrive/)"]
project_name = "video2screens" #@param {type:"string"}

# Set working folder
if "Video file in Google Drive" in input_storage_location:
    working_folder = os.path.join("/content/drive/MyDrive/", project_name)
else:
    working_folder = os.path.join("/content", project_name)

# Check if the working_folder exists, create if not
if not os.path.exists(working_folder):
    os.makedirs(working_folder)

# YouTube video URL
youtube_video_url = "https://www.youtube.com/watch?v=XE8fGjPx6MI"  #@param {type:"string"}

# Download the video
print("Downloading video. Please wait...")
try:
    yt = YouTube(youtube_video_url)
    video_stream = yt.streams.get_highest_resolution()
    video_stream.download(output_path=working_folder)
    print(f"Video downloaded successfully to: {os.path.join(working_folder, yt.title)}")
    video_file_name = f"{yt.title}.mp4"
    print(f"Video File name: {video_file_name}")

except Exception as e:
    print(f"Error downloading the video: {e}")


In [None]:
#@title Step 2. Get Screenshots for a Video

# Define a function to check if a directory exists
def check_directory_exists(directory):
    return os.path.exists(directory)

# Set the working_folder based on input_storage_location
if "Video file in Google Drive" in input_storage_location:
    working_folder = os.path.join("/content/drive/MyDrive/", project_name)
else:
    working_folder = os.path.join("/content", project_name)

# Check if the working_folder exists
if not check_directory_exists(working_folder):
    print(f"The directory '{working_folder}' does not exist.")


print(f"There are {countNumberOfFilesInFolder(working_folder)} files in {working_folder}")
output_storage_location = "Store in colab (/content/)" #@param ["Store in colab (/content/)", "Store in Google Drive (/content/drive/MyDrive/)"]
#@markdown If not provided, the code will attempt to use the `<video_file_name>_output` as the output folder
screenshots_output_folder = "" #@param {type:"string"}
#@markdown If not provided, the code will attempt to use the `video_file_name` from a previous cell.
user_specified_video_file_name = "" #@param {type:"string"}
if user_specified_video_file_name:
    video_file_name = user_specified_video_file_name
print(f"Video file name is set to {video_file_name}")

default_output_folder = f"{video_file_name}_output"
cleaned_default_output_folder = clean_string(default_output_folder)

screenshots_output_folder = screenshots_output_folder or cleaned_default_output_folder
#@markdown Adjust how often you want to screenshot frames. (Example: 30 FPS for 1 minute is 1800 screenshots)
frame_interval = 10 #@param {type:"integer"}
#@markdown NOTE: Progress bar is not accurate if you adjust the frame_interval

#@markdown Check this box if you want to delete the output folder before creating new output
delete_output_flag = True #@param {type:"boolean"}

#@markdown Automatic Zip Output Options
zip_output = False #@param {type:"boolean"}

# Check if video_file_name is empty
assert video_file_name, "Error: video_file_name is empty. Please provide a valid file name."


def delete_output_directory(output_directory):
    if os.path.exists(output_directory):
        subprocess.run(['rm', '-r', output_directory])
        print(f"Output directory '{output_directory}' deleted.")

def run_ffmpeg_command(input_video, screenshots_output_folder, frame_interval):
    # Create the output directory if it doesn't exist
    subprocess.run(['mkdir', '-p', screenshots_output_folder])

    # Get total number of frames in the video
    ffprobe_command = f'ffprobe -v error -select_streams v:0 -show_entries stream=nb_frames -of default=nokey=1:noprint_wrappers=1 "{input_video}"'
    total_frames = int(subprocess.check_output(shlex.split(ffprobe_command)).decode('utf-8').strip())

    # FFmpeg command to extract frames with progress bar
    ffmpeg_command = f'ffmpeg -i "{input_video}" -vf "select=not(mod(n\,{frame_interval})),setpts=N/FRAME_RATE/TB" -vsync vfr "{screenshots_output_folder}/output_frames_%04d.png" -progress pipe:1'

    # Run FFmpeg command with progress bar
    process = subprocess.Popen(shlex.split(ffmpeg_command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, bufsize=1, universal_newlines=True)

    # Parse progress information
    duration_pattern = re.compile(r"Duration: (\d+:\d+:\d+\.\d+),")
    time_pattern = re.compile(r"time=(\d+:\d+:\d+\.\d+)")
    total_duration = None

    with tqdm(total=total_frames, unit="frame", unit_scale=True, desc="Processing") as pbar:
        for line in process.stderr:
            duration_match = duration_pattern.search(line)
            time_match = time_pattern.search(line)

            if duration_match:
                total_duration = duration_match.group(1)

            if time_match and total_duration:
                current_time = time_match.group(1)
                progress_percentage = (time_to_seconds(current_time) / time_to_seconds(total_duration)) * 100
                frames_processed = int(progress_percentage * total_frames / 100)
                pbar.update(frames_processed - pbar.n)

    # Wait for the process to finish
    process.wait()

    # Check for errors
    if process.returncode != 0:
        print(f"\nError: FFmpeg process failed with return code {process.returncode}")
    else:
        print(f"\nFrames extracted successfully. Output directory: {screenshots_output_folder}")

def time_to_seconds(time_str):
    h, m, s = map(float, time_str.split(':'))
    return h * 3600 + m * 60 + s


input_video_path = os.path.join(working_folder, video_file_name)
#set default location to google drive
output_frames_directory = os.path.join(working_folder, screenshots_output_folder)

if output_storage_location == "Store in colab (/content/)":
    # Code for storing in colab session
    storage_path = "/content/"
    output_frames_directory = os.path.join(storage_path, project_name, screenshots_output_folder)
    print("Storing in colab session.")

elif output_storage_location == "Store in Google Drive (/content/drive/MyDrive/)":
    # Code for storing in Google Drive
    storage_path = "/content/drive/MyDrive/"
    output_frames_directory = os.path.join(storage_path, project_name, screenshots_output_folder)
    print("Storing in Google Drive.")


if (delete_output_flag):
  print("delete_output_flag is true")
  delete_output_directory(output_frames_directory)

run_ffmpeg_command(input_video_path, output_frames_directory, frame_interval)

print(f"There are {countNumberOfFilesInFolder(output_frames_directory)} images in the output directory")



if (zip_output):
  print("_" * 50) #Print Horizontal bar
  zip_output_storage_location = "Store in colab (/content/)" #@param ["Store in colab (/content/)", "Store in Google Drive (/content/drive/MyDrive/)"]

  if zip_output_storage_location == "Store in colab (/content/)":
      # Code for storing in colab session
      storage_path = "/content/"
      output_zip_directory = os.path.join(storage_path, screenshots_output_folder)
      print(f"Zipping output in Google Drive: {output_zip_directory}")

  elif zip_output_storage_location == "Store in Google Drive (/content/drive/MyDrive/)":
      # Code for storing in Google Drive
      storage_path = "/content/drive/MyDrive/"
      output_zip_directory = os.path.join(storage_path, project_name, screenshots_output_folder)
      print(f"Zipping output in Google Colab: {output_zip_directory}")

  folder_to_zip = output_frames_directory
  zip_output_path = output_zip_directory

  zip_folder(folder_to_zip, zip_output_path)


In [None]:
#@title Step 3. Detect and Delete Duplicates in Directory
#@markdown ### Note! this cell works best with GPU


#@markdown Choose Google Drive or Google Colab based on where your images folder is
image_directory_location = "Directory in Colab (/content/)" #@param ["Directory in Colab (/content/)", "Directory in Google Drive (/content/drive/MyDrive/)"]
#@markdown This is the folder were your images folder is and the place where the filtered output folder will be created

#@markdown If not provided, the code will attempt to use the `project_name` from a previous cell.
user_specified_project_name = "" #@param {type:"string"}
if user_specified_project_name:
    project_name = user_specified_project_name

#@markdown This is the folder with your images

#@markdown If not provided, the code will attempt to use the `screenshots_output_folder` from a previous cell.
user_specified_image_directory_name = "" #@param {type:"string"}
if user_specified_image_directory_name:
    screenshots_output_folder = user_specified_image_directory_name


filtered_directory_name = screenshots_output_folder + "_filtered"

# Set working folder
if "Google Drive" in image_directory_location:
    directory_location = os.path.join("/content/drive/MyDrive/", project_name)
else:
    directory_location = os.path.join("/content", project_name)

working_folder = os.path.join(directory_location, screenshots_output_folder)
#working_folder = directory_location


assert working_folder, f"Error: {working_folder} does not exist"
filtered_dir = os.path.join(directory_location, filtered_directory_name)


#@markdown This is how similar images should be for marking them to delete. I recommend 0.96 to 0.99 based on your needs:
similarity_threshold = 0.98 # @param {type:"number"}

#@markdown Batch sizes, if you don't know what it is, better don't touch:
embedding_batch_size = 200 # @param {type:"integer"}
similarity_matrix_batch_size = 1000 # @param {type:"integer"}

#@markdown Clip model name. You can choose another model from fiftyone zoo if you want. Just print its name here.
model_name = "clip-vit-base32-torch" # @param {type:"string"}

print ("Detecting Duplicates!")
print (f"There are {countNumberOfFilesInFolder(working_folder)} images in the working folder")
dataset = fo.Dataset.from_dir(working_folder, dataset_type=fo.types.ImageDirectory)

# @markdown This cell will load the images, make embeddings using the selected model, calculate the similarity matrix and find samples to remove.

def make_embeddings(model_name, batch_size):
    model = foz.load_zoo_model(model_name)
    embeddings = dataset.compute_embeddings(model, batch_size=batch_size)

    # Unload the model from the GPU to free up memory
    del model
    torch.cuda.empty_cache()

    return embeddings

def calculate_similarity_matrix(embeddings, batch_size):
    batch_size = min(embeddings.shape[0], batch_size)
    batch_embeddings = np.array_split(embeddings, batch_size)
    similarity_matrices = []

    # Find the maximum size of the arrays
    max_size_x = max(array.shape[0] for array in batch_embeddings)
    max_size_y = max(array.shape[1] for array in batch_embeddings)

    for batch_embedding in batch_embeddings:
        similarity = cosine_similarity(batch_embedding)
        # Pad 0 for np.concatenate
        padded_array = np.zeros((max_size_x, max_size_y))
        padded_array[0:similarity.shape[0], 0:similarity.shape[1]] = similarity
        similarity_matrices.append(padded_array)

    # Concatenate the padded arrays
    similarity_matrix = np.concatenate(similarity_matrices, axis=0)
    similarity_matrix = similarity_matrix[0:embeddings.shape[0], 0:embeddings.shape[0]]

    similarity_matrix = cosine_similarity(embeddings)
    similarity_matrix -= np.identity(len(similarity_matrix))

    return similarity_matrix

def make_samples(dataset, similarity_matrix, threshold=0.98):
    dataset.match(VF("max_similarity") > threshold)
    dataset.tags = ["delete", "has_duplicates"]
    id_map = [s.id for s in dataset.select_fields(["id"])]
    samples_to_remove = set()
    samples_to_keep = set()
    for idx, sample in enumerate(dataset):
      if sample.id not in samples_to_remove:
        # Keep the first instance of two duplicates
        samples_to_keep.add(sample.id)

        dup_idxs = np.where(similarity_matrix[idx] > threshold)[0]
        for dup in dup_idxs:
            # We kept the first instance so remove all other duplicates
            samples_to_remove.add(id_map[dup])
        if len(dup_idxs) > 0:
            sample.tags.append("has_duplicates")
            sample.save()
      else:
        sample.tags.append("delete")
        sample.save()

    return samples_to_remove, samples_to_keep


embeddings = make_embeddings(model_name, embedding_batch_size)

clear_output()
print("Embeddings calculated!")

similarity_matrix = calculate_similarity_matrix(embeddings, similarity_matrix_batch_size)
print("Similarity matrix calculated!")

samples_to_remove, samples_to_keep = make_samples(dataset, similarity_matrix, similarity_threshold)
print(f"Remove percentage: {len(samples_to_remove) / (len(samples_to_remove) + len(samples_to_keep)) * 100}")

del embeddings, similarity_matrix, samples_to_remove, samples_to_keep
torch.cuda.empty_cache()

session = None
print("Detection Done!")
print("_" *50)
print("Deleting Duplicates...")

# @markdown Delete all images marked as "delete".

# @markdown If you want to delete previous images in the folder (if you run this cell before), select this:
delete_previous_filtered_images = True # @param {type:"boolean"}
zip_filtered_output = True # @param {type:"boolean"}

os.makedirs(filtered_dir, exist_ok=True)

if (delete_previous_filtered_images):
    delete_dir(filtered_dir)

kys = [s for s in dataset if "delete" in s.tags]
dataset.delete_samples(kys)
n_filtered = len(dataset)
dataset.export(export_dir=filtered_dir, dataset_type=fo.types.ImageDirectory)

if session is not None:
    session.refresh()
    fo.close_app()

#clear_output()


print(f"Done! Filtered directory is {filtered_dir}")

print("\nSummary:")
print("+-------------------------------------------------+")
print("| Total Images in Screenshots Folder              |")
print("|-------------------------------------------------|")
print(f"| {countNumberOfFilesInFolder(working_folder):<47} |")
print("+-------------------------------------------------+")
print("| Duplicates Removed                              |")
print("|-------------------------------------------------|")
print(f"| {len(kys):<47} |")
print("+-------------------------------------------------+")
print("| Total Images in Filtered Screenshots Folder     |")
print("|-------------------------------------------------|")
print(f"| {countNumberOfFilesInFolder(filtered_dir):<47} |")
print("+-------------------------------------------------+")

if (zip_filtered_output):
  print("_" * 50) #Print Horizontal bar
  zip_filtered_output_storage_location = "Store in Google Drive (/content/drive/MyDrive/)" #@param ["Store in colab (/content/)", "Store in Google Drive (/content/drive/MyDrive/)"]

  if zip_filtered_output_storage_location == "Store in colab (/content/)":
      # Code for storing in colab session
      storage_path = "/content/"
      output_zip_directory = os.path.join(storage_path, project_name)
      print(f"Zipping output in Google Drive: {output_zip_directory}")

  elif zip_filtered_output_storage_location == "Store in Google Drive (/content/drive/MyDrive/)":
      # Code for storing in Google Drive
      storage_path = "/content/drive/MyDrive/"
      output_zip_directory = os.path.join(storage_path, project_name)
      print(f"Zipping output in Google Colab: {output_zip_directory}")

  folder_to_zip = filtered_dir
  zip_output_path = os.path.join(output_zip_directory, filtered_directory_name)

  zip_folder(folder_to_zip, zip_output_path)


In [None]:
#@title Utility: Create Log

import os
import toml
from datetime import datetime

log_storage_location = "Store in Google Drive (/content/drive/MyDrive/)" #@param ["Store in colab (/content/)", "Store in Google Drive (/content/drive/MyDrive/)"]

if log_storage_location == "Store in colab (/content/)":
    # Code for storing in colab session
    storage_path = "/content/"
    output_log_directory = os.path.join(storage_path, project_name)

elif log_storage_location == "Store in Google Drive (/content/drive/MyDrive/)":
    # Code for storing in Google Drive
    storage_path = "/content/drive/MyDrive/"
    output_log_directory = os.path.join(storage_path, project_name)

# Get the current date and time
current_datetime = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

# Extract the video name from the video_file_name
video_name = os.path.splitext(video_file_name)[0] if video_file_name else "null_or_empty_video_name"
video_name = clean_string(video_name)
# Count the number of files before and after filtering
num_images_before_filtering = countNumberOfFilesInFolder(working_folder)
num_images_after_filtering = countNumberOfFilesInFolder(filtered_dir)

# Create a dictionary with the log information
log_data = {
    "ProjectName": video_name or "null or empty",
    "DateTime": current_datetime,
    "YoutubeURL": youtube_video_url or "null or empty",
    "FrameInterval": frame_interval or "null or empty",
    "NumImagesBeforeFiltering": num_images_before_filtering,
    "DuplicateImagesDeleted": len(kys),
    "NumImagesAfterFiltering": num_images_after_filtering
}

# Convert the dictionary to TOML format
log_toml = toml.dumps(log_data)

# Create the log file name
log_file_name = f"{video_name}_{current_datetime}.toml"

# Save the log to a file in the parent directory
log_file_path = os.path.join(output_log_directory, log_file_name)
with open(log_file_path, "w") as log_file:
    log_file.write(log_toml)

# Print a message indicating the log file location
print(f"Log file created at: {log_file_path}")


In [None]:
#@title Utility: Search for a Project
#@markdown Use this Cell to find the full path to the folder you want to work on.

#@markdown Note: This is slow for the first run but will be fast on subsequent runs


#@markdown Select location where your project is
directory_location = "Directory in Google Drive (/content/drive/MyDrive/)" #@param ["Directory in Colab (/content/)", "Directory in Google Drive (/content/drive/MyDrive/)"]
#@markdown Specify a more detailed path (this is optional but if you dont provide it then the entire Google Drive or Colab instance will searched)
extra_path = "" #@param {type:"string"}
#@markdown This is the folder with your images
name_to_search_for = "a" #@param {type:"string"}


def search_directory(root_directory, search_string):
    matching_files = []
    print(f"Searching in directory: '{root_directory}' for something containing: '{search_string}'")
    # Get the total number of files/folders to process
    total_items = sum([len(files) + len(subfolders) for _, subfolders, files in os.walk(root_directory)])

    # Set up the tqdm progress bar
    progress_bar = tqdm(total=total_items, desc="Searching", unit="file")

    for foldername, subfolders, filenames in os.walk(root_directory):
        for item in filenames + subfolders:
            if search_string in item:
                matching_files.append(os.path.join(foldername, item))

            # Update the progress bar
            progress_bar.update(1)

    # Close the progress bar
    progress_bar.close()

    return matching_files

if "Google Drive" in directory_location:
    directory_location = "/content/drive/MyDrive/"
else:
    directory_location = "/content"

root_directory = os.path.join(directory_location, extra_path)
search_string = name_to_search_for

result = search_directory(root_directory, search_string)

if result:
    print(f"Matching files/folders:")
    for item in result:
        print(item)
else:
    print("No matching files/folders found.")



In [None]:
#@title Utility: Reconnect to Google Drive
#@markdown Use this if Google Colab is not seeing recently changed items in Google Drive.

# Unmount Google Drive
drive.flush_and_unmount()

# Mount Google Drive
print("📂 Connecting to Google Drive...")
drive.mount('/content/drive')


In [None]:
#@title Zip the Screenshots Output Folder

zip_output_storage_location = "Store in colab (/content/)" #@param ["Store in colab (/content/)", "Store in Google Drive (/content/drive/MyDrive/)"]

if zip_output_storage_location == "Store in colab (/content/)":
    # Code for storing in colab session
    storage_path = "/content/"
    output_zip_directory = os.path.join(storage_path, screenshots_output_folder)
    print(f"Zipping output in Google Drive: {output_zip_directory}")

elif zip_output_storage_location == "Store in Google Drive (/content/drive/MyDrive/)":
    # Code for storing in Google Drive
    storage_path = "/content/drive/MyDrive/"
    output_zip_directory = os.path.join(storage_path, project_name, screenshots_output_folder)
    print(f"Zipping output in Google Colab: {output_zip_directory}")

folder_to_zip = output_frames_directory
zip_output_path = output_zip_directory

zip_folder(folder_to_zip, zip_output_path)