<a href="https://colab.research.google.com/github/LifeHackInnovationsLLC/whisper-video-transcription/blob/main/LHI_WhisperVideoDrive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [61]:
# LHI_WhisperVideoDrive.py

In [62]:
# ---
# jupyter:
#   jupytext:
#     formats: ipynb,py:percent
#     text_representation:
#       extension: .py
#       format_name: percent
#       format_version: '1.3'
#       jupytext_version: 1.16.5
#   kernelspec:
#     display_name: Python 3
#     name: python3
# ---

<a href="https://colab.research.google.com/github/LifeHackInnovationsLLC/whisper-video-transcription/blob/main/LHI_WhisperVideoDrive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Jupytext Initialization (Sync Logic)
Ensure Jupytext is installed and the notebook is paired with the `.py` file.

import subprocess
import sys

def ensure_module(module_name, install_name=None):
    """Install a module if it's not already installed."""
    try:
        __import__(module_name)
        print(f"Module '{module_name}' is already installed.")
    except ImportError:
        install_name = install_name or module_name
        print(f"Module '{module_name}' not found. Installing...")
        subprocess.run([sys.executable, "-m", "pip", "install", install_name], check=True)

Ensure Jupytext is installed
ensure_module("jupytext")

Sync the notebook with its paired `.py` file
try:
    subprocess.run(["jupytext", "--sync", "LHI_WhisperVideoDrive.ipynb"], check=True)
    print("Jupytext synchronization successful.")
except subprocess.CalledProcessError as e:
    print(f"Error during Jupytext synchronization: {e}")

In [63]:
# Handle missing modules and Google Colab environment checks

import subprocess
import sys


# Install and import required modules
required_modules = {
    "google.colab": "google-colab",
    "whisper": "openai-whisper",
    "librosa": "librosa",
    "soundfile": "soundfile",
    "colorama": "colorama",
    "google-api-python-client": "google-api-python-client",
    "google-auth-httplib2": "google-auth-httplib2",
    "google-auth-oauthlib": "google-auth-oauthlib"
}


for module, install_name in required_modules.items():
    try:
        __import__(module)
        print(f"Module '{module}' is already installed.")
    except ImportError:
        print(f"Module '{module}' not found. Installing...")
        subprocess.run([sys.executable, "-m", "pip", "install", install_name], check=True)

# Conditional import for Google Colab
try:
    from google.colab import drive
    print("Google Colab environment detected.")
except ImportError:
    print("Google Colab environment not detected. Skipping Colab imports.")

# Import other required modules
import whisper
import librosa
import soundfile as sf



Module 'google.colab' is already installed.
Module 'whisper' is already installed.
Module 'librosa' is already installed.
Module 'soundfile' is already installed.
Module 'colorama' is already installed.
Module 'google-api-python-client' not found. Installing...
Module 'google-auth-httplib2' not found. Installing...
Module 'google-auth-oauthlib' not found. Installing...
Google Colab environment detected.



#📼 OpenAI Whisper + Google Drive Video Transcription

📺 Getting started video: https://youtu.be/YGpYinji7II

###This application will extract audio from all the video files in a Google Drive folder and create a high-quality transcription with OpenAI's Whisper automatic speech recognition system.

*Note: This requires giving the application permission to connect to your drive. Only you will have access to the contents of your drive, but please read the warnings carefully.*

This notebook application:
1. Connects to your Google Drive when you give it permission.
2. Creates a WhisperVideo folder and three subfolders (ProcessedVideo, AudioFiles and TextFiles.)
3. When you run the application it will search for all the video files (.mp4, .mov, mkv and .avi) in your WhisperVideo folder, transcribe them and then move the file to WhisperVideo/ProcessedVideo and save the transcripts to WhisperVideo/TextFiles. It will also add a copy of the new audio file to WhisperVideo/AudioFiles

###**For faster performance set your runtime to "GPU"**
*Click on "Runtime" in the menu and click "Change runtime type". Select "GPU".*


**Note: If you add a new file after running this application you'll need to remount the drive in step 1 to make them searchable**

##0. Choose which 'LHI Client' or folder to add transcriptions to

In [64]:
import os
import subprocess
import sys
from colorama import Fore, Style, init
from google.colab import drive
from google.colab import auth
from googleapiclient.discovery import build
from tabulate import tabulate


init(autoreset=True)

# Global registry
registry_entries = []

def add_to_registry(entry_type, name, path, entity_id=None, is_file=False):
    """Add or update an entity in the registry."""
    url = None
    if entity_id:
        if is_file:
            url = f"https://drive.google.com/file/d/{entity_id}/view"
        else:
            url = f"https://drive.google.com/drive/folders/{entity_id}"

    # Update or add
    for e in registry_entries:
        if e["path"] == path:
            e["type"] = entry_type
            e["name"] = name
            e["id"] = entity_id
            e["url"] = url if url else e["url"]
            return

    registry_entries.append({
        "type": entry_type,
        "name": name,
        "path": path,
        "id": entity_id,
        "url": url
    })

def remove_from_registry_by_path(path):
    global registry_entries
    registry_entries = [e for e in registry_entries if e["path"] != path]

def print_registry_table():
    """Print a table of all registered entries."""
    headers = ["Type", "Name", "Path", "ID", "URL"]
    table_data = []
    for e in registry_entries:
        table_data.append([
            e["type"],
            e["name"],
            e["path"],
            e["id"] if e["id"] else "-",
            e["url"] if e["url"] else "-"
        ])
    print(Fore.CYAN + "=== REGISTRY TABLE ===")
    print(tabulate(table_data, headers=headers, tablefmt="fancy_grid"))

def check_and_mount_drive():
    print("Checking /content/drive status...")
    if os.path.exists("/content/drive"):
        print("Mount directory exists. Checking contents...")
        if os.listdir("/content/drive"):
            print("Mountpoint already contains files. Attempting to unmount...")
            print("Unmounted successfully or already unmounted.")

    # Mount Google Drive
    print("Mounting Google Drive...")
    drive.mount("/content/drive", force_remount=True)
    print("Google Drive mounted successfully.")

    # Verify mount
    if os.path.exists("/content/drive/MyDrive"):
        print("Drive is mounted and ready.")
        return True
    else:
        print("Mounting seems incomplete. Please check your drive configuration.")
        return False

def initialize_drive_api():
    """
    Initialize Google Drive API using OAuth user credentials.
    This will prompt for user authentication.
    """
    print(Fore.CYAN + "Initializing Google Drive API using OAuth (User Credentials)...")
    try:
        auth.authenticate_user()  # This will prompt you to authorize the app
        service = build("drive", "v3")
        print(Fore.GREEN + "Google Drive API service initialized successfully as the user.")
        return service
    except Exception as e:
        print(Fore.RED + f"Failed to initialize Google Drive API: {e}")
        return None

drive_service = initialize_drive_api()


def get_file_id(file_name, folder_id):
    """
    Retrieve the file ID for a given file name in a specific folder on Google Drive.
    """
    try:
        results = drive_service.files().list(
            q=f"name='{file_name}' and '{folder_id}' in parents",
            spaces="drive",
            fields="files(id, name)",
            pageSize=1
        ).execute()
        items = results.get("files", [])
        if items:
            return items[0]["id"]
        else:
            print(Fore.YELLOW + f"File '{file_name}' not found in folder {folder_id}.")
            return None
    except Exception as e:
        print(Fore.RED + f"Error retrieving file ID for '{file_name}': {e}")
        return None

def generate_shareable_link(file_id):
    """
    Generate a shareable link for a given Google Drive file.
    """
    print(Fore.CYAN + f"Generating shareable link for file ID: {file_id}...")
    if drive_service is None:
        print(Fore.RED + "Drive service not initialized. Cannot generate link.")
        return None
    try:
        permission = {"type": "anyone", "role": "reader"}
        drive_service.permissions().create(fileId=file_id, body=permission).execute()
        link = f"https://drive.google.com/file/d/{file_id}/view"
        print(Fore.GREEN + f"Shareable link generated successfully: {link}")
        return link
    except Exception as e:
        print(Fore.RED + f"Failed to generate shareable link: {e}")
        return None

def get_or_create_folder(drive_service, folder_name, parent_id):
    """
    Retrieve or create a folder in Google Drive given a name and parent folder ID.
    """
    try:
        query = f"name='{folder_name}' and mimeType='application/vnd.google-apps.folder' and '{parent_id}' in parents"
        results = drive_service.files().list(
            q=query,
            spaces="drive",
            fields="files(id, name)",
            pageSize=1
        ).execute()
        items = results.get("files", [])

        if items:
            folder_id = items[0]["id"]
            print(Fore.GREEN + f"Folder '{folder_name}' found with ID: {folder_id}")
            return folder_id
        else:
            folder_metadata = {
                "name": folder_name,
                "mimeType": "application/vnd.google-apps.folder",
                "parents": [parent_id]
            }
            folder = drive_service.files().create(body=folder_metadata, fields="id").execute()
            folder_id = folder.get("id")
            print(Fore.GREEN + f"Folder '{folder_name}' created with ID: {folder_id}")
            return folder_id
    except Exception as e:
        print(Fore.RED + f"Error creating or retrieving folder '{folder_name}': {e}")
        return None

folder_id_cache = {}

def get_folder_id_from_path(drive_service, local_path):
    if local_path in folder_id_cache:
        return folder_id_cache[local_path]

    prefix = "/content/drive/MyDrive/"
    if not local_path.startswith(prefix):
        print(Fore.RED + "The path does not start with /content/drive/MyDrive/.")
        return None

    relative_path = local_path[len(prefix):].strip("/")
    if not relative_path:
        folder_id_cache[local_path] = "root"
        return "root"

    parts = relative_path.split("/")
    current_parent_id = "root"
    for part in parts:
        folder_id = get_or_create_folder(drive_service, part, current_parent_id)
        if not folder_id:
            print(Fore.RED + f"Failed to navigate/create the folder for part: {part}")
            return None
        current_parent_id = folder_id

    # Cache the final folder ID
    folder_id_cache[local_path] = current_parent_id
    return current_parent_id


# Attempt to check and mount the drive
if check_and_mount_drive():
    print("Proceeding...")
else:
    print("Drive mount failed. Exiting.")
    raise SystemExit("Drive mount failed.")

drive_service = initialize_drive_api()

# Predefined options for client folders
clients = {
    "1": "/content/drive/MyDrive/Clients/WCBradley/Videos/",
    "2": "/content/drive/MyDrive/Clients/SiriusXM/Videos/",
    "3": "/content/drive/MyDrive/Clients/LHI/Videos/"
}

print("Select a client folder:")
print("1: WCBradley")
print("2: SiriusXM")
print("3: LHI")
print("4: Enter a custom folder path")

choice = input("Enter the number corresponding to your choice (default: 1): ").strip()
if choice in clients:
    client_videos_folder = clients[choice]
elif choice == "4":
    client_videos_folder = input("Enter the full path to your Videos folder: ").strip()
else:
    client_videos_folder = clients["1"]

rootFolder = client_videos_folder + "WhisperVideo/"
audio_folder = rootFolder + "AudioFiles/"
text_folder = rootFolder + "TextFiles/"
processed_folder = rootFolder + "ProcessedVideo/"

# Ensure local folders exist
folders = [rootFolder, audio_folder, text_folder, processed_folder]
for folder in folders:
    try:
        print(f"Checking folder: {folder}")
        folder_name = os.path.basename(os.path.normpath(folder))
        if not os.path.exists(folder):
            os.makedirs(folder)
            print(Fore.GREEN + f"Created folder: {folder}")
        else:
            print(Fore.GREEN + f"Folder already exists: {folder}")
        # Register locally. No ID yet.
        add_to_registry("folder", folder_name, folder)
    except Exception as e:
        print(Fore.RED + f"Error ensuring folder {folder}: {e}")

print(Fore.CYAN + f"WhisperVideo folder and subfolders initialized for client:")
print(Fore.GREEN + f"WhisperVideo folder: {rootFolder}")
print(Fore.GREEN + f"Audio files folder: {audio_folder}")
print(Fore.GREEN + f"Text files folder: {text_folder}")
print(Fore.GREEN + f"Processed videos folder: {processed_folder}")

# Now get or create these folders in Google Drive to get their IDs
if drive_service:

# Get folder names
    audio_name = os.path.basename(os.path.normpath(audio_folder))
    text_name = os.path.basename(os.path.normpath(text_folder))
    processed_name = os.path.basename(os.path.normpath(processed_folder))

    rootFolderID = get_folder_id_from_path(drive_service, rootFolder)
    audio_folder_id = get_or_create_folder(drive_service, audio_name, rootFolderID)
    text_folder_id = get_or_create_folder(drive_service, text_name, rootFolderID)
    processed_folder_id = get_or_create_folder(drive_service, processed_name, rootFolderID)

    if rootFolderID:
        root_name = os.path.basename(os.path.normpath(rootFolder))
        add_to_registry("folder", root_name, rootFolder, rootFolderID, is_file=False)

    audio_name = os.path.basename(os.path.normpath(audio_folder))
    text_name = os.path.basename(os.path.normpath(text_folder))
    processed_name = os.path.basename(os.path.normpath(processed_folder))

    audio_id = get_or_create_folder(drive_service, audio_name, rootFolderID)
    if audio_id:
        add_to_registry("folder", audio_name, audio_folder, audio_id, is_file=False)

    text_id = get_or_create_folder(drive_service, text_name, rootFolderID)
    if text_id:
        add_to_registry("folder", text_name, text_folder, text_id, is_file=False)

    processed_id = get_or_create_folder(drive_service, processed_name, rootFolderID)
    if processed_id:
        add_to_registry("folder", processed_name, processed_folder, processed_id, is_file=False)

# Print the updated registry table with IDs and URLs
print_registry_table()


Initializing Google Drive API using OAuth (User Credentials)...
Google Drive API service initialized successfully as the user.
Checking /content/drive status...
Mount directory exists. Checking contents...
Mountpoint already contains files. Attempting to unmount...
Unmounted successfully or already unmounted.
Mounting Google Drive...
Mounted at /content/drive
Google Drive mounted successfully.
Drive is mounted and ready.
Proceeding...
Initializing Google Drive API using OAuth (User Credentials)...
Google Drive API service initialized successfully as the user.
Select a client folder:
1: WCBradley
2: SiriusXM
3: LHI
4: Enter a custom folder path
Enter the number corresponding to your choice (default: 1): 1
Checking folder: /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/
Folder already exists: /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/
Checking folder: /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/
Created folder: /content/drive/M

##1. Load the code libraries

In [65]:
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
!pip install librosa
!pip install audioread

import whisper
import time
import librosa
import soundfile as sf
import re
import os

# model = whisper.load_model("tiny.en")
model = whisper.load_model("base.en")
# model = whisper.load_model("small.en") # load the small model
# model = whisper.load_model("medium.en")
# model = whisper.load_model("large")

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-ns08rq3_
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-ns08rq3_
  Resolved https://github.com/openai/whisper.git to commit 90db0de1896c23cbfaf0c58bc2d30665f709f170
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu 

  checkpoint = torch.load(fp, map_location=device)


In [66]:
# from colorama import Fore, Style, init
# from googleapiclient.discovery import build
# from google.oauth2.service_account import Credentials  # Ensure this import is included
# from google.colab import drive

# print(Fore.CYAN + "Attempting to mount Google Drive...")
# drive.mount('/content/drive', force_remount=True)
# print(Fore.GREEN + "Google Drive mounted successfully.")

# # Initialize colorama for console color support
# init(autoreset=True)
# print(Fore.CYAN + "Colorama initialized for console color support.")

# # Google Drive API setup
# def initialize_drive_api():
#     """
#     Initialize Google Drive API service account for generating shareable links.
#     """
#     print(Fore.CYAN + "Initializing Google Drive API...")
#     try:
#         credentials = Credentials.from_service_account_file(
#             "/content/drive/MyDrive/key.json",
#             scopes=["https://www.googleapis.com/auth/drive"]
#         )
#         service = build("drive", "v3", credentials=credentials)
#         print(Fore.GREEN + "Google Drive API service initialized successfully.")
#         return service
#     except Exception as e:
#         print(Fore.RED + f"Failed to initialize Google Drive API: {e}")
#         return None

# drive_service = initialize_drive_api()

# def get_file_id(file_name, folder_id):
#     """
#     Retrieve the file ID for a given file name in a specific folder on Google Drive.
#     """
#     print(Fore.CYAN + f"Searching for file '{file_name}' in folder ID '{folder_id}'...")
#     if drive_service is None:
#         print(Fore.RED + "Drive service not initialized. Cannot proceed.")
#         return None
#     try:
#         results = drive_service.files().list(
#             q=f"name='{file_name}' and '{folder_id}' in parents",
#             spaces="drive",
#             fields="files(id, name)",
#             pageSize=1
#         ).execute()
#         items = results.get("files", [])
#         if items:
#             file_id = items[0]["id"]
#             print(Fore.GREEN + f"File '{file_name}' found with ID: {file_id}")
#             return file_id
#         else:
#             print(Fore.YELLOW + f"File '{file_name}' not found in folder {folder_id}.")
#             return None
#     except Exception as e:
#         print(Fore.RED + f"Error retrieving file ID for '{file_name}': {e}")
#         return None

# def generate_shareable_link(file_id):
#     """
#     Generate a shareable link for a given Google Drive file.
#     """
#     print(Fore.CYAN + f"Generating shareable link for file ID: {file_id}...")
#     if drive_service is None:
#         print(Fore.RED + "Drive service not initialized. Cannot generate link.")
#         return None
#     try:
#         permission = {"type": "anyone", "role": "reader"}
#         drive_service.permissions().create(fileId=file_id, body=permission).execute()
#         link = f"https://drive.google.com/file/d/{file_id}/view"
#         print(Fore.GREEN + f"Shareable link generated successfully: {link}")
#         return link
#     except Exception as e:
#         print(Fore.RED + f"Failed to generate shareable link: {e}")
#         return None

# # Example usage (uncomment to test):
# # folder_id = "YOUR_FOLDER_ID"
# # file_name = "test.txt"
# # file_id = get_file_id(file_name, folder_id)
# # if file_id:
# #     link = generate_shareable_link(file_id)


##2. Give the application permission to mount the drive and create the folders

In [67]:
# # Mount Google Drive
# from google.colab import drive
# drive.mount("/content/drive", force_remount=True)  # This will prompt for authorization.

# import os

# # Ensure WhisperVideo folder and its subfolders exist
# folders = [rootFolder, audio_folder, text_folder, processed_folder]
# for folder in folders:
#     try:
#         if not os.path.exists(folder):
#             os.makedirs(folder)
#             print(f"Created folder: {folder}")
#         else:
#             print(f"Folder already exists: {folder}")
#     except Exception as e:
#         print(f"Error ensuring folder {folder}: {e}")

# print(f"All folders verified and ready under: {rootFolder}")

In [68]:
import os
import shutil
import subprocess
import logging
import csv
import time
from datetime import datetime, timedelta
import librosa
import soundfile as sf
import whisper
from googleapiclient.http import MediaFileUpload
import sys


# === Processing Script ===

# Assuming the initial setup has already been run:
# - drive_service is initialized and authenticated
# - helper functions like get_folder_id_from_path, get_or_create_folder are defined
# - registry_entries are loaded from registry.json
# - folders are created and registered


import torch
import subprocess

def detect_hardware_accelerator():
    """Detect the hardware accelerator (GPU type) being used."""
    if torch.cuda.is_available():
        gpu_name = torch.cuda.get_device_name(0)
        return gpu_name
    else:
        # Check if TPU is available
        try:
            import tensorflow as tf
            tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
            return "TPU"
        except ValueError:
            return "None"


print(f"Hardware Accelerator Detected: {hardware_accel}")

def detect_high_ram():
    """Determine if the Colab environment is using high RAM."""
    try:
        # Check the total memory available
        total_ram = subprocess.check_output(['grep', 'MemTotal', '/proc/meminfo']).decode('utf-8')
        total_ram_kb = int(total_ram.split()[1])
        total_ram_gb = total_ram_kb / (1024 ** 2)  # Convert KB to GB

        # High RAM in Colab is typically around 25GB
        if total_ram_gb >= 24:
            return "Yes"
        else:
            return "No"
    except Exception as e:
        print(f"Error detecting high RAM: {e}")
        return "Unknown"



def get_runtime_type():
    """Retrieve the runtime type information."""
    try:
        python_version = sys.version.split()[0]  # e.g., '3.8.10'
        runtime_info = f"Python {python_version}"
        return runtime_info
    except Exception as e:
        print(f"Error retrieving runtime type: {e}")
        return "Unknown"



def initialize_environment():
    global hardware_accel, high_ram, runtime_type
    hardware_accel = detect_hardware_accelerator()
    high_ram = detect_high_ram()
    runtime_type = get_runtime_type()
    print(f"Hardware Accelerator Detected: {hardware_accel}")
    print(f"High RAM Enabled: {high_ram}")
    print(f"Runtime Type: {runtime_type}")

initialize_environment()


# Define your folder paths (ensure these match your initial setup)

audio_folder = os.path.join(rootFolder, "AudioFiles/")
text_folder = os.path.join(rootFolder, "TextFiles/")
processed_folder = os.path.join(rootFolder, "ProcessedVideo/")

# Load the existing registry (if not already loaded)
# If your initial setup already loaded it, you can skip this
import json

registry_file_path = os.path.join(rootFolder, "registry.json")

def load_registry():
    """Load the registry from a JSON file."""
    global registry_entries
    if os.path.exists(registry_file_path):
        with open(registry_file_path, "r", encoding="utf-8") as f:
            registry_entries = json.load(f)
        print("Registry loaded successfully.")
    else:
        registry_entries = []
        print("Registry file not found. Starting with an empty registry.")

def save_registry():
    """Save the registry to a JSON file."""
    with open(registry_file_path, "w", encoding="utf-8") as f:
        json.dump(registry_entries, f, ensure_ascii=False, indent=4)
    print("Registry saved successfully.")

# If registry_entries is not already loaded, load it
try:
    registry_entries
except NameError:
    load_registry()

# Ensure local folders exist
for folder in [rootFolder, audio_folder, text_folder, processed_folder]:
    if not os.path.exists(folder):
        os.makedirs(folder)
        print(f"Created folder: {folder}")
    else:
        print(f"Folder exists: {folder}")

# Initialize Whisper model
model = whisper.load_model("base")  # Adjust the model as needed



# Configure logging
logging.basicConfig(
    filename=os.path.join(rootFolder, "processing_log.txt"),
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
)

# Helper functions to check registry

# Import timedelta for format_time
from datetime import timedelta

def format_time(seconds):
    """Convert seconds to HH:MM:SS format."""
    return str(timedelta(seconds=int(seconds)))

def move_file_in_drive(drive_service, file_id, old_parent_id, new_parent_id):
    """
    Move a file in Google Drive from old_parent_id to new_parent_id.

    Args:
        drive_service: Authenticated Google Drive service instance.
        file_id (str): The ID of the file to move.
        old_parent_id (str): The ID of the current parent folder.
        new_parent_id (str): The ID of the target parent folder.

    Returns:
        str: The updated file ID after moving.
    """
    try:
        # Retrieve the existing parents to remove
        file_info = drive_service.files().get(fileId=file_id, fields='parents').execute()
        parents = file_info.get('parents', [])

        if old_parent_id in parents:
            parents.remove(old_parent_id)

        # Update the file's parents
        updated_file = drive_service.files().update(
            fileId=file_id,
            addParents=new_parent_id,
            removeParents=old_parent_id,
            fields='id, parents'
        ).execute()

        print(f"Moved file ID {file_id} from parent {old_parent_id} to {new_parent_id}.")
        return updated_file.get('id')

    except Exception as e:
        print(f"Error moving file ID {file_id}: {e}")
        return None

# Restore folder IDs
# Ensure that `text_name` and `processed_name` are correctly defined
text_folder_id = get_or_create_folder(drive_service, text_name, rootFolderID)
processed_folder_id = get_or_create_folder(drive_service, processed_name, rootFolderID)

# Register these folders in the registry
if text_folder_id:
    add_to_registry("folder", text_name, text_folder, entity_id=text_folder_id, is_file=False)

if processed_folder_id:
    add_to_registry("folder", processed_name, processed_folder, entity_id=processed_folder_id, is_file=False)

# Optional: Clear old local audio and text files before starting
def clear_old_files(folder, extension):
    for f in os.listdir(folder):
        if f.endswith(extension):
            try:
                os.remove(os.path.join(folder, f))
                print(f"Removed old file: {f}")
            except Exception as e:
                print(f"Error removing file {f}: {e}")

clear_old_files(audio_folder, ".wav")
clear_old_files(text_folder, ".txt")

print("Initial Audio directory:", os.listdir(audio_folder))
print("Initial Text directory:", os.listdir(text_folder))

def file_in_registry_with_id(path):
    return any(e["path"] == path and e.get("id") for e in registry_entries)

def file_in_registry(path):
    return any(e["path"] == path for e in registry_entries)

# Function to get file bases (without extension)
def get_file_bases(folder):
    return {os.path.splitext(f)[0] for f in os.listdir(folder) if os.path.isfile(os.path.join(folder, f))}

# Function to verify folder state
def verify_folder_state():
    videos = get_file_bases(processed_folder)
    audios = get_file_bases(audio_folder)
    texts = get_file_bases(text_folder)
    all_match = (videos == audios == texts)

    if not all_match:
        print("WARNING: Folder parity mismatch detected:")
        print(f"Processed Videos: {len(videos)} ({videos})")
        print(f"Audio Files: {len(audios)} ({audios})")
        print(f"Text Files: {len(texts)} ({texts})")

    return all_match

# Function to remove duplicate files with "(1)" in their names
def remove_duplicates(folder):
    for fname in os.listdir(folder):
        base, ext = os.path.splitext(fname)
        if "(1)" in base:
            try:
                print(f"Removing duplicate file: {fname}")
                os.remove(os.path.join(folder, fname))
            except Exception as e:
                print(f"Error removing duplicate file {fname}: {e}")

# Function to clean incomplete text files
def clean_incomplete_text_files(text_folder):
    incomplete_files = []
    for fname in os.listdir(text_folder):
        if fname.lower().endswith(".txt"):
            file_path = os.path.join(text_folder, fname)
            try:
                with open(file_path, "r", encoding="utf-8") as f:
                    first_line = f.readline().strip()
                    if not first_line.startswith("Original Video Link:"):
                        print(f"Removing incomplete text file: {fname}")
                        os.remove(file_path)
                        incomplete_files.append(fname)
            except Exception as e:
                print(f"Error checking file {fname}: {e}")
                # Optionally, remove or handle the file
    # Remove incomplete files from registry
    for fname in incomplete_files:
        path = os.path.join(text_folder, fname)
        remove_from_registry_by_path(path)
    return incomplete_files

# Function to verify and cleanup registry
def verify_and_cleanup_registry(text_folder, registry_entries):
    # Get all .txt files in text_folder
    txt_files = {f for f in os.listdir(text_folder) if f.lower().endswith(".txt")}

    # Get all .txt file names from registry
    registry_txt_filenames = {os.path.basename(e["path"]) for e in registry_entries if e["path"].lower().endswith(".txt")}

    # Identify extra .txt files not in registry
    extra_txt_files = txt_files - registry_txt_filenames

    # Remove extra .txt files
    for fname in extra_txt_files:
        file_path = os.path.join(text_folder, fname)
        try:
            print(f"Removing extra text file not in registry: {fname}")
            os.remove(file_path)
            # Optionally, log this removal
            logging.info(f"Removed extra text file not in registry: {fname}")
        except Exception as e:
            print(f"Error removing extra text file {fname}: {e}")

    return extra_txt_files

# Function to upload final files and register them
def register_and_upload_final_file(drive_service, entry_type, file_name, file_path, parent_folder_id, url=None):
    """
    Upload the final file to Drive and register it in the registry.
    """
    if file_in_registry_with_id(file_path):
        return None, None
    else:
        if file_in_registry(file_path):
            remove_from_registry_by_path(file_path)
        file_id = upload_file_to_drive(drive_service, file_path, parent_folder_id)
        if file_id:
            shareable_link = generate_shareable_link(file_id)
            add_to_registry(entry_type, file_name, file_path, entity_id=file_id, is_file=True)

            return file_id, shareable_link
        else:
            return None, None

# Function to upload file to Drive
def upload_file_to_drive(drive_service, file_path, parent_folder_id):
    """
    Upload a file to Google Drive and return its file ID.
    """
    file_name = os.path.basename(file_path)
    file_metadata = {
        'name': file_name,
        'parents': [parent_folder_id]
    }
    media = MediaFileUpload(file_path, resumable=True)
    try:
        file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
        print(f"Uploaded file '{file_name}' with ID: {file.get('id')}")
        return file.get('id')
    except Exception as e:
        print(f"Error uploading file '{file_name}': {e}")
        return None

# Initialize logs
success_log = []
error_log = []
skipped_log = []

# List all video files in the rootFolder
video_files = [f for f in os.listdir(rootFolder) if os.path.isfile(os.path.join(rootFolder, f))]

# Store processing details for videos
video_details = []

for video_file in video_files:
    if video_file == "processing_report.txt":
        continue
    if not video_file.lower().endswith((".mp4", ".mov", ".avi", ".mkv")):
        skipped_log.append((video_file, "Invalid video format"))
        print(f"Skipped {video_file}: Invalid video format.")
        continue

    start_time = time.time()
    runtime_type = "Python"
    hardware_accel = "None"  # Adjust if you know GPU usage
    high_ram = "No"
    library_used = "librosa"
    original_video_size_mb = os.path.getsize(os.path.join(rootFolder, video_file)) / (1024*1024)

    base_name = os.path.splitext(video_file)[0]
    video_path = os.path.join(rootFolder, video_file)
    audio_path = os.path.join(audio_folder, base_name + ".wav")
    text_path = os.path.join(text_folder, base_name + ".txt")
    processed_path = os.path.join(processed_folder, video_file)

    if os.path.exists(processed_path):
        print(f"Video {video_file} already processed. Skipping.")
        skipped_log.append((video_file, "Already processed"))
        continue

    if not file_in_registry(video_path):
        add_to_registry("file", video_file, video_path, entity_id=None, is_file=True)

    video_id = get_file_id(video_file, rootFolderID)
    if video_id:
        remove_from_registry_by_path(video_path)
        add_to_registry("file", video_file, video_path, entity_id=video_id, is_file=True)

    print(f"\nProcessing {video_file}:")
    print("Audio directory:", os.listdir(audio_folder))
    print("Text directory:", os.listdir(text_folder))

    try:
        need_ffmpeg = False
        if not os.path.exists(audio_path):
            print(f"Extracting audio for {video_file} to {audio_path}")
            try:
                y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
                sf.write(audio_path, y, sr)
                print(f"Audio extraction successful using librosa for {video_file}")
            except Exception as e_librosa:
                print(f"Librosa extraction failed for {video_file}: {e_librosa}. Using ffmpeg...")
                subprocess.run([
                    "ffmpeg", "-i", os.path.join(rootFolder, video_file),
                    "-ar", "16000", "-ac", "1", audio_path
                ], check=True)
                print(f"Audio extraction successful using ffmpeg for {video_file}")
                need_ffmpeg = True
        else:
            print(f"Audio file {audio_path} already exists.")

        if need_ffmpeg:
            library_used = "ffmpeg"

        print(f"Uploading audio file {os.path.basename(audio_path)}...")
        # Upload and register only the final audio file
        audio_file_id, audio_link = register_and_upload_final_file(
            drive_service, "file", os.path.basename(audio_path), audio_path, audio_folder_id, None
        )

        need_transcription = not os.path.exists(text_path)

        if need_transcription:
            print(f"Starting transcription for {audio_path}")
            result = model.transcribe(audio_path)
            print(f"Transcription completed for {audio_path}")

            transcription_text = ""
            for segment in result["segments"]:
                start_s = segment["start"]
                end_s = segment["end"]
                start_time_str = format_time(start_s)
                end_time_str = format_time(end_s)
                text_segment = segment["text"].strip()
                transcription_text += f"[{start_time_str} - {end_time_str}] {text_segment}\n\n"

            # Write transcription to a temporary local file
            temp_text_path = "/tmp/" + base_name + ".txt.tmp"
            print(f"Saving transcription to temporary local file {temp_text_path}")
            with open(temp_text_path, "w", encoding="utf-8") as f:
                f.write(transcription_text)

            # Generate shareable link for the processed video
            print(f"Generating shareable link for processed video {video_file}...")
            processed_video_link = generate_shareable_link(video_id) if video_id else ""

            if processed_video_link:
                # Prepend the original video link to the transcription
                final_transcription_content = f"Original Video Link: {processed_video_link}\n\n{transcription_text}"
                print(f"Saving final transcription file to {text_path}")
                with open(text_path, "w", encoding="utf-8") as f:
                    f.write(final_transcription_content)

                # Upload and register only the final .txt file
                print(f"Uploading final text file {os.path.basename(text_path)}...")
                text_file_id, text_link = register_and_upload_final_file(
                    drive_service, "file", os.path.basename(text_path), text_path, text_folder_id, None
                )

                # Remove the temporary local .txt file
                os.remove(temp_text_path)
                print(f"Removed temporary local file {temp_text_path}")

            else:
                print(f"No video ID found for {video_file}. Skipping link insertion.")
                # Optionally, handle cases where processed_video_link is not available
                text_file_id, text_link = register_and_upload_final_file(
                    drive_service, "file", os.path.basename(text_path), text_path, text_folder_id, None
                )

        else:
            print(f"Text file {text_path} already exists, not retranscribing.")
            # Retrieve the existing text file's ID and URL from registry
            existing_entry = next((e for e in registry_entries if e["path"] == text_path), None)
            text_file_id, text_link = (existing_entry["id"], existing_entry["url"]) if existing_entry else (None, None)

        print(f"Moving file {video_file} to processed folder")
        shutil.move(os.path.join(rootFolder, video_file), processed_path)

        if video_id:
            move_file_in_drive(drive_service, video_id, rootFolderID, processed_folder_id)
            remove_from_registry_by_path(video_path)
            add_to_registry("file", video_file, processed_path, entity_id=video_id, is_file=True)

        print("Registry after processing this video:")
        print_registry_table()

        if not verify_folder_state():
            print("Folder parity mismatch after processing", video_file)

        # Calculate processing metrics
        end_time = time.time()
        processing_time = end_time - start_time
        efficiency = processing_time / original_video_size_mb if original_video_size_mb > 0 else ""

        success_log.append(video_file)
        logging.info(f"Successfully processed {video_file}")

        # Store details for CSV
        registry_entry = next((e for e in registry_entries if e["path"] == text_path and e["id"] == text_file_id), None)
        video_details.append({
            "Name": video_file,
            "Path": processed_path,
            "Type": "video",
            "ID": video_id if video_id else "",
            "URL": registry_entry["url"] if registry_entry and registry_entry.get("url") else "",
            "Status": "Processed",
            "ProcessingTime": processing_time,
            "RuntimeType": runtime_type,
            "HardwareAccelerator": hardware_accel,
            "HighRamUsed": high_ram,
            "LibraryUsed": library_used,
            "OriginalVideoSizeMB": original_video_size_mb,
            "Efficiency": efficiency
        })

    except subprocess.CalledProcessError as ffmpeg_error:
        error_message = f"FFmpeg error for {video_file}: {ffmpeg_error}"
        print(error_message)
        error_log.append((video_file, error_message))
        logging.error(error_message)

    except Exception as general_error:
        error_message = f"General error for {video_file}: {general_error}"
        print(error_message)
        error_log.append((video_file, error_message))
        logging.error(error_message)

# === Post-processing Cleanup and Verification ===

# Remove duplicate files
remove_duplicates(audio_folder)
remove_duplicates(text_folder)

# Clean incomplete text files
cleaned_files = clean_incomplete_text_files(text_folder)

# Final cleanup to remove extra .txt files not in registry
extra_cleaned_files = verify_and_cleanup_registry(text_folder, registry_entries)

# Re-check parity after deduplication and cleanup
videos = get_file_bases(processed_folder)
audios = get_file_bases(audio_folder)
texts = get_file_bases(text_folder)
all_match = (videos == audios == texts)

# === Generate Processing Report ===

report = "Processing Report\n"
report += f"\nSuccessfully Processed Files ({len(success_log)}):\n"
report += "\n".join(success_log)
report += f"\n\nSkipped Files ({len(skipped_log)}):\n"
report += "\n".join([f"{file} - {reason}" for file, reason in skipped_log])
report += f"\n\nErrors ({len(error_log)}):\n"
report += "\n".join([f"{file} - {reason}" for file, reason in error_log])
report += f"\n\nRemoved Incomplete Text Files ({len(cleaned_files)}):\n"
report += "\n".join(cleaned_files)
report += f"\n\nRemoved Extra Text Files Not in Registry ({len(extra_cleaned_files)}):\n"
report += "\n".join(extra_cleaned_files)
report += f"\n\nFolder Parity Check:\n"
report += f"All folders have matching files: {'Yes' if all_match else 'No'}\n"
report += f"Processed Videos: {len(videos)}\n"
report += f"Audio Files: {len(audios)}\n"
report += f"Text Files: {len(texts)}\n"

with open(os.path.join(rootFolder, "processing_report.txt"), "w", encoding="utf-8") as f:
    f.write(report)

print("=== COMPLETION REPORT ===")
print(report)

# === Writing to CSV ===

csv_path = os.path.join(rootFolder, "processing_log.csv")
file_exists = os.path.isfile(csv_path)
current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# Now include entire registry in CSV
# For non-video entries (folders, audio files, text files), we leave processing-specific fields blank

registry_data = []
for e in registry_entries:
    # Find if this entry corresponds to a processed video in video_details
    matching_video = next((vd for vd in video_details if vd["Name"] == e["name"] and vd["Path"] == e["path"]), None)
    if matching_video:
        row = {
            "Timestamp": current_time,
            "Name": e["name"],
            "Path": e["path"],
            "ID": e["id"] if e["id"] else "",
            "URL": e["url"] if e.get("url") else "",
            "Type": matching_video["Type"],
            "Status": matching_video["Status"],
            "ProcessingTime": matching_video["ProcessingTime"],
            "RuntimeType": matching_video["RuntimeType"],
            "HardwareAccelerator": matching_video["HardwareAccelerator"],
            "HighRamUsed": matching_video["HighRamUsed"],
            "LibraryUsed": matching_video["LibraryUsed"],
            "OriginalVideoSizeMB": matching_video["OriginalVideoSizeMB"],
            "Efficiency": matching_video["Efficiency"]
        }
    else:
        # Non-video or no details
        # Determine type from registry entry type and extension
        entry_type = e["type"]  # folder or file
        row = {
            "Timestamp": current_time,
            "Name": e["name"],
            "Path": e["path"],
            "ID": e["id"] if e["id"] else "",
            "URL": e["url"] if e.get("url") else "",
            "Type": entry_type,
            "Status": "",  # no status for non-video or unprocessed entries
            "ProcessingTime": "",
            "RuntimeType": "",
            "HardwareAccelerator": "",
            "HighRamUsed": "",
            "LibraryUsed": "",
            "OriginalVideoSizeMB": "",
            "Efficiency": ""
        }
    registry_data.append(row)

# Define CSV fields
fields = [
    "Timestamp", "Name", "Path", "ID", "URL", "Type", "Status",
    "ProcessingTime", "RuntimeType", "HardwareAccelerator", "HighRamUsed",
    "LibraryUsed", "OriginalVideoSizeMB", "Efficiency"
]

# Write registry_data to CSV
with open(csv_path, "a", newline="", encoding="utf-8") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fields)
    if not file_exists:
        writer.writeheader()
    for row in registry_data:
        writer.writerow(row)

print("\nCurrent CSV log entries:")
with open(csv_path, "r", encoding="utf-8") as csvfile:
    print(csvfile.read())

# Save the updated registry
save_registry()


Hardware Accelerator Detected: None
Hardware Accelerator Detected: NVIDIA A100-SXM4-40GB
High RAM Enabled: Yes
Runtime Type: Python 3.10.12
Folder exists: /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/
Folder exists: /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/
Folder exists: /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/TextFiles/
Folder exists: /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/ProcessedVideo/


  checkpoint = torch.load(fp, map_location=device)


Folder 'TextFiles' found with ID: 11BSRsvsP_VcAlJBLK7CJf63s_9bIDLmO
Folder 'ProcessedVideo' found with ID: 11Abuw26F1oF-kBp1LInveLmZ8OQYXI1b
Initial Audio directory: []
Initial Text directory: []

Processing CT Loyalty Demo.mp4:
Audio directory: []
Text directory: []
Extracting audio for CT Loyalty Demo.mp4 to /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/CT Loyalty Demo.wav


  y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Audio extraction successful using librosa for CT Loyalty Demo.mp4
Uploading audio file CT Loyalty Demo.wav...
Uploaded file 'CT Loyalty Demo.wav' with ID: 1RzacQiOhnFrAQ4l69m2xNAfuMC6TxbQg
Generating shareable link for file ID: 1RzacQiOhnFrAQ4l69m2xNAfuMC6TxbQg...
Shareable link generated successfully: https://drive.google.com/file/d/1RzacQiOhnFrAQ4l69m2xNAfuMC6TxbQg/view
Starting transcription for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/CT Loyalty Demo.wav
Transcription completed for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/CT Loyalty Demo.wav
Saving transcription to temporary local file /tmp/CT Loyalty Demo.txt.tmp
Generating shareable link for processed video CT Loyalty Demo.mp4...
Generating shareable link for file ID: 1-xrPP-vHxPUB9_sCZhpCL7dI0pYJmBNy...
Shareable link generated successfully: https://drive.google.com/file/d/1-xrPP-vHxPUB9_sCZhpCL7dI0pYJmBNy/view
Saving final transcription file to /content/drive/MyDrive

  y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Audio extraction successful using librosa for WCBradley Onboarding.mov
Uploading audio file WCBradley Onboarding.wav...
Uploaded file 'WCBradley Onboarding.wav' with ID: 1ynK6MBENTiuJVRFPPK-yzvPGCgiWsLLl
Generating shareable link for file ID: 1ynK6MBENTiuJVRFPPK-yzvPGCgiWsLLl...
Shareable link generated successfully: https://drive.google.com/file/d/1ynK6MBENTiuJVRFPPK-yzvPGCgiWsLLl/view
Starting transcription for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/WCBradley Onboarding.wav
Transcription completed for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/WCBradley Onboarding.wav
Saving transcription to temporary local file /tmp/WCBradley Onboarding.txt.tmp
Generating shareable link for processed video WCBradley Onboarding.mov...
Generating shareable link for file ID: 1hRU7AO2Unr5CFX5UzsYL-p2Fvbn3Mi4r...
Shareable link generated successfully: https://drive.google.com/file/d/1hRU7AO2Unr5CFX5UzsYL-p2Fvbn3Mi4r/view
Saving final transcrip

  y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Audio extraction successful using librosa for First Standup.mov
Uploading audio file First Standup.wav...
Uploaded file 'First Standup.wav' with ID: 1jPt35VbKwCNdUqpUpDCmB2Ks3ee_aVh6
Generating shareable link for file ID: 1jPt35VbKwCNdUqpUpDCmB2Ks3ee_aVh6...
Shareable link generated successfully: https://drive.google.com/file/d/1jPt35VbKwCNdUqpUpDCmB2Ks3ee_aVh6/view
Starting transcription for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/First Standup.wav
Transcription completed for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/First Standup.wav
Saving transcription to temporary local file /tmp/First Standup.txt.tmp
Generating shareable link for processed video First Standup.mov...
Generating shareable link for file ID: 1-4Oy8d1_VBLcmel6TlVakFBGpAmaMZ0P...
Shareable link generated successfully: https://drive.google.com/file/d/1-4Oy8d1_VBLcmel6TlVakFBGpAmaMZ0P/view
Saving final transcription file to /content/drive/MyDrive/Clients/WCBra

  y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Audio extraction successful using librosa for 1 on 1 with Amy.mov
Uploading audio file 1 on 1 with Amy.wav...
Uploaded file '1 on 1 with Amy.wav' with ID: 1QBAA2vtLbB_g_vsxukVAgFSvCyqksMT_
Generating shareable link for file ID: 1QBAA2vtLbB_g_vsxukVAgFSvCyqksMT_...
Shareable link generated successfully: https://drive.google.com/file/d/1QBAA2vtLbB_g_vsxukVAgFSvCyqksMT_/view
Starting transcription for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/1 on 1 with Amy.wav
Transcription completed for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/1 on 1 with Amy.wav
Saving transcription to temporary local file /tmp/1 on 1 with Amy.txt.tmp
Generating shareable link for processed video 1 on 1 with Amy.mov...
Generating shareable link for file ID: 1-HmRFNuRhSSuiEDShjCjczzcsVEVdaYm...
Shareable link generated successfully: https://drive.google.com/file/d/1-HmRFNuRhSSuiEDShjCjczzcsVEVdaYm/view
Saving final transcription file to /content/drive/MyDrive

  y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Audio extraction successful using librosa for (PBM Daily) Screen Recording 2024-02-20 at 11.02.21 AM.mov
Uploading audio file (PBM Daily) Screen Recording 2024-02-20 at 11.02.21 AM.wav...
Uploaded file '(PBM Daily) Screen Recording 2024-02-20 at 11.02.21 AM.wav' with ID: 1MNFbgKnvXsbtZakGXJz1tniVFPPU-13Z
Generating shareable link for file ID: 1MNFbgKnvXsbtZakGXJz1tniVFPPU-13Z...
Shareable link generated successfully: https://drive.google.com/file/d/1MNFbgKnvXsbtZakGXJz1tniVFPPU-13Z/view
Starting transcription for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/(PBM Daily) Screen Recording 2024-02-20 at 11.02.21 AM.wav
Transcription completed for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/(PBM Daily) Screen Recording 2024-02-20 at 11.02.21 AM.wav
Saving transcription to temporary local file /tmp/(PBM Daily) Screen Recording 2024-02-20 at 11.02.21 AM.txt.tmp
Generating shareable link for processed video (PBM Daily) Screen Recording 202

  y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Audio extraction successful using librosa for (Long video explaining how to use Jira to Dylan Horton) Screen Recording 2024-02-20 at 1.32.28 PM.mov
Uploading audio file (Long video explaining how to use Jira to Dylan Horton) Screen Recording 2024-02-20 at 1.32.28 PM.wav...
Uploaded file '(Long video explaining how to use Jira to Dylan Horton) Screen Recording 2024-02-20 at 1.32.28 PM.wav' with ID: 1x_RmktzWzjnxVUWyWa6MqPoQuXSIgsef
Generating shareable link for file ID: 1x_RmktzWzjnxVUWyWa6MqPoQuXSIgsef...
Shareable link generated successfully: https://drive.google.com/file/d/1x_RmktzWzjnxVUWyWa6MqPoQuXSIgsef/view
Starting transcription for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/(Long video explaining how to use Jira to Dylan Horton) Screen Recording 2024-02-20 at 1.32.28 PM.wav
Transcription completed for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/(Long video explaining how to use Jira to Dylan Horton) Screen Recording 2024-

  y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Audio extraction successful using librosa for Testflight build confusion (v1.5.94 does not contain the code from the commit SHA it references).mov
Uploading audio file Testflight build confusion (v1.5.94 does not contain the code from the commit SHA it references).wav...
Uploaded file 'Testflight build confusion (v1.5.94 does not contain the code from the commit SHA it references).wav' with ID: 1wHs5ZB8d_nI8whODFEqVBJQG30IlII2u
Generating shareable link for file ID: 1wHs5ZB8d_nI8whODFEqVBJQG30IlII2u...
Shareable link generated successfully: https://drive.google.com/file/d/1wHs5ZB8d_nI8whODFEqVBJQG30IlII2u/view
Starting transcription for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/Testflight build confusion (v1.5.94 does not contain the code from the commit SHA it references).wav
Transcription completed for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/Testflight build confusion (v1.5.94 does not contain the code from the commit SHA 

  y, sr = librosa.load(os.path.join(rootFolder, video_file), sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Audio extraction successful using librosa for Second Standup.mov
Uploading audio file Second Standup.wav...
Uploaded file 'Second Standup.wav' with ID: 13nIurFZLkORYraM6Jm0tF2ZN9sCI7Yi4
Generating shareable link for file ID: 13nIurFZLkORYraM6Jm0tF2ZN9sCI7Yi4...
Shareable link generated successfully: https://drive.google.com/file/d/13nIurFZLkORYraM6Jm0tF2ZN9sCI7Yi4/view
Starting transcription for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/Second Standup.wav
Transcription completed for /content/drive/MyDrive/Clients/WCBradley/Videos/WhisperVideo/AudioFiles/Second Standup.wav
Saving transcription to temporary local file /tmp/Second Standup.txt.tmp
Generating shareable link for processed video Second Standup.mov...
Generating shareable link for file ID: 1RTepCoYo0atj6iFsJKQxQcJhWmpf5nix...
Shareable link generated successfully: https://drive.google.com/file/d/1RTepCoYo0atj6iFsJKQxQcJhWmpf5nix/view
Saving final transcription file to /content/drive/MyDrive/Client

##3. Upload any video files you want transcribed in the "WhisperVideo" folder in your Google Drive.

## 4. Extract audio from the video files and create a transcription

This step processes video files in the `WhisperVideo` folder by extracting audio, transcribing it, and saving the transcription in the `TextFiles` folder. The original video file is moved to the `ProcessedVideo` folder upon successful transcription.

### Shareable Links
The shareable link for the processed video is generated based on its Google Drive file path. This method avoids additional API calls and assumes that files are already shared within your team. The constructed link can be found at the beginning of the transcription file.

Example of a shareable link format:
```
https://drive.google.com/file/d/<file_id>/view
```



In [69]:
# ### Final Note for Synchronization
# For Colab: Sync changes manually after downloading the notebook.
# For Local: Use the Jupytext command:
#    jupytext --sync LHI_WhisperVideoDrive.ipynb

print("Final Note: Synchronize your files locally using Jupytext.")
print("Colab users: Save your notebook and download it to sync manually.")

Final Note: Synchronize your files locally using Jupytext.
Colab users: Save your notebook and download it to sync manually.
