## PROJECT: Business Case - Building a Multimodel AI Chatbot for Youtube Videos
** Data Preprocessing **
* Creating CSV file with Youtube video links - 22 records
* Extracting Metadata from Youtube using CSV file
* Generating Audio files from Youtube using Metadata
* Transcripting Audio Files using "WHISPER" Model

## Step 1: Load CSV File

In [1]:
%pip install yt-dlp
%pip install pandas numpy --quiet

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0.1 -> 25.1.1
[notice] To update, run: c:\Users\Mercy\.pyenv\pyenv-win\versions\3.10.11\python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0.1 -> 25.1.1
[notice] To update, run: c:\Users\Mercy\.pyenv\pyenv-win\versions\3.10.11\python.exe -m pip install --upgrade pip


In [1]:
import os

# Point yt_dlp to the FFmpeg path
os.environ["PATH"] += os.pathsep + r"C:\ffmpeg-7.1.1-essentials_build\bin"

In [2]:
import pandas as pd
import numpy as np

In [3]:
#import pandas as pd
df = pd.read_csv("data/SNOW_YT_Videos.csv", sep=";")
print(df.head())

   Number                                 Youtube_link  \
0       1  https://www.youtube.com/watch?v=tOaMRG8DX3U   
1       2  https://www.youtube.com/watch?v=vteLoWpNw8Q   
2       3  https://www.youtube.com/watch?v=7WJ6lmxa1WQ   
3       4  https://www.youtube.com/watch?v=fqB-NcZmqXo   
4       5  https://www.youtube.com/watch?v=ZYJqkxGrNiI   

                                             Subject  
0  An AI Agent that knows everything about your P...  
1          What Is Agentic AI and Why Should I Care?  
2                     Agentic AI workflows for AIOps  
3  ServiceNow's agentic AI framework explained: W...  
4  AI and Business Agility: Enhancing Human Intel...  


## Convert Videos to MetaData

In [6]:
import yt_dlp

# Ensure the directory exists
os.makedirs("Data", exist_ok=True)

def get_metadata_yt_dlp(video_url):
    ydl_opts = {'quiet': True, 'skip_download': True}
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        try:
            info = ydl.extract_info(video_url, download=False)
            return {
                "title": info.get("title"),
                "channel": info.get("uploader"),
                "description": info.get("description", "")[:200],
                "length": info.get("duration"),
                "publish_date": info.get("upload_date"),
                "views": info.get("view_count")
            }
        except Exception as e:
            return {"error": str(e)}

metadata_list = [get_metadata_yt_dlp(link) for link in df["Youtube_link"]]
metadata_df = pd.DataFrame(metadata_list)

# Merge and export
final_df = pd.concat([df, metadata_df], axis=1)
final_df.to_csv("Data/ServiceNow_Youtube_Metadata_Clean.csv", index=False)

ERROR: [youtube] VFGAvNxaK4Q: Private video. Sign in if you've been granted access to this video. Use --cookies-from-browser or --cookies for the authentication. See  https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp  for how to manually pass cookies. Also see  https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies  for tips on effectively exporting YouTube cookies


## Transcription with Whisper

In [7]:
%pip install openai-whisper --quiet

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0.1 -> 25.1.1
[notice] To update, run: c:\Users\Mercy\.pyenv\pyenv-win\versions\3.10.11\python.exe -m pip install --upgrade pip


In [10]:
import whisper

# Load the metadata CSV
df = pd.read_csv("Data/ServiceNow_Youtube_Metadata_Clean.csv", sep=";")

# Load Whisper model (choose "base" for speed, "medium" or "large" for quality)
model = whisper.load_model("base")
os.makedirs("audio_files", exist_ok=True)

def download_audio(url, video_id):
    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': f'audio_files/{video_id}.%(ext)s',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
        'quiet': True
    }
    try:
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            ydl.download([url])
        return f'audio_files/{video_id}.mp3'
    except Exception as e:
        return None

100%|███████████████████████████████████████| 139M/139M [00:11<00:00, 12.1MiB/s]


In [None]:
# Transcript all videos
transcripts = []

for idx, row in final_df.iterrows():
    url = row['Youtube_link']
    video_id = url.split("v=")[-1]
    print(f"Processing video {idx+1}: {url}")
    
    audio_path = download_audio(url, video_id)
    if audio_path and os.path.exists(audio_path):
        try:
            result = model.transcribe(audio_path)
            transcripts.append(result['text'])
        except Exception as e:
            transcripts.append(f"Error during transcription: {str(e)}")
    else:
        transcripts.append("Error: Audio download failed or video may be protected")

# Append to DataFrame
final_df["transcript"] = transcripts

# Save to new CSV
final_df.to_csv("Data/video_metadata_with_transcripts.csv", index=False)
print("✅ Transcripts saved to video_metadata_with_transcripts.csv")

Processing video 1: https://www.youtube.com/watch?v=tOaMRG8DX3U
                                                           



Processing video 2: https://www.youtube.com/watch?v=vteLoWpNw8Q




                                                                         



Processing video 3: https://www.youtube.com/watch?v=7WJ6lmxa1WQ
                                                           



Processing video 4: https://www.youtube.com/watch?v=fqB-NcZmqXo
                                                           



Processing video 5: https://www.youtube.com/watch?v=ZYJqkxGrNiI
                                                         



Processing video 6: https://www.youtube.com/watch?v=kQV6g8Vbbfc
                                                         



Processing video 7: https://www.youtube.com/watch?v=ThW6lPyYgYk
                                                         



Processing video 8: https://www.youtube.com/watch?v=XUKTyE2YtHc&list=PLh-u-epknspBswAAKG0EfPHyV6gcVVOhK
                                                           



Processing video 9: https://www.youtube.com/watch?v=K6z4c256gzI
                                                           



Processing video 10: https://www.youtube.com/watch?v=it1hcs5S1ks
                                                           



Processing video 11: https://www.youtube.com/watch?v=x4ZvT7ZmxaI
                                                         



Processing video 12: https://www.youtube.com/watch?v=Rx65d0ofz8I
                                                         



Processing video 13: https://www.youtube.com/watch?v=kQV6g8Vbbfc
                                                         



Processing video 14: https://www.youtube.com/watch?v=it1hcs5S1ks&t=27s




                                                                         



Processing video 15: https://www.youtube.com/watch?v=j_PVU9hJTh8
                                                         



Processing video 16: https://www.youtube.com/watch?v=mSYdZW_D67o
                                                           



Processing video 17: https://www.youtube.com/watch?v=eFMeZto6yMg
                                                           



Processing video 18: https://www.youtube.com/watch?v=KSWNDuKn9t0
                                                         



Processing video 19: https://www.youtube.com/watch?v=WyQTP0AA1VU
                                                         



Processing video 20: https://www.youtube.com/watch?v=E6m8UuVhIzw
                                                         



Processing video 21: https://www.youtube.com/watch?v=_2IG1CX2y6g
                                                         



Processing video 22: https://www.youtube.com/watch?v=a0fllfx_fmg&list=PLkGSnjw5y2U407_1UQQaVVrD13-MFi5ia&index=22




                                                           

ERROR: [youtube] VFGAvNxaK4Q: Private video. Sign in if you've been granted access to this video. Use --cookies-from-browser or --cookies for the authentication. See  https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp  for how to manually pass cookies. Also see  https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies  for tips on effectively exporting YouTube cookies


✅ Transcripts saved to video_metadata_with_transcripts.csv
