## Code Retrieving Transcript

This Notebook displays how I retrieved the transcript of music videos on YouTube. 

The following code imports the `os` module to interact with the operating system. It then retrieves the current working directory using the `getcwd()` function and prints it out. This is helpful for identifying the directory from which a Python script is being executed, which can be useful when working with files or managing the script's environment.

In [2]:
# Import the os module which provides a way to interact with the operating system, including file management and environment variables.
import os  

# Get the current working directory
current_directory = os.getcwd()

# Print the current working directory
print("Current working directory:", current_directory)

Current working directory: /Users/helgegeurtjacobusmoes/Desktop/Digital Analytics/AB_Test


The next code segment is a Python script designed to collect transcript data from multiple YouTube videos and store it in a CSV file. This script automates the process of collecting transcript data from multiple YouTube videos and organizing it into a structured CSV format for further analysis or use. Here's how it works:

The script begins by importing necessary modules: `csv` for CSV file handling and `YouTubeTranscriptApi` for fetching video transcripts from YouTube. Next, it defines a list named `video_ids` containing the IDs of numerous YouTube videos from the `YouTube_Music.csv`. Then, it opens a CSV file named "YouTube_Music_Transcript.csv" in write mode. This file will store the collected transcript data.

After opening the file, the script writes a header row to the CSV file, specifying column names as "Video ID" and "Text".
Moreover, it iterates through each video ID in the `video_ids` list. For each video ID, the script attempts to retrieve the transcript using `YouTubeTranscriptApi.get_transcript(video_id)`.

If the transcript retrieval was successful, it loops through each line in the transcript and writes the video ID along with the transcript text to the CSV file. In case there's an error while fetching the transcript (e.g., due to an invalid video ID), the script prints an error message and writes only the video ID to the CSV file with an empty text field. Throughout the process, the script prints messages indicating whether the transcript for each video is successfully saved or if there's an error encountered.

**References:**

Depoix, J. (n.d.). youtube-transcript-api: This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do! (0.6.2) [Python; OS Independent]. Retrieved February 26, 2024, from https://github.com/jdepoix/youtube-transcript-api

YouTube Data Tools. (n.d.). Retrieved February 26, 2024, from https://ytdt.digitalmethods.net/

In [6]:
# Installing the YouTube Transcript API
! pip install youtube-transcript-api



In [3]:
# Importing the CSV module
import csv

# Import the YouTubeTranscriptApi module
from youtube_transcript_api import YouTubeTranscriptApi

# List of YouTube videoId's according to the YouTube_Music.csv
video_ids = [
    "suAR1PYFNYA",
"4xDzrJKXOOY",
"EwBc8ChPhic",
"u0BhX3tCzAY",
"eL2vQyP6DQE",
"Q7N99EzNQrw",
"dNt1QR1ecuM",
"fW1QcEAy4rg",
"Q3KNV63Viw8",
"YudHcBIxlYw",
"liwI54T39_Y",
"Bk8IQIv7MuQ",
"bui3q02NK8Y",
"3tDPeqLBbbc",
"T1GnpYq7tB0",
"ZiahkY4snMw",
"kixUtwahyHs",
"XhktfrAFPu4",
"xl0NMRAnqbA",
"xnvkStb2jT0",
"4oDE-ZAp114",
"ZKZPjNq3wDw",
"jE9vIWGTB6c",
"gpsqz2kseiY",
"eHQZIyvpfD0",
"_IqI2bf9CCQ",
"OSBan_sH_b8",
"IGm4ax2KVFE",
"YmFasIFdQ2c",
"35kwlY_RR08",
"w1G_6A7xxUA",
"cPj9r2ePJbE",
"H3m_zSBo41Q",
"ZyzYo7UpIUE",
"rJWdfDPZ9Ck",
"tmMOMK5c4EY",
"hAQcodpwsIA",
"AX6OrbgS8lI",
"u-A3nCIvUGs",
"u-HTDnmofqA",
"vEI1ojFxpgg",
"lv5R6C3hz54",
"QEZiuxLsLgk",
"D27Cvj1h_qk",
"QSWYyoF79oE",
"RvE3mp52XQA",
"i_Opkdi1haM",
"qmfr0A3vZQg",
"vy0U8eysF3E",
"c1cRRSozPMw",
"9VGGs5KlW8s",
"B9Bl30pAOws",
"RvhMrB11emk",
"eJf4_zktsCo",
"_tbCnZvU_0Y",
"fuuGyyKCC28",
"ZLD7GjajrVI",
"i1kyii0IYOI",
"7_eXiEbx1e0",
"UoPZ85LCTqE",
"LD5LxJdCiRk",
"QCXxvGgs0fo",
"3asdV5ZQ5IU",
"7Q6ae1TEn3k",
"A8hbw3-D610",
"4kLviL8XwAI",
"-wp98L1wAPY",
"V8SRALQxN3g",
"eGpdyP9r9sU",
"uWoYIOcOpwU",
"CvA8dbVXZ-I",
"eaIazn9z0NA",
"jD97hNDiyvI",
"kdByoaqpL0Q",
"DKrelwjugQA",
"-RQTxqPc5T0",
"-fxZOzAjKHo",
"g7cu2eVV8kA",
"ma3L7xWB_1U",
"ZAp3xJ7GsY8",
"Vgcp2ZGkyvo",
"rWF42VRvsLg",
"u6wOyMUs74I",
"nKqWTKV36FI",
"3dEfax7mkbE",
"WCJMayi-acw",
"qcbMISqPSRc",
"I5pgV81VI6A",
"DZhNgVyIrHw",
"1BGmwYkPpIQ",
"0K8tifZHBsg",
"ID79csvcO0U",
"KrsqPE9SMxo",
"zkk2A9sn0pM",
"flzQWsof8ac",
"IaQEmepUfwQ",
"j76JtR_-bKc",
"Q2O0BePaLSw",
"rB-0bgwZ_3E",
"dglBgJSMr-E",
"-k-UwO4nW2Y",
"oAlXZ6aVf2U",
"dHUq9xJcaZs",
"vmytMK1ZjcY",
"2yeRRm9AIxI",
"_1HGZ_9aRhI",
"3S4oaJZgIoo",
"I-vcBLpYwqs",
"goAcTmjUQ0Y",
"4HVqC4zEPDc",
"gSPoY_vdjaE",
"R7ew_apl2Cc",
"TtA8PPtzDP4",
"kasNvP_zREQ",
"ZUzRfBSp5MI",
"zStYh2eHOWk",
"kZW6g-meigA",
"cTh9rFqCtDo",
"6sveprwSOGE",
"YsdyE9UQvso",
"yOuYY4AL_1U",
"zO4ogqm-WeE",
"ntp9_iznQ-0",
"b60ROURgmRs",
"L8JJernNrS8",
"GDr4Gge6hDM",
"VAnZ8lJtBio",
"Cu7UZWeRccc",
"U-vO38hfbT4",
"NRZIGj3xHkI",
"JU8qWkTkLLE",
"jtRIqBAqMMQ",
"4OUFYyiMgM0",
"BxAdb8K2KDM",
"mJ-tOBn3kz8",
"ws2JLkxiuDc",
"ca8RoyQ1tNo",
"vulK0wDSc24",
"HzOyBRhAQeI",
"TlfiN9EGcYE",
"p6Cnazi_Fi0",
"mW8ATpWd5Wg",
"IiAPEjKsxmw",
"_g1_mG6Ru3M",
"o9KHV7Aa2Zs",
"-yIz7TFqgR4",
"G06aqtUnWt8",
"e_12Apy8LR8",
"Mepu3Brq5Cc",
"Bd8bpV2pxak",
"BlnVP2_dIb4",
"Lae1Moi-h20",
"X9iz1drtOGw",
"MrSgRB01ZpI",
"EGKkju8Ed8E",
"KSXP1DRnTKc",
"Uxc-0yUiK_c",
"7aUZtDaxS60",
"VnjdikpgR0E",
"Mc-wFcieEmU",
"3TWU-Ojz41g",
"uGGQGoht6ic",
"DLA54iSPYA4",
"4FP8yh_ZVO0",
"uZUadAu3bnI",
"jgCVkQhlScc",
"J2_MBahO-z0",
"jXvU5HiapSI",
"sVTy_wmn5SU",
"unEbmAmXhSA",
"ncsYiztio94",
"Nq4Mh_jTubA",
"acfYQmCBsz8",
"5YbLZjoiRho",
"PrqwxkBB0DA",
"9WXsdApQIY4",
"atC_1Bz7Xp0",
"Ky0kzj9sHHk",
"ARAGz6kfEAw",
"u0JS26tiRCA",
"B30TXmgNeS4",
"DsAd2Brhr-M",
"Vijx5tRBCeg",
"BoNj6552p-A",
"aOAjuAI0nJA",
"qUqwbRfSUgM",
"-NECnTb07dE",
"frBT2Jx2cvA",
"BiRvg_zuNhQ",
"aWKtAqIUEl4",
"4bIX8HsLm9c",
"fs6qowdBBhI",
"Pt7DYT2BxyE",
"Z9D16KYgUJE",
"w6KjcavHFrA",
"NzAcKKRES5s",
"msf4LGLOL_k",
"dwY7w0k3j2Y",
"OTMrnPrehh0",
"p71MPBhMfvA",
"kUFJkQ90zV4",
"BbyUNvTz1U4",
"P2_j1P51dNI",
"oze0ZKxGGgk",
"JD-PH_L526Y",
"S7i3ugniyjg",
"76_rqg_eNZs",
"b648g1cDOmg",
"YabbXKzZmiM",
"CpY5x5iCs_o",
"TcA3OG_WAE4",
"hVIw30Az4XM",
"NR1u09i5E6I",
"0SA71vfvog0",
"NYrfcNsnmCQ",
"jZdxiNTL3uk",
"Sx4xVyXHl60",
"rKJFvxORw8Y",
"AIZ0wTd-mlQ",
"teMpjWQHtWQ",
"N3KdmZSJrUA",
"PFFW82dHjyc",
"qWJU_eANW4M",
"2TVXi_9Bvlg",
"_7SUxuxzT0w",
"kqU4iVaZKPE",
"w1rgx0SFx_k",
"L2utKGvxR5U",
"IUbYF0iZfJU",
"VVH002tzdm4",
"AwLy-cMrtzo",
"_LC1gKZ7fws",
"4B4pgDet6jo",
"sTnSd9ZH9dM",
"R4GC5n8A0TE",
"RVRmY9g4fjI",
"s1NPXKoTGy0",
"gBWYm1DZ650",
"PpCZPdLC6Cc",
"x5Oag4hISgU",
"LXdg7qS_jew",
"WJlQ4jt5Fz4",
"krJJibiUPWY",
"-QCVMtrDz1w",
"ApIqVddaw-c",
"eZ4TR7QvFgk",
"xr4S7sL0Ul8",
"RdhwSmAuP5Q",
"AvrHSq2iJlQ",   
"Ktu843qdrWE",
"PrZilcsE7HI",
"iLs8t1N8Xkw",
"ZcQVpSf8VD8",
"BLtBiAXVsaM",
"iKdXdeQTMj8",
"WhA9AdNOOfA",
"aOm9n0DljSE",
"QUGe9K99Tdo",
"oUOdKwzuvX0",
"0dT9siTP70Y",
"UiLbRkULcFE",
"fukGbiPuBjU",
"qrjUQQN5ats",
"PqXp8ftYGd8",
"OL7hUAVKlOc",
"KHLchEtMKQg",
"GY12jXHhsnk",
"Cwmy8bpF5X8",
"3zJznDjvPkU",
"IK7T5dbGvKs",
"IbfbnjSHs3A",
"UxdOR8cvhrY",
"lEr8Gfa-hsk",
"1f4oSoDnuzQ",
"rJApc2T54_o",
"B6y5xwNB0NU",
"UOTr_hxctXE",
"IEuHpriOVzE",
"cMFD61PnZrU",
"ID12cv7_YjQ",
"JojwHc1MKag",
"ALZzTBqDvL4",
"QbYu0GQSD7s",
"D5d5xinZI3E",
"keSQGQ2QFVI",
"5nS2I4q_D6U",
"vZpEM9Jn9LA",
"Bz6kI710_GE",
"NVB1A2kUGKs",
"R7M3_9NOA7o",
"HkLEM6hvQlo",
"-_MuvBNSi8I",
"tu4HfcmMn1E",
"41KWKP81Q1k",
"xa07OaZcWGM",
"3Och32-IRG8",
"nAuiHB6azV0",
"QMbg-1kzvCI",
"ZfIZNKRp2-o",
"OJUuut2owlI",
"VZ32rE9iCls",
"1JinnB5Ydtc",
"Z9aMA1XhViw",
"OFYjThwgibc",
"jYK3wNCSmwg",
"Ic4butBku4E",
"8gk5acdsUQA",
"AXlkzZmVJQs",
"rOjbXSry87w",
"0eajPLUwfTk",
"Ryn6ftx7pwU",
"2d5kgQVSa1s",
"QAY6brOwGrA",
"hzXi1jcJH_w",
"EDoCqXNMNt0",
"NUWmCfFHNro",
"mQgIn4eE2Zk",
"la4t4nkRzHI",
"7CVb4U1uONM",
"jDekakK2CR0",
"8qlSDwbdFk4",
"c-fmDQ4TgPQ",
"YZrqS4w0srk",
"Z1_v1XJ9kPw",
"wfqE4fEsEXE",
"JQ0-yO3r0jA",
"fMtnktpF_t0",
"UzhvQZ2sDfo",
"dSh0jJXCd_E",
"AvK43zhcL1Y",
"Yx2zYmmGUNU",
"znwFCeEkRHI",
"T6YVgEpRU6Q",
"vSfffjHjdTk",
"RIJbCb1KrOk",
"UYCWaK6Sbik",
"aW7bzd8uwyQ",
"dD67CA87xno",
"tjChsCs2Rpk",
"p7zEc7FkQpY",
"dxq9iiD0Dt0",
"uYUj39nMPDk",
"UdGMRQg5szM",
"WPG6TWqrdyk",
"VCzoybr6ysI",
"DGmjwTqg_C0",
"YBFc72S1WJc",
"ziVHso7m7ZA",
"KsPZhrvKlYM",
"lfV-EjH8DPE",
"0A8OC6H-_9U",
"cDctZe7pJmA",
"enn8sluAdPg",
"xWLB2fuPFpA",
"icPnn6W_kbM",
"oIFjEUcrOX8",
"2FYoRX5BASM",
"EYygYkYldHw",
"-65sXadCeY0",
"Nlw6a-qhKEI",
"YJ9VzizzB9k",
"hLG5t4EZ_VE",
"8wUNk8YQhh8",
"y6Cz5zcvtZg",
"SGt1RCYkCV8",
"DI5c2b06s_0",
"aEH80t-Fud0",
"Ngx8ZrFcrUg",
"747GnGJvKtI",
"fR8Gp9j3NYc",
"8n9T31oNV6s",
"uObPfXZcr8M",
"bxAjPD6dSUk",
"I0_LIpILMn0",
"pd35v2xe--w",
"JGolv5Pi9Yg",
"h9FjXlsFqzw",
"LieVkjt5lgw",
"rqDRMquMb8g",
"bIX_ouNJsTs",
"WpvN2fDsqsg",
"MEiIJnFZBVc",
"EN1bBi4ZL3Y",
"TOb2NXx_GQo",
"OCxd5G7wtPM",
"EfWmWlW2PvM",
"A4RExZfFcG8",
"GNcX0qqnE2Y",
"FSi2xoKENHk",
"fIR4YF0_B6M",
"Ce3DI_fQGuc",
"H-fetNUKqDM",
"0L6EvgWotfI",
"dB7NsWHCQv4",
"HUo7oiZH1_M",
"9zPl8065b5A",
"cg0yQp7VN28",
"N8cfPrCqvf0",
"NDnyZdpP_EI",
"qlirAdBH8ww",
"BwWBYMaO0UU",
"JIoNOK8e0qc",
"kON9fn01rUQ",
"0thySV8uyL8",
"N2ezdCSB22Q",
"vjbPjLwSXw0",
"hTc2QolErGI",
"gTGsxSfy_Ck",
"l1CQn-7p0Jc",
"KWyvYbfsWMA",
"xFgnnPBlM7E",
"yd8SC9pSId0",
"SuEkOAayWZo",
"glNwa2rM-tI",
"6hNA0SyNIuU",
"ysNDDrG9PtI",
"9WGIHjoz4b8",
"TNmxJ8nL_Wg",
"7IUxhbkYmrM",
"HBluMFFdoPk",
"7LwfID7If0k",
"9L_yE1GZIZE",
"0dxOaoqaz_c",
"a87M40YSUeE",
"_Inis0iagv4",
"FVVl8cfCJws",
"PAKFzFqJa58",
"VV5VKHVuyPY",
"J4z5sVpmnzU",
"afdvb0NnHdE",
"IS41HLeW8-E",
"s0UmdrNwEy8",
"H5S9dvENcho",
"xo2wlMGfgn0",
"qnZ5sMeeONI",
"Fb5jGmMvhtM",
"4NfVz8_Xeq4",
"CvHzWF_iTLo",
"f9zQSZqh7qY",
"H7ng02U0-j0",
"flkdQqfXOq0",
"y64m5o1rMG8",
"yy3bbXuHr9w",
"V1UFl6Vl3Lg",
"f--UUHGg_Mc",
"dviOJyEqO24"
]

# Open the CSV file in write mode
with open("YouTube_Music_Transcript.csv", "w", newline="", encoding="utf-8") as csvfile:
    csvwriter = csv.writer(csvfile)
    # Write the header row
    csvwriter.writerow(["Video ID", "Text"])
    
    # Loop through each video ID
    for video_id in video_ids:
        try:
            # Attempting to retrieve the transcript for the video ID
            transcript = YouTubeTranscriptApi.get_transcript(video_id)
            # Writing each transcript line to the CSV file
            for line in transcript:
                csvwriter.writerow([video_id, line['text']])
            # Print a message indicating that the transcript has been saved
            print(f"Transcript for video {video_id} saved.")
        except Exception as e:
            # Printing an error message if the transcript retrieval fails
            print(f"Failed to retrieve transcript for video {video_id}: {str(e)}")
            # Skip writing to CSV and leave the text field blank
            csvwriter.writerow([video_id, ""])

Transcript for video suAR1PYFNYA saved.
Failed to retrieve transcript for video 4xDzrJKXOOY: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=4xDzrJKXOOY! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
Failed to retrieve transcript for video EwBc8ChPhic: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=EwBc8ChPhic! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable