This code takes in YouTube playlist url to download the raw .wav in required format for Azure.

In [None]:
!pip install yt-dlp
!apt-get install ffmpeg

Collecting yt-dlp
  Downloading yt_dlp-2024.12.23-py3-none-any.whl.metadata (172 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m172.1/172.1 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading yt_dlp-2024.12.23-py3-none-any.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m45.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: yt-dlp
Successfully installed yt-dlp-2024.12.23
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.


In [None]:
from yt_dlp import YoutubeDL
import os
import sys
from tqdm import tqdm
from contextlib import redirect_stdout

def download_audio_as_wav(playlist_url, output_folder="wav_downloads"):
    # Create the output folder
    os.makedirs(output_folder, exist_ok=True)

    # yt-dlp options for downloading WAV
    ydl_opts = {
        'format': 'bestaudio/best',             # Download best available audio
        'postprocessors': [
            {   # Convert audio to WAV using FFmpeg
                'key': 'FFmpegExtractAudio',
                'preferredcodec': 'wav',
            }
        ],
        'postprocessor_args': [
            '-ar', '16000',  # Set sample rate to 16 kHz
            '-ac', '1',      # Set audio channels to mono
            '-sample_fmt', 's16'  # Set sample format to 16-bit PCM
        ],
        'outtmpl': os.path.join(output_folder, '%(title)s.%(ext)s'),  # Save with video title
        'quiet': False,                       # Show logs
    }

    # Extract playlist info
    try:
      with YoutubeDL({'quiet': True, 'extract_flat': True, 'no_warnings': True}) as ydl:
          with redirect_stdout(sys.stdout):  # Suppress yt-dlp output
              playlist_info = ydl.extract_info(playlist_url, download=False)

      # Save links using the playlist name as the file name
      if 'entries' in playlist_info:
          video_links = [entry['url'] for entry in playlist_info['entries']]
      else:
          print("No videos found in the playlist.")
          exit()

      # Process each video
      print("\nProcessing Playlist:")
      with tqdm(total=len(video_links), desc="Playlist Progress", unit="video") as playlist_pbar:
          for link in video_links:
              youtube_url = link
              # Download the audio
              with YoutubeDL(ydl_opts) as ydl:
                  with redirect_stdout(sys.stdout):  # Suppress yt-dlp output
                      try:
                          info = ydl.extract_info(youtube_url, download=True)
                      except:
                          print("Video unavailable due to regional difference or contains harmful content.")
                          continue

    except Exception as e:
        print(f"Error: {e}")

playlist_url = "https://youtube.com/playlist?list=PLvWSXKZvY3P4X1UJAH886Tjeu-wlAB_4z&si=8raawKarXGd8CAww"
download_audio_as_wav(playlist_url, output_folder="HoSehBo2")


Processing Playlist:


Playlist Progress:   0%|          | 0/16 [00:00<?, ?video/s]

[youtube] Extracting URL: https://www.youtube.com/watch?v=WWAGf8wRDjU
[youtube] WWAGf8wRDjU: Downloading webpage
[youtube] WWAGf8wRDjU: Downloading ios player API JSON
[youtube] WWAGf8wRDjU: Downloading mweb player API JSON
[youtube] WWAGf8wRDjU: Downloading m3u8 information
[info] WWAGf8wRDjU: Downloading 1 format(s): 251
[download] Destination: HoSehBo2/《好世谋2》第一集– “Ho Seh Bo 2” Episode 1.webm
[download] 100% of   38.23MiB in 00:00:05 at 7.36MiB/s   
[ExtractAudio] Destination: HoSehBo2/《好世谋2》第一集– “Ho Seh Bo 2” Episode 1.wav
Deleting original file HoSehBo2/《好世谋2》第一集– “Ho Seh Bo 2” Episode 1.webm (pass -k to keep)
[youtube] Extracting URL: https://www.youtube.com/watch?v=el8SKU9fRpQ
[youtube] el8SKU9fRpQ: Downloading webpage
[youtube] el8SKU9fRpQ: Downloading ios player API JSON
[youtube] el8SKU9fRpQ: Downloading mweb player API JSON
[youtube] el8SKU9fRpQ: Downloading m3u8 information
[info] el8SKU9fRpQ: Downloading 1 format(s): 251
[download] Destination: HoSehBo2/《好世谋2》第二集– “Ho Seh B

Playlist Progress:   0%|          | 0/16 [05:29<?, ?video/s]


In [None]:
# download locally

# !zip -r HoSehBo.zip HoSehBo
# from google.colab import files
# files.download("HoSehBo.zip")

  adding: EatAlready/ (stored 0%)
  adding: EatAlready/《吃饱没？4》 第十集 ＂Eat Already？ 4＂ Episode 10.wav (deflated 15%)
  adding: EatAlready/《吃饱没？4》 第五集 ＂Eat Already？ 4＂ Episode 5.wav (deflated 13%)
  adding: EatAlready/《吃饱没？》(Eat Already？) Episode 3.wav (deflated 15%)
  adding: EatAlready/《吃饱没？4》 第九集 ＂Eat Already？ 4＂ Episode 9.wav (deflated 13%)
  adding: EatAlready/“Eat Already？ 2” Episode 1 -《吃饱没？2》第一集.wav (deflated 14%)
  adding: EatAlready/《吃饱没？3》第七集 - “Eat Already？ 3” Episode 7.wav (deflated 14%)
  adding: EatAlready/“Eat Already？ 2” Episode 7 《吃饱没？2》第七集.wav (deflated 16%)
  adding: EatAlready/＂Eat Already？ 2＂ Episode 6 《吃饱没？ 2》 第六集.wav (deflated 16%)
  adding: EatAlready/《吃饱没？3》第三集  – “Eat Already？ 3” Episode 3.wav (deflated 13%)
  adding: EatAlready/＂Eat Already？ 2＂ Episode 4《吃饱没？2》第四集.wav (deflated 16%)
  adding: EatAlready/《吃饱没？》(Eat Already？) Episode 1.wav (deflated 10%)
  adding: EatAlready/《吃饱没？》(Eat Already？) Episode 6.wav (deflated 15%)
  adding: EatAlready/《吃饱没？4》第六集 ＂Eat Alr

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# upload to drive

!zip -r HoSehBo2.zip HoSehBo2
from google.colab import drive
drive.mount('/content/drive')

import shutil
source = '/content/HoSehBo2.zip'
destination = '/content/drive/MyDrive/NYP/Year2/AI Engineering Project/AIEP [shared]/wav/HoSehBo2.zip'
shutil.move(source, destination)
print(f"File moved to {destination}")

  adding: HoSehBo2/ (stored 0%)
  adding: HoSehBo2/《好世谋2》第三集– “Ho Seh Bo 2” Episode 3.wav (deflated 18%)
  adding: HoSehBo2/《好世谋2》第十三集 - “Ho Seh Bo 2” Episode 13.wav (deflated 16%)
  adding: HoSehBo2/《好世谋2》第七集– “Ho Seh Bo 2” Episode 7.wav (deflated 15%)
  adding: HoSehBo2/好世谋2第九集 - Ho Seh Bo 2 Episode 9.wav (deflated 15%)
  adding: HoSehBo2/《好世谋2》第十六集 - “Ho Seh Bo 2” Episode 16.wav (deflated 16%)
  adding: HoSehBo2/《好世谋2》第一集– “Ho Seh Bo 2” Episode 1.wav (deflated 17%)
  adding: HoSehBo2/《好世谋2》第二集– “Ho Seh Bo 2” Episode 2.wav (deflated 18%)
  adding: HoSehBo2/《好世谋2》第四集– “Ho Seh Bo 2” Episode 4.wav (deflated 17%)
  adding: HoSehBo2/《好世谋2》第十集– “Ho Seh Bo 2” Episode 10.wav (deflated 15%)
  adding: HoSehBo2/《好世谋2》第五集– “Ho Seh Bo 2” Episode 5.wav (deflated 18%)
  adding: HoSehBo2/《好世谋2》第十四集 - “Ho Seh Bo 2” Episode 14.wav (deflated 17%)
  adding: HoSehBo2/《好世谋2》第八集– “Ho Seh Bo 2” Episode 8.wav (deflated 16%)
  adding: HoSehBo2/《好世谋2》第六集– “Ho Seh Bo 2” Episode 6.wav (deflated 15%)
  adding: Ho