# Dazbo's YouTube Demos

## Overview

Examples of how to work with YouTube using Python.

A few useful notes:

- The source for this notebook source lives in my GitHub repo, <a href="https://github.com/derailed-dash/dazbo-python-demos" target="_blank">Dazbo-Python-Demos</a>.
- Check out further guidance - including tips on how to run the notebook, in the project's `README.md`.
- For example, you could...
  - Run the notebook locally, in your own Jupyter environment.
  - Run the notebook in a cloud-based Jupyter environment, with no setup required on your part!  For example, <a href="https://colab.research.google.com/github/derailed-dash/dazbo-python-demos/blob/main/notebooks/youtube-demos.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Google Colab"/></a>
- **To run the notebook, first execute the cells in the [Setup](#Setup) section, as described below.** Then you can experiment with any of the subsequent cells.

## Setup

In [None]:
%pip install --upgrade --no-cache-dir dazbo-commons pytubefix yt_dlp moviepy

First, we can install any dependent packages.

Notes:
- [dazbo-commons](https://pypi.org/project/dazbo-commons/): my own utils library, which includes coloured logging, and a Locations class for handling input and output paths.
- [pytubefix](https://github.com/JuanBindez/pytubefix): for downloading of YouTube videos, audio extraction, and more. This is a community-maintained fork of `pytube`. It was created to provide quick fixes for issues that the official pytube library faced, particularly when YouTube's updates break pytube.

In [2]:
import logging
import re
from pathlib import Path
from dataclasses import dataclass
import dazbo_commons as dc

Now we'll setup logging. Here I'm using my coloured logger module. Feel free to change the logging level.

In [None]:
# Setup logging
APP_NAME="dazbo_yt-demos"
logger = dc.retrieve_console_logger(APP_NAME)
logger.setLevel(logging.DEBUG)
logger.info("Logger initialised.")
logger.debug("DEBUG level logging enabled.")

Here we initialise some file path locations, e.g. an output folder.

In [None]:
locations = dc.get_locations(APP_NAME)
for attribute, value in vars(locations).items():
    logger.debug(f"{attribute}: {value}")

Now some utility functions.

In [5]:
def clean_filename(filename):
    """ Create a clean filename by removing unallowed characters. """
    pattern = r'[^a-zA-Z0-9._\s-]'
    return  re.sub(pattern, '_', filename)

## Downloading Videos and Extracting Audio

In [14]:
# YouTube videos to download
urls = [
    "https://www.youtube.com/watch?v=ZL29msEpqOQ",  # Sigrid
    "bla", # Test a bad URL
    "https://www.youtube.com/watch?v=CiTn4j7gVvY",  # Melissa - I Believe
    # "https://www.youtube.com/watch?v=kcgooI7aq3c",  # Jerry and Julia
]

### With PyTubeFix and MoviePy

In [15]:
from pytubefix import YouTube
from pytubefix.cli import on_progress
from moviepy.editor import VideoFileClip

for i, url in enumerate(urls):
    logger.info(f"Downloads progress: {i+1}/{len(urls)}")

    try:
        yt = YouTube(url, on_progress_callback=on_progress)
        logger.info(f"Getting: {yt.title}")
        stream = yt.streams.get_highest_resolution()
        if not stream:
            raise Exception("Stream not available.")
           
        cleaned = clean_filename(yt.title)
        video_output = f"{locations.output_dir}/{cleaned}.mp4"
        logger.info(f"Downloading video {cleaned}.mp4 ...")
        stream.download(output_path=locations.output_dir, filename=f"{cleaned}.mp4")
        logger.debug("Downloaded")
    
        logger.info(f"Creating audio from {cleaned}.mp4 ...")
        video = VideoFileClip(video_output) # purely to give us access to methods
        video.audio.write_audiofile(f"{locations.output_dir}/{cleaned}.mp3")
        video.close()
        logger.info("Done")
    except Exception as e:        
        logger.error(f"Error processing URL '{url}'.")
        logger.error(f"The cause was: {e}") 
        
logger.info(f"Downloads finished. Check out files at {locations.output_dir}.")

[32m16:18:52.410:dazbo_yt-demos - INF: Downloads progress: 1/3[39m
[32m16:18:52.611:dazbo_yt-demos - INF: Getting: Sigrid - Burning Bridges (up close, acoustic)[39m
[32m16:18:52.612:dazbo_yt-demos - INF: Downloading video Sigrid - Burning Bridges _up close_ acoustic_.mp4 ...[39m
[34m16:18:52.685:dazbo_yt-demos - DBG: Downloaded[39m
[32m16:18:52.686:dazbo_yt-demos - INF: Creating audio from Sigrid - Burning Bridges _up close_ acoustic_.mp4 ...[39m


MoviePy - Writing audio in /home/jovyan/dazbo_yt-demos/output/Sigrid - Burning Bridges _up close_ acoustic_.mp3


[32m16:18:55.175:dazbo_yt-demos - INF: Done[39m                     
[32m16:18:55.175:dazbo_yt-demos - INF: Downloads progress: 2/3[39m
[31m16:18:55.176:dazbo_yt-demos - ERR: Error processing URL 'bla'.[39m
[31m16:18:55.176:dazbo_yt-demos - ERR: The cause was: regex_search: could not find match for (?:v=|\/)([0-9A-Za-z_-]{11}).*[39m
[32m16:18:55.176:dazbo_yt-demos - INF: Downloads progress: 3/3[39m
[32m16:18:55.320:dazbo_yt-demos - INF: Getting: Wolfenstein: The New Order - I Believe - Melissa Hollick (Official Ending Song)[39m
[32m16:18:55.321:dazbo_yt-demos - INF: Downloading video Wolfenstein_ The New Order - I Believe - Melissa Hollick _Official Ending Song_.mp4 ...[39m
[34m16:18:55.326:dazbo_yt-demos - DBG: Downloaded[39m
[32m16:18:55.327:dazbo_yt-demos - INF: Creating audio from Wolfenstein_ The New Order - I Believe - Melissa Hollick _Official Ending Song_.mp4 ...[39m


MoviePy - Done.
MoviePy - Writing audio in /home/jovyan/dazbo_yt-demos/output/Wolfenstein_ The New Order - I Believe - Melissa Hollick _Official Ending Song_.mp3


[32m16:19:00.150:dazbo_yt-demos - INF: Done[39m                     
[32m16:19:00.151:dazbo_yt-demos - INF: Downloads finished. Check out files at /home/jovyan/dazbo_yt-demos/output.[39m


MoviePy - Done.


### With PyTubeFix Alone

In [None]:

from pytubefix import YouTube
from pytubefix.cli import on_progress

for i, url in enumerate(urls):
    logger.info(f"Downloads progress: {i+1}/{len(urls)}")

    try:
        yt = YouTube(url, on_progress_callback=on_progress)
        logger.info(f"Getting: {yt.title}")
        stream = yt.streams.get_highest_resolution()
        if not stream:
            raise Exception("Stream not available.")
           
        cleaned = clean_filename(yt.title)
        video_output = f"{locations.output_dir}/{cleaned}.mp4"
        logger.info(f"Downloading video {cleaned}.mp4 ...")
        stream.download(output_path=locations.output_dir, filename=f"{cleaned}.mp4")
        logger.debug("Downloaded")
    
        logger.info(f"Creating audio from {cleaned}.mp4 ...")
        video = VideoFileClip(video_output) # purely to give us access to methods
        video.audio.write_audiofile(f"{locations.output_dir}/{cleaned}.mp3")
        video.close()
        logger.info("Done")
    except Exception as e:        
        logger.error(f"Error processing URL '{url}'.")
        logger.error(f"The cause was: {e}") 
        
logger.info(f"Downloads finished. Check out files at {locations.output_dir}.")