# Dazbo's YouTube Demos

## Overview

Examples of how to work with YouTube using Python.

A few useful notes:

- The source for this notebook source lives in my GitHub repo, <a href="https://github.com/derailed-dash/dazbo-python-demos" target="_blank">Dazbo-Python-Demos</a>.
- Check out further guidance - including tips on how to run the notebook, in the project's `README.md`.
- For example, you could...
  - Run the notebook locally, in your own Jupyter environment.
  - Run the notebook in a cloud-based Jupyter environment, with no setup required on your part!  For example, <a href="https://colab.research.google.com/github/derailed-dash/dazbo-python-demos/blob/main/notebooks/youtube-demos.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Google Colab"/></a>
- **To run the notebook, first execute the cells in the [Setup](#Setup) section, as described below.** Then you can experiment with any of the subsequent cells.

## Setup

First, let's install any dependent packages:

In [None]:
%pip install --upgrade --no-cache-dir dazbo-commons pytubefix yt_dlp

In [23]:
import logging
import re
from pathlib import Path
from dataclasses import dataclass
import dazbo_commons as dc

Now we'll setup logging. Here I'm using coloured logging from my [dazbo-commons](https://pypi.org/project/dazbo-commons/) package. Feel free to change the logging level.

In [None]:
# Setup logging
APP_NAME="dazbo-yt-demos"
logger = dc.retrieve_console_logger(APP_NAME)
logger.setLevel(logging.DEBUG)
logger.info("Logger initialised.")
logger.debug("DEBUG level logging enabled.")

Here we initialise some file path locations, e.g. an output folder.

In [None]:
locations = dc.get_locations(APP_NAME)
for attribute, value in vars(locations).items():
    logger.debug(f"{attribute}: {value}")

Now some utility functions.

In [26]:
def clean_filename(filename):
    """ Create a clean filename by removing unallowed characters. """
    pattern = r'[^a-zA-Z0-9._\s-]'
    return  re.sub(pattern, '_', filename)

## Downloading Videos and Extracting Audio

In [36]:
# YouTube videos to download
urls = [
    "https://www.youtube.com/watch?v=udRAIF6MOm8",  # Sigrid - Burning Bridges
    "bla", # Test a bad URL
    "https://www.youtube.com/watch?v=CiTn4j7gVvY",  # Melissa Hollick - I Believe
    "https://www.youtube.com/watch?v=d4N82wPpdg8",  # Jerry Heil & Alyona Alyona - Teresa & Maria
]

### With PyTubeFix

Here I'll use the [pytubefix](https://github.com/JuanBindez/pytubefix) library to download YouTube videos, and then to download mp3 audio-only streams as files.

This library is a community-maintained fork of `pytube`. It was created to provide quick fixes for issues that the official pytube library faced, particularly when YouTube's updates break `pytube`.

Pros:

- The library is very easy to use.
- We can work with video, audio, channels, playlists, and even search and filter.
- It is [well documented](https://pytubefix.readthedocs.io/en/latest/).
- It can be used from the command line, with its simple CLI.
- It is VERY FAST!

Cons:

- Does not offer some of the more sophisticated capabilities that are offered by `yt_dlp`.

In [None]:

from pytubefix import YouTube
from pytubefix.cli import on_progress

videos = []
audios = []

for i, url in enumerate(urls):
    logger.info(f"Downloads progress: {i+1}/{len(urls)}")

    try:
        yt = YouTube(url, on_progress_callback=on_progress)
        logger.info(f"Getting: {yt.title}")
        video_stream = yt.streams.get_highest_resolution()
        if not video_stream:
            raise Exception("Stream not available.")
        
        # YouTube resource titles may contain special characters which 
        # can't be used when saving the file. So we need to clean the filename.
        cleaned = clean_filename(yt.title)
        output_locn = f"{locations.output_dir}/pytubefix"
        
        video_output = f"{output_locn}/{cleaned}.mp4"
        logger.info(f"Downloading video {cleaned}.mp4 ...")
        video_stream.download(output_path=output_locn, filename=f"{cleaned}.mp4")
        videos.append(f"{output_locn}/{cleaned}.mp4")
    
        logger.info(f"Creating audio...")
        audio_stream = yt.streams.get_audio_only()
        audio_stream.download(output_path=output_locn, filename=cleaned, mp3=True)
        audios.append(f"{output_locn}/{cleaned}.mp3")
        
        logger.info("Done")
        
    except Exception as e:        
        logger.error(f"Error processing URL '{url}'.")
        logger.error(f"The cause was: {e}") 
        
logger.info(f"Downloads finished.")
for video in videos:
    logger.info(video)
for audio in audios:
    logger.info(audio)


### With YT_DLP

I wanted to try the other popular YouTube package: [yt-dlp](https://pypi.org/project/yt-dlp/). The [repo](https://github.com/yt-dlp/yt-dlp) repo is a fork of the now unmaintained `youtube-dl`. 

Pros:

- It is very powerful, with far more options and features than `pytubefix`.
- It can be installed as a standalone command-line executable, or as a pip-installable Python package.

Cons:

- It is more complicated to use.
- The documentation is complex. And there's no real Python-specific documentation.
- It depends on having ffmpeg installed for many use cases.
- It is significantly slower that `pytubefix` for performing video download and audio extraction.


In [None]:
import yt_dlp

for i, url in enumerate(urls):
    logger.info(f"Downloads progress: {i+1}/{len(urls)}")

    try:
        # Options for downloading the video
        output_locn = f"{locations.output_dir}/yt_dlp"
        
        video_opts = {
            'format': 'best',  # Download the best quality video
            'outtmpl': f'{output_locn}/%(title)s.%(ext)s',  # Save video in output directory
        }
        
        # Download the video
        with yt_dlp.YoutubeDL(video_opts) as ydl:
            print("Downloading video...")
            ydl.download([url])
        
        # Options for extracting audio and saving as MP3
        audio_opts = {
            'format': 'bestaudio',  # Download the best quality audio
            'outtmpl': f'{output_locn}/%(title)s.%(ext)s',  # Save audio in output directory
            'postprocessors': [{
                'key': 'FFmpegExtractAudio',
                'preferredcodec': 'mp3',
            }],
        }
        
        # Download and extract audio
        with yt_dlp.YoutubeDL(audio_opts) as ydl:
            print("Extracting and saving audio as MP3...")
            ydl.download([url])
        
    except Exception as e:        
        logger.error(f"Error processing URL '{url}'.")
        logger.error(f"The cause was: {e}") 
        
logger.info(f"Downloads finished. Check out files at {output_locn}.")