# Downloading Clips from the MusicCaps Dataset

In this notebook, we see how you can use `yt-dlp` to download clips from the [MusicCaps](https://huggingface.co/datasets/google/MusicCaps) dataset from Google. The MusicCaps dataset contains music and their associated text captions. You could use a dataset like this to train a text-to-audio generation model 😉. 

Once we've downloaded the clips, we'll explore them using a [Gradio](https://gradio.app/) interface.

If you like this notebook:

  - consider giving the [repo](https://github.com/nateraw/download-musiccaps-dataset) a star ⭐️
  - consider following me on Github [@nateraw](https://github.com/nateraw)

In [None]:
%%capture
! pip install datasets[audio] yt-dlp

# For the interactive interface we'll need gradio
! pip install gradio

In [None]:

from google.colab import drive, files
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import subprocess
import os
from pathlib import Path
import librosa
import numpy as np
from PIL import Image

from datasets import load_dataset
import shutil


def download_clip(
    video_identifier,
    output_filename,
    start_time,
    end_time,
    tmp_dir='/tmp/musiccaps',
    num_attempts=5,
    url_base='https://www.youtube.com/watch?v='
):
    status = False

    command = f"""
        yt-dlp --quiet --no-warnings -x --audio-format wav -f bestaudio -o "{output_filename}" --download-sections "*{start_time}-{end_time}" {url_base}{video_identifier}
    """.strip()

    attempts = 0
    while True:
        try:
            output = subprocess.check_output(command, shell=True,
                                                stderr=subprocess.STDOUT)
        except subprocess.CalledProcessError as err:
            attempts += 1
            if attempts == num_attempts:
                return status, err.output
        else:
            break

    # Check if the video was successfully saved.
    status = os.path.exists(output_filename)
    return status, 'Downloaded'


def main(
    spectrogram_dir: str,
    limit: int = None,
    num_proc: int = 1,
    writer_batch_size: int = 1000,
    keep_in_memory: bool = False
):
    """
    Download the clips within the MusicCaps dataset from YouTube and store their spectrograms.
    Args:
        spectrogram_dir: Directory to save the spectrograms to.
        limit: Limit the number of examples to download.
    """

    ds = load_dataset('google/MusicCaps', split='train')
    if limit is not None:
        print(f"Limiting to {limit} examples")
        ds = ds.select(range(limit))

    spectrogram_dir = Path(spectrogram_dir)
    spectrogram_dir.mkdir(exist_ok=True, parents=True)

    def process(example):
      try:
          spectrogram_file = spectrogram_dir / f"{example['ytid']}.png"
          if not spectrogram_file.exists():
              audio_file = f"/tmp/musiccaps/{example['ytid']}.wav"
              try:
                  if not os.path.exists(audio_file):
                      download_clip(
                          example['ytid'],
                          audio_file,
                          example['start_s'],
                          example['end_s'],
                      )
                  y, sr = librosa.load(audio_file, sr=None, mono=True)
                  S = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=2048, hop_length=1024, n_mels=128)
                  S_dB = librosa.power_to_db(S, ref=np.max)
                  plt.figure(figsize=(10, 5))
                  librosa.display.specshow(S_dB, x_axis='time', y_axis='mel', sr=sr, fmax=8000)
                  plt.axis('off')
                  plt.savefig(spectrogram_file, bbox_inches='tight', pad_inches=0, dpi=100)
                  plt.close()
              except:
                  print(f"Failed to process {audio_file}")
                  return None
          example['spectrogram'] = str(spectrogram_file)
          return example
      except:
          print(f"Failed to process example with ID {example['ytid']}")
          return None


    ds = ds.map(
        process,
        num_proc=num_proc,
        keep_in_memory=keep_in_memory,
        writer_batch_size = writer_batch_size
    )

    # Remove examples that failed to process
    ds = ds.filter(lambda x: x is not None)

    # Zip and download the data directory
    shutil.make_archive('spectrogram_all_images', 'zip', spectrogram_dir)
    files.download('spectrogram_all_images.zip')

    return ds

## Load the Dataset

Here we are limiting to the first 32 examples. Since Colab is constrained to 2 cores, downloading the whole dataset here would take hours.

When running this on your own machine:
  - you can set `limit=None` to download + process the full dataset. Feel free to do that here in Colab, it'll just take a long time.
  - you should increase the `num_proc`, which will speed things up substantially
  - If you run out of memory, try reducing the `writer_batch_size`, as by default, it will keep 1000 examples in memory *per worker*.

In [None]:
ds = main('./spectrogram_data_everything_5', num_proc=1, writer_batch_size=100, keep_in_memory=False)



Map:   0%|          | 0/5521 [00:00<?, ? examples/s]

Failed to process /tmp/musiccaps/-0Gj8-vB1q4.wav
Failed to process /tmp/musiccaps/-0SdAVK79lg.wav
Failed to process /tmp/musiccaps/-0vPFx-wRRI.wav
Failed to process /tmp/musiccaps/-0xzrMun0Rs.wav
Failed to process /tmp/musiccaps/-1LrH01Ei1w.wav
Failed to process /tmp/musiccaps/-1OlgJWehn8.wav
Failed to process /tmp/musiccaps/-1UWSisR2zo.wav
Failed to process /tmp/musiccaps/-3Kv4fdm7Uk.wav
Failed to process /tmp/musiccaps/-4NLarMj4xU.wav
Failed to process /tmp/musiccaps/-4SYC2YgzL8.wav
Failed to process /tmp/musiccaps/-5FoeegAgvU.wav
Failed to process /tmp/musiccaps/-5f6hjZf9Yw.wav
Failed to process /tmp/musiccaps/-5xOcMJpTUk.wav
Failed to process /tmp/musiccaps/-6HBGg1cAI0.wav
Failed to process /tmp/musiccaps/-6QGvxvaTkI.wav
Failed to process /tmp/musiccaps/-6pcgdLfb_A.wav
Failed to process /tmp/musiccaps/-7B9tPuIP-w.wav
Failed to process /tmp/musiccaps/-7wUQP6G5EQ.wav
Failed to process /tmp/musiccaps/-88me9bBzrk.wav
Failed to process /tmp/musiccaps/-8C-gydUbR8.wav
Failed to process /t

  y, sr = librosa.load(audio_file, sr=None, mono=True)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Failed to process /tmp/musiccaps/0J_2K1Gvruk.wav
Failed to process /tmp/musiccaps/0J_TdiZ3TKA.wav
Failed to process /tmp/musiccaps/0JbGxIR8JTk.wav
Failed to process /tmp/musiccaps/0K-zyeLuKho.wav
Failed to process /tmp/musiccaps/0KCVgexi4yU.wav
Failed to process /tmp/musiccaps/0L2ndtt60Q8.wav
Failed to process /tmp/musiccaps/0L3vcdzQPPU.wav
Failed to process /tmp/musiccaps/0LE6Ll1rVlg.wav
Failed to process /tmp/musiccaps/0LLlcPiatiU.wav
Failed to process /tmp/musiccaps/0M7nETLOsKQ.wav
Failed to process /tmp/musiccaps/0MzrXd8CUCg.wav
Failed to process /tmp/musiccaps/0NTzOtVmoiU.wav
Failed to process /tmp/musiccaps/0NZY0GHQBP0.wav
Failed to process /tmp/musiccaps/0ONdm4sW47c.wav
Failed to process /tmp/musiccaps/0OY8XXZ98rw.wav
Failed to process /tmp/musiccaps/0OYlHvyfNk4.wav
Failed to process /tmp/musiccaps/0OhtODbKajw.wav
Failed to process /tmp/musiccaps/0Olm321vgk8.wav
Failed to process /tmp/musiccaps/0PMFAO4TIU4.wav
Failed to process /tmp/musiccaps/0Q1JLNfm8oU.wav
Failed to process /t

Let's explore the samples using a quick Gradio Interface 🤗

In [None]:
import gradio as gr

def get_example(idx):
    ex = ds[idx]
    return ex['audio']['path'], ex['caption']

gr.Interface(
    get_example,
    inputs=gr.Slider(0, len(ds) - 1, value=0, step=1),
    outputs=['audio', 'textarea'],
    live=True
).launch()

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



In [None]:
# extract the spectrograms from the audio files
import librosa
import numpy as np
import matplotlib.pyplot as plt
import librosa.display

def get_spectrogram(audio_path):
    y, sr = librosa.load(audio_path, sr=44100)
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000)
    return librosa.power_to_db(S, ref=np.max)

def plot_spectrogram(audio_path):
    S = get_spectrogram(audio_path)
    plt.figure(figsize=(10, 4))
    librosa.display.specshow(S, y_axis='mel', fmax=8000, x_axis='time')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel spectrogram')
    plt.tight_layout()
    plt.show()

plot_spectrogram(ds[0]['audio']['path'])

KeyError: ignored