## Audio Downloading with yt-dlp

### Overview
`yt-dlp` is a command-line program to download videos and audio from YouTube and other video hosting websites. It's a fork of `youtube-dlc` which in turn is a fork of `youtube-dl`. 

### Installation
You can install `yt-dlp` using pip. It's a straightforward process:

```bash
pip install yt-dlp
```

**Note for Windows Users:** After installing, make sure to add the `Scripts` folder of pip to your system's PATH. This allows you to run `yt-dlp` from any directory in the command prompt.

### Downloading Audio from YouTube Shorts
To download just the audio from a YouTube Short, use the following command:

```bash
yt-dlp -x --audio-format wav [URL]
```

Replace `[URL]` with the actual URL of the YouTube Short. This command extracts the audio in `.wav` format.

Load Gregors File and extract the IDs

In [19]:
import pandas as pd

# Load the CSV file
df = pd.read_csv('youtube_shorts_description.csv') 

# Extract the video IDs
video_ids = df['Video ID'].tolist()

# Print the video IDs (optional, for verification)
print(video_ids)

['l9_8_pDTmis', 'QYEfTly0pTE', 'jYJTPqU66IY', 'dBsomKKHhtk', 'dTLYweJ08Tg', 'k9v_bsZUQRg', 'Js6ZUBSW6s0', '1AY9Sqt7yCg', 'f8a2tiHatCc', 'bnem7I5UkaA', 'aFJ1ThX8XHU', 'n7x4Jj9pdH8', 'LdoJnz_ZQyU', 'm5uJjHV_eVs', 'xN5OsH0UCmo', 'KiEErvcX_qo', 'NLvfrxL3YGA', 'nK-Hy0TxIik', 'yWJVX9MKrUM', 'd2EPEgWPn8Y']


Download them using yt-dlp

In [20]:
import subprocess
import os

# Create the "audio" directory if it doesn't exist
audio_dir = './audio'
os.makedirs(audio_dir, exist_ok=True)

# Base URL for YouTube shorts
base_url = 'https://www.youtube.com/shorts/'

# Loop through each video ID
for video_id in video_ids:
    # Construct the full URL
    video_url = base_url + video_id

    # Construct the yt-dlp command with output template
    command = (
        f'yt-dlp -x --audio-format wav --no-check-certificate '
        f'-o "{audio_dir}/%(title)s.%(ext)s" {video_url}'
    )

    # Execute the command
    subprocess.run(command, shell=True)


Setup the Pretrained Model

(Download the model from https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition/resolve/main/pytorch_model.bin?download=true )
To be able to manually set the weights and biases (seems like it isn't initialized correctly when just loading it from pretrained)

In [22]:
import torch
from transformers import AutoProcessor, AutoModelForAudioClassification, Wav2Vec2FeatureExtractor
import numpy as np
from pydub import AudioSegment
import torch.nn as nn


# https://github.com/ehcalabres/EMOVoice
# the preprocessor was derived from https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english
# processor1 = AutoProcessor.from_pretrained("ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition")
# ^^^ no preload model available for this model (above), but the `feature_extractor` works in place
model1 = AutoModelForAudioClassification.from_pretrained("ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition")


model1.projector = nn.Linear(1024, 1024, bias=True)
model1.classifier = nn.Linear(1024, 8, bias=True)
#
##https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition/resolve/main/pytorch_model.bin?download=true
torch_state_dict = torch.load('pytorch_model.bin', map_location=torch.device('cpu'))
#
model1.projector.weight.data = torch_state_dict['classifier.dense.weight']
model1.projector.bias.data = torch_state_dict['classifier.dense.bias']
#
model1.classifier.weight.data = torch_state_dict['classifier.output.weight']
model1.classifier.bias.data = torch_state_dict['classifier.output.bias']

feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/wav2vec2-large-xlsr-53")

def predict_emotion(audio_file):
    if not audio_file:
        # I fetched some samples with known emotions from here: https://www.fesliyanstudios.com/royalty-free-sound-effects-download/poeple-crying-252
        audio_file = 'MrBeast.wav'
    sound = AudioSegment.from_file(audio_file)
    sound = sound.set_frame_rate(16000)
    sound_array = np.array(sound.get_array_of_samples())
    # this model is VERY SLOW, so best to pass in small sections that contain 
    # emotional words from the transcript. like 10s or less.
    # how to make sub-chunk  -- this was necessary even with very short audio files 
    # test = torch.tensor(input.input_values.float()[:, :100000])

    input = feature_extractor(
        raw_speech=sound_array,
        sampling_rate=16000,
        padding=True,
        return_tensors="pt")

    result = model1.forward(input.input_values.float())
    # making sense of the result 
    id2label = {
        "0": "angry",
        "1": "calm",
        "2": "disgust",
        "3": "fearful",
        "4": "happy",
        "5": "neutral",
        "6": "sad",
        "7": "surprised"
    }
    interp = dict(zip(id2label.values(), list(round(float(i),4) for i in result[0][0])))
    return interp

Some weights of the model checkpoint at ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition were not used when initializing Wav2Vec2ForSequenceClassification: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v', 'classifier.output.weight', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.output.bias']
- This IS expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForSequenceClassification were not initialized from the model checkpoint at ehcalabres/wav2vec2-lg-xlsr-e

In [23]:
predict_emotion("cry.wav") # Audio of crying man

{'angry': -2.4137,
 'calm': 4.3917,
 'disgust': 1.8303,
 'fearful': -1.7471,
 'happy': -1.725,
 'neutral': 0.0457,
 'sad': 4.0059,
 'surprised': -3.7045}

Process Audio Files

In [25]:
import os
import pandas as pd
from tqdm import tqdm

# Assuming the 'predict_emotion' function is already defined

# Path to the 'audio' directory
audio_dir = './audio'

# Initialize an empty list to store the results
results = []

# Loop over each file in the 'audio' folder with a progress bar
for filename in tqdm(os.listdir(audio_dir), desc="Processing audio files"):
    if filename.endswith('.wav'):
        # Full path of the audio file
        file_path = os.path.join(audio_dir, filename)

        # Call the predict_emotion function
        emotion_prediction = predict_emotion(file_path)

        # Append the results along with the filename
        results.append({'filename': filename, 'prediction': emotion_prediction})

# Convert the results to a DataFrame
df_results = pd.DataFrame(results)

# Display the DataFrame
print(df_results)


Processing audio files: 100%|██████████| 20/20 [39:45<00:00, 119.29s/it] 


                                             filename  \
0                        A Real Authentic Italian.wav   
1   Chinese Spacecraft Rolls Out Of Control During...   
2   Corrupt Cops Caught & OWNED! Dirty Tyrant Stat...   
3   Courtside Kicks CASHES OUT on WHOLE TABLE of D...   
4                                 enjoy the light.wav   
5   FINDING TOM HOLLAND 😂😂 #marvel #mcu #spiderman...   
6              Furthest Away From Me Wins $10,000.wav   
7    Hockey Cameramen are Insane (@lsantanaphoto).wav   
8   How much do #Teachers make？ #salarycompilation...   
9                How to renovate your private jet.wav   
10                how to set the perfect password.wav   
11                                 Kid Fried Rice.wav   
12               Spending $100 in North Macedonia.wav   
13  Typing SO FAST that monkeytype would INVALIDAT...   
14                       Ultra Efficient Pit Stop.wav   
15                      WAKE UP： College Is A Lie.wav   
16  We lost contact with ATC ov

In [26]:
df_results.head()

Unnamed: 0,filename,prediction
0,A Real Authentic Italian.wav,"{'angry': -2.078, 'calm': 2.7263, 'disgust': 2..."
1,Chinese Spacecraft Rolls Out Of Control During...,"{'angry': -1.7527, 'calm': 2.133, 'disgust': 2..."
2,Corrupt Cops Caught & OWNED! Dirty Tyrant Stat...,"{'angry': -2.2896, 'calm': 2.9672, 'disgust': ..."
3,Courtside Kicks CASHES OUT on WHOLE TABLE of D...,"{'angry': -2.0477, 'calm': 2.9957, 'disgust': ..."
4,enjoy the light.wav,"{'angry': -2.114, 'calm': 2.7554, 'disgust': 2..."


In [29]:
#Scale the results, since it seems like they are all in a similar range of values
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Sample DataFrame

# Convert the dictionary column into a DataFrame of its own
emotion_df = df_results['prediction'].apply(pd.Series)

# Display the new DataFrame structure
print(emotion_df.head())


# Initialize the scaler
scaler = MinMaxScaler()

# Normalize each column
for column in emotion_df.columns:
    emotion_df[column] = scaler.fit_transform(emotion_df[[column]])

# Now emotion_df has normalized values for each emotion


    angry    calm  disgust  fearful   happy  neutral     sad  surprised
0 -2.0780  2.7263   2.2574  -1.1500 -1.5450  -0.3969  4.6464    -3.8677
1 -1.7527  2.1330   2.9519  -1.2306 -1.5955  -0.8280  4.6259    -3.6894
2 -2.2896  2.9672   1.9130  -1.0142 -1.5379  -0.3028  4.8103    -3.9737
3 -2.0477  2.9957   2.0203  -1.2569 -1.4550  -0.1416  4.4973    -4.0577
4 -2.1140  2.7554   2.4861  -1.1827 -1.7141  -0.5673  4.7852    -3.7833
      angry      calm   disgust   fearful     happy   neutral       sad  \
0  0.430675  0.565685  0.387902  0.373323  0.652644  0.704034  0.318658   
1  0.843702  0.166801  0.834813  0.266242  0.457738  0.204266  0.274845   
2  0.162011  0.727646  0.166281  0.553740  0.680046  0.813123  0.668946   
3  0.469147  0.746807  0.235328  0.231301  1.000000  1.000000  0.000000   
4  0.384967  0.585249  0.535071  0.329879  0.000000  0.506492  0.615302   

   surprised  
0   0.392238  
1   0.760322  
2   0.173410  
3   0.000000  
4   0.566474  


In [30]:
emotion_df.head()

Unnamed: 0,angry,calm,disgust,fearful,happy,neutral,sad,surprised
0,0.430675,0.565685,0.387902,0.373323,0.652644,0.704034,0.318658,0.392238
1,0.843702,0.166801,0.834813,0.266242,0.457738,0.204266,0.274845,0.760322
2,0.162011,0.727646,0.166281,0.55374,0.680046,0.813123,0.668946,0.17341
3,0.469147,0.746807,0.235328,0.231301,1.0,1.0,0.0,0.0
4,0.384967,0.585249,0.535071,0.329879,0.0,0.506492,0.615302,0.566474


In [33]:


# Sort the DataFrame by the 'title' column in ascending order (alphabetically)
sorted_df = df.sort_values(by='Video Title', ascending=True)

# Display the sorted DataFrame
sorted_df.head()

Unnamed: 0,Video ID,Video Title,Channel Title,Transcript,Duration,Words per Second
19,d2EPEgWPn8Y,A Real Authentic Italian,Adriano Valentini,how am I the only one who's offended I'm offen...,121.36,1.821028
16,NLvfrxL3YGA,Chinese Spacecraft Rolls Out Of Control During...,Scott Manley,yesterday the three astronauts from China's sh...,121.76,1.544021
8,f8a2tiHatCc,Corrupt Cops Caught & OWNED! Dirty Tyrant Stat...,People's Court Audit,y'all decked out in police gear for this yes s...,119.858,1.777103
4,dTLYweJ08Tg,Courtside Kicks CASHES OUT on WHOLE TABLE of D...,Courtside Kicks,yo what's good bro so you have a ton of dunks ...,59.679,1.524824
17,nK-Hy0TxIik,FINDING TOM HOLLAND 😂😂 #marvel #mcu #spiderman...,SYNCSHOW,so I know one of you is Tom Holland are you To...,65.32,1.500306


In [34]:
# Merge the dataframes using their indices
merged_df = pd.concat([sorted_df, emotion_df], axis=1)

# Display the merged DataFrame
merged_df.head()

Unnamed: 0,Video ID,Video Title,Channel Title,Transcript,Duration,Words per Second,angry,calm,disgust,fearful,happy,neutral,sad,surprised
19,d2EPEgWPn8Y,A Real Authentic Italian,Adriano Valentini,how am I the only one who's offended I'm offen...,121.36,1.821028,0.402869,0.78217,0.562741,0.0,0.009649,0.622305,0.123531,0.602395
16,NLvfrxL3YGA,Chinese Spacecraft Rolls Out Of Control During...,Scott Manley,yesterday the three astronauts from China's sh...,121.76,1.544021,0.101701,1.0,0.173616,0.241265,0.469317,0.965221,0.313529,0.193642
8,f8a2tiHatCc,Corrupt Cops Caught & OWNED! Dirty Tyrant Stat...,People's Court Audit,y'all decked out in police gear for this yes s...,119.858,1.777103,0.573514,0.241495,0.455598,0.715292,0.669626,0.421748,0.67087,0.443435
4,dTLYweJ08Tg,Courtside Kicks CASHES OUT on WHOLE TABLE of D...,Courtside Kicks,yo what's good bro so you have a ton of dunks ...,59.679,1.524824,0.384967,0.585249,0.535071,0.329879,0.0,0.506492,0.615302,0.566474
17,nK-Hy0TxIik,FINDING TOM HOLLAND 😂😂 #marvel #mcu #spiderman...,SYNCSHOW,so I know one of you is Tom Holland are you To...,65.32,1.500306,0.176232,0.755278,0.237001,0.453301,0.424161,0.800023,0.522334,0.3654


In [35]:
merged_df.to_csv('youtube_shorts_description_emotion.csv', index=False)