#  Downloading Audio
## Anna Bernbaum
## April 2019

Problems with using the Google AudioSet TFRecords:
- Lack of understanding what '128 - dimension audio features extracted at 1 Hz' actually means
- Difference between frame level features and video level features is unclear
- Embedding? Can you un-embed something?

New tactic:
- Download the actual audio clips for all speech and cough samples from youtube
- Create a spectrogram (or mel spectrogram) for each 
- Train the model on these images

In [2]:
import pandas as pd
import youtube_dl
from pydub import AudioSegment
from pydub.utils import which
from pydub.utils import mediainfo
import librosa as librosa
import os
from random import shuffle
import csv

import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = [20, 5]

# Get all desired YouTubeIDs

Let's find the index and code associated with 'Cough':

In [3]:
cough_class_label_index = !grep Cough AudioSet/class_labels_indices.csv
print(cough_class_label_index)

speech_class_label_index = !grep Speech AudioSet/class_labels_indices.csv
print(speech_class_label_index)


['47,/m/01b_21,"Cough"']
['0,/m/09x0r,"Speech"', '7,/m/0brhx,"Speech synthesizer"']


Retreiving the class label:

In [4]:
print("Cough:", cough_class_label_index[0].split(",")[1])
print("Speech:", speech_class_label_index[0].split(",")[1])

Cough: /m/01b_21
Speech: /m/09x0r


Finding all samples with the selected label. The header of this table is printed.

In [6]:
# coughs = !grep /m/01b_21 AudioSet/balanced_train_segments.csv |head  # ID manually inserted
# print(type(coughs))
# print(coughs)

# Maybe select speech clips with only a speech tag
# ignore clips with speech and cough tag

coughs_bal = !grep /m/01b_21 AudioSet/balanced_train_segments.csv | cut -c -11
coughs_bal_starts = !grep /m/01b_21 AudioSet/balanced_train_segments.csv | cut -d ',' -f 2
print("Cough samples in Balanced Train:", len(coughs_bal))

speech_bal = !grep /m/09x0r AudioSet/balanced_train_segments.csv | cut -c -11
speech_bal_starts = !grep /m/09x0r AudioSet/balanced_train_segments.csv | cut -d ',' -f 2
print("Speech samples in Balanced Train:", len(speech_bal))


coughs_eval = !grep /m/01b_21 AudioSet/eval_segments.csv | cut -c -11
coughs_eval_starts = !grep /m/01b_21 AudioSet/eval_segments.csv | cut -d ',' -f 2
print("\nCough samples in Evaluation:", len(coughs_eval))

speech_eval = !grep /m/09x0r AudioSet/eval_segments.csv | cut -c -11
speech_eval_starts = !grep /m/09x0r AudioSet/eval_segments.csv | cut -d ',' -f 2
print("Speech samples in Evaluation:", len(speech_eval))


coughs_unbal = !grep /m/01b_21 AudioSet/unbalanced_train_segments.csv | cut -c -11
coughs_unbal_starts = !grep /m/01b_21 AudioSet/unbalanced_train_segments.csv | cut -d ',' -f 2
print("\nCough samples in Unbalanced Train:", len(coughs_unbal))

speech_unbal = !grep /m/09x0r AudioSet/unbalanced_train_segments.csv | cut -c -11
speech_unbal_starts = !grep /m/09x0r AudioSet/unbalanced_train_segments.csv | cut -d ',' -f 2
print("Speech samples in Unbalanced Train:", len(speech_unbal))
      
print("\nTotal Number of Cough Clips:", (len(coughs_bal)+len(coughs_eval)+len(coughs_unbal)))
print("\nTotal Number of Speech Clips:", (len(speech_bal)+len(speech_eval)+len(speech_unbal)))

Cough samples in Balanced Train: 60
Speech samples in Balanced Train: 5735

Cough samples in Evaluation: 60
Speech samples in Evaluation: 5324

Cough samples in Unbalanced Train: 751
Speech samples in Unbalanced Train: 999421

Total Number of Cough Clips: 871

Total Number of Speech Clips: 1010480


Let's create our own Train : Evaluation : Validation split to maximise the number of clips used. This will be done in a separate file, but here we combine all available clips.

In [7]:
# combine the lists of youtube ids into one master list
all_coughs = (coughs_bal + coughs_eval + coughs_unbal)
all_speech = (speech_bal + speech_eval + speech_unbal)

all_coughs_starts = (coughs_bal_starts + coughs_eval_starts + coughs_unbal_starts)
all_speech_starts = (speech_bal_starts + speech_eval_starts + speech_unbal_starts)


# create pandas dataframe of YouTubeIDs
speech_df = pd.DataFrame()  # create empty dataframe
cough_df = pd.DataFrame()
speech_df['Speech YouTubeIDs'] = pd.Series(all_speech)
cough_df['Coughs YouTubeIDs'] = pd.Series(all_coughs)
speech_df['Speech Starts'] = pd.Series(all_speech_starts)
cough_df['Coughs Starts'] = pd.Series(all_coughs_starts)

# Shuffle the dataframes
speech_df = speech_df.sample(frac=1).reset_index(drop=True)
cough_df = cough_df.sample(frac=1).reset_index(drop=True)

# Trim Speech Df
speech_df = speech_df[:871]

print(speech_df)
print(cough_df)

    Speech YouTubeIDs Speech Starts
0         03r3PwMZUso         0.000
1         4HdRojmCQxA         0.000
2         cglFhkbrh8g        60.000
3         FqhNYFQWuaM        30.000
4         3PRD7ld0AUc        30.000
5         H2KBpauoDNs        21.000
6         RFUXSGZgbRg        30.000
7         tTp43eLqytA        30.000
8         9e4-9LuJ89I        30.000
9         _qw1nngzu8I        30.000
10        AZmImkrr9fc        30.000
11        9VK1ge2LKZ4        30.000
12        zA4QCodZq9o         0.000
13        5ze2kRrm9r4        30.000
14        5M0wBBUi8-g       250.000
15        R0yvDmBxZbM        30.000
16        sbI-6Trfyq8        22.000
17        CC8b-u97I-k        30.000
18        -T3OZDxQRx8        30.000
19        jE03-vsLQVo       460.000
20        D6DLOKfRka8        30.000
21        iHF9SaeTN7Q        30.000
22        UUOforOMk04        30.000
23        J0SM1roAHMc        30.000
24        2FRt27pXoBM        30.000
25        MLqJ_njQv_U        30.000
26        cRlceSMWzE0       

# Find Associated YouTube clip for each YouTubeID

https://stackoverflow.com/questions/27473526/download-only-audio-from-youtube-video-using-youtube-dl-in-python-script

https://stackoverflow.com/questions/28423501/download-part-of-the-youtube-video-using-python


## Getting all Speech Clips

In [9]:
speech_test_df = speech_df[:50]

speech_fail_log = []

for i in range(len(speech_test_df.index)):
    title = speech_test_df['Speech YouTubeIDs'][i] + '.wav'  # filename
    url = "http://www.youtube.com/watch?v=" + title  # URL

    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': 'Untrimmed_AudioSet_WAV_files/' + title,
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'wav',  # WAV file
            'preferredquality': '192',
        }],
    }
    
    try:
        with youtube_dl.YoutubeDL(ydl_opts) as ydl:
            ydl.download([url])

        # Find the associated start_time
        start_time = float(speech_test_df['Speech Starts'][i])

        # Trim the clip
        y, sr = librosa.load('Untrimmed_AudioSet_WAV_files/'+ title, offset=start_time, duration=10.0) # trim a 10 second segment from start_time    

    #     # plot the clip - used for debugging
    #     plt.title(title)
    #     plt.xlabel('sample number')
    #     plt.ylabel('Amplitude')
    #     plt.plot(y)
    #     plt.show()

        # write a wav file of the trimmed clip
        filepath = 'Audioset_WAV_files/Speech/' + title
        librosa.output.write_wav(filepath, y, sr, norm=True)
        librosa.output.write_wav(filepath, y, sr, norm=True)

        
    # if the clip is unavailable
    except: 
        speech_fail_log.append(speech_test_df['Speech YouTubeIDs'][i])

        
print(speech_fail_log)

[youtube] 03r3PwMZUso: Downloading webpage
[youtube] 03r3PwMZUso: Downloading video info webpage


ERROR: 03r3PwMZUso: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 4HdRojmCQxA: Downloading webpage
[youtube] 4HdRojmCQxA: Downloading video info webpage


ERROR: 4HdRojmCQxA: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] cglFhkbrh8g: Downloading webpage
[youtube] cglFhkbrh8g: Downloading video info webpage
[youtube] FqhNYFQWuaM: Downloading webpage
[youtube] FqhNYFQWuaM: Downloading video info webpage


ERROR: FqhNYFQWuaM: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 3PRD7ld0AUc: Downloading webpage
[youtube] 3PRD7ld0AUc: Downloading video info webpage


ERROR: 3PRD7ld0AUc: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] H2KBpauoDNs: Downloading webpage
[youtube] H2KBpauoDNs: Downloading video info webpage


ERROR: H2KBpauoDNs: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] RFUXSGZgbRg: Downloading webpage
[youtube] RFUXSGZgbRg: Downloading video info webpage
[youtube] tTp43eLqytA: Downloading webpage
[youtube] tTp43eLqytA: Downloading video info webpage


ERROR: tTp43eLqytA: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 9e4-9LuJ89I: Downloading webpage
[youtube] 9e4-9LuJ89I: Downloading video info webpage


ERROR: 9e4-9LuJ89I: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] _qw1nngzu8I: Downloading webpage
[youtube] _qw1nngzu8I: Downloading video info webpage


ERROR: _qw1nngzu8I: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] AZmImkrr9fc: Downloading webpage
[youtube] AZmImkrr9fc: Downloading video info webpage


ERROR: AZmImkrr9fc: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 9VK1ge2LKZ4: Downloading webpage
[youtube] 9VK1ge2LKZ4: Downloading video info webpage


ERROR: 9VK1ge2LKZ4: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] zA4QCodZq9o: Downloading webpage
[youtube] zA4QCodZq9o: Downloading video info webpage


ERROR: zA4QCodZq9o: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 5ze2kRrm9r4: Downloading webpage
[youtube] 5ze2kRrm9r4: Downloading video info webpage


ERROR: 5ze2kRrm9r4: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 5M0wBBUi8-g: Downloading webpage
[youtube] 5M0wBBUi8-g: Downloading video info webpage


ERROR: 5M0wBBUi8-g: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] R0yvDmBxZbM: Downloading webpage
[youtube] R0yvDmBxZbM: Downloading video info webpage


ERROR: R0yvDmBxZbM: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] sbI-6Trfyq8: Downloading webpage
[youtube] sbI-6Trfyq8: Downloading video info webpage


ERROR: sbI-6Trfyq8: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] CC8b-u97I-k: Downloading webpage
[youtube] CC8b-u97I-k: Downloading video info webpage


ERROR: CC8b-u97I-k: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] -T3OZDxQRx8: Downloading webpage
[youtube] -T3OZDxQRx8: Downloading video info webpage


ERROR: -T3OZDxQRx8: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] jE03-vsLQVo: Downloading webpage
[youtube] jE03-vsLQVo: Downloading video info webpage


ERROR: jE03-vsLQVo: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] D6DLOKfRka8: Downloading webpage
[youtube] D6DLOKfRka8: Downloading video info webpage


ERROR: D6DLOKfRka8: YouTube said: This video is unavailable.


[youtube] iHF9SaeTN7Q: Downloading webpage
[youtube] iHF9SaeTN7Q: Downloading video info webpage


ERROR: iHF9SaeTN7Q: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] UUOforOMk04: Downloading webpage
[youtube] UUOforOMk04: Downloading video info webpage


ERROR: UUOforOMk04: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] J0SM1roAHMc: Downloading webpage
[youtube] J0SM1roAHMc: Downloading video info webpage


ERROR: J0SM1roAHMc: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 2FRt27pXoBM: Downloading webpage
[youtube] 2FRt27pXoBM: Downloading video info webpage


ERROR: 2FRt27pXoBM: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] MLqJ_njQv_U: Downloading webpage
[youtube] MLqJ_njQv_U: Downloading video info webpage


ERROR: MLqJ_njQv_U: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] cRlceSMWzE0: Downloading webpage
[youtube] cRlceSMWzE0: Downloading video info webpage


ERROR: cRlceSMWzE0: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 2CMdPVBfm_I: Downloading webpage
[youtube] 2CMdPVBfm_I: Downloading video info webpage


ERROR: 2CMdPVBfm_I: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 1u83qqmOHss: Downloading webpage
[youtube] 1u83qqmOHss: Downloading video info webpage


ERROR: 1u83qqmOHss: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 8nr5jelebhA: Downloading webpage
[youtube] 8nr5jelebhA: Downloading video info webpage


ERROR: 8nr5jelebhA: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] JSAdRJ6BDHc: Downloading webpage
[youtube] JSAdRJ6BDHc: Downloading video info webpage


ERROR: JSAdRJ6BDHc: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 6nYX-V5W8VQ: Downloading webpage
[youtube] 6nYX-V5W8VQ: Downloading video info webpage


ERROR: 6nYX-V5W8VQ: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] z-qbzxrFBaw: Downloading webpage
[youtube] z-qbzxrFBaw: Downloading video info webpage


ERROR: z-qbzxrFBaw: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] wESLqY2AgO4: Downloading webpage
[youtube] wESLqY2AgO4: Downloading video info webpage


ERROR: wESLqY2AgO4: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] -16ZFtFAu8I: Downloading webpage
[youtube] -16ZFtFAu8I: Downloading video info webpage


ERROR: -16ZFtFAu8I: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] VGB8u39MDOI: Downloading webpage
[youtube] VGB8u39MDOI: Downloading video info webpage


ERROR: VGB8u39MDOI: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] sesHRFK3Dgg: Downloading webpage
[youtube] sesHRFK3Dgg: Downloading video info webpage


ERROR: sesHRFK3Dgg: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] ho7kNQ3n9JI: Downloading webpage
[youtube] ho7kNQ3n9JI: Downloading video info webpage


ERROR: ho7kNQ3n9JI: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] DNiAyPJblPc: Downloading webpage
[youtube] DNiAyPJblPc: Downloading video info webpage


ERROR: DNiAyPJblPc: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] b6-tMp4cHDI: Downloading webpage
[youtube] b6-tMp4cHDI: Downloading video info webpage


ERROR: b6-tMp4cHDI: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] Cuqu07pnPWs: Downloading webpage
[youtube] Cuqu07pnPWs: Downloading video info webpage


ERROR: Cuqu07pnPWs: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] td_7azDeHCU: Downloading webpage
[youtube] td_7azDeHCU: Downloading video info webpage


ERROR: td_7azDeHCU: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] D02OgdXB6vw: Downloading webpage
[youtube] D02OgdXB6vw: Downloading video info webpage


ERROR: D02OgdXB6vw: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] -sWSr8hrXp4: Downloading webpage
[youtube] -sWSr8hrXp4: Downloading video info webpage


ERROR: -sWSr8hrXp4: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] myLvxCjpGAE: Downloading webpage
[youtube] myLvxCjpGAE: Downloading video info webpage


ERROR: myLvxCjpGAE: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 5U_37CiEHww: Downloading webpage
[youtube] 5U_37CiEHww: Downloading video info webpage


ERROR: 5U_37CiEHww: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] SX39w9K2wHY: Downloading webpage
[youtube] SX39w9K2wHY: Downloading video info webpage


ERROR: SX39w9K2wHY: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 47KKNmrcxEY: Downloading webpage
[youtube] 47KKNmrcxEY: Downloading video info webpage


ERROR: 47KKNmrcxEY: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] 1DPvVeihko0: Downloading webpage
[youtube] 1DPvVeihko0: Downloading video info webpage


ERROR: 1DPvVeihko0: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[youtube] Q9fWl7L1m-0: Downloading webpage
[youtube] Q9fWl7L1m-0: Downloading video info webpage


ERROR: Q9fWl7L1m-0: "token" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


['03r3PwMZUso', '4HdRojmCQxA', 'cglFhkbrh8g', 'FqhNYFQWuaM', '3PRD7ld0AUc', 'H2KBpauoDNs', 'RFUXSGZgbRg', 'tTp43eLqytA', '9e4-9LuJ89I', '_qw1nngzu8I', 'AZmImkrr9fc', '9VK1ge2LKZ4', 'zA4QCodZq9o', '5ze2kRrm9r4', '5M0wBBUi8-g', 'R0yvDmBxZbM', 'sbI-6Trfyq8', 'CC8b-u97I-k', '-T3OZDxQRx8', 'jE03-vsLQVo', 'D6DLOKfRka8', 'iHF9SaeTN7Q', 'UUOforOMk04', 'J0SM1roAHMc', '2FRt27pXoBM', 'MLqJ_njQv_U', 'cRlceSMWzE0', '2CMdPVBfm_I', '1u83qqmOHss', '8nr5jelebhA', 'JSAdRJ6BDHc', '6nYX-V5W8VQ', 'z-qbzxrFBaw', 'wESLqY2AgO4', '-16ZFtFAu8I', 'VGB8u39MDOI', 'sesHRFK3Dgg', 'ho7kNQ3n9JI', 'DNiAyPJblPc', 'b6-tMp4cHDI', 'Cuqu07pnPWs', 'td_7azDeHCU', 'D02OgdXB6vw', '-sWSr8hrXp4', 'myLvxCjpGAE', '5U_37CiEHww', 'SX39w9K2wHY', '47KKNmrcxEY', '1DPvVeihko0', 'Q9fWl7L1m-0']


## Getting all cough clips

In [7]:
coughs_test_df = cough_df[:50]

coughs_fail_log = []

for i in range(len(coughs_test_df.index)):
    title = coughs_test_df['Coughs YouTubeIDs'][i] + '.wav'  # filename
    url = "http://www.youtube.com/watch?v=" + title  # URL

    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': 'Untrimmed_AudioSet_WAV_files/' + title,
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'wav',  # WAV file
            'preferredquality': '192',
        }],
    }
    
    # if clip is unavailable
    # if clip is longer than X skip
    try:
        with youtube_dl.YoutubeDL(ydl_opts) as ydl:
            ydl.download([url])
        # Find the associated start_time
        start_time = float(coughs_test_df['Coughs Starts'][i])


        # Trim the clip
        y, sr = librosa.load('Untrimmed_AudioSet_WAV_files/' + title, offset=start_time, duration=10.0) # trim a 10 second segment from start_time    

    #     # plot the clip - used for debugging
    #     plt.title(title)
    #     plt.xlabel('sample number')
    #     plt.ylabel('Amplitude')
    #     plt.plot(y)
    #     plt.show()

        # if the clip is 10 seconds long
        if len(y) == 10:
    
            # write over orginal file with the 10 second clip
            filepath = 'Audioset_WAV_files/Coughs/' + title
            librosa.output.write_wav(filepath, y, sr, norm=True)
            librosa.output.write_wav(filepath, y, sr, norm=True)

    # if clip is unavailable
    except: 
        coughs_fail_log.append(coughs_test_df['Coughs YouTubeIDs'][i])
        # create an error log

print(coughs_fail_log)

[youtube] O61IKTdt2I0: Downloading webpage
[youtube] O61IKTdt2I0: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/O61IKTdt2I0.wav
[download] 100% of 483.19KiB in 00:0003MiB/s ETA 00:000
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/O61IKTdt2I0.wav exists, skipping
[youtube] RYppdW56lfo: Downloading webpage
[youtube] RYppdW56lfo: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/RYppdW56lfo.wav
[download] 100% of 1.73MiB in 00:0073MiB/s ETA 00:002
[ffmpeg] Correcting container in "Untrimmed_AudioSet_WAV_files/RYppdW56lfo.wav"
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/RYppdW56lfo.wav exists, skipping
[youtube] quwLxxbH2Pg: Downloading webpage
[youtube] quwLxxbH2Pg: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/quwLxxbH2Pg.wav
[download] 100% of 545.97KiB in 00:0079MiB/s ETA 00:000
[ffmpeg] Correcting container in "Untrimmed_AudioSet_WAV_files/quwLxxbH2Pg.wav

[youtube] xCRumdDLFj0: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/xCRumdDLFj0.wav
[download] 100% of 1.70MiB in 00:0117MiB/s ETA 00:001
[ffmpeg] Correcting container in "Untrimmed_AudioSet_WAV_files/xCRumdDLFj0.wav"
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/xCRumdDLFj0.wav exists, skipping
[youtube] 3id3zRRZBVM: Downloading webpage
[youtube] 3id3zRRZBVM: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/3id3zRRZBVM.wav
[download] 100% of 2.07MiB in 00:02.78KiB/s ETA 00:00
[ffmpeg] Correcting container in "Untrimmed_AudioSet_WAV_files/3id3zRRZBVM.wav"
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/3id3zRRZBVM.wav exists, skipping
[youtube] VzVjieAgz7Y: Downloading webpage
[youtube] VzVjieAgz7Y: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/VzVjieAgz7Y.wav
[download] 100% of 1.74MiB in 00:0038MiB/s ETA 00:002
[ffmpeg] Post-process file Untrimmed_AudioSet

ERROR: This video is unavailable.


[youtube] cl5Bt-rqtZ4: Downloading webpage
[youtube] cl5Bt-rqtZ4: Downloading video info webpage
[youtube] cl5Bt-rqtZ4: Downloading MPD manifest
[dashsegments] Total fragments: 8
[download] Destination: Untrimmed_AudioSet_WAV_files/cl5Bt-rqtZ4.wav
[download] 100% of 799.38KiB in 00:01.33MiB/s ETA 00:000
[ffmpeg] Correcting container in "Untrimmed_AudioSet_WAV_files/cl5Bt-rqtZ4.wav"
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/cl5Bt-rqtZ4.wav exists, skipping
[youtube] jMyY06bjVRA: Downloading webpage
[youtube] jMyY06bjVRA: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/jMyY06bjVRA.wav
[download] 100% of 2.09MiB in 00:0051MiB/s ETA 00:002
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/jMyY06bjVRA.wav exists, skipping
[youtube] 89ZJ46zuxRY: Downloading webpage
[youtube] 89ZJ46zuxRY: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/89ZJ46zuxRY.wav
[download] 100% of 1.10MiB in 00:0071MiB/s ETA 00:0

ERROR: This video has been removed for violating YouTube's Terms of Service.


[youtube] s77vdz6Xnj8: Downloading webpage
[youtube] s77vdz6Xnj8: Downloading video info webpage


ERROR: This video is unavailable.


[youtube] 6iz8Jk8Hvwg: Downloading webpage
[youtube] 6iz8Jk8Hvwg: Downloading video info webpage


ERROR: This video has been removed by the user


[youtube] 6nTcsNoIGDw: Downloading webpage
[youtube] 6nTcsNoIGDw: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/6nTcsNoIGDw.wav
[download] 100% of 1.33MiB in 00:0157MiB/s ETA 00:000
[ffmpeg] Correcting container in "Untrimmed_AudioSet_WAV_files/6nTcsNoIGDw.wav"
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/6nTcsNoIGDw.wav exists, skipping
[youtube] RWCSzp1zU8A: Downloading webpage
[youtube] RWCSzp1zU8A: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/RWCSzp1zU8A.wav
[download] 100% of 1.68MiB in 00:0162MiB/s ETA 00:001
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/RWCSzp1zU8A.wav exists, skipping
[youtube] qOSdHmfwLF4: Downloading webpage
[youtube] qOSdHmfwLF4: Downloading video info webpage
[download] Destination: Untrimmed_AudioSet_WAV_files/qOSdHmfwLF4.wav
[download] 100% of 2.14MiB in 00:0181MiB/s ETA 00:002
[ffmpeg] Post-process file Untrimmed_AudioSet_WAV_files/qOSdHmfwLF4.wav exists, sk

## Check number of clips after downloading
Some clips were lost due to failed downloads. The two datagroups need to be evened out.

In [8]:
# Check number of clips in each folder
coughs_count=0
for files in os.listdir("Audioset_WAV_files/Coughs/"):
    if files.endswith('.wav'):
        coughs_count+=1
print("Unedited number of cough clips:", coughs_count)


speech_count=0
for files in os.listdir("Audioset_WAV_files/Speech/"):
    if files.endswith('.wav'):
        speech_count+=1
print("Unedited number of speech clips:", speech_count)

# Remove additional clips to even out the size
values = {"Coughs": coughs_count, "Speech": speech_count}
largest = max(values, key=values.get)  # find which group has more clips
difference = (abs(coughs_count - speech_count))  # how many clips should be removed

# find names of clips to be removed
f = []
for (dirpath, dirnames, filenames) in os.walk("Audioset_WAV_files/"+ largest + '/'):
    f.extend(filenames)
    break

shuffle(f)

clips_to_remove = f[:difference]
print(clips_to_remove)

for clip in clips_to_remove:
    os.remove("Audioset_WAV_files/"+ largest + '/' + clip)

# Check it has worked
coughs_count=0
for files in os.listdir("Audioset_WAV_files/Coughs/"):
    if files.endswith('.wav'):
        coughs_count+=1
print("Unedited number of cough clips:", coughs_count)


speech_count=0
for files in os.listdir("Audioset_WAV_files/Speech/"):
    if files.endswith('.wav'):
        speech_count+=1
print("Unedited number of speech clips:", speech_count)

Unedited number of cough clips: 46
Unedited number of speech clips: 48
['ATkiruac-y4.wav', 'o80gIVpvsPM.wav']
Unedited number of cough clips: 46
Unedited number of speech clips: 46


# Save YouTube IDs of final clip selections

In [9]:
# get filename of every file in both directories
# write to CSV  selected_YouTubeIDs

coughs_clips = ["Coughs"]
for files in os.listdir("Audioset_WAV_files/Coughs/"):
    if files.endswith('.wav'):
        coughs_clips.append(files)
        print(files)

speech_clips = ["Speech"]
for files in os.listdir("Audioset_WAV_files/Speech/"):
    if files.endswith('.wav'):
        speech_clips.append(files)
        print(files)

csvData = zip(coughs_clips, speech_clips)      
with open('selected_YouTubeIDs.csv', 'w') as csvFile:
    writer = csv.writer(csvFile)
    writer.writerows(csvData)

3Liy9uBgsQM.wav
xk57B6zi6hA.wav
txVBnPl9KPM.wav
GXLEKU6K1uI.wav
GzdmC_MiIyg.wav
qHxgEpRG1Vs.wav
BR6UqPUqYsQ.wav
_0WKVY0n8aE.wav
k_0J26cnYpw.wav
bv35U83Ob1o.wav
8ieJbzu7ql8.wav
RYppdW56lfo.wav
VlFNQv6fLqQ.wav
qj-s6ZcgytA.wav
HEKKh5yZ1s8.wav
E5guDHgn7XQ.wav
Z459PzdNuCU.wav
tUUkucw-BOY.wav
jMyY06bjVRA.wav
89ZJ46zuxRY.wav
vA-eGyCdVBE.wav
quwLxxbH2Pg.wav
qOSdHmfwLF4.wav
dFmT9hZ5sbA.wav
oGf-eDCiQfg.wav
-vu4jJkffMw.wav
nL3LUxrWEpA.wav
4txOUgXllWE.wav
EksYlo1IXU0.wav
3id3zRRZBVM.wav
1MSYO4wgiag.wav
aJdyPN00-bM.wav
X8yUSV4oqoU.wav
RwE9JAktTvU.wav
O61IKTdt2I0.wav
RWCSzp1zU8A.wav
TA-iHSeEUYk.wav
6nTcsNoIGDw.wav
Ao0n1cqfFDw.wav
sNCKIJFUj5I.wav
VzVjieAgz7Y.wav
sUFGh7zp9D0.wav
9KNqsONT3-Y.wav
cl5Bt-rqtZ4.wav
o-TJISpYLFc.wav
xCRumdDLFj0.wav
5l8Y4twyb6c.wav
GwXzQvqmgQg.wav
OG_aj0e2gCg.wav
ogDl7PHyyhE.wav
0sLzEXwCV50.wav
h1QxG72W3g0.wav
brsMKP0yP8I.wav
uFMH9z6DMIE.wav
Cf02eu2D_Hs.wav
1w979JoVyXM.wav
4NtspP5IbfI.wav
DVhN2nJSoi0.wav
V44Qc-TpuIE.wav
6KL1ZzMv7fw.wav
Icv0XlAuBGo.wav
NkwMxUW-Kds.wav
Cma9PIQj

In [1]:
print(y)

NameError: name 'y' is not defined