#  Downloading Audio
## Anna Bernbaum
## April 2019

Problems with using the Google AudioSet TFRecords:
- Lack of understanding what '128 - dimension audio features extracted at 1 Hz' actually means
- Difference between frame level features and video level features is unclear
- Embedding? Can you un-embed something?

New tactic:
- Download the actual audio clips for all speech and cough samples from youtube
- Create a spectrogram (or mel spectrogram) for each 
- Train the model on these images

In [48]:
import pandas as pd
from random import shuffle

# Get all desired YouTubeIDs

Let's find the index and code associated with 'Cough':

In [16]:
cough_class_label_index = !grep Cough AudioSet/class_labels_indices.csv
print(cough_class_label_index)

speech_class_label_index = !grep Speech AudioSet/class_labels_indices.csv
print(speech_class_label_index)


['47,/m/01b_21,"Cough"']
['0,/m/09x0r,"Speech"', '7,/m/0brhx,"Speech synthesizer"']


Retreiving the class label:

In [18]:
print("Cough:", cough_class_label_index[0].split(",")[1])
print("Speech:", speech_class_label_index[0].split(",")[1])

Cough: /m/01b_21
Speech: /m/09x0r


Finding all samples with the selected label. The header of this table is printed.

In [33]:
# coughs = !grep /m/01b_21 AudioSet/balanced_train_segments.csv |head  # ID manually inserted
# print(type(coughs))
# print(coughs)

coughs_bal = !grep /m/01b_21 AudioSet/balanced_train_segments.csv | cut -c -11
print("Cough samples in Balanced Train:", len(coughs_bal))

speech_bal = !grep /m/09x0r AudioSet/balanced_train_segments.csv | cut -c -11
print("Speech samples in Balanced Train:", len(speech_bal))


coughs_eval = !grep /m/01b_21 AudioSet/eval_segments.csv | cut -c -11
print("\nCough samples in Evaluation:", len(coughs_eval))

speech_eval = !grep /m/09x0r AudioSet/eval_segments.csv | cut -c -11
print("Speech samples in Evaluation:", len(speech_eval))


coughs_unbal = !grep /m/01b_21 AudioSet/unbalanced_train_segments.csv | cut -c -11
print("\nCough samples in Unbalanced Train:", len(coughs_unbal))

speech_unbal = !grep /m/09x0r AudioSet/unbalanced_train_segments.csv | cut -c -11
print("Speech samples in Unbalanced Train:", len(speech_unbal))
      
print("\nTotal Number of Cough Clips:", (len(coughs_bal)+len(coughs_eval)+len(coughs_unbal)))
print("\nTotal Number of Speech Clips:", (len(speech_bal)+len(speech_eval)+len(speech_unbal)))

Cough samples in Balanced Train: 60
Speech samples in Balanced Train: 5735

Cough samples in Evaluation: 60
Speech samples in Evaluation: 5324

Cough samples in Unbalanced Train: 751
Speech samples in Unbalanced Train: 999421

Total Number of Cough Clips: 871

Total Number of Speech Clips: 1010480


Let's create our own Train : Evaluation : Validation split to maximise the number of clips used.

In [54]:
# combine the lists of youtube ids into one master list
all_coughs = (coughs_bal + coughs_eval + coughs_unbal)
all_speech = (speech_bal + speech_eval + speech_unbal)

# Randomise the YouTubeIDs
shuffle(all_coughs)
shuffle(all_speech)

# create pandas dataframe of YouTubeIDs
df = pd.DataFrame()  # create empty dataframe
df['Coughs'] = pd.Series(all_coughs)
df['Speech'] = pd.Series(all_speech)


print(df)

          Coughs       Speech
0    1JxnLOOUHaw  dHKD6FXDxg0
1    BG9oq6EH8xE  osn7Hee24kA
2    nkwMqCHldFo  3dFUns31Mkk
3    nBuw_KZXT_k  505i0nOvAPs
4    x2ZyKE89nzw  AE1FBljSwNQ
5    _zrAnhgYzSo  lMNzvTZ7WtE
6    o8zJ_AjJ388  DMnc6AZ0BuI
7    TjP-9AlPShg  UXKvTTa_p60
8    HIqJwH6AD9k  M7rZXTGds9Q
9    -yJtuj9EuMg  r_m5Udy3D5U
10   6OatUcXF4nk  F1LmbGIbZWw
11   FTePTiRR_tA  w-Fss5LOvY8
12   FvvtH4qSZKg  0UmU9qm8pAw
13   RFeU64gTvGQ  cRIVfvnJtyY
14   Wqvuk_-8l9c  JHzSp3VfB6Y
15   _dfcXBTcmqU  bE2C6PpSabI
16   OSlYn9hTRFA  2jEF6mvJEW8
17   eAS16z7mRdk  phhnHR_YXEk
18   4_0uUL2HPe0  yZc8sp4qYCU
19   B-aFrS47bY8  1EQl3i2hAcY
20   dFmT9hZ5sbA  ThtbIepQZQo
21   ZBzNxwQk9XE  cGL4jOVL09A
22   RWlyM4veldE  C4SVOn7EEUc
23   U53w21v9s8o  KDInBcnKowE
24   cL5zgrlRceI  SmrhZsRRNPA
25   Eq_VLzVJSIo  QQfcF9Q064w
26   YsUCzO0gfro  ETtU4pSE1H8
27   a0WbXTZ5xXE  xbvn55UWVaQ
28   YR5rkWyzUkQ  h8nNRqUce8g
29   BXLIBedwtKE  Yb9y3P4ZItA
..           ...          ...
841  4txOUgXllWE  MnQMzH2JLJk
842  gBAhZ

# Find Associated YouTube clip for each YouTubeID

https://stackoverflow.com/questions/27473526/download-only-audio-from-youtube-video-using-youtube-dl-in-python-script

# Find Start and End Time for each YouTubeID

In [None]:
# Match up ID to row in the CSV