# Explore audio sampling rate

**Author:** Fábio Paraíso

**Achievement:** Evaluated all the audio files sampling rates to see if all the file were equal and developed a cycle to standardize the sampling rates.


## Introduction

In this notebook, I evaluate what are the sampling rates of all the audio files. By doing this I can verify if some standardizationd of the sampling rate is need to uniformize the data.

Most of this evaluation is based on this [post](https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5).

In [1]:
%load_ext watermark
%watermark

Last updated: 2021-07-12T09:42:09.016326+01:00

Python implementation: CPython
Python version       : 3.9.5
IPython version      : 7.25.0

Compiler    : MSC v.1928 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
CPU cores   : 4
Architecture: 64bit



### Import modules

In [2]:
import os

import pandas as pd
from torchaudio import transforms
import torch

from speech import data

### Load the metadata

In [3]:
metadata_file = '../../data/UrbanSound8K/metadata/UrbanSound8K.csv'
metadata_df = pd.read_csv(metadata_file)

In [4]:
metadata_df.head()

Unnamed: 0,slice_file_name,fsID,start,end,salience,fold,classID,class
0,100032-3-0-0.wav,100032,0.0,0.317551,1,5,3,dog_bark
1,100263-2-0-117.wav,100263,58.5,62.5,1,5,2,children_playing
2,100263-2-0-121.wav,100263,60.5,64.5,1,5,2,children_playing
3,100263-2-0-126.wav,100263,63.0,67.0,1,5,2,children_playing
4,100263-2-0-137.wav,100263,68.5,72.5,1,5,2,children_playing


Since for this case I'm only studying the sampling rate for all the audio files, I need to define the audio file path using the fold and the file name data.

In [5]:
metadata_df['path'] = 'fold' + metadata_df['fold'].astype(str) + '/' + metadata_df['slice_file_name']

After creating the file I gather only the necessary data which in this case is the audio path and the type of sound that it is playing.

In [6]:
audio = metadata_df[['path', 'classID']]

I then read all the audio files and get the sampling rate of each one.

In [21]:
audio_folder = '../../data/UrbanSound8K/audio/'
sampling_rate = []

for audio_path in audio.path:
    audio_full_path = os.path.join(audio_folder, audio_path)
    _, sr = data.load_audio(audio_full_path)
    sampling_rate.append(sr)
    
audio = audio.assign(sampling_rate = sampling_rate)

In [23]:
audio.sampling_rate.value_counts()

44100     5370
48000     2502
96000      610
24000       82
16000       45
22050       44
11025       39
192000      17
8000        12
11024        7
32000        4
Name: sampling_rate, dtype: int64

Has expected not all the files have the same sample rate size. However, for feeding this audio files it is necessary to standardize the salmpling rate. <p>
To do this I test using the audio files with the bigger sampling rates (192000).

In [50]:
audio_sample = audio.query('sampling_rate == 192000')
base_sampling_rate = 44100
new_sig = []

for audio_path in audio_sample.path:
    resig_list = []
    audio_full_path = os.path.join(audio_folder, audio_path)
    sig, sr = data.load_audio(audio_full_path)

    sig_one = transforms.Resample(sr, base_sampling_rate)(sig[:1,:])
    if sig.shape[0] > 1:
        sig_two = transforms.Resample(sr, base_sampling_rate)(sig[1:,:])
        new_sig = torch.cat((sig_one, sig_two))

((new_sig, base_sampling_rate))

(tensor([[-6.7109e-06, -4.3835e-03, -8.4153e-03,  ..., -2.4607e-02,
         -2.4959e-02, -2.5384e-03],
        [-6.7109e-06, -4.3835e-03, -8.4153e-03,  ..., -2.4607e-02,
         -2.4959e-02, -2.5384e-03]]), 44100)


Now I get the audio sampling rate standerdized and can use it in the next phases.