# "Getting Started with Audio data in Python"
> "In this article, I am will be explaining some of the modules which can be used to load and manipulate some audio files in python."

- toc: true
- branch: master
- badges: true
- comments: true
- categories: [data, audio, file, sound, functions, wave, dockship, article, speech, recognition], 
- hide: false

Nowadays Audio data is also becoming more and more common in the field of Data Science.

In this article, I am going to explain how to load, manipulate and store audio data in python.

There are different kinds of audio formats:

1. mp3
2. wav
3. flac
4. m4a

Audio data is stored and measured in frequency(Hz)

1KHz means 1000 bytes/units of information stored per second

In a small sample of the audio clip, there are some thousands of byte information.

Let's dive into some code

In [None]:
# Required modules

import wave
import numpy as np

from matplotlib import pyplot as plt

filename is the name of the audio file which needs to be loaded

In [None]:
# Loading audio file as wave object

gm_wave = wave.open(filename, 'r')

In [None]:
# Converting wave object into bytes

gm_bytes = gm_wave.readframes(-1)

In [None]:
# Output of the variable

gm_bytes

# Output
# b'\x04\xbb\x05\x86\t\x10\x06\x82\r\xe4\x06\xda\x0e...'

The output of the sound variable is in bytes which is now not in human readable form. So let's convert this into human readable form.

For that we can use numpy module's frombuffer function.

In [None]:
# Changing into bytes

signal_gm = np.frombuffer(gm_bytes, dtype='int16')
signal_gm[:10]

# Output
# array([ -3, -5, -8, -8, -9, -13, -8, -10, -9, -11], dtype=int16)

The wave object also has so many functions which can be used to get the characteristics of the audio file, they are getframerate(), getnchannels(), getsamplewidth(), getnframes() etc.

Frame rate is the number of frequency bytes in one second.

In [None]:
# Get the frame rate
framerate_gm = gm_wave.getframerate()

# Show the frame rate
framerate_gm

# Output
# 8000

**Now it's time to visualize the sound wave.**

Let's get the timestamp values of the audio file, for that we are using np.linspace() function to create a evenly spaced numpy array

Syntax: np.linspace(start, stop, step) which creates step number of floating point values in between start and stop.

In [None]:
# Creating timestamp values

time_gm = np.linspace(start=0, stop=len(signal_gm)/framerate_gm, step=len(signal_gm))
time_gm[:10]

# Output

# array([0.00000000e+00, 1.25000117e-04, 2.50000233e-04, ...,
       1.34118250e+02, 1.34118375e+02, 1.34118500e+02])

The above code creates a time stamp array values. last value of the array is the duration of the audio clip.

Plotting the audio file

In [None]:
plt.title("Audio Clip")

plt.plot(time_gm, signal_gm)

# x and y axis labels
plt.xlabel("Time(s)")
plt.ylabel("Amplitude")

# show our plot
plt.show()

Apart from wave there are also some other third party libraries for processing the audio data.

1. CMU Sphinx
2. Speech Recognition
3. Kaldi
4. Wav2letter++

In [None]:
# Import the module
import speech_recognition as sr

# Create the instance of recognizer
recog = sr.Recognizer()

# Set the limit of the energy
recog.energy_threshold = 350

Recognizer class contains many of the built-in functions to convert audio  into text data

1. recognize_bing()
2. recognize_google()
3. recognize_google_cloud()
4. recognize_ibm()
5. recognize_wit()
6. recognize_houndify()
Input: Audio file

Output: Transcribed text

**Note:** Some of the api calls require credentials

In [None]:
# Translate using google api
text = recog.recognize_google(audio_data=audio_file, language='en-US')

if the audio file which you are passing to the recognize function is of different language, then specify the corresponding language to the second argument.

Else it will print the text in the English language.

Creating a AudioFile class using sr module

In [None]:
# Read in audio file
audio = sr.AudioFile()

# Check type of audio
type(audio)

# Output
# 

If we try to pass the audio variable to any one of the recognize function, it will throw and error. As the recognize functions accept only the audio_data input.

In this case, we need to convert the AudioFile to AudioData

In [None]:
# Convert from AudioFile to AudioData

with audio as src:
    audio_data = recognizer.record(src)

# Check the type
type(audio_data)

# Output
# 

In the above code snippet, the record function records the audio file in the form of audio_data.

In [None]:
# Leave duration and offset as default

with clean_support_call as source:
    clean_support_call_audio = recognizer.record(source, duration=None, offset=None)

The record function also takes two other arguments which is duration and offset.

Duration specifies the time for which the audio data should be recorded and the offset specifies the beginning byte from which the function start capturing the audio data.

Audio file can also be of non-speech data i.e. a roar of the lion, barking of the dog etc.

In this case, if you pass the audio file, you will get a **UnknownValueError**

In [None]:
# Import the leopard roar audio file
leopard_roar = sr.AudioFile("leopard_roar.wav")

# Convert the AudioFile to AudioData
with leopard_roar as source:
    leopard_roar_audio = recognizer.record(source)

# Recognize the AudioData
recognizer.recognize_google(leopard_roar_audio)

If you have trouble in hearing the audio file, the api will also have trouble.

In [None]:
# Import audio file with background nosie
noisy_support_call = sr.AudioFile(noisy_support_call.wav)

with noisy_support_call as source:
# Adjust for ambient noise and record
    recognizer.adjust_for_ambient_noise(source,duration=0.5)
    noisy_support_call_audio = recognizer.record(source)

# Recognize the audio
recognizer.recognize_google(noisy_support_call_audio)

The above code removes the noise from the audio clip by hearing the audio clip by listening to it for 0.5 duration.