# Working with frequency data - An introduction

most natural and social phenomena have an element of periodicity - a tendency to repeat in semi-regular ways, whether human daily acticity levels, to climate behaviour, to the information contained in audio data. 

Before you can start to analyse the data, you need to be able to understand how to manipulate and visualise it. Today's short lesson will be looking at how we can represent whalesong visually, and if there are any trends that we use machine learning to extract.

This lesson is a dive into the deeper end of 

First, python cannot easily handle audio data and the more complex maths we want to carry out, so we need to call in some libraries to augment the functionality of the language

In [17]:
import pandas as pd  #This helps us contain our data in easy to handle Dataframes
import numpy as np # This provides additional mathematical support for handling our data
import matplotlib.pyplot as plt
from scipy.io import wavfile #This allows us to handle .wav audio files

%matplotlib notebook

AttributeError: module 'matplotlib' has no attribute 'cbook'

Now that we've added all our needed libraries, let's pull in our audio file. Our wavfile module will tell us two things - the recording rate of our file, and the audio data itself, once we provide it with a path to our file.

In [10]:
recording_rate, audio_data = wavfile.read("./Data/Humpback whale song from Monterey Bay.wav")
audio_data.shape

(5297152, 2)

We can check our data with `.shape`, which tells is that we have 5297152 data points. But we have two of them, which means we have stereo data. As we don't need to worry about ahndling both streams independently, let's average them into a mono audio instead.

We need to specify that we want an average across the two channels - otherwise the `.mean` function would average down instead, leaving us with the average of the entire clip per channel - not very useful. 

We'll then check the shape again to see if it worked.

In [11]:
audio_data = np.mean(audio_data, axis = 1)
audio_data.shape

(5297152,)

In [12]:
audio_data

array([0., 0., 0., ..., 0., 0., 0.])