# Rainforest Connection Data Exploration 🦜
Hey, welcome to my notebook :)

I'll be trying to extract useful information from the dataset and describing my thought process along the way.
Let's get into it!

*Note:* A lot of this information is new to me, and I'm always open to suggestions! I'll leave space at the top here to include where and who I've gathered some of my knowledge from, feel free to expand the section below if interested! Remember that open-source is a beautiful thing, so long as people share their knowledge AND cite their inspiration :)

### Resources/Citations:
* Valerio Velardo has an amazing series on YouTube regarding deep learning and audio analysis. Much of my inspiration comes from his tutorial videos, and he deserves the attention way more than me! Please check out his videos is you're interested in learning more: https://www.youtube.com/playlist?list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf

## Importing Data and Loading Modules

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import librosa as lb # audio proccessing
import librosa.display # cool audio visuals
import matplotlib.pyplot as plt # to support librosa display
import IPython.display as ipd # for playing audio

import os

# View files if interested
# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

## Displaying a Waveform

In [None]:
file = '/kaggle/input/rfcx-species-audio-detection/train/003b04435.flac'
signal, sr = lb.load(file, sr=22050) # load file into librosa with sample rate
lb.display.waveplot(signal, sr=sr)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

ipd.Audio(file) # to listen to the audio

Isn't that neat?! This is the waveform of the audio in the time-domain. In it's current state it shows us the amplitude of the wave as it progresses over time, but not much more. In order to get more information on this audio, we have to apply some transformations.

To switch from the time-domain to frequency-domain, we can apply a Fast Fourier Transformation (FFT). For that, we'll use numpy.

## Fast Fourier Transform (FFT)

In [None]:
fft = np.fft.fft(signal)

magnitude = np.abs(fft)
frequency = np.linspace(0, sr, len(magnitude))
plt.plot(frequency, magnitude)
plt.xlabel('Frequency')
plt.ylabel('Magnitude')
plt.show()

So cool :)

This gives us the magnitude of all the frequencies in our audio, over the duration of the whole file. 

If you have a keen eye, you might've noticed that this graph is symmetrical! This is due to a property of FFTs that I won't get into, but we can reduce this graph to only the needed information, as the second half of the graph is duplicated.

In [None]:
left_frequency = frequency[:int(len(frequency)/2)]
left_magnitude = magnitude[:int(len(frequency)/2)]
plt.plot(left_frequency, left_magnitude)
plt.xlabel('Frequency')
plt.ylabel('Magnitude')
plt.show()  

There we go! You can see that the loudest sounds in this file are at ~2000Hz and ~4200Hz. Maybe this can help give us a better idea of what species are present? 🤔

## Short Time Fourier Transforms (STFT)
Remember that this is the magnitude of the frequencies throughout the **whole duration of the audio**. A more useful graph would show us what frequencies are present on a time axis. One idea to implement this would be to make a bunch of these frequency-domain graphs for short durations in the audio, and then combine them together to form a time axis. This is what Short Time Fourier Transforms (STFT) do, and the visuals we can produce with this are know as **spectrograms**.

In [None]:
n_fft = 2048 # number of samples per FFT (the duration of each slice)
hop_length = 512 # shift

stft = lb.core.stft(signal, hop_length=hop_length, n_fft=n_fft)

spectrogram = np.abs(stft)

lb.display.specshow(spectrogram, sr=sr, hop_length=hop_length)
plt.xlabel('Time')
plt.ylabel('Frequency')
clb = plt.colorbar()
clb.set_label('Amplitude')
plt.show()  

Now we see that frequency is mapped over time, with the color representing the amplitude of the frequency at that time. That's sorta cool... but I can't see much!

Let's apply a logarithm to change our amplitude to decibels, hopefully we can see better afterwards...

In [None]:
log_spectrogram = lb.amplitude_to_db(spectrogram)

lb.display.specshow(log_spectrogram, sr=sr, hop_length=hop_length)
plt.xlabel('Time')
plt.ylabel('Frequency')
clb = plt.colorbar()
clb.set_label('Amplitude')
plt.show()  

Much better! As deduced by our frequency-domain graph earlier, we can see again where the loudest frequencies are present. But now we can see how that fluctuates over time, and how different frequencies may pop in or out. 

## Mel Frequency Cepstal Coefficients (MFCCs)
The last feature that is useful to extract is the Mel Frequency Cepstral Coefficients (MFCCs). This can give you information about the timbral/textural aspects of the audio, and approximate how the human auditory system interprets sound. This is especially useful in speech recognition, but could prove very important for this competition as well! 

In [None]:
MFFCs = lb.feature.mfcc(signal, n_fft=n_fft, hop_length=hop_length, n_mfcc=13) # 13 coefficients

lb.display.specshow(MFFCs, sr=sr, hop_length=hop_length)
plt.xlabel('Time')
plt.ylabel('MFCC')
clb = plt.colorbar()
clb.set_label('Volume')
plt.show()  

Groovy!

## Conclusion 
For now, I won't get into how this transformation is performed or how it can be implemented with machine learning. If enough people are intersted, I'll continue this series, let me know!

See what you can find, and share your new-found knowledge with the community! Happy coding :)