In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# /kaggle/input/birdclef-2022/sample_submission.csv
# /kaggle/input/birdclef-2022/scored_birds.json
# /kaggle/input/birdclef-2022/eBird_Taxonomy_v2021.csv
# /kaggle/input/birdclef-2022/test.csv
# /kaggle/input/birdclef-2022/train_metadata.csv
# /kaggle/input/birdclef-2022/train_audio/bongul/XC516224.ogg

## ⬆️⬆️⬆️ 💁🏻‍♀️💁🏻‍♀️💁🏻‍♀️👩🏻‍🔬

## **If you find this notebook helpful. please upvote!** 



# BirdCLEF 2022 Audio Detection


## About the competition:

As the “extinction capital of the world,” Hawai'i has lost 68% of its bird species, the consequences of which can harm entire food chains. Researchers use population monitoring to understand how native birds react to changes in the environment and conservation efforts. But many of the remaining birds across the islands are isolated in difficult-to-access, high-elevation habitats. With physical monitoring difficult, scientists have turned to sound recordings. Known as bioacoustic monitoring, this approach could provide a passive, low labor, and cost-effective strategy for studying endangered bird populations.

![Image](https://storage.googleapis.com/kaggle-competitions/kaggle/33246/logos/header.png?t=2022-02-08-17-06-27)

Current methods for processing large bioacoustic datasets involve manual annotation of each recording. This requires specialized training and prohibitively large amounts of time. Thankfully, recent advances in machine learning have made it possible to automatically identify bird songs for common species with ample training data. However, it remains challenging to develop such tools for rare and endangered species, such as those in Hawai'i.

The Cornell Lab of Ornithology's K. Lisa Yang Center for Conservation Bioacoustics (KLY-CCB) develops and applies innovative conservation technologies across multiple ecological scales to inspire and inform the conservation of wildlife and habitats. KLY-CCB does this by collecting and interpreting sounds in nature and they've joined forces with Google Bioacoustics Group, LifeCLEF, Listening Observatory for Hawaiian Ecosystems (LOHE) Bioacoustics Lab at the University of Hawai'i at Hilo, and Xeno-Canto for this competition.

In this competition, you’ll use your machine learning skills to identify bird species by sound. Specifically, you'll develop a model that can process continuous audio data and then acoustically recognize the species. The best entries will be able to train reliable classifiers with limited training data.

If successful, you'll help advance the science of bioacoustics and support ongoing research to protect endangered Hawaiian birds. Thanks to your innovations, it will be easier for researchers and conservation practitioners to accurately survey population trends. They'll be able to regularly and more effectively evaluate threats and adjust their conservation actions.





## Data Description

Your challenge in this competition is to identify which birds are calling in long recordings given quite limited training data. This is the exact challenge faced by scientists trying to monitor rare birds in Hawaii. For example, there are only a few thousand individual Nene geese left in the world, which makes it difficult to acquire recordings of their calls.

This competition uses a hidden test. When your submitted notebook is scored, the actual test data (including a sample submission) will be availabe to your notebook.

![Bird](https://storage.googleapis.com/kaggle-media/competitions/Birdsong/Screen%20Shot%202022-02-08%20at%202.04.09%20PM.png)

## Files

* train_metadata.csv - A wide range of metadata is provided for the training data. The most directly relevant fields are:

        - primary_label - a code for the bird species. You can review detailed information about the bird codes by appending the code to https://ebird.org/species/, such as https://ebird.org/species/amecro for the American Crow.
        - secondary_labels: Background species as annotated by the recordist. An empty list does not mean that no background birds are audible.
        - author - the eBird user who provided the recording.
        - filename: the associated audio file.
        - rating: Float value between 0.0 and 5.0 as an indicator of the quality rating on Xeno-canto and the number of background species, where 5.0 is the highest and 1.0 is the lowest. 0.0 means that this recording has no user rating yet.
        
 
* train_audio/ - The bulk of the training data consists of short recordings of individual bird calls generously uploaded by users of xenocanto.org. These files have been downsampled to 32 kHz where applicable to match the test set audio and converted to the ogg format.

* test_soundscapes/ - When you submit a notebook, the test_soundscapes directory will be populated with approximately 5,500 recordings to be used for scoring. These are each within a few milliseconds of 1 minute long and in the ogg audio format. Only one soundscape is available for download.

* test.csv - Metadata for the test set. Only the first three rows are available for download; the full test.csv is provided in the hidden test set.

        - row_id - A unique identifier for the row.
        - file_id - A unique identifier for the audio file.
        - bird - The ebird code for the row. There is one row for each of the scored species per 5 second window per audio file.
        - end_time - The last second of the 5 second time window (5, 10, 15, etc).
        

* sample_submission.csv - A valid sample submission. Only the first three rows are available for download; the full submission.csv is provided in the hidden test set.

        - row_id - A unique identifier for the row.
        - target - True/False for whether or not the bird in question called during the 5 second window.
        
        
* scored_birds.json - The subset of the species in the dataset that are scored.

* eBird_Taxonomy_v2021.csv - Data on the relationships between different species.

## Load the data 

In [None]:
main_dir = '../input/birdclef-2022/'
train = pd.read_csv(main_dir + 'train_metadata.csv')
train.head(5)

In [None]:
def load_data(f):
    t = pd.read_csv(main_dir+str(f))
    return t

In [None]:
train_df = load_data('train_metadata.csv')
train_df.head(5)

In [None]:
train.info()

In [None]:
ss = load_data('sample_submission.csv')
ss.head()

In [None]:
test = load_data('test.csv')
test.head()

## Librosa : Audio data handling 

librosa is a python package for music and audio analysis.Librosa is basically used when we work with audio data like in music generation(using LSTM's), Automatic Speech Recognition. It provides the building blocks necessary to create the music information retrieval systems.librosa uses soundfile and audioread to load audio files. Note that soundfile does not currently support MP3, which will cause librosa to fall back on the audioread library.

More about audio data hands-on: 

* [Hands-On Guide To Librosa For Handling Audio Files](https://analyticsindiamag.com/hands-on-guide-to-librosa-for-handling-audio-files/)
* [Visualizing Sounds Using Librosa Machine Learning Library!](https://analyticsvidhya.com/blog/2021/06/visualizing-sounds-librosa/)

*Let's load data from audio folders*

In [None]:
train_audio_path = main_dir+'train_audio/'
test_audio_path = main_dir+'test_soundscapes/'
sample_audio_path = '/kaggle/input/birdclef-2022/train_audio/bongul/XC516224.ogg'

In [None]:
import librosa
# audio_data = y, sample_rate = sr
y, sr = librosa.load(sample_audio_path)
print('audio_data',y.shape)

## What does “stft” mean in the code?
Stft is a short form for Short-Time Fourier Transform. As mentioned in Librosa official documentation, “The STFT represents a signal in the time-frequency domain by computing discrete Fourier transforms (DFT) over short overlapping windows” (Librosa Development Team, 2021)

In [None]:
D = librosa.stft(y)
D

## What does “Chroma” mean?

Chroma is a type of transformation of sounds into numerical values. The majority of the time, Chroma can become a vector data type. A synopsis of Chroma history includes the process of feature extraction and can
become a vital part of data engineering.

In [None]:
s = np.abs(librosa.stft(y)**2) # Get magnitude of stft
chroma = librosa.feature.chroma_stft(S = s, sr = sr)
print(chroma)

In [None]:
##cumulative sum function involves adding values of a specific axis.
chroma = np.cumsum(chroma)

import matplotlib.pyplot as plt

x = np.linspace(-chroma, chroma)
plt.plot(x , np.sin(x))
plt.xlabel('Angle [rad]')
plt.ylabel('sin(x)')
plt.axis('tight')
plt.show()

If the audio file inside this code were to become replaced with another file, shapes, and movements inside the graph would be different and vary as features of each element of sound can change. 

In [None]:
import librosa.display
chroma_orig = librosa.feature.chroma_cqt(y,sr)

idx = tuple([slice(None),slice(*list(librosa.time_to_frames([45,60])))])
C = np.abs(librosa.cqt(y=y, sr=sr, bins_per_octave=12*3, n_bins=7*12*3))
fig, ax = plt.subplots(nrows=2, sharex=True)
img1 = librosa.display.specshow(librosa.amplitude_to_db(C, ref=np.max)[idx],
                                y_axis='cqt_note', x_axis='time', bins_per_octave=12*3,
                                ax=ax[0])
fig.colorbar(img1, ax=[ax[0]], format="%+2.f dB")
ax[0].label_outer()

img2 = librosa.display.specshow(chroma_orig[idx], y_axis='chroma', x_axis='time', ax=ax[1])
fig.colorbar(img2, ax=[ax[1]])
ax[1].set(ylabel='Default chroma')

We can do better by isolating the harmonic component of the audio signal. We’ll use a large margin for separating harmonics from percussives:



In [None]:
y_harm = librosa.effects.harmonic(y=y, margin=8)
chroma_harm = librosa.feature.chroma_cqt(y=y_harm, sr=sr)


fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
librosa.display.specshow(chroma_orig[idx], y_axis='chroma', x_axis='time', ax=ax[0])
ax[0].set(ylabel='Default chroma')
ax[0].label_outer()

librosa.display.specshow(chroma_harm[idx], y_axis='chroma', x_axis='time', ax=ax[1])
ax[1].set(ylabel='Harmonic')

There’s still some noise in there though. We can clean it up using non-local filtering. This effectively removes any sparse additive noise from the features.

In [None]:
chroma_filter = np.minimum(chroma_harm,
                           librosa.decompose.nn_filter(chroma_harm,
                                                       aggregate=np.median,
                                                       metric='cosine'))


fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
librosa.display.specshow(chroma_harm[idx], y_axis='chroma', x_axis='time', ax=ax[0])
ax[0].set(ylabel='Harmonic')
ax[0].label_outer()

librosa.display.specshow(chroma_filter[idx], y_axis='chroma', x_axis='time', ax=ax[1])
ax[1].set(ylabel='Non-local')

Local discontinuities and transients can be suppressed by using a horizontal median filter.

In [None]:
import scipy
chroma_smooth = scipy.ndimage.median_filter(chroma_filter, size=(1, 9))


fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
librosa.display.specshow(chroma_filter[idx], y_axis='chroma', x_axis='time', ax=ax[0])
ax[0].set(ylabel='Non-local')
ax[0].label_outer()

librosa.display.specshow(chroma_smooth[idx], y_axis='chroma', x_axis='time', ax=ax[1])
ax[1].set(ylabel='Median-filtered')

A final comparison between the CQT, original chromagram and the result of our filtering.

In [None]:
fig, ax = plt.subplots(nrows=3, sharex=True)
librosa.display.specshow(librosa.amplitude_to_db(C, ref=np.max)[idx],
                         y_axis='cqt_note', x_axis='time',
                         bins_per_octave=12*3, ax=ax[0])
ax[0].set(ylabel='CQT')
ax[0].label_outer()
librosa.display.specshow(chroma_orig[idx], y_axis='chroma', x_axis='time', ax=ax[1])
ax[1].set(ylabel='Default chroma')
ax[1].label_outer()
librosa.display.specshow(chroma_smooth[idx], y_axis='chroma', x_axis='time', ax=ax[2])
ax[2].set(ylabel='Processed')

# Chroma variants

There are three chroma variants implemented in librosa: chroma_stft, chroma_cqt, and chroma_cens. chroma_stft and chroma_cqt are two alternative ways of plotting chroma. chroma_stft performs short-time fourier transform of an audio input and maps each STFT bin to chroma, while chroma_cqt uses constant-Q transform and maps each cq-bin to chroma.

A comparison between the STFT and the CQT methods for chromagram.



In [None]:
chromagram_stft = librosa.feature.chroma_stft(y=y, sr=sr)
chromagram_cqt = librosa.feature.chroma_cqt(y=y, sr=sr)


fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
librosa.display.specshow(chromagram_stft[idx], y_axis='chroma', x_axis='time', ax=ax[0])
ax[0].set(ylabel='STFT')
ax[0].label_outer()

librosa.display.specshow(chromagram_cqt[idx], y_axis='chroma', x_axis='time', ax=ax[1])
ax[1].set(ylabel='CQT')

CENS features (chroma_cens) are variants of chroma features introduced in Müller and Ewart, 2011, in which additional post processing steps are performed on the constant-Q chromagram to obtain features that are invariant to dynamics and timbre.

Thus, the CENS features are useful for applications, such as audio matching and retrieval.

**Following steps are additional processing done on the chromagram, and are implemented in chroma_cens:**

    1. L1-Normalization across each chroma vector

    2. Quantization of the amplitudes based on “log-like” amplitude thresholds

    3. Smoothing with sliding window (optional parameter)

    4. Downsampling (not implemented)

A comparison between the original constant-Q chromagram and the CENS features.



In [None]:
chromagram_cens = librosa.feature.chroma_cens(y=y, sr=sr)


fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
librosa.display.specshow(chromagram_cqt[idx], y_axis='chroma', x_axis='time', ax=ax[0])
ax[0].set(ylabel='Orig')

librosa.display.specshow(chromagram_cens[idx], y_axis='chroma', x_axis='time', ax=ax[1])
ax[1].set(ylabel='CENS')

# Takeaways:

    * Audio files can translate to visuals without the creation of data tables.
    
    * Librosa can generate many views of audio files and become interpreted accordingly.
    
    * Trigonometry and general math are appropriate for sound analytics.
    
    * Depending on the demographics of listeners who intentionally or unintentionally listen to audio files of this nature, opinions, and interpretations can vary from each individual.

![Chirping birds](https://tf-cmsv2-smithsonianmag-media.s3.amazonaws.com/filer/52/a7/52a72deb-8158-40cf-ac7e-75f794124a78/sparrowsongcovidstoryimage.jpg)

## ⬆️⬆️⬆️ 💁🏻‍♀️💁🏻‍♀️💁🏻‍♀️👩🏻‍🔬

## **If you find this note book helpful. please upvote!** 

## ⬆️⬆️⬆️

### Work in Progress!