# Bird Classifier


Goal of this notebook is to perform **data exploration**

Questions to check are
* Are there birds tone recorded from different regions
* Is bird tone recorded at different time have different response
* Is two bird tone correlated or how it is different
* Is there a difference in tone when bird type or mood is different




In [None]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import soundfile as sf
import librosa
import librosa.display
import IPython.display as display
from sklearn.model_selection import train_test_split
from keras.utils import Sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv1D, MaxPool1D, BatchNormalization
from keras.optimizers import RMSprop,Adam
from keras.applications import VGG19, VGG16, ResNet50
from tensorflow.keras import layers
import warnings
import tensorflow as tf
warnings.filterwarnings("ignore")
from sklearn.preprocessing import normalize

In [None]:
path = '/kaggle/input/birdclef-2021/'
os.listdir(path)

Reference: https://www.kaggle.com/stefankahl/birdclef2021-exploring-the-data

# Train meta data
* Primary label are the main species. 
* Based on each species in primary label, the number of recordings are limited for 500 per species
* Irrrespective of the above limitation, there is class imbalance in the dataset
* secondary label are based on background noises but are not completely reliable
* Other information like time and location are useful to base the analysis on when and where the birds migrated

# Train soundscape data
* train data soundscapes contain manually labelled dataset which shows what we can expect in submission file for each of the 2 recordings
* Its possible that some species are not available in the train soundscapes as in the hidden dataset
* Five segments are provided. A segment can have more than one bird and are seperated by space
* nocall bird label means there was no bird localization

# Test soundscape metadata
* Has more info on the location for each species

# Test data
* Similar to train data. contains 10 minute audio 
* Results has to be in 5 second chunk with 
*1234_SSW_20 bluwa1 redwa2*

Reference:
 Usually, you would take a dataset and make 80/10/10 split for train/val/test. In our case, we have two different types of data: Focal recordings (i.e., short audio) for training and soundscapes for testing. Yet, we also have training soundscapes which can be used either for training or validation. The hidden test set however, only contains soundscapes. So, some terminology is interchangeable. However, "train_short_audio" and "train_soundscapes" can be used during training, and you're free to use any subsets that you like to validate or train or have a hold-out local test set and so on. The "test_soundscapes" are hidden at the moment and will only appear during submission.

In [None]:
train_labels = pd.read_csv(path+'train_soundscape_labels.csv')
train_meta = pd.read_csv(path+'train_metadata.csv')
test_data = pd.read_csv(path+'test.csv')
samp_subm = pd.read_csv(path+'sample_submission.csv')

In [None]:
train_labels.head()

In [None]:
train_meta.head(5)

In [None]:
test_data

In [None]:
labels = []
for row in train_labels.index:
    labels.extend(train_labels.loc[row, 'birds'].split(' '))
labels = list(set(labels))

print('Number of unique bird labels:', len(labels))

In [None]:
# import plotly.express as px
# import pandas as pd
# fig = px.scatter_geo(train_meta,lat='latitude',lon='longitude', hover_name="primary_label")
# fig.update_layout(title = 'Bird classifier distribution', title_x=0.5)
# fig.show()

In [None]:
# import matplotlib.pyplot as plt
# import seaborn as sns
# import descartes
# import geopandas as gpd
# from shapely.geometry import Point, Polygon

# # SHP file
# world_map = gpd.read_file('../input/map-data/map_data/99bfd9e7-bb42-4728-87b5-07f8c8ac631c2020328-1-1vef4ev.lu5nk.shp')

# # Coordinate reference system
# crs = {"init" : "epsg:4326"}

# # Lat and Long need to be of type float, not object
# species_list = ['norcar', 'houspa', 'wesblu', 'banana']
# data = train[train['primary_label'].isin(species_list)]
# data["latitude"] = data["latitude"].astype(float)
# data["longitude"] = data["longitude"].astype(float)

# # Create geometry
# geometry = [Point(xy) for xy in zip(data["longitude"], data["latitude"])]

# # Geo Dataframe
# geo_df = gpd.GeoDataFrame(data, crs=crs, geometry=geometry)

# print(geo_df.head())
# # Create ID for species
# species_id = geo_df["primary_label"].value_counts().reset_index()
# species_id.insert(0, 'ID', range(0, 0 + len(species_id)))

# species_id.columns = ["ID", "primary_label", "count"]

# # Add ID to geo_df
# geo_df = pd.merge(geo_df, species_id, how="left", on="primary_label")

# # === PLOT ===
# fig, ax = plt.subplots(figsize = (16, 10))
# world_map.plot(ax=ax, alpha=0.4, color="grey")

# palette = iter(sns.hls_palette(len(species_id)))
# for i in range(len(species_list)):
#     geo_df[geo_df["ID"] == i].plot(ax=ax, 
#                                    markersize=20, 
#                                    color=next(palette), 
#                                    marker="o", 
#                                    label = species_id['primary_label'].values[i]);
    
# ax.legend()

In [None]:
# print(geo_df.head())

Referenced from https://www.kaggle.com/drcapa/birdclef-2021-starter

In [None]:
label = train_meta.loc[row, 'primary_label']
filename = train_meta.loc[row, 'filename']

In [None]:
train_meta

In [None]:
data, samplerate = sf.read(path+'train_short_audio/'+label+'/'+filename)
print(data[:8])
print(samplerate)
fig = plt.figure(figsize=(8, 4))
x = range(len(data))
y = data
plt.plot(x, y)
plt.plot(x, y, color='red')
plt.legend(loc='upper center')
plt.grid()

In [None]:
# Each audio file consists of 120 birds with a length of 5 seconds.
train_labels.nunique()

Each training data has the label of which bird, creating bird names as columns and encoding as 1 if its available

In [None]:
df_labels_train = pd.DataFrame(index=train_labels.index, columns=labels)
for row in train_labels.index:
    birds = train_labels.loc[row, 'birds'].split(' ')
    for bird in birds:
        df_labels_train.loc[row, bird] = 1
df_labels_train.fillna(0, inplace=True)
df_labels_test = pd.DataFrame(index=test_data.index, columns=labels)
test_data['birds'] = 'nocall'
for row in test_data.index:
    birds = test_data.loc[row, 'birds'].split(' ')
    for bird in birds:
        df_labels_test.loc[row, bird] = 1
df_labels_test.fillna(0, inplace=True)
train_labels = pd.concat([train_labels, df_labels_train], axis=1)
test_data = pd.concat([test_data, df_labels_test], axis=1)

In [None]:
test_data

In [None]:
data_lenght = 160000
audio_lenght = 5
num_labels = len(labels)
batch_size = 16
list_IDs_train, list_IDs_val = train_test_split(list(train_labels.index), test_size=0.33, random_state=2021)
list_IDs_test = list(samp_subm.index)
print(list_IDs_test)

In [None]:
samp_subm

In [None]:
def padding(array, xx, yy):
    """
    :param array: numpy array
    :param xx: desired height
    :param yy: desirex width
    :return: padded array
    """
    h = array.shape[0]
    w = array.shape[1]
    a = max((xx - h) // 2,0)
    aa = max(0,xx - a - h)
    b = max(0,(yy - w) // 2)
    bb = max(yy - b - w,0)
    return np.pad(array, pad_width=((a, aa), (b, bb)), mode='constant')

In [None]:
class DataGenerator(Sequence):
    # Class inherits the properties of keras.utils.sequence
    def __init__(self, path, list_IDs, data, batch_size):
        self.path = path
        self.list_IDs = list_IDs
        self.data = data
        self.batch_size = batch_size
        self.indexes = np.arange(len(self.list_IDs))
     
    # each call requests batch index between 0 and total number of batches, latter is specified in len
    def __len__(self):
        len_ = int(len(self.list_IDs)/self.batch_size) # sample/batch size
        if len_*self.batch_size < len(self.list_IDs):
            len_ += 1
        return len_ 
    
    # batch corresponding toa given index is called, generator executes get item to generate it
    def __getitem__(self, index):
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        list_IDs_temp = [self.list_IDs[k] for k in indexes]
        X, y = self.__data_generation(list_IDs_temp)
#         X = X.reshape((self.batch_size, 100, 1600//2))
        return X, y
    
    # Argument is list of IDs of the target batch
    def __data_generation(self, list_IDs_temp):
        X = np.zeros((self.batch_size, 128,1000,3))
        y = np.zeros((self.batch_size, num_labels))
        for i, ID in enumerate(list_IDs_temp):
            prefix = str(self.data.loc[ID, 'audio_id'])+'_'+self.data.loc[ID, 'site']
            file_list = [s for s in os.listdir(self.path) if prefix in s]
            if len(file_list) == 0:
                # Dummy for missing test audio files
                audio_file_fft = np.zeros((data_lenght//2))
            else:
                file = file_list[0]#[s for s in os.listdir(self.path) if prefix in s][0]
                audio_file, audio_sr = read_ogg_file(self.path, file)
                audio_file = audio_file[int((self.data.loc[ID, 'seconds']-5)/audio_lenght)*data_lenght:int(self.data.loc[ID, 'seconds']/audio_lenght)*data_lenght]
#                 audio_file_fft = np.abs(np.fft.fft(audio_file)[: len(audio_file)//2])
#                 # scale data
#                 audio_file_fft = (audio_file_fft-audio_file_fft.mean())/audio_file_fft.std()
#             X[i, ] = audio_file_fft
            y[i, ] = self.data.loc[ID, self.data.columns[5:]].values
            
            max_size=1000 #my max audio file feature width
            n_fft = 255 # window in num. of samples
            stft = padding(np.abs(librosa.stft(audio_file, n_fft=n_fft, hop_length=512)), 128, max_size)
            MFCCs = padding(librosa.feature.mfcc(audio_file, n_fft=n_fft, hop_length=512,n_mfcc=128),128,max_size)
            spec_centroid = librosa.feature.spectral_centroid(y=audio_file, sr=audio_sr)
            chroma_stft = librosa.feature.chroma_stft(y=audio_file, sr=audio_sr)
            spec_bw = librosa.feature.spectral_bandwidth(y=audio_file, sr=audio_sr)
            #Now the padding part
            image = np.array([padding(normalize(spec_bw),1, max_size)]).reshape(1,max_size)
            image = np.append(image,padding(normalize(spec_centroid),1, max_size), axis=0) 
        #repeat the padded spec_bw,spec_centroid and chroma stft until they are stft and MFCC-sized
            for i in range(0,9):
                image = np.append(image,padding(normalize(spec_bw),1, max_size), axis=0)
                image = np.append(image, padding(normalize(spec_centroid),1, max_size), axis=0)
                image = np.append(image, padding(normalize(chroma_stft),12, max_size), axis=0)
            image=np.dstack((image,np.abs(stft)))
            image=np.dstack((image,MFCCs))
            X[i,]=image
#         X = np.array((X-np.min(X))/(np.max(X)-np.min(X)))
#         X = X/np.std(X)
#         y = np.array(y)
        return X, y

In [None]:
# a=[[1,2],[2,3]]
# print(np.array(a).shape)

Test of CONV2D

In [None]:
def read_ogg_file(path, file):
    """ Read ogg audio file and return numpay array and samplerate"""
    
    data, samplerate = sf.read(path+file)
    return data, samplerate

In [None]:
# Test of how a short audio looks like
# audio_file, audio_sr = read_ogg_file(path+'train_short_audio/'+label+'/',filename)
# print(filename)
# plt.figure(figsize=(14, 5))
# librosa.display.waveplot(audio_file, sr=audio_sr)
# plt.grid()
# plt.show()

# # 1st file read from train soundscapes
# file = os.listdir(path+'train_soundscapes')[0]
# data, samplerate = read_ogg_file(path+'train_soundscapes/', file)
# sub_data = data[int(455/5)*160000:int(460/5)*160000]
# plt.figure(figsize=(14, 5))
# librosa.display.waveplot(sub_data, sr=samplerate)
# plt.grid()
# plt.show()
# audio_file = audio_file[int((self.data.loc[ID, 'seconds']-5)/audio_lenght)*data_lenght:int(self.data.loc[ID, 'seconds']/audio_lenght)*data_lenght]

In [None]:
# audio_file = sub_data
# sr = samplerate
# max_size=1000 #my max audio file feature width
# n_fft = 255 # window in num. of samples
# stft = padding(np.abs(librosa.stft(audio_file, n_fft=n_fft, hop_length=512)), 128, max_size)
# MFCCs = padding(librosa.feature.mfcc(audio_file, n_fft=n_fft, hop_length=512,n_mfcc=128),128,max_size)
# spec_centroid = librosa.feature.spectral_centroid(y=audio_file, sr=sr)
# chroma_stft = librosa.feature.chroma_stft(y=audio_file, sr=sr)
# spec_bw = librosa.feature.spectral_bandwidth(y=audio_file, sr=sr)
# #Now the padding part
# image = np.array([padding(normalize(spec_bw),1, max_size)]).reshape(1,max_size)
# image = np.append(image,padding(normalize(spec_centroid),1, max_size), axis=0) 
# #repeat the padded spec_bw,spec_centroid and chroma stft until they are stft and MFCC-sized
# for i in range(0,9):
#     image = np.append(image,padding(normalize(spec_bw),1, max_size), axis=0)
#     image = np.append(image, padding(normalize(spec_centroid),1, max_size), axis=0)
#     image = np.append(image, padding(normalize(chroma_stft),12, max_size), axis=0)
# image=np.dstack((image,np.abs(stft)))
# image=np.dstack((image,MFCCs))

In [None]:
train_generator = DataGenerator(path+'train_soundscapes/', list_IDs_train, train_labels, batch_size)
val_generator = DataGenerator(path+'train_soundscapes/', list_IDs_val, train_labels, batch_size)
test_generator = DataGenerator(path+'test_soundscapes/', list_IDs_test, test_data, batch_size)

In [None]:
input_shape=(128,1000,3)
CNNmodel = Sequential()
CNNmodel.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
CNNmodel.add(layers.MaxPooling2D((2, 2)))
CNNmodel.add(layers.Flatten())
CNNmodel.add(layers.Dense(32, activation='relu'))
CNNmodel.add(layers.Dense(49, activation='softmax'))
CNNmodel.summary()

In [None]:
CNNmodel.compile(optimizer='adam',loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False),metrics=['accuracy'])

In [None]:
epochs = 2
history = CNNmodel.fit_generator(generator=train_generator, validation_data=val_generator, epochs = epochs, workers=4)

In [None]:
y_pred = CNNmodel.predict_generator(test_generator, verbose=1)

In [None]:
history_dict=history.history
loss_values=history_dict['loss']
acc_values=history_dict['accuracy']
val_loss_values = history_dict['val_loss']
val_acc_values=history_dict['val_accuracy']
epochs=range(1,21)
fig,(ax1,ax2)=plt.subplots(1,2,figsize=(15,5))
ax1.plot(epochs,loss_values,'bo',label='Training Loss')
ax1.plot(epochs,val_loss_values,'orange', label='Validation Loss')
ax1.set_title('Training and validation loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.legend()
ax2.plot(epochs,acc_values,'bo', label='Training accuracy')
ax2.plot(epochs,val_acc_values,'orange',label='Validation accuracy')
ax2.set_title('Training and validation accuracy')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accuracy')
ax2.legend()
plt.show()

In [None]:
# epochs = 2
# lernrate = 2e-3


# model = Sequential()
# model.add(Conv1D(64, input_shape=(100, 1600//2,), kernel_size=5, strides=4, activation='relu'))
# model.add(BatchNormalization())
# model.add(MaxPool1D(pool_size=(4)))
# model.add(Conv1D(64, kernel_size=3, activation='relu'))
# model.add(BatchNormalization())
# model.add(Flatten())
# model.add(Dense(256, activation='relu'))
# model.add(Dense(num_labels, activation='sigmoid'))



In [None]:
# model.compile(optimizer = Adam(lr=lernrate),
#               loss='binary_crossentropy',
#               metrics=['binary_accuracy'])

In [None]:
# model.summary()

In [None]:
# history = model.fit_generator(generator=train_generator, validation_data=val_generator, epochs = epochs, workers=4)


In [None]:
# y_pred = model.predict_generator(test_generator, verbose=1)

In [None]:
y_test = np.where(y_pred > 0.5, 1, 0)
for row in samp_subm.index:
    string = ''
    for col in range(len(y_test[row])):
        if y_test[row][col] == 1:
            if string == '':
                string += labels[col]
            else:
                string += ' ' + labels[col]
    if string == '':
        string = 'nocall'
    samp_subm.loc[row, 'birds'] = string

In [None]:
output = samp_subm
output.to_csv('submission.csv', index=False)

In [None]:
output