# **<span style='color:#A80808'>🎯 Goal</span>**

Identify bird species by sound: process continuous audio data and then acoustically recognize the species.

# **<span style='color:#A80808'>🔑 Metric</span>**
Submissions are evaluated on a metric that is most similar to the macro F1 score. Given the amount of audio data used in this competition it wasn't feasible to label every single species found in every soundscape. Instead only a subset of species are actually scored for any given audio file. After dropping all of the un-scored rows we technically run a weighted classification accuracy with the weights set such that all of the species are assigned the same total weight and the true negatives and true positives for each species have the same weight. The extra complexity exists purely to allow us to have a great deal of control over which birds are scored for a given soundscape. For offline cross validation purposes, the macro F1 is the closest analogue to the actual metric.




![](https://www.allaboutbirds.org/news/wp-content/uploads/2020/04/RBGull-Vyn-FI-1280x720.jpg)
[Source](https://www.allaboutbirds.org/news/how-to-photograph-birds-in-flight/)

# **<span style='color:#A80808'>💾 Data</span>**

## train_metadata.csv

A wide range of metadata is provided for the training data. The most directly relevant fields are:
- primary_label - a code for the bird species. You can review detailed information about the bird codes by appending the code to https://ebird.org/species/, such as https://ebird.org/species/amecro for the American Crow.
- secondary_labels: Background species as annotated by the recordist. An empty list does not mean that no background birds are audible.
- author - the eBird user who provided the recording.
- filename: the associated audio file.
- rating: Float value between 0.0 and 5.0 as an indicator of the quality rating on Xeno-canto and the number of background species, where 5.0 is the highest and 1.0 is the lowest. 0.0 means that this recording has no user rating yet.

## train_audio/ 
- The bulk of the training data consists of short recordings of individual bird calls generously uploaded by users of xenocanto.org. These files have been downsampled to 32 kHz where applicable to match the test set audio and converted to the ogg format.

## test_soundscapes/ 
- When you submit a notebook, the test_soundscapes directory will be populated with approximately 5,500 recordings to be used for scoring. These are each within a few milliseconds of 1 minute long and in the ogg audio format. Only one soundscape is available for download.

## test.csv 

Metadata for the test set. Only the first three rows are available for download; the full test.csv is provided in the hidden test set.

- row_id - A unique identifier for the row.
- file_id - A unique identifier for the audio file.
- bird - The ebird code for the row. There is one row for each of the scored species per 5 second window per audio file.
- end_time - The last second of the 5 second time window (5, 10, 15, etc).

## sample_submission.csv 
A valid sample submission. Only the first three rows are available for download; the full submission.csv is provided in the hidden test set.

- row_id - A unique identifier for the row.
- target - True/False for whether or not the bird in question called during the 5 second window.
- scored_birds.json - The subset of the species in the dataset that are scored.

## eBird_Taxonomy_v2021.csv 
Data on the relationships between different species.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import random

from IPython.display import Image
import librosa
import librosa.display
import IPython.display as ipd

## **<span style='color:#A80808'>Train metadata</span>**

In [None]:
train_metadata = pd.read_csv('../input/birdclef-2022/train_metadata.csv')
print(f'Shape of train_metadata: {train_metadata.shape}')
train_metadata.head()

### Location

In [None]:
#https://www.kaggle.com/prokaggler/birdclef-2022-eda
fig = px.scatter_geo(
    train_metadata,
    lat="latitude",
    lon="longitude",
    color="common_name",
    width=1000,
    height=500,
    title="BirdCLEF 2022 Training Data Location Plot",
)
fig.show()

### Spacies

In [None]:

print(f'There are {train_metadata.primary_label.nunique()} bird species:\n{train_metadata.primary_label.unique()}')

In [None]:
pie, ax = plt.subplots(figsize=[20,12])
train_metadata.groupby('primary_label').size().plot(kind='pie',
                                                    #autopct='%.2f',
                                                    ax=ax,
                                                    title='Spacies distibution',
                                                    rotatelabels =True,
                                                    cmap = 'hot')
plt.show()


In [None]:
plt.figure(figsize=(20,10))
plt.bar(train_metadata.primary_label.unique(), height=train_metadata.groupby('primary_label').size())
plt.xticks(rotation=90)
plt.title('Spacies distribution')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train_metadata.groupby('primary_label').size().hist(bins=100)
plt.xlabel('Unique values')
plt.ylabel('Frequence')
plt.show()

### Background spacies

In [None]:
print(f'There are {train_metadata.secondary_labels.nunique()-1} background specie groups:\n {train_metadata.secondary_labels.unique()}')

### Authors

In [None]:
print(f'There are {train_metadata.author.nunique()} authors:\n{train_metadata.author.unique()}')

In [None]:
plt.figure(figsize=(10,7))
train_metadata.groupby('author').size().hist(bins=1000)
plt.xlabel('Author unique values')
plt.ylabel('Frequence')
plt.yscale('log')
plt.show()

In [None]:
top_author = train_metadata.groupby('author').size().max()
print(f'Number of samples by the top author: {top_author}')

### Audio files

In [None]:
print(f'There are {train_metadata.filename.nunique()} audio files:\n{train_metadata.filename.unique()}')

![](https://cdn.download.ams.birds.cornell.edu/api/v1/asset/97647371/1200)

[afrsil1](https://ebird.org/species/afrsil1?siteLanguage=fr)

In [None]:
#https://www.kaggle.com/amandanavine/hawaiian-bird-species
x, sr = librosa.load('../input/birdclef-2022/train_audio/afrsil1/XC125458.ogg', duration=20)
%matplotlib inline
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
plt.figure(figsize=(14, 5))
librosa.display.specshow(Xdb, cmap='hot', sr=sr, x_axis='s', y_axis='hz')
ipd.Audio('../input/birdclef-2022/train_audio/afrsil1/XC125458.ogg')

![](https://abcbirds.org/wp-content/uploads/2017/02/Akekee-_Jim-Denny.jpg)

[akekee](https://abcbirds.org/bird/akekee/)

In [None]:
x, sr = librosa.load('../input/birdclef-2022/train_audio/akekee/XC174953.ogg', duration=20)
%matplotlib inline
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
plt.figure(figsize=(14, 5))
librosa.display.specshow(Xdb, cmap='hot', sr=sr, x_axis='s', y_axis='hz')
ipd.Audio('../input/birdclef-2022/train_audio/akekee/XC174953.ogg')

![](https://cdn.download.ams.birds.cornell.edu/api/v1/asset/96737001/900)

[akepa1](https://ebird.org/species/akepa1/US)

In [None]:
x, sr = librosa.load('../input/birdclef-2022/train_audio/akepa1/XC122473.ogg', duration=20)
%matplotlib inline
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
plt.figure(figsize=(14, 5))
librosa.display.specshow(Xdb, cmap='hot', sr=sr, x_axis='s', y_axis='hz')
ipd.Audio('../input/birdclef-2022/train_audio/akepa1/XC122473.ogg')

### Rating

In [None]:
print(f'There are {train_metadata.rating.nunique()} rating scores from 0 (no voting yet) to 5 (highest score):\n{train_metadata.rating.unique()}')

In [None]:
pie, ax = plt.subplots(figsize=[20,12])
train_metadata.groupby('rating').size().plot(kind='pie',
                                                    #autopct='%.2f',
                                                    ax=ax,
                                                    title='Rating distibution',
                                                    rotatelabels =True,
                                                    cmap = 'hot')
plt.show()

# **<span style='color:#A80808'>🏆 Submission</span>**

In [None]:
submission = pd.read_csv('../input/birdclef-2022/sample_submission.csv')
submission['target'] = bool(random.getrandbits(1))
submission.to_csv('submission.csv', index=False)
submission.head()

# This notebook is under construction 🏗