# Train a CNN

Convolutional neural networks (CNNs) are popular tools for creating automated machine learning classifiers on images or image-like samples. By converting audio into a two-dimensional frequency vs. time representation such as a spectrogram, we can generate image-like samples that can be used to train CNNs. 

This tutorial demonstrates the basic use of OpenSoundscape's `preprocessors` and `cnn` modules for training CNNs and making predictions using CNNs.

Under the hood, OpenSoundscape uses Pytorch for machine learning tasks. By using the class `opensoundscape.ml.cnn.CNN`, you can train and predict with PyTorch's powerful CNN architectures in just a few lines of code. 

## Run this tutorial

This tutorial is more than a reference! It's a Jupyter Notebook which you can run and modify on Google Colab or your own computer.

|Link to tutorial|How to run tutorial|
| :- | :- |
| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kitzeslab/opensoundscape/blob/master/docs/tutorials/train_cnn.ipynb) | The link opens the tutorial in Google Colab. Uncomment the "installation" line in the first cell to install OpenSoundscape. |
| [![Download via DownGit](https://img.shields.io/badge/GitHub-Download-teal?logo=github)](https://minhaskamal.github.io/DownGit/#/home?url=https://github.com/kitzeslab/opensoundscape/blob/master/docs/tutorials/train_cnn.ipynb) | The link downloads the tutorial file to your computer. Follow the [Jupyter installation instructions](https://opensoundscape.org/en/latest/installation/jupyter.html), then open the tutorial file in Jupyter. |

In [None]:
# if this is a Google Colab notebook, install opensoundscape in the runtime environment
if 'google.colab' in str(get_ipython()):
  %pip install "opensoundscape==0.12.1" "jupyter-client<8,>=5.3.4" "ipykernel==6.17.1"
  num_workers=0
else:
  num_workers=4

## Setup

### Import needed packages

In [2]:
# the cnn module provides classes for training/predicting with various types of CNNs
from opensoundscape import CNN

#other utilities and packages
import torch
import pandas as pd
from pathlib import Path
import numpy as np
import pandas as pd
import random 
import subprocess
from glob import glob
import sklearn
import os
#set up plotting
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for large visuals
%config InlineBackend.figure_format = 'retina'

### Set random seeds

Set manual seeds for Pytorch and Python. These essentially "fix" the results of any stochastic steps in model training, ensuring that training results are reproducible. You probably don't want to do this when you actually train your model, but it's useful for debugging.

In [3]:
torch.manual_seed(0)
random.seed(0)
np.random.seed(0)

## Prepare audio data

In [4]:
from opensoundscape import BoxedAnnotations
from glob import glob
# audio_list = os.listdir("Audio/Macaulay Focal Recordings") + os.listdir("Audio/Xeno-canto/mp3s")
# len(audio_list)
audio_list = glob("/home/dah238/Kauai-Amakihi/Audio/Macaulay Focal Recordings/*") + glob("/home/dah238/Kauai-Amakihi/Audio/Xeno-canto/mp3s/*")
akek_audio_list = pd.read_csv("/home/dah238/Kauai-Amakihi/Annotations/akek_sam/sam_lapp_annotations.csv")["audio_file"].unique().tolist()

audio_list = audio_list+akek_audio_list

In [5]:
annotations_df = pd.concat([
    pd.read_csv("/home/dah238/Kauai-Amakihi/Annotations/macaulay/combined_output_macaulay_v2.csv"),
    pd.read_csv("/home/dah238/Kauai-Amakihi/Annotations/xeno_canto/combined_output_xeno_canto.csv"),
    pd.read_csv("/home/dah238/Kauai-Amakihi/Annotations/akek_sam/sam_lapp_annotations.csv")
], ignore_index=True)
annotationsfull = BoxedAnnotations(df= annotations_df,audio_files = audio_list)
annotationsfull

Unnamed: 0,audio_file,annotation_file,annotation,start_time,end_time,low_f,high_f,AKEK_song,point,notes,folder
0,/home/dah238/Kauai-Amakihi/Audio/Macaulay Foca...,,KAAM_song,1.170121,3.047612,,,,,,
1,/home/dah238/Kauai-Amakihi/Audio/Macaulay Foca...,,KAAM_song,11.091803,13.318115,,,,,,
2,/home/dah238/Kauai-Amakihi/Audio/Macaulay Foca...,,KAAM_song,23.051374,24.836450,,,,,,
3,/home/dah238/Kauai-Amakihi/Audio/Macaulay Foca...,,KAAM_song,29.264000,31.231800,,,,,,
4,/home/dah238/Kauai-Amakihi/Audio/Macaulay Foca...,,KAAM_song,40.564796,42.124980,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
947,/media/kiwi/datasets/finalized/kaua2024a/PUAI_...,,u,150.000000,153.000000,,,5.530422,OS-UUK-20,,OS-UUK-20_11919
948,/media/kiwi/datasets/finalized/kaua2024a/PUAI_...,,u,165.000000,168.000000,,,3.910681,OS-UUK-20,,OS-UUK-20_11919
949,/media/kiwi/datasets/finalized/kaua2024a/PUAI_...,,u,171.000000,174.000000,,,4.832873,OS-UUK-20,,OS-UUK-20_11919
950,/media/kiwi/datasets/finalized/kaua2024a/PUAI_...,,u,9.000000,12.000000,,,4.613531,OS-UUK-20,,OS-UUK-20_11919


In [26]:

%%capture
# Parameters to use for label creation
clip_duration = 3
clip_overlap = 1.5
min_label_overlap = 0.25
species_of_interest = ["KAAM_song"]

# Create dataframe of one-hot labels
clip_labels = annotationsfull.clip_labels(
    clip_duration = clip_duration, 
    clip_overlap = clip_overlap,
    min_label_overlap = min_label_overlap,
    class_subset = species_of_interest # You can comment this line out if you want to include all species.
)




In [27]:
clip_labels.reset_index()["file"].nunique()

449

In [28]:
clip_labels.shape

(70715, 1)

This is the way to do it without including blank audio files

**NOTE** Update this below chunk before each experiment phase

In [29]:
clip_labels.to_csv("/home/dah238/Kauai-Amakihi/Experiment3/labels_3sec_experiment3.csv")

## Create train, validation, and test datasets

To train and test a model, we use three datasets:

* The **training dataset** is used to fit your machine learning model to the audio data. 
* The **validation dataset** is a held-out dataset that is used to select hyperparameters (e.g. how many epochs to train for) during training
* The **test dataset** is another held-out dataset that we use to check how the model performs on data that were not available at all during training.

While both the training and validation datasets are used while training the model, the test dataset is never touched until the model is fully trained and completed.

The training and validation datasets may be gathered from the same source as each other. In contrast, the test dataset is often gathered from a different source to assess whether the model's performance generalizes to a real-world problem. For example, training and validation data might be drawn from an online database like Xeno-Canto, whereas the testing data is from your own field data. 

### Create a test dataset

We'll separate the test dataset first. For a good assessment of the model's generalization, we want the test set to be independent of the training and validation datasets. For example, we don't want to use clips from the same source recording in the training dataset and the test dataset.

For this example, we'll use the recordings in the folders `Recording_1`, `Recording_2` and `Recording_3` as our training and validation data, and use the recordings in folder `Recording_4` as our test data. 

In [30]:
from sklearn.model_selection import train_test_split

train_df, val_df = train_test_split(clip_labels,test_size=0.2)
train_df.shape,val_df.shape


((56572, 1), (14143, 1))

### Split training and validation datasets

Now, separate the remaining non-test data into training and validation datasets.

The idea of keeping a separate validation dataset is that, throughout training, we can 'peek' at the performance on the validation set to choose hyperparameters. (This is in contrast to the test dataset, which we will not look at until we've finished training our model.)

One important hyperparameter is the number of **epochs** to train to, in order to prevent overfitting. Each epoch includes one round of fitting on each training sample. 

If a model's performance on a training dataset continues to improve as it trains, but its performance on the validation dataset plateaus, this could incate the model is **overfitting** on the training dataset, learning information specific to those particular samples instead of gaining the ability to generalize to new data.

In [None]:
# # Split our training data into training and validation sets
# train_df, valid_df = sklearn.model_selection.train_test_split(
#     train_and_val_set, test_size=0.1, random_state=0
# )

In [31]:
train_df.to_csv("Experiment3/train_set.csv")
val_df.to_csv("Experiment3/valid_set.csv")
#look back to reorganize file save locations, EDIT THIS EVERY EXPERIMENT

### Resample data for even class representation

Before training, we will balance the number of samples of each class in the training set. This helps the model learn all of the classes, rather than paying too much attention to the classes with the most labeled annotations. 

In [32]:
train_df.KAAM_song.value_counts()

KAAM_song
False    55643
True       929
Name: count, dtype: int64

In [33]:
from opensoundscape.data_selection import resample

# upsample (repeat samples) so that all classes have 800 samples
balanced_train_df = resample(train_df, n_samples_per_class=636, random_state=0)
balanced_train_df.KAAM_song.value_counts()

KAAM_song
False    55643
True       636
Name: count, dtype: int64

## Set up model

Now we create a model object. We have to select several parameters when creating this object: its `architecture`, `classes`, and `sample_duration`. 

Some additional parameters can also be changed at this step, such as the preprocessor used to create spectrograms and the shape of the spectrograms. 

For more detail on this step, see the ["Customize CNN training"]("tutorials/CNN.html") tutorial.


### Create CNN object

Now, create a CNN object with this architecture, the classes we put into the dataframe above, and the same sample duration as we selected above.

The first time you run this script for a particular architecture, OpenSoundscape will download the desired architecture.

In [None]:
# Create a CNN object designed to recognize 3-second samples
from opensoundscape import CNN

# Use resnet34 architecture
architecture = "resnet18"

# Can use this code to get your classes, if needed
class_list = list(train_df.columns)
clip_duration = 3

model = CNN(
    architecture=architecture,
    classes=class_list,
    sample_duration=clip_duration,  # 3s, selected above
)

### Check model device

If a GPU is available on your computer, the CNN object automatically selects it for accellerating performance. You can override `.device` to use a specific device such as `cpu` or `cuda:3`

In [None]:
print(f"model.device is: {model.device}")


### Set up WandB model logging

While this step is optional, it is very helpful for model training. In this step, we set up model logging on a service called **Weights & Biases** (AKA WandB). 

Weights & Biases is a free website you can use to monitor model training. It is integrated with OpenSoundscape to include helpful functions such as checking on your model's training progress in real time, visualizing the spectrograms created for training your model, comparing multiple tries at training the same model, and more. For more information, check out this [blog post](https://wandb.ai/wandb_fc/repo-spotlight/reports/Community-Spotlight-OpenSoundscape--Vmlldzo0MDcwMTI4). 

The instructions below will help you set up `wandb` logging:

* Create an account on the [Weights and Biases website](https://wandb.ai/). 
* The first time you use `wandb`, you'll need to run `wandb.login()` in Python or `wandb login` on the command line, then enter the API key from your [settings](https://wandb.ai/settings) page
* In a Python script where you want to log model training, use `wandb.init()` as demonstrated below. The "Entity" or team option allows runs and projects to be shared across members in a group, making it easy to collaborate and see progress of other team members' runs.


As training progresses, performance metrics will be plotted to the wandb logging platform and visible on this run's web page. For example, this [wandb web page](https://wandb.ai/kitzeslab/opensoundscape%20training%20demo/runs/w1xyk7zr/workspace?workspace=user-samlapp) shows the content logged to wandb when this notebook was run by the Kitzes Lab. By default, OpenSoundscape + WandB integration creates several pages with information about the model:

- Overview: hyperparameters, run description, and hardware available during the run
- Charts: "Samples" panel with audio and images of preprocessed samples (useful for checking that your preprocessing performs as expected and your labels are correct)
- Charts: graphs of each class's performance metrics over training time
- Model: summary of model architecture
- Logs: standard output of training script
- System: computational performance metrics including memory, CPU use, etc

When training several models and comparing performance, the "Project" page of WandB provides comparisons of metrics and hyperparameters across training runs.

In [None]:
import wandb

try:
    wandb.login()
    wandb_session = wandb.init(
        entity="kitzeslab",  # replace with your entity/group name
        project="KAAM ",
        name=None,
    )
except:  # if wandb.init fails, don't use wandb logging
    # raise
    print("failed to create wandb session. wandb session will be None")
    wandb_session = None

In [None]:
# wandb_session = None

## Train the CNN

Finally, train the CNN for two epoch. Typically, we would train the model for more than two epochs, but because training is slow and is much better done outside of a Jupyter Notebook, we just include this as a short demonstration of training.

Each **epoch** is one pass-through of all of the samples in the training dataset, plus running predictions on the validation dataset. 

Each epoch is composed of smaller groups of samples called **batches**. The machine learning model predicts on every sample in the batch, then the model weights are updated based on those samples. Larger batches can increase training speed, but require more memory. If you get a memory error, try reducing the batch size.

We use default training parameters, but many aspects of CNN training can be customized (see the "Customize CNN training" tutorial for examples).

In [None]:
checkpoint_folder = Path("Experiment3/checkpoints")
checkpoint_folder.mkdir(exist_ok=True)
#CHANGE CHECKPOINT FOLDER PATH

In [None]:
balanced_train_df.head()

In [None]:
model.device = "cuda:1"
#changing to other snowy gpu *temp

In [None]:
%%capture --no-stdout --no-display
# Uncomment the line above to silence outputs from this cell

num_workers = 10
model.train(
    balanced_train_df,
    val_df,
    epochs=100, #change to more later
    batch_size=64,
    log_interval=100,  # log progress every 100 batches
    num_workers=num_workers,  # parallelized cpu tasks for preprocessing
    wandb_session=wandb_session,
    save_interval=1,  # save checkpoint every 10 epochs
    save_path=checkpoint_folder,  # location to save checkpoints
)

Once this is finished running, you have trained the CNN. 

To generate predictions on audio files using the CNN, use the `.predict()` method of the CNN object. Here, we apply a sigmoid activation layer which maps the CNN's outputs (all real numbers) to a 0-1 range. 

In [None]:
scores_df = model.predict(val_df, 
                          activation_layer="sigmoid",
                          batch_size=128
                          )

In [1]:
val_df.head
scores_df.values

NameError: name 'val_df' is not defined

In [None]:
model.eval(val_df.values,scores_df.values)

We don't expect this CNN to actually be good at classifying sounds, since we only trained it with a few examples and for a couple epochs. We'd want to train with hundreds of examples per class for 10-100 epochs as a starting point for training a useful model. 

For guidance on how to use machine learning classifiers, see the Classifieres 101 Guide on opensoundscape.org and the tutorial on predicting with pre-trained CNNs.


**Clean up:** Run the following cell to delete the files created in this tutorial. However, these files are used in other tutorials, so you may wish not to delete them just yet.