# Creating a training database

In this tutorial, we will use Ketos to create a database that can be used to train a deep learning classifier. 

We will use a subset of the data described in [Kirsebom et al. 2020](https://asa.scitation.org/doi/10.1121/10.0001132). These data consist of 3-s long clips, some containing right whale upcalls and others containing only background noise. The clips are wave files extracted from recordings produced by bottom-mounted hydrophones in the Gulf of Saint Lawrence, Canada.


Our starting point will be a collection of .wav files accompanied by annotations. You can find them in the `data` folder within the .zip file linked at the top of this page. In the `train` folder, there are 2,000 files, half of them containing upcalls and the other half containing background noise (which, for our purpose, is any sound that is not an upcall. This includes sounds produced by other animals and the overall ambient noise). The `annotations_train.csv` file contains the label attributed to each file: 1 for upcall, 0 for background. Similarly, the `val` (validation) folder contains 200 .wav files (50% with upcalls) and is accompanied by the `annotations_val.csv` file.

We will use Ketos to produce a database with spectrogram representations of the training and validation clips, so that we later can train a deep learning classifier to distinguish the upcalls from the other sounds. Eventually, we will use that classifier to build a detector.

A different scenario would be where you have audio recordings and annotations indicating where in these recording the signals of interest are, but you don't have clips of uniform length with examples of the target signal(s) and background. That case is covered in [this tutorial](https://docs.meridian.cs.dal.ca/ketos/tutorials/create_database/index.html).

We also encourage you to explore the [documentation](https://docs.meridian.cs.dal.ca/ketos/index.html), since Ketos has a variety of tools that might help you to build training databases in different scenarios.

## Contents:

[1. Importing the packages](#section1)  
[2. Loading the annotations](#section2)  
[3. Putting the annotations in the Ketos format](#section3)  
[4. Choosing the spectrogram settings](#section4)  
[5. Creating the database](#section5)  



<a id=section1></a>

### 1. Importing the packages
For this tutorial we will use several modules within ketos. We will also the pandas to read our annotations files.



In [1]:
import pandas as pd
from ketos.data_handling import selection_table as sl
import ketos.data_handling.database_interface as dbi
from ketos.data_handling.parsing import load_audio_representation
from ketos.audio.spectrogram import MagSpectrogram
from ketos.data_handling.parsing import load_audio_representation
import os

# Change the working directory
os.chdir('C:\\Users\\kaitlin.palmer\\Desktop\\KetosMinke\\Training Data\\CompletedModels\\20230524_01')

# test to see ketos loaded properly
#help(load_audio_representation)

  "class": algorithms.Blowfish,


<a id=section2></a>

### 2. Loading the annotations
Our annotations are saved in two `.csv` files: `annotations_train.csv` and `annotations_val.csv`, which we will use to create the training and validation datasets respectively. 

In [2]:
annot_train = pd.read_csv("C:\\Users\\kaitlin.palmer\\Desktop\\KetosMinke\\Training Data\\TP12khz\\TrainMinke.csv")
annot_val = pd.read_csv("C:\\Users\\kaitlin.palmer\\Desktop\\KetosMinke\\Training Data\\TP12khz\\valMinke.csv")

Let's inspect our annotations

In [3]:
annot_train

Unnamed: 0,sound_file,label
0,Detection_6673.220622202225.wav.103000018CH1_2...,0
1,Detection_6673.220622202225.wav.103000019CH1_2...,0
2,Detection_6673.220622202225.wav.103000020CH1_2...,0
3,Detection_6673.220622202225.wav.103000021CH1_2...,0
4,Detection_6673.220622202225.wav.103000023CH1_2...,0
...,...,...
15627,Augment_2812_SWAPS-042597-NS62a.wav,1
15628,Augment_2813_SWAPS-042597-NS62b.wav,1
15629,Augment_2814_SWAPS-042597-NS63.wav,1
15630,Augment_2816_SWAPS-042597-NS65.wav,1


In [4]:
annot_val

Unnamed: 0,sound_file,label
0,Detection_6673.220622202225.wav.103000022CH1_2...,0
1,Detection_6673.220622202225.wav.103000027CH1_2...,0
2,Detection_6673.220622202225.wav.103000032CH1_2...,0
3,Detection_6673.220622202225.wav.103000037CH1_2...,0
4,Detection_6673.220622202225.wav.103000042CH1_2...,0
...,...,...
3901,Augment_2795_SWAPS-042197-NS48.wav,1
3902,Augment_2800_SWAPS-042197-NS52.wav,1
3903,Augment_2805_SWAPS-042297-NS56b.wav,1
3904,Augment_2810_SWAPS-042297-NS60b.wav,1


The **annot_train** dataframe contains 2000 rows and the **annot_val** 200.
The columns indicate:

**sound_file:** name of the audio file  
**label:** label for the annotation (1 for upcall, 0 for background))  



<a id=section3></a>

### 3. Putting the annotations in the Ketos format
Let's check if our annotations follow the Ketos standard.

If that's the case, the function ```sl.is_standardized``` will return ```True```. 


In [5]:
sl.is_standardized(annot_train)

 Your table is not in the Ketos format.

            It should have two levels of indices: filename and annot_id.
            It should also contain at least the 'label' column.
            If your annotations have time information, these should appear in the 'start' and 'end' columns

            extra columns are allowed.

            Here is a minimum example:

                                 label
            filename  annot_id                    
            file1.wav 0          2
                      1          1
                      2          2
            file2.wav 0          2
                      1          2
                      2          1


            And here is a table with time information and a few extra columns ('min_freq', 'max_freq' and 'file_time_stamp')

                                 start   end  label  min_freq  max_freq  file_time_stamp
            filename  annot_id                    
            file1.wav 0           7.0   8.1      2    180.6     2

False

In [6]:
sl.is_standardized(annot_val) 

 Your table is not in the Ketos format.

            It should have two levels of indices: filename and annot_id.
            It should also contain at least the 'label' column.
            If your annotations have time information, these should appear in the 'start' and 'end' columns

            extra columns are allowed.

            Here is a minimum example:

                                 label
            filename  annot_id                    
            file1.wav 0          2
                      1          1
                      2          2
            file2.wav 0          2
                      1          2
                      2          1


            And here is a table with time information and a few extra columns ('min_freq', 'max_freq' and 'file_time_stamp')

                                 start   end  label  min_freq  max_freq  file_time_stamp
            filename  annot_id                    
            file1.wav 0           7.0   8.1      2    180.6     2

False

Neither of our annotations are in the format ketos expects. But we can use the ```sl.standardize``` function to convert to the specified format.

The *annot_id* column is created automatically by the ```sl.standardize``` function. From the remaining required columns indicated in the example above, we already have *start*, *end* and *label*. Our *sound_file* column needs to be renamed to *filename*, so we will need to provide a dictionary to specify that. 

We have one extra column, *datetime*, that we don't really need to keep, so we'll set ```trim_table=True```, which will exclude any columns that are not required by the standardized tables.

If we wanted to keep the datetime (or any other columns), we would just set ```trim_table=False```. One situation in which you might to do that is if you need this information to split a dataset in train/test or train/validation/test, because then you can sort all your annotations by time and make sure the training set does not overlap with the validation/test. But in our case, the annotations are already split.

In [7]:
map_to_ketos_annot_std ={'filename': 'sound_file'} 
std_annot_train = sl.standardize(table=annot_train, mapper=map_to_ketos_annot_std,trim_table=True)
std_annot_val = sl.standardize(table=annot_val, mapper=map_to_ketos_annot_std, trim_table=True)


Let's have a look at our standardized tables

In [8]:
std_annot_train


Unnamed: 0_level_0,Unnamed: 1_level_0,label
filename,annot_id,Unnamed: 2_level_1
Augment_1001_sel.14.ch01.230603.022928.50..wav,0,1
Augment_1002_sel.140.ch01.230529.014235.17..wav,0,1
Augment_1003_sel.141.ch01.230529.014629.32..wav,0,1
Augment_1004_sel.15.ch01.230520.081352.56..wav,0,1
Augment_1006_sel.15.ch01.230520.214758.68..wav,0,1
...,...,...
sel.85.ch01.230520.143234.70..wav,0,1
sel.87.ch01.230520.143308.25..wav,0,1
sel.88.ch01.230520.143309.25..wav,0,1
sel.89.ch01.230520.143338.00..wav,0,1


<a id=section8></a>

In [9]:
std_annot_val


Unnamed: 0_level_0,Unnamed: 1_level_0,label
filename,annot_id,Unnamed: 2_level_1
Augment_1000_sel.14.ch01.230522.171913.45..wav,0,1
Augment_1005_sel.15.ch01.230520.141655.26..wav,0,1
Augment_100_sel.181.ch01.230529.015612.62..wav,0,1
Augment_1010_sel.16.ch01.230520.081353.56..wav,0,1
Augment_1015_sel.161.ch01.230529.015203.11..wav,0,1
...,...,...
sel.64.ch01.230520.215722.19..wav,0,1
sel.70.ch01.230520.142901.19..wav,0,1
sel.75.ch01.230520.143015.15..wav,0,1
sel.81.ch01.230520.143123.85..wav,0,1


<a id=section4></a>

In [10]:
sl.is_standardized(std_annot_train) 

True

###  4. Choosing the spectrogram settings

As mentioned earlier, we'll represent the segments as spectrograms.
In the .zip file where you found the data, there's also a spectrogram configuration file (```spec_config.json```) which contains the settings we want to use.

This configuration file is simply a text file in the ```.json``` format, so you could make a copy of it, change a few parameters and save several settings to use later or to share the with someone else.


In [14]:
spec_cfg = load_audio_representation('spec_configMinkeSpec.json', name="spectrogram")

In [15]:
spec_cfg

{'rate': 12000,
 'window': 0.0853,
 'step': 0.00853,
 'freq_min': 750,
 'freq_max': 2500,
 'window_func': 'hamming',
 'type': ketos.audio.spectrogram.MagSpectrogram,
 'duration': 4}

The result is a python dictionary. We could change some value, like the step size:

In [16]:
#spec_cfg['step'] = 0.064

But we will stick to the original here.

5. Creating the database¶
Now, we have to c

<a id=section8></a>

### 5. Creating the database

Now, we have to compute the spectrograms following the settings above for each selection in our selection tables (i.e.: each 3s clip) and then save them in a database.

All of this can be done with the ```dbi.create_database``` function in Ketos.

We will start with the training dataset. We need to indicate the name for the database we want to create, where the audio files are, a name for the dataset, the selections table and the audio representation. As specified in our ``spec_cfg``, this is a Magnitude spectrogram, but ketos can also create databases with Power, Mel and CQT spectrograms, as well as time-domain data (waveforms).


In [17]:
dbi.create_database(output_file='databaseMinke.h5', 
                    data_dir='C:\\Users\\kaitlin.palmer\\Desktop\\KetosMinke\\Training Data\\TP12khz\\Train',
                               dataset_name='train',
                    selections=std_annot_train,
                               audio_repres=spec_cfg)
                              













































































































































































































































































































































































































100%|██████████| 15632/15632 [05:38<00:00, 46.14it/s]

2254 items saved to databaseMinke.h5





And we do the same thing for the validation set. Note that, by specifying the same database name, we are telling ketos that we want to add the validation set to the existing database.

In [18]:
dbi.create_database(output_file='databaseMinke.h5', 
                    data_dir='C:\\Users\\kaitlin.palmer\\Desktop\\KetosMinke\\Training Data\\TP12khz\\Validate',
                               dataset_name='val',
                    selections=std_annot_val,
                               audio_repres=spec_cfg)
                              



















































100%|██████████| 3906/3906 [01:24<00:00, 46.49it/s]

563 items saved to databaseMinke.h5





Now we have our database with spectrograms representing audio segments with and without the North Atlantic Right Whale upcall. The data is divided into 'train' and 'validation'. 



In [19]:
db = dbi.open_file("databaseMinke.h5", 'r')

In [20]:
db

File(filename=databaseMinke.h5, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/train (Group) ''
/train/data (Table(2254,)fletcher32, shuffle, zlib(1)) ''
  description := {
  "data": Float32Col(shape=(669, 150), dflt=0.0, pos=0),
  "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1),
  "id": UInt32Col(shape=(), dflt=0, pos=2),
  "label": UInt8Col(shape=(), dflt=0, pos=3),
  "offset": Float64Col(shape=(), dflt=0.0, pos=4)}
  byteorder := 'little'
  chunkshape := (1,)
/val (Group) ''
/val/data (Table(563,)fletcher32, shuffle, zlib(1)) ''
  description := {
  "data": Float32Col(shape=(669, 150), dflt=0.0, pos=0),
  "filename": StringCol(itemsize=100, shape=(), dflt=b'', pos=1),
  "id": UInt32Col(shape=(), dflt=0, pos=2),
  "label": UInt8Col(shape=(), dflt=0, pos=3),
  "offset": Float64Col(shape=(), dflt=0.0, pos=4)}
  byteorder := 'little'
  chunkshape := (1,)

In [21]:
db.close()  #Close the database connection

Here we can see the data divided into 'train' and 'validation'. These are called 'groups' in HDF5 terms. Within each of them there is a dataset called 'data', which contains the spectrograms and respective labels.

You will likely not need to directly interact with the database. In a following tutorial, we will use Ketos to build a deep neural network and train it to recognize upcalls. Ketos handles the database interactions, so we won't really have to go into the details of it, but if you would like to learn more about how to get data from this database, take a look at the [database_interface](https://docs.meridian.cs.dal.ca/ketos/modules/data_handling/database_interface.html) module in ketos and the [pyTables](https://www.pytables.org/index.html) documentation.