# Annotating new data with canapy - tutorial

With Canapy you can train an AI model to annotate bird songs. This tutorial shows you how to use this trained model to annotate new data.  

## Import libraries

In [None]:
from canapy import Dataset, Annotator
from canapy.sequence import to_seconds

## Import data and models

First, create an Annotator object:

In [None]:
annotator = Annotator("./tuto_output2/models")

This object will call every function you need to produce annotations from the dataset, using the models we've trained in the dashboard. Make sure that the reservoirpy version you are using is the same as the one used for the model training, otherwise it won't work.

In this example, the `./tuto_non_annotated_songs` directory contains only .wav audio files, one per song, ready to be annotated.

The Dataset object stores the dataset in the form of a Pandas Dataframe (in this case, only paths to audio files), but can also store annotations, corrections, configuration files, and apply everything to the audio and labels to correct them and extract the features needed by the models to annotate them. For now, we only need them to store the audio files and the configuration file, which is here by default. It also automatically creates the class "SIL", which represents all the non annotated (and thus silent) part of the songs.

In [None]:
dataset = Dataset("./tuto_non_annotated_songs",vocab=annotator.vocab)

## Annotate

To run the annotator, simply call (it could last a bit long depending on how many songs you have to annotate):

In [None]:
annotations, vectors = annotator.run(dataset=dataset)

That's it! The annotations variable now looks like a dictionary:

In [None]:
print("annotations: ", annotations) #syn annotations, no more ensemble or nsyn

This dictionary stores all the annotations of the syntactic model, with the audio file name attached. If you want to annotate with the other models (non syntactic and ensemble) you can do it by specifying the parameter `model` to `nsync`, `ensemble`, or `all`.

The `vectors` variable looks the same, but stores the raw responses of the models (the output vectors representing the decision of the neural network).

Notice that the annotations look like they are repeating in time a lot. To export only the sequence of annotations in time, not all annotations for all timesteps, simply set the `to_group` argument to `True` when calling the annotator:

In [None]:
annotations_grouped, _ = annotator.run(dataset=dataset,to_group=True)

In [None]:
print("grouped annotations: ", annotations_grouped)

The annotations have been grouped. The number that comes along each annotation label is the number of timesteps covered by the annotation, i.e. the duration of the bird phrase, in number of spectral analysis windows. This number can be easily converted in seconds knowing the sampling rate and the time jumps between each analysis windows, but this can of course lead to huge approximations.

If you really need to display this time in seconds, simply use:

In [None]:
new_annotations = to_seconds(annotations_grouped, dataset.config)

Be careful, this function works with the grouped annotations from one model. Hence, you shouldn't give it the annotations produced by `annotator.run(model='all',dataset=dataset,to_group=True)`.

In [None]:
print("annotations in seconds: ", new_annotations)

Finally, you can directly save these annotations in CSV files by using the csv_directory parameter. This parameter take the path where you want to save the new CSV as input. Moreover, this parameter automatically activates the grouping function, you don't have to specify it to get a concise CSV.


In [None]:
annotations, _ = annotator.run(dataset=dataset,csv_directory="./tuto_non_annotated_songs_annotated")

That's it! You now have new files annotated!