# 1. hybrid-vocal-classifier autolabel workflow

Here's the steps in the workflow for autolabeling vocalizations.

First we import the library, since in Python you need to `import` a library before you can work with it.

In [1]:
import hvc  # in Python we have to import a library before we can use it

### 0. Label a small set of songs to provide **training data** for the models, typically ~20 songs.
Here we download the data from a repository.  
** You don't need to run this if you've already downloaded the data.**

In [None]:
hvc.utils.fetch('gy6or6.032212')
hvc.utils.fetch('gy6or6.032612')

### 1. Pick a machine learning algorithm/**model** and the **features** used to train the model. 

In this case we'll use the k-Nearest Neighbors (k-NN) algorithm because it's fast to apply to our data. We'll use the features built into the library that have been tested with k-NN.

Picking a model and the features that go with it is simple:  
1. In a text editor, open `gy6or6_autolabel.example.knn.extract.config.yml`
2. Below the line that says `feature group:` add `knn` after the dash.
3. Below the line that says `data_dirs:` add the path to the data you downloaded after the dash.

### 2. Extract features for that model from song files that will be used to train the model.  

We call the `extract` function and we pass it the name of the `yaml` config file as an argument.

```Python
# 1. pick a model and 2. extract features for that model
# Model and features are defined in extract.config.yml file.
hvc.extract('gy6or6_autolabel.example.extract.knn.config.yml')
```

In [5]:
hvc.extract('gy6or6_autolabel.example.extract.knn.config.yml')

Parsed extract config.
Completing item 1 of 1 in to-do list
Changing to data directory: /home/ildefonso/Documents/data/gy6or6/032312/
Processing audio file 1 of 162.
Processing audio file 2 of 162.
Processing audio file 3 of 162.
Processing audio file 4 of 162.
Processing audio file 5 of 162.
Processing audio file 6 of 162.
Processing audio file 7 of 162.
Processing audio file 8 of 162.
Processing audio file 9 of 162.
Processing audio file 10 of 162.
Processing audio file 11 of 162.
Processing audio file 12 of 162.
Processing audio file 13 of 162.
Processing audio file 14 of 162.
Processing audio file 15 of 162.
Processing audio file 16 of 162.
Processing audio file 17 of 162.
Processing audio file 18 of 162.
Processing audio file 19 of 162.
Processing audio file 20 of 162.
Processing audio file 21 of 162.
Processing audio file 22 of 162.
Processing audio file 23 of 162.
Processing audio file 24 of 162.
Processing audio file 25 of 162.
Processing audio file 26 of 162.
Processing audio 

Did not extract features from file.
  .format(filename, labels_to_use))


Processing audio file 82 of 162.
Processing audio file 83 of 162.
Processing audio file 84 of 162.
Processing audio file 85 of 162.
Processing audio file 86 of 162.
Processing audio file 87 of 162.
Processing audio file 88 of 162.
Processing audio file 89 of 162.
Processing audio file 90 of 162.
Processing audio file 91 of 162.
Processing audio file 92 of 162.
Processing audio file 93 of 162.
Processing audio file 94 of 162.
Processing audio file 95 of 162.
Processing audio file 96 of 162.
Processing audio file 97 of 162.
Processing audio file 98 of 162.
Processing audio file 99 of 162.
Processing audio file 100 of 162.
Processing audio file 101 of 162.
Processing audio file 102 of 162.
Processing audio file 103 of 162.
Processing audio file 104 of 162.
Processing audio file 105 of 162.
Processing audio file 106 of 162.
Processing audio file 107 of 162.
Processing audio file 108 of 162.
Processing audio file 109 of 162.
Processing audio file 110 of 162.
Processing audio file 111 of 162

Did not extract features from file.
  .format(filename, labels_to_use))


Processing audio file 122 of 162.
Processing audio file 123 of 162.
Processing audio file 124 of 162.
Processing audio file 125 of 162.
Processing audio file 126 of 162.
Processing audio file 127 of 162.
Processing audio file 128 of 162.
Processing audio file 129 of 162.
Processing audio file 130 of 162.
Processing audio file 131 of 162.
Processing audio file 132 of 162.
Processing audio file 133 of 162.
Processing audio file 134 of 162.
Processing audio file 135 of 162.
Processing audio file 136 of 162.
Processing audio file 137 of 162.
Processing audio file 138 of 162.
Processing audio file 139 of 162.
Processing audio file 140 of 162.
Processing audio file 141 of 162.
Processing audio file 142 of 162.
Processing audio file 143 of 162.
Processing audio file 144 of 162.
Processing audio file 145 of 162.
Processing audio file 146 of 162.
Processing audio file 147 of 162.
Processing audio file 148 of 162.
Processing audio file 149 of 162.
Processing audio file 150 of 162.
Processing aud

### 3. Pick the **hyperparameters** used by the algorithm as it trains the model on the data.
Now in Python we use some convenience functions to figure out which "hyperparameters" will give us the best accuracy when we train our machine learning models.
```Python
# 3. pick hyperparameters for model
# Load summary feature file to use with helper functions for
# finding best hyperparameters.
from glob import glob
summary_file = glob('./extract_output*/summary*')
summary_data = hvc.load_feature_file(summary_file)
# In this case, we picked a k-nearest neighbors model
# and we want to find what value of k will give us the highest accuracy
X = summary_data['features']
y = summary_data['labels']
cv_scores, best_k = hvc.utils.find_best_k(X,y,k_range=range(1, 11))
```

### 4. Train, i.e., fit the **model** to the data  
### 5. Select the **best** model based on some measure of accuracy. 

1. In a text editor, open `gy6or6_autolabel.example.knn.select.config.yml`
2. On the line that says `feature_file:` paste the name of the feature file after the colon. The name will have a format like `summary_file_bird_ID_date`.

Then run the following code in the cell below:
```Python
# 4. Fit the **model** to the data and 5. Select the **best** model
hvc.select('gy6or6_autolabel.example.select.knn.config.yml')
```

In [15]:
!gedit gy6or6_autolabel.example.select.knn.config.yml

In [11]:
cd hybrid-vocal-classifier-tutorial/

/home/ildefonso/Documents/repositories/talks_and_teaching/hybrid-vocal-classifier-tutorial


In [16]:
hvc.select('gy6or6_autolabel.example.select.knn.config.yml')

Parsed select config.
Completing item 1 of 1 in to-do list
Training models with 50 samples, replicate #0
training knn. fitting model. score on test set: 0.9080 , average accuracy on test set: 0.9011
Training models with 50 samples, replicate #1
training knn. fitting model. score on test set: 0.8780 , average accuracy on test set: 0.8689
Training models with 50 samples, replicate #2
training knn. fitting model. score on test set: 0.9120 , average accuracy on test set: 0.8995
Training models with 50 samples, replicate #3
training knn. fitting model. score on test set: 0.9760 , average accuracy on test set: 0.9747
Training models with 50 samples, replicate #4
training knn. fitting model. score on test set: 0.8620 , average accuracy on test set: 0.8512
Training models with 50 samples, replicate #5
training knn. fitting model. score on test set: 0.9300 , average accuracy on test set: 0.9284
Training models with 50 samples, replicate #6
training knn. fitting model. score on test set: 0.9140 

### 6. Using the fit model, **Predict** labels for unlabeled data.

1. In a text editor, open `gy6or6_autolabel.example.knn.predict.config.yml`
2. On the line that says `model_meta_file:`, after the colon, paste the name of a meta file from the `select` output. The name will have a format like `summary_file_bird_ID_date`.
3. Below the line that says `data_dirs:`, after the dash, add the path to the other folder of data that you downloaded.

Then run the following code in the cell below.
```Python
# 6. **Predict** labels for unlabeled data using the fit model.
hvc.predict('gy6or6_autolabel.example.predict.knn.config.yml')
```

In [18]:
cd select_output_171205_193932/knn_k4/

/home/ildefonso/Documents/repositories/talks_and_teaching/hybrid-vocal-classifier-tutorial/select_output_171205_193932/knn_k4


In [19]:
ls

knn_100samples_replicate0.meta   knn_200samples_replicate0.meta
knn_100samples_replicate0.model  knn_200samples_replicate0.model
knn_100samples_replicate1.meta   knn_200samples_replicate1.meta
knn_100samples_replicate1.model  knn_200samples_replicate1.model
knn_100samples_replicate2.meta   knn_200samples_replicate2.meta
knn_100samples_replicate2.model  knn_200samples_replicate2.model
knn_100samples_replicate3.meta   knn_200samples_replicate3.meta
knn_100samples_replicate3.model  knn_200samples_replicate3.model
knn_100samples_replicate4.meta   knn_200samples_replicate4.meta
knn_100samples_replicate4.model  knn_200samples_replicate4.model
knn_100samples_replicate5.meta   knn_200samples_replicate5.meta
knn_100samples_replicate5.model  knn_200samples_replicate5.model
knn_100samples_replicate6.meta   knn_200samples_replicate6.meta
knn_100samples_replicate6.model  knn_200samples_replicate6.model
knn_100samples_replicate7.meta   knn_200samples_replicate7.meta
knn_100samples_rep

In [6]:
cd hybrid-vocal-classifier-tutorial/

/home/ildefonso/Documents/repositories/talks_and_teaching/hybrid-vocal-classifier-tutorial


In [7]:
hvc.predict('gy6or6_autolabel.example.predict.knn.config.yml')

parsed predict config
Changing to data directory: /home/ildefonso/Documents/data/gy6or6/032612
Processing audio file 1 of 39.
Processing audio file 2 of 39.
Processing audio file 3 of 39.
Processing audio file 4 of 39.
Processing audio file 5 of 39.
Processing audio file 6 of 39.
Processing audio file 7 of 39.
Processing audio file 8 of 39.
Processing audio file 9 of 39.
Processing audio file 10 of 39.
Processing audio file 11 of 39.
Processing audio file 12 of 39.
Processing audio file 13 of 39.
Processing audio file 14 of 39.
Processing audio file 15 of 39.
Processing audio file 16 of 39.
Processing audio file 17 of 39.
Processing audio file 18 of 39.
Processing audio file 19 of 39.
Processing audio file 20 of 39.
Processing audio file 21 of 39.
Processing audio file 22 of 39.
Processing audio file 23 of 39.
Processing audio file 24 of 39.
Processing audio file 25 of 39.
Processing audio file 26 of 39.
Processing audio file 27 of 39.
Processing audio file 28 of 39.
Processing audio f

Congratulations! You have auto-labeled an entire day's worth of data.