# label prediction

Finally, we use the `hvc.predict` function with the models we trained to predict labels for new data. The arguments to `hvc.predict` are similar to `hvc.extract`, because we will extract the same features from the new data and use those features to classify syllables. 

The main difference between `hvc.extract` and `hvc.predict` is that we specify the **model_meta_file** we will use for prediction. Each model that you train with `hvc.select` is saved in a `.model` file, and has one of these `.meta` files associated with it. The `.meta` files that `hvc` points to the `.model` file, and keeps track of related "metadata", like which features `hvc` needs to extract for the model when you apply it to new data.

In [None]:
from glob import glob

select_output = glob('../output/select*/knn*/*')
print(select_output[:6])

model_meta_file = select_output[-1]
print('model meta file: {}'.format(model_meta_file))

Most of the rest of the arguments will look familiar from when we used `hvc.extract`. We again specify a list of `data_dirs`, a `file_format`, and an `output_dir`, and we tell `hvc.extract` that we want it to return the predictions in a variable.

In [None]:
data_dirs = [
    'cbins/gy6or6/032312',
    'cbins/gy6or6/032412']
file_format = 'cbin'
output_dir = '../output.'
return_predictions = True

There are a couple other arguments specific to `hvc.predict` though. We tell the function whether we want it to segment the song by saying 

```Python
segment = True
```

If this was `False`, then the function would look for annotation files and use the onsets and offsets of segments saved in those files. (Instead of calling the function with `data_dirs`, you can alternatively supply a .csv file with annotations for a list of files with the `annotation_file` argument.)

We also stipulate that we don't want `hvc.predict` to estimate the probability of the labels that it assigns, by saying `predict_proba = False`, and we provide a file format to which it converts the predictions: `convert_to 'notmat'`.

In [None]:
segment = True
predict_proba = False
convert_to = 'notmat'

Now we are ready to predict all the labels!

In [None]:
predict = hvc.predict(data_dirs=data_dirs,
                      file_format=file_format,
                      model_meta_file=model_meta_file,
                      segment=segment,
                      output_dir=output_dir,
                      predict_proba=predict_proba,
                      convert_to=convert_to,
                      return_predictions=return_predictions)