![brainome logo](./images/brainome_logo.png)
# 104 Using Brainome's Predictor CLI
The predictor generated by Brainome is capable of being used by the command line interface (CLI).


1. Predictor --help
2. Validate test csv dataset
3. Classify unlabeled csv dataset
4. Feature engineering predictions

## Prerequisites
This notebook assumes brainome is installed as per notebook [brainome_101_Quick_Start](brainome_101_Quick_Start.ipynb)

The data sets are:
* [titanic_train.csv](https://download.brainome.ai/data/public/titanic_train.csv) for training data
* [titanic_validate.csv](https://download.brainome.ai/data/public/titanic_validate.csv) for validation
* [titanic_predict.csv](https://download.brainome.ai/data/public/titanic_predict.csv) for predictions

In [1]:
!python3 -m pip install brainome  --quiet
!brainome -version

import urllib.request as request
response1 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_train.csv', 'titanic_train.csv')
response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_validate.csv', 'titanic_validate.csv')
response3 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')
%ls -lh titanic_train.csv titanic_validate.csv titanic_predict.csv


-rw-r--r-- 1 jovyan users  858 Sep 17 23:26 titanic_predict.csv
-rw-r--r-- 1 jovyan users  57K Sep 17 23:26 titanic_train.csv
-rw-r--r-- 1 jovyan users 5.8K Sep 17 23:26 titanic_validate.csv


## Generate a predictor
The predictor filename is `predictor_104.py`

In [2]:
# Assuming brainome is installed per brainome_101_Quick_Start.ipynb
!brainome titanic_train.csv -y -rank -f DT -split 90 -o predictor_104.py -modelonly -q
# Preview predictor
%pycat predictor_104.py




## 1. Predictor --help
Brainome predictors are really short and sweet. They just validate and classify data.

While the predictor source code is portable, it does require numpy to run and optionally scipy to generate the confusion matrices.

In [3]:
!python3 predictor_104.py --help

usage: predictor_104.py [-h] [-validate] [-headerless] [-json] [-trim] csvfile

Predictor trained on ['titanic_train.csv']

positional arguments:
  csvfile      CSV file containing test set (unlabeled).

optional arguments:
  -h, --help   show this help message and exit
  -validate    Validation mode. csvfile must be labeled. Output is
               classification statistics rather than predictions.
  -headerless  Do not treat the first line of csvfile as a header.
  -json        report measurements as json
  -trim        If true, the prediction will not output ignored columns.


## 2. Validate test csv dataset
The validate function takes a csv data set identical to the training data set and, with the **-validate** parameter, compares outcomes.

In [4]:
!python3 predictor_104.py -validate titanic_validate.csv

Classifier Type:                    Decision Tree
System Type:                        2-way classifier

Accuracy:
    Best-guess accuracy:            61.25%
    Model accuracy:                 81.25% (65/80 correct)
    Improvement over best guess:    20.00% (of possible 38.75%)

Model capacity (MEC):               1 bits
Generalization ratio:               62.61 bits/bit

Confusion Matrix:

      Actual |   Predicted    
    ----------------------------
        died |      45        4
    survived |      11       20

Accuracy by Class:

      target | TP FP TN FN     TPR     TNR     PPV     NPV      F1      TS
    -------- | -- -- -- -- ------- ------- ------- ------- ------- -------
        died | 45 11 20  4  91.84%  64.52%  80.36%  83.33%  85.71%  75.00%
    survived | 20  4 45 11  64.52%  91.84%  83.33%  80.36%  72.73%  57.14%


## 3. Classify unlabeled csv dataset
The predictor can classify a similar to training/validation data set sans target column.

It will generate a complete data set with the "Prediction" column appended.

In [None]:
!python3 predictor_104.py titanic_predict.csv > classifications_104.csv
print('Viewing classification predictions.')
import pandas as pd
classifications_output = pd.read_csv('classifications_104.csv')
classifications_output.head()

## 4. Feature engineering predictions
While feature engineering, it is desired to only view the features that contributed to the prediction. 

With the `-trim` parameter, the output will only show the features deemed important by the model.

In [None]:
!python3 predictor_104.py titanic_predict.csv -trim > trimmed_classifications_104.csv
print('Viewing important features classification predictions.')
import pandas as pd
trimmed_classifications_output = pd.read_csv('trimmed_classifications_104.csv')
trimmed_classifications_output.head()

## Advanced Predictor Usage
See notebook [300 Put your model to work](./brainome_300_Put_your_model_to_work.ipynb) for integrating the predictor within your python program.

## Next Steps
- Check out [106 Describe Your CSV](./brainome_106_Describe_Your_CSV.ipynb)
- Check out [200 Using measurements to improve your model](./brainome_200_Using_measurements_to_improve_your_model.ipynb)