# High-Level API

These are the main objects used in the high-level API to create models, intelligently feed samples into them, and experiment with hyperparameters.

| High-level object |                    Groups together the following objects                   |
|:-----------------:|:--------------------------------------------------------------------------:|
|    `Algorithm`    | Functions to build, train, predict, and evaluate a machine learning model. |
|   `DataPipeline`  |      Dataset, Label, Featureset, Splitset, Foldset, Folds, Preprocess.     |
|    `Experiment`   |    Algorithm, Hyperparamset, Hyperparamcombos, DataPipeline, Batch, Job.   |

In this way, the high-level API abstracts the lower-level API to make things easier for the user.

## Prerequisites
If you've already completed the instructions on the **Installation** page, then let's get started.

In [1]:
import aiqc
from aiqc import examples



## Usage

### 1. Algorithm

An `Algorithm` is the ORM's codename for a machine learning model since *Model* is the most important *reserved word* for ORMs.

In [2]:
import keras
from keras import metrics
from keras.models import Sequential
from keras.callbacks import History
from keras.layers import Dense, Dropout

You can name the functions below whatever you want, but do not change the predetermined `*args` (e.g. `**hyperparameters`, `model`, etc.).

Put a placeholder anywhere you want to try out different hyperparameters: `hyperparameters['<some_variable_name>']`. You'll get a chance to define the hyperparameters in a minute.

In [3]:
def function_model_build(**hyperparameters):
    model = Sequential()
    model.add(Dense(hyperparameters['neuron_count'], input_shape=(4,), activation='relu', kernel_initializer='he_uniform'))
    model.add(Dropout(hyperparameters['dropout_size']))
    model.add(Dense(hyperparameters['neuron_count'], activation='relu', kernel_initializer='he_uniform'))
    model.add(Dense(3, activation='softmax', name='output'))

    model.compile(optimizer='adamax', loss='categorical_crossentropy', metrics=['accuracy'])
    
    return model

In [4]:
def function_model_train(model, samples_train, samples_evaluate, **hyperparameters):
    model.fit(
        samples_train["features"]
        , samples_train["labels"]
        , validation_data = (
            samples_evaluate["features"]
            , samples_evaluate["labels"]
        )
        , verbose = 0
        , batch_size = 3
        , epochs = hyperparameters['epoch_count']
        , callbacks=[History()]
    )
    return model

Then pass these functions into the `Algorithm`.

The `library` and `analysis_type` help handle the model and its output behind the scenes. Current analysis types include: 'classification_multi', 'classification_binary', and 'regression'.

In [5]:
algorithm = aiqc.Algorithm.make(
	library = "keras"
	, analysis_type = "classification_multi"
	, function_model_build = function_model_build
	, function_model_train = function_model_train
)

#### Hyperparameters

The `hyperparameters` below will be automatically fed into the functions above as `**kwargs` via the `**hyperparameters` argument we saw earlier.

For example, wherever you see `hyperparameters['neuron_count']`, it will pull from the *key:value* pair `"neuron_count": [9, 12]` seen below. Where model A will have 9 neurons and model B will have 12 neurons.

In [6]:
hyperparameters = {
	"neuron_count": [9, 12]
	, "dropout_size": [0.10, 0.20]
    , "epoch_count": [25, 50]
}

### 2. DataPipeline

In [7]:
from sklearn.preprocessing import *

In [8]:
file_path = examples.demo_file_to_pandas('iris.tsv')

In [9]:
datapipeline = aiqc.DataPipeline.make(
	dataFrame_or_filePath = file_path
	, label_column = 'species'
	, size_test = 0.24
	, size_validation = 0.16
	, fold_count = None
	, encoder_features = StandardScaler()
	, encoder_labels = OneHotEncoder(sparse=False)
)

> Don't use `fold_count` unless your (total sample count / fold_count) still gives you an accurate representation of your sample population. You can try it with the 'iris_10x.tsv' demo_file.
>
> In the future, we'll turn the `encoder_*` attributes into customizable functions.

### 3. Experiment

Now it's time to bring together the data and logic into an `Experiment`.

In [10]:
experiment = algorithm.make_experiment(
	datapipeline_id = datapipeline.id
	, hyperparameters = hyperparameters
)

In [11]:
batch = experiment.batch

In [12]:
batch.run_jobs()

🔮 Training Models 🔮: 100%|██████████████████████████████████████████| 8/8 [00:18<00:00,  2.25s/it]


In [15]:
batch.metrics_to_pandas()

Unnamed: 0,job_id,split,roc_auc,accuracy,precision,recall,f1,loss
0,41,test,0.991898,0.888889,0.907692,0.888889,0.882963,0.395283
1,41,validation,1.0,0.84,0.889231,0.84,0.827879,0.390014
2,41,train,0.975395,0.831461,0.853385,0.831461,0.824202,0.445891
3,42,test,0.892361,0.805556,0.806527,0.805556,0.805217,0.486446
4,42,validation,0.944559,0.84,0.84,0.84,0.84,0.431892
5,42,train,0.911877,0.808989,0.810783,0.808989,0.808659,0.472854
6,43,test,0.921296,0.75,0.781955,0.75,0.736533,0.654026
7,43,validation,0.971765,0.76,0.791429,0.76,0.740406,0.559443
8,43,train,0.903771,0.674157,0.673975,0.674157,0.635896,0.684904
9,44,test,1.0,0.916667,0.933333,0.916667,0.915344,0.365426


### Metrics & Visualization

For more information of visualization of performance metrics, reference the **Visualization & Metrics** documentation.