# TensorFlow: Tabular Classify Multi-Label

*Categorizing Plant Species with Multi-Label Classification of Phenotypes.*

![farming](../../../_static/images/banner/plants.png)

## Example Data

Reference [Example Datasets](../../datasets.html) for more information.

This dataset is comprised of:

* *Labels* = the species of the plant.
* *Features* = phenotypes of the plant sample.

In [2]:
from aiqc import datum
from aiqc.orm import Dataset

In [3]:
df = datum.to_pandas('iris.tsv')
shared_dataset = Dataset.Tabular.from_pandas(df)
df.head(3)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa


---

## Pipeline

Reference [High-Level API Docs](../../api_high_level.ipynb) for more information.

In [4]:
from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import OneHotEncoder, StandardScaler

In [5]:
pipeline = Pipeline(
    Input(
        dataset  = shared_dataset,
        encoders = Input.Encoder(
            StandardScaler(),
            dtypes = ['float64']
        )
    ),
        
    Target(
        dataset   = shared_dataset
        , column  = 'species'
        , encoder = Target.Encoder(OneHotEncoder())
    ),

    Stratifier(
        size_test    = 0.22
        , fold_count = 5
    )
)


â””â”€â”€ Info - System overriding user input to set `sklearn_preprocess.sparse=False`.
	This would have generated 'scipy.sparse.csr.csr_matrix', causing Keras training to fail.


â””â”€â”€ Info - System overriding user input to set `sklearn_preprocess.copy=False`.
	This saves memory when concatenating the output of many encoders.

is not evenly divisible by the `fold_count` <5> you specified.
This can result in misleading performance metrics for the last Fold.



---

## Modeling

Reference [High-Level API Docs](../../api_high_level.ipynb) for more information.

In [6]:
from aiqc.mlops import Experiment, Architecture, Trainer
import tensorflow as tf
from tensorflow.keras import layers as l

In [7]:
def fn_build(features_shape, label_shape, **hp):
    m = tf.keras.models.Sequential()
    m.add(l.Input(shape=features_shape))
    m.add(l.Dense(units=hp['neuron_count'], activation='relu', kernel_initializer='he_uniform'))
    m.add(l.Dense(units=label_shape[0], activation='softmax'))
    return m

In [8]:
def fn_train(
    model, loser, optimizer,
    train_features, train_label,
    eval_features, eval_label,
    **hp
):
    model.compile(
        loss        = loser
        , optimizer = optimizer
        , metrics   = ['accuracy']
    )
    model.fit(
        train_features, train_label
        , validation_data = (eval_features, eval_label)
        , verbose         = 0
        , batch_size      = hp['batch_size']
        , epochs          = hp['epoch_count']
        , callbacks       = [tf.keras.callbacks.History()]
    )
    return model

In [9]:
hyperparameters = dict(
    neuron_count    = [9, 12]
    , batch_size    = [3]
    , learning_rate = [0.03, 0.05]
    , epoch_count   = [30, 60]
)

In [10]:
experiment = Experiment(
    Architecture(
        library           = "keras"
        , analysis_type   = "classification_multi"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , hyperparameters = hyperparameters
    ),
    
    Trainer(
        pipeline       = pipeline
        , repeat_count = 1
    )
)

In [11]:
experiment.run_jobs()

ðŸ“¦ Caching Splits - Fold #1 ðŸ“¦: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 3/3 [00:00<00:00, 303.85it/s]
ðŸ“¦ Caching Splits - Fold #2 ðŸ“¦: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 3/3 [00:00<00:00, 433.21it/s]
ðŸ“¦ Caching Splits - Fold #3 ðŸ“¦: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 3/3 [00:00<00:00, 415.58it/s]
ðŸ“¦ Caching Splits - Fold #4 ðŸ“¦: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 3/3 [00:00<00:00, 356.13it/s]
ðŸ“¦ Caching Splits - Fold #5 ðŸ“¦: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 3/3 [00:00<00:00, 447.23it/s]
ðŸ”® Training Models - Fold #1 ðŸ”®: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 8/8 [00:31<00:0

---

## Visualization & Interpretation

For more information on visualization of performance metrics, reference the [Dashboard](../../dashboard.html) documentation.