# Keras: Tabular Classify Multi-Label

*Categorizing Plant Species with Multi-Label Classification of Phenotypes.*

![farming](../images/vertical_farming.png)

In [None]:
import aiqc
from aiqc import datum

---

## Example Data

This dataset is comprised of:

* *Labels* = the species of the plant.
* *Features* = phenotypes of the plant sample.

Reference [Example Datasets](example_datasets.ipynb) for more information.

In [3]:
df = datum.to_pandas('iris.tsv')

In [4]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


---

## Pipeline

Reference [High-Level API Docs](api_high_level.ipynb) for more information including how to work with non-tabular data.

In [None]:
from sklearn.preprocessing import OneHotEncoder, StandardScaler

In [None]:
splitset = aiqc.Pipeline.Tabular(
    # --- Data source ---
    df_or_path = df
    , dtype = None

    # --- Label preprocessing ---
    , label_column = 'species'
    , label_interpolater = None
    , label_encoder = dict(sklearn_preprocess = OneHotEncoder())

    # --- Feature preprocessing ---
    , feature_cols_excluded = 'species'
    , feature_interpolaters = None
    , feature_window = None
    , feature_encoders = dict(
        sklearn_preprocess = StandardScaler()
        , dtypes = ['float64']
    )
    , feature_reshape_indices = None

    # --- Stratification ---
    , size_test = 0.22
    , size_validation = 0.12
    , fold_count = None
    , bin_count = None
)


=> Info - System overriding user input to set `sklearn_preprocess.sparse=False`.
   This would have generated 'scipy.sparse.csr.csr_matrix', causing Keras training to fail.


=> Info - System overriding user input to set `sklearn_preprocess.copy=False`.
   This saves memory when concatenating the output of many encoders.


___/ featurecoder_index: 0 \_________

=> The column(s) below matched your filter(s) featurecoder filters.

['sepal_length', 'sepal_width', 'petal_length', 'petal_width']

=> Done. All feature column(s) have featurecoder(s) associated with them.
No more Featurecoders can be added to this Encoderset.



---

## Modeling

Reference this great blog for machine learning cookbooks: [MachineLearningMastery.com "Multi-Label Classification"](https://machinelearningmastery.com/multi-label-classification-with-deep-learning/).

In [7]:
import tensorflow as tf
from tensorflow.keras import layers as l

In [7]:
def fn_build(features_shape, label_shape, **hp):
    m = tf.keras.models.Sequential()
    m.add(l.Input(shape=features_shape))
    m.add(l.Dense(units=hp['neuron_count'], activation='relu', kernel_initializer='he_uniform'))
    m.add(l.Dense(units=label_shape[0], activation='softmax'))
    return m

In [8]:
def fn_train(model, loser, optimizer, samples_train, samples_evaluate, **hp):
    model.compile(
        loss = loser
        , optimizer = optimizer
        , metrics = ['accuracy']
    )
    model.fit(
        samples_train["features"]
        , samples_train["labels"]
        , validation_data = (
            samples_evaluate["features"]
            , samples_evaluate["labels"]
        )
        , verbose = 0
        , batch_size = hp['batch_size']
        , epochs = hp['epoch_count']
        , callbacks=[tf.keras.callbacks.History()]
    )
    return model

In [9]:
hyperparameters = dict(
    neuron_count    = [9, 12]
    , batch_size    = [3]
    , learning_rate = [0.03, 0.05]
    , epoch_count   = [30, 60]
)

In [10]:
queue = aiqc.Experiment(
    # --- Analysis type ---
    library = "keras"
    , analysis_type = "classification_multi"
    
    # --- Model functions ---
    , fn_build = fn_build
    , fn_train = fn_train
    , fn_lose = None #auto
    , fn_optimize = None #auto
    , fn_predict = None #auto
    
    # --- Training options ---
    , repeat_count = 1
    , hyperparameters = hyperparameters
    , search_percent = None
    
    # --- Data source ---
    , splitset_id = splitset.id
    , foldset_id = None
    , hide_test = False
)

In [11]:
queue.run_jobs()

🔮 Training Models 🔮: 100%|██████████████████████████████████████████| 8/8 [00:49<00:00,  6.21s/it]


For more information on visualization of performance metrics, reference the [Visualization & Metrics](visualization.html) documentation.