# Your First Image Classifier: Using MLP to Classify Images
# Train

The purpose of this dataset is to correctly classify an image as containing a dog, cat, or panda.
Containing only 3,000 images, the Animals dataset is meant to be another **introductory** dataset
that we can quickly train a Multilayer Perceptron (MLP) model and obtain initial results (no so good accuracy) that has potential to be used as a baseline. 

Let's take the following steps:

1. Encoding target variable
2. Training the MLP model
3. Export the model

<center><img width="800" src="https://drive.google.com/uc?export=view&id=1fKGuR5U5ECf7On6Zo1UWzAIWZrMmZnGc"></center>

## Step 01: Setup

Start out by installing the experiment tracking library and setting up your free W&B account:


*   **pip install wandb** – Install the W&B library
*   **import wandb** – Import the wandb library
*   **wandb login** – Login to your W&B account so you can log all your metrics in one place

In [1]:
!pip install wandb -qU

[K     |████████████████████████████████| 1.9 MB 15.2 MB/s 
[K     |████████████████████████████████| 162 kB 66.6 MB/s 
[K     |████████████████████████████████| 182 kB 41.2 MB/s 
[K     |████████████████████████████████| 63 kB 1.6 MB/s 
[K     |████████████████████████████████| 162 kB 37.3 MB/s 
[K     |████████████████████████████████| 158 kB 43.6 MB/s 
[K     |████████████████████████████████| 157 kB 45.0 MB/s 
[K     |████████████████████████████████| 157 kB 50.3 MB/s 
[K     |████████████████████████████████| 157 kB 40.6 MB/s 
[K     |████████████████████████████████| 157 kB 47.2 MB/s 
[K     |████████████████████████████████| 157 kB 46.6 MB/s 
[K     |████████████████████████████████| 157 kB 32.8 MB/s 
[K     |████████████████████████████████| 157 kB 38.7 MB/s 
[K     |████████████████████████████████| 156 kB 45.8 MB/s 
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [2]:
import wandb
wandb.login()

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

### Import Packages

In [3]:
# import the necessary packages
from imutils import paths
import logging
import os
import cv2
import numpy as np
import joblib
from sklearn.preprocessing import LabelEncoder
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import fbeta_score, precision_score, recall_score, accuracy_score

In [4]:
# configure logging
# reference for a logging obj
logger = logging.getLogger()

# set level of logging
logger.setLevel(logging.INFO)

# create handlers
c_handler = logging.StreamHandler()
c_format = logging.Formatter(fmt="%(asctime)s %(message)s",datefmt='%d-%m-%Y %H:%M:%S')
c_handler.setFormatter(c_format)

# add handler to the logger
logger.handlers[0] = c_handler

## Step 02 Data Segregation

In [5]:
# since we are using Jupyter Notebooks we can replace our argument
# parsing code with *hard coded* arguments and values
args = {
  "project_name": "mlp_classifier",
  "train_feature_artifact": "train_x:latest",
  "train_target_artifact": "train_y:latest",
  "val_feature_artifact": "val_x:latest",
  "val_target_artifact": "val_y:latest",
  "neighbors": 1,
  "jobs": -1,
  "encoder": "target_encoder",
  "inference_model": "model"
}

In [6]:
# open the W&B project created in the Fetch step
run = wandb.init(entity="morsinaldo",project=args["project_name"], job_type="Train")

logger.info("Downloading the train and validation data")
# train x
train_x_artifact = run.use_artifact(args["train_feature_artifact"])
train_x_path = train_x_artifact.file()

# train y
train_y_artifact = run.use_artifact(args["train_target_artifact"])
train_y_path = train_y_artifact.file()

# validation x
val_x_artifact = run.use_artifact(args["val_feature_artifact"])
val_x_path = val_x_artifact.file()

# validation y
val_y_artifact = run.use_artifact(args["val_target_artifact"])
val_y_path = val_y_artifact.file()

# unpacking the artifacts
train_x = joblib.load(train_x_path)
train_y = joblib.load(train_y_path)
val_x = joblib.load(val_x_path)
val_y = joblib.load(val_y_path)

[34m[1mwandb[0m: Currently logged in as: [33mmorsinaldo[0m. Use [1m`wandb login --relogin`[0m to force relogin


15-10-2022 12:35:57 Downloading the train and validation data


In [7]:
# encode the labels as integers
le = LabelEncoder()
train_y = le.fit_transform(train_y)
val_y = le.transform(val_y)

## Step 03 Training the model

In [8]:
# train a MLP classifier on the raw pixel intensities
logger.info("[INFO] training MLP classifier...")
model = MLPClassifier(hidden_layer_sizes=(128, 128), activation='relu', solver='adam')
model.fit(train_x, train_y)

15-10-2022 12:36:04 [INFO] training MLP classifier...


MLPClassifier(hidden_layer_sizes=(128, 128))

In [9]:
logger.info("Dumping the model and encoder artifacts to the disk")

# Save the artifacts using joblib
joblib.dump(le, args["encoder"])
joblib.dump(model, args["inference_model"])

15-10-2022 12:36:15 Dumping the model and encoder artifacts to the disk


['model']

In [10]:
# encoder artifact
artifact = wandb.Artifact(args["encoder"],
                          type="INFERENCE_MODEL",
                          description="A json file representing the target encoder"
                          )

logger.info("Logging the target encoder artifact")
artifact.add_file(args["encoder"])
run.log_artifact(artifact)

15-10-2022 12:36:15 Logging the target encoder artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7fa923412ad0>

In [11]:
# inference model artifact
artifact = wandb.Artifact(args["inference_model"],
                          type="INFERENCE_MODEL",
                          description="A json file representing the inference model"
                          )

logger.info("Logging the inference model artifact")
artifact.add_file(args["inference_model"])
run.log_artifact(artifact)

15-10-2022 12:36:16 Logging the inference model artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7fa924cf3c50>

In [12]:
run.finish()

## Step 04: Hyperparameter tuning with sweep

In [15]:
sweep_config = {
    "name": "my-mlp-sweep",
    "metric": {"name": "accuracy", "goal": "maximize"},
    "method": "grid",
    "parameters": {
        "hidden_layer_sizes": {
            "values":[(100,100), (200,200), (200,200,200)],
        },
        "activation": {
            "values": ['relu']
        },
        "solver": {
            "values": ['adam']
        },
        "learning_rate" : {
            "values": ['constant','adaptive']
        }
    }
}

sweep_id = wandb.sweep(sweep_config, project=args['project_name'])

Create sweep with ID: ttd4lsof
Sweep URL: https://wandb.ai/morsinaldo/mlp_classifier/sweeps/ttd4lsof


In [16]:
def train():
  with wandb.init() as run:

    model = MLPClassifier(hidden_layer_sizes=run.config.hidden_layer_sizes,
                          activation=run.config.activation,
                          solver=run.config.solver,
                          learning_rate=run.config.learning_rate)

    # training
    logger.info("Training")
    model.fit(train_x,train_y)

    # infering
    logger.info("Infering")
    predict = model.predict(val_x)

    # Evaluation Metrics
    logger.info("Test Evaluation metrics")
    fbeta = fbeta_score(val_y, predict, beta=1, zero_division=1,average='weighted')
    precision = precision_score(val_y, predict, zero_division=1,average='weighted')
    recall = recall_score(val_y, predict, zero_division=1,average='weighted')
    acc = accuracy_score(val_y, predict)

    logger.info("Test Accuracy: {}".format(acc))
    logger.info("Test Precision: {}".format(precision))
    logger.info("Test Recall: {}".format(recall))
    logger.info("Test F1: {}".format(fbeta))

    run.summary["Acc"] = acc
    run.summary["Precision"] = precision
    run.summary["Recall"] = recall
    run.summary["F1"] = fbeta

    run.finish()

In [17]:
wandb.agent(sweep_id, function=train)

[34m[1mwandb[0m: Agent Starting Run: y5lkefw0 with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


15-10-2022 12:09:52 Training
15-10-2022 12:10:07 Infering
15-10-2022 12:10:07 Test Evaluation metrics
15-10-2022 12:10:07 Test Accuracy: 0.5328596802841918
15-10-2022 12:10:07 Test Precision: 0.5409662142148562
15-10-2022 12:10:07 Test Recall: 0.5328596802841918
15-10-2022 12:10:07 Test F1: 0.5349325911293328


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.53286
F1,0.53493
Precision,0.54097
Recall,0.53286


[34m[1mwandb[0m: Agent Starting Run: dp04ojr7 with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


15-10-2022 12:10:18 Training
15-10-2022 12:10:25 Infering
15-10-2022 12:10:25 Test Evaluation metrics
15-10-2022 12:10:25 Test Accuracy: 0.5079928952042628
15-10-2022 12:10:25 Test Precision: 0.5781227992878506
15-10-2022 12:10:25 Test Recall: 0.5079928952042628
15-10-2022 12:10:25 Test F1: 0.4516742953009598


VBox(children=(Label(value='0.000 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.288965…

0,1
Acc,0.50799
F1,0.45167
Precision,0.57812
Recall,0.50799


[34m[1mwandb[0m: Agent Starting Run: ahle8neh with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [200, 200]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


15-10-2022 12:10:33 Training
15-10-2022 12:11:08 Infering
15-10-2022 12:11:09 Test Evaluation metrics
15-10-2022 12:11:09 Test Accuracy: 0.5701598579040853
15-10-2022 12:11:09 Test Precision: 0.5965721667320246
15-10-2022 12:11:09 Test Recall: 0.5701598579040853
15-10-2022 12:11:09 Test F1: 0.5626676519742346


VBox(children=(Label(value='0.000 MB of 0.008 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.047441…

0,1
Acc,0.57016
F1,0.56267
Precision,0.59657
Recall,0.57016


[34m[1mwandb[0m: Agent Starting Run: zjtj82st with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [200, 200]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


15-10-2022 12:11:19 Training
15-10-2022 12:12:14 Infering
15-10-2022 12:12:14 Test Evaluation metrics
15-10-2022 12:12:14 Test Accuracy: 0.4991119005328597
15-10-2022 12:12:14 Test Precision: 0.48567072594623667
15-10-2022 12:12:14 Test Recall: 0.4991119005328597
15-10-2022 12:12:14 Test F1: 0.4282143595572133


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.49911
F1,0.42821
Precision,0.48567
Recall,0.49911


[34m[1mwandb[0m: Agent Starting Run: lces4gz0 with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [200, 200, 200]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


15-10-2022 12:12:26 Training
15-10-2022 12:13:22 Infering
15-10-2022 12:13:22 Test Evaluation metrics
15-10-2022 12:13:22 Test Accuracy: 0.5150976909413855
15-10-2022 12:13:22 Test Precision: 0.5508007769776618
15-10-2022 12:13:22 Test Recall: 0.5150976909413855
15-10-2022 12:13:22 Test F1: 0.5274862853989083


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.5151
F1,0.52749
Precision,0.5508
Recall,0.5151


[34m[1mwandb[0m: Agent Starting Run: rgv0l3ao with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [200, 200, 200]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


15-10-2022 12:13:32 Training
15-10-2022 12:14:17 Infering
15-10-2022 12:14:17 Test Evaluation metrics
15-10-2022 12:14:17 Test Accuracy: 0.5435168738898757
15-10-2022 12:14:17 Test Precision: 0.5653947243282496
15-10-2022 12:14:17 Test Recall: 0.5435168738898757
15-10-2022 12:14:17 Test F1: 0.5487596396916782


VBox(children=(Label(value='0.000 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.318389…

0,1
Acc,0.54352
F1,0.54876
Precision,0.56539
Recall,0.54352


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.


## Step 05: Train and export the best model

<font color="red">Important</font> to restart the colab to unlink a new experiment (run) with the last ```sweep``` experiment. 

```
Runtime >> Disconnect and delete runtime
```
> Re-run all cells Step 01 and Step 02. 

In [14]:
model = MLPClassifier(hidden_layer_sizes=(200,200),
                          activation='relu',
                          solver='adam')

# training
logger.info("Training")
model.fit(train_x,train_y)

# infering
logger.info("Infering")
predict = model.predict(val_x)

# Evaluation Metrics
logger.info("Test Evaluation metrics")
fbeta = fbeta_score(val_y, predict, beta=1, zero_division=1,average='weighted')
precision = precision_score(val_y, predict, zero_division=1,average='weighted')
recall = recall_score(val_y, predict, zero_division=1,average='weighted')
acc = accuracy_score(val_y, predict)

logger.info("Test Accuracy: {}".format(acc))
logger.info("Test Precision: {}".format(precision))
logger.info("Test Recall: {}".format(recall))
logger.info("Test F1: {}".format(fbeta))

run.summary["Acc"] = acc
run.summary["Precision"] = precision
run.summary["Recall"] = recall
run.summary["F1"] = fbeta

15-10-2022 12:28:32 Training
15-10-2022 12:29:29 Infering
15-10-2022 12:29:29 Test Evaluation metrics
15-10-2022 12:29:29 Test Accuracy: 0.522202486678508
15-10-2022 12:29:29 Test Precision: 0.5311050952233651
15-10-2022 12:29:29 Test Recall: 0.522202486678508
15-10-2022 12:29:29 Test F1: 0.5257375414336429


In [15]:
logger.info("Dumping the model and encoder artifacts to the disk")

# Save the artifacts using joblib
joblib.dump(le, args["encoder"])
joblib.dump(model, args["inference_model"])

15-10-2022 12:29:53 Dumping the model and encoder artifacts to the disk


['model']

In [16]:
# encoder artifact
artifact = wandb.Artifact(args["encoder"],
                          type="INFERENCE_MODEL",
                          description="A json file representing the target encoder"
                          )

logger.info("Logging the target encoder artifact")
artifact.add_file(args["encoder"])
run.log_artifact(artifact)

15-10-2022 12:29:53 Logging the target encoder artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7fe77043ff50>

In [17]:
# inference model artifact
artifact = wandb.Artifact(args["inference_model"],
                          type="INFERENCE_MODEL",
                          description="A json file representing the inference model"
                          )

logger.info("Logging the inference model artifact")
artifact.add_file(args["inference_model"])
run.log_artifact(artifact)

15-10-2022 12:29:54 Logging the inference model artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7fe770711090>

In [18]:
run.finish()