# Training an Iris classifier using scikit-learn
In this notebook, we'll show how to train a classifier trained on the [iris data set](https://archive.ics.uci.edu/ml/datasets/iris) using scikit-learn.

## Install Dependencies
First, we'll install our dependencies:

In [0]:
pip install scikit-learn==0.21.* onnxmltools==1.5.* boto3==1.*

## Load the data
We can use scikit-learn to load the Iris dataset:

In [0]:
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

## Train the model

We would like to normalize the data before training the model. We can use sklearn's [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html):

In [0]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = scaler.fit_transform(X)

Now we can split the dataset for training/evaluation:

In [0]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=42)

We'll use scikit-learn's [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) to train the model:

In [0]:
from sklearn.linear_model import LogisticRegression

logreg_model = LogisticRegression(solver="lbfgs", multi_class="multinomial")
logreg_model.fit(X_train, y_train)

print("Test data accuracy: {:.2f}".format(logreg_model.score(X_test, y_test)))  # Accuracy should be > 90%

## Export the model
Now we can export the model in the ONNX format:

In [0]:
from onnxmltools import convert_sklearn
from onnxconverter_common.data_types import FloatTensorType

onnx_model = convert_sklearn(logreg_model, initial_types=[("input", FloatTensorType([1, 4]))])

with open("sklearn.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## Upload the model to AWS

Cortex loads models from AWS, so we need to upload the exported model.

Set these variables to configure your AWS credentials and model upload path:

In [0]:
AWS_ACCESS_KEY_ID = "" #@param {type:"string"}
AWS_SECRET_ACCESS_KEY = "" #@param {type:"string"}
S3_UPLOAD_PATH = "s3://my-bucket/iris/sklearn.onnx" #@param {type:"string"}

import sys
import re

if AWS_ACCESS_KEY_ID == "":
    print("\033[91m{}\033[00m".format("ERROR: Please set AWS_ACCESS_KEY_ID"), file=sys.stderr)

elif AWS_SECRET_ACCESS_KEY == "":
    print("\033[91m{}\033[00m".format("ERROR: Please set AWS_SECRET_ACCESS_KEY"), file=sys.stderr)

else:
    try:
        bucket = re.search("s3://(.+?)/", S3_UPLOAD_PATH).group(1)
        key = re.search("s3://.+?/(.+)", S3_UPLOAD_PATH).group(1)
    except:
        print("\033[91m{}\033[00m".format("ERROR: Invalid s3 path (should be of the form s3://my-bucket/path/to/file)"), file=sys.stderr)

Upload the model to S3:

In [0]:
import boto3

s3 = boto3.client("s3", aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
print("Uploading {} ...".format(S3_UPLOAD_PATH), end = '')
s3.upload_file("sklearn.onnx", bucket, key)
print(" ✓")

<!-- CORTEX_VERSION_MINOR -->
We also need to upload the mean and standard deviation, so that the [pre-inference request handler](https://github.com/cortexlabs/cortex/blob/master/examples/iris-classifier/handlers/sklearn.py) can normalize the data before making real-time predictions.

In [0]:
METADATA_S3_UPLOAD_PATH = "s3://my-bucket/iris/scalars.json" #@param {type:"string"}

try:
    metadata_bucket = re.search('s3://(.+?)/', METADATA_S3_UPLOAD_PATH).group(1)
    metadata_key = re.search('s3://.+?/(.+)', METADATA_S3_UPLOAD_PATH).group(1)
except:
    print("\033[91m{}\033[00m".format("ERROR: Invalid s3 path (should be of the form s3://my-bucket/path/to/file)"), file=sys.stderr)

In [0]:
import math
import json

metadata = {
    "mean": scaler.mean_.tolist(),
    "stddev": [math.sqrt(x) for x in scaler.var_],
}

print("Uploading {} ...".format(METADATA_S3_UPLOAD_PATH), end = '')
s3.put_object(Body=json.dumps(metadata), Bucket=metadata_bucket, Key=metadata_key)
print(" ✓")

<!-- CORTEX_VERSION_MINOR -->
That's it! See the [example on GitHub](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier) for how to deploy the model as an API.