# Build a recommender

## Instructions

Given your exercises in this lesson, you now know how to build JavaScript-based web app using Onnx Runtime and a converted Onnx model. Experiment with building a new recommender using data from these lessons or sourced elsewhere (give credit, please). You might create a pet recommender given various personality attributes, or a music genre recommender based on a person's mood. Be creative!

## Rubric

| Criteria | Exemplary                                                              | Adequate                              | Needs Improvement                 |
| -------- | ---------------------------------------------------------------------- | ------------------------------------- | --------------------------------- |
|          | A web app and notebook are presented, both well documented and running | One of those two is missing or flawed | Both are either missing or flawed |


## Solution: Breast Cancer Classification Recommender

Breast cancer patients undergo tests that produce measurements (features). The goal of this exercise is to classify tumors as Malignant or Benign (target). Then, recommend next steps: urgent follow-up screening or routine monitoring.

### The Breast Cancer dataset

The built-in [breast cancer dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#) includes 569 samples of data around diabetes, with 30 feature variables, some of which include:

- mean radius: mean of distances from center to points on the perimeter.
- texture: standard deviation of gray-scale values.
- smoothness: local variation in radius lengths.

### Train the Classification Model

Start by importing useful libraries:

In [1]:
# !pip install skl2onnx

You need '[skl2onnx](https://onnx.ai/sklearn-onnx/)' to help convert your Scikit-learn model to Onnx format.

In [2]:
import pandas as pd
from sklearn.datasets import load_breast_cancer

Then, load your data:

In [3]:
data = load_breast_cancer()
breast_cancer = pd.DataFrame(data.data, columns=data.feature_names)
breast_cancer['target'] = data.target
breast_cancer['target'] = breast_cancer['target'].map({0: 'malignant',
                                                             1: 'benign'})

breast_cancer.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,malignant
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,malignant
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,malignant
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,malignant
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,malignant


In [4]:
breast_cancer.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

Remove the target and save the remaining data as 'X':

In [5]:
X = breast_cancer.iloc[:,:-1]
X.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


Save the labels as 'y':

In [6]:
y = breast_cancer[['target']]
y.head()

Unnamed: 0,target
0,malignant
1,malignant
2,malignant
3,malignant
4,malignant


#### Commence the training

We will use the 'SVC' library.

Import the appropriate libraries from Scikit-learn:

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report

Split training and test sets:

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size = 0.2,
                                                    random_state=0,
                                                    stratify=y)

Build an SVC Classification model:

In [9]:
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svc', SVC(kernel='linear',
                C=10,
                probability=True,
                random_state=0))
])
model = pipeline
model.fit(X_train, y_train.values.ravel())

Now, test your model, calling `predict()`:

In [10]:
y_pred = model.predict(X_test)

Print out a classification report to check the model's quality:

In [11]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

      benign       0.95      1.00      0.97        72
   malignant       1.00      0.90      0.95        42

    accuracy                           0.96       114
   macro avg       0.97      0.95      0.96       114
weighted avg       0.97      0.96      0.96       114



The accuracy is very good!

### Convert your model to Onnx

Make sure to do the conversion with the proper Tensor number. This dataset has 30 features, so you need to notate that number in `FloatTensorType`:

Convert using a tensor number of 30.

In [12]:
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

initial_type = [('float_input', FloatTensorType([None, 30]))]
options = {id(model): {'nocl': True, 'zipmap': False}}

Create the onx and store as a file model.onnx:

In [13]:
onx = convert_sklearn(model, initial_types=initial_type, options=options)
with open("./model.onnx", "wb") as f:
    f.write(onx.SerializeToString())

#### View your model

Use [Netron](https://github.com/lutzroeder/Netron) to open your model.onnx file. You can see your simple model visualized, with its 30 inputs and classifier listed:

![Netron visual](model.png)

Now you are ready to use this neat model in a web app. Let's build an app that will come in handy when you want check your breast cancer status, as determined by your model.