#### Building a Cuisine Recommender Web App

##### Exercise - Train classification model

##### First, train a classification model using the cleaned cuisines dataset we used.

In [1]:
# You need 'skl2onnx' to help convert your Scikit-learn model to Onnx format.

! pip install skl2onnx
import pandas as pd 

Collecting skl2onnx
  Downloading skl2onnx-1.14.1-py2.py3-none-any.whl (292 kB)
     ---------------------------------------- 0.0/292.3 kB ? eta -:--:--
     ---- ---------------------------------- 30.7/292.3 kB 1.3 MB/s eta 0:00:01
     ---- ---------------------------------- 30.7/292.3 kB 1.3 MB/s eta 0:00:01
     ---- ---------------------------------- 30.7/292.3 kB 1.3 MB/s eta 0:00:01
     --------- --------------------------- 71.7/292.3 kB 357.2 kB/s eta 0:00:01
     ----------- ------------------------- 92.2/292.3 kB 403.5 kB/s eta 0:00:01
     ------------- ---------------------- 112.6/292.3 kB 409.6 kB/s eta 0:00:01
     ------------- ---------------------- 112.6/292.3 kB 409.6 kB/s eta 0:00:01
     --------------- -------------------- 122.9/292.3 kB 361.0 kB/s eta 0:00:01
     ----------------- ------------------ 143.4/292.3 kB 370.8 kB/s eta 0:00:01
     ----------------- ------------------ 143.4/292.3 kB 370.8 kB/s eta 0:00:01
     ----------------- ------------------ 143.4

##### Then, work with your data in the same way you did in previous lessons, by reading a CSV file using read_csv()

In [2]:
data = pd.read_csv('cleaned_cuisines.csv')
data.head()

Unnamed: 0.1,Unnamed: 0,cuisine,almond,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,0,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,indian,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


##### Remove the first two unnecessary columns and save the remaining data as 'X'

In [3]:
X = data.iloc[:,2:]
X.head()

Unnamed: 0,almond,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


##### Save the labels as 'y'

In [4]:
y = data[['cuisine']]
y.head()

Unnamed: 0,cuisine
0,indian
1,indian
2,indian
3,indian
4,indian


##### Commencing the training routine.

##### We will use the 'SVC' library which has good accuracy.

##### Import the appropriate libraries from Scikit-learn:

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report

##### Separate training and test sets

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)

##### Build an SVC Classification model.

In [7]:
model = SVC(kernel='linear', C=10, probability=True,random_state=0)
model.fit(X_train,y_train.values.ravel())

##### Now, test your model, calling predict()

In [8]:
y_pred = model.predict(X_test)

##### Print out a classification report to check the model's quality

In [9]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

     chinese       0.72      0.71      0.72       242
      indian       0.91      0.88      0.89       239
    japanese       0.75      0.76      0.75       253
      korean       0.83      0.76      0.80       234
        thai       0.77      0.84      0.80       231

    accuracy                           0.79      1199
   macro avg       0.79      0.79      0.79      1199
weighted avg       0.79      0.79      0.79      1199



#### Convert your model to Onnx

##### Make sure to do the conversion with the proper Tensor number. This dataset has 380 ingredients listed, so you need to notate that number in FloatTensorType:

##### Convert using a tensor number of 380.