# Build Classification Models

In this lesson, you will use the dataset you saved from the last lesson full of balanced, clean data all about cuisines.

You will use this dataset with a variety of classifiers to _predict a given national cuisine based on a group of ingredients_. While doing so, you'll learn more about some of the ways that algorithms can be leveraged for classification tasks.

## Exercise - predict a national cuisine

In [1]:
import pandas as pd
cuisines_df = pd.read_csv("cleaned_cuisines.csv")
cuisines_df.head()

Unnamed: 0.1,Unnamed: 0,cuisine,almond,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,0,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,indian,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,indian,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
from sklearn.svm import SVC
import numpy as np

 Divide the X and y coordinates into two dataframes for training. `cuisine` can be the labels dataframe:

In [3]:
cuisines_label_df = cuisines_df['cuisine']
cuisines_label_df.head()

0    indian
1    indian
2    indian
3    indian
4    indian
Name: cuisine, dtype: object

Drop that `Unnamed: 0` column and the `cuisine` column, calling `drop()`. Save the rest of the data as trainable features:

In [4]:
cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
cuisines_feature_df.head()

Unnamed: 0,almond,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


Now you are ready to train your model!

## Choosing your classifier

Now that your data is clean and ready for training, you have to decide which algorithm to use for the job. 

Scikit-learn groups classification under Supervised Learning, and in that category you will find many ways to classify. [The variety](https://scikit-learn.org/stable/supervised_learning.html) is quite bewildering at first sight. The following methods all include classification techniques:

- Linear Models
- Support Vector Machines
- Stochastic Gradient Descent
- Nearest Neighbors
- Gaussian Processes
- Decision Trees
- Ensemble methods (voting Classifier)
- Multiclass and multioutput algorithms (multiclass and multilabel classification, multiclass-multioutput classification)

## Exercise - split the data

We can focus on logistic regression for our first training trial since you recently learned about the latter in a previous lesson.
Split your data into training and testing groups by calling `train_test_split()`:


In [5]:
X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)

## Exercise - apply logistic regression

Since you are using the multiclass case, you need to choose what _scheme_ to use and what _solver_ to set. Use LogisticRegression with a multiclass setting and the **liblinear** solver to train.

1. Create a logistic regression with multi_class set to `ovr` and the solver set to `liblinear`:


In [6]:
lr = LogisticRegression(multi_class='ovr',solver='liblinear')
model = lr.fit(X_train, np.ravel(y_train))

accuracy = model.score(X_test, y_test)
print ("Accuracy is {}".format(accuracy))

Accuracy is 0.804837364470392


The accuracy is good at over **80%**!

1. You can see this model in action by testing one row of data (#50):

In [7]:
print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')
print(f'cuisine: {y_test.iloc[50]}')

ingredients: Index(['fish', 'lime_juice', 'shrimp'], dtype='object')
cuisine: thai


In [8]:
test= X_test.iloc[50].values.reshape(-1, 1).T
proba = model.predict_proba(test)
classes = model.classes_
resultdf = pd.DataFrame(data=proba, columns=classes)

topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])
topPrediction.head()



Unnamed: 0,0
thai,0.820303
japanese,0.144534
chinese,0.019625
korean,0.011454
indian,0.004085


Get more detail by printing a classification report, as you did in the regression lessons:

In [9]:
y_pred = model.predict(X_test)
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

     chinese       0.72      0.73      0.72       228
      indian       0.93      0.87      0.90       256
    japanese       0.75      0.77      0.76       223
      korean       0.84      0.79      0.81       237
        thai       0.78      0.86      0.82       255

    accuracy                           0.80      1199
   macro avg       0.81      0.80      0.80      1199
weighted avg       0.81      0.80      0.81      1199



In the lesson, we discussed various solvers used in machine learning to pair algorithms with a learning process to create accurate models. Let's compare and contrast two of these solvers: Stochastic Gradient Descent (SGD) and Adam.

## Stochastic Gradient Descent (SGD):

Problem Addressed: SGD is commonly used for solving optimization problems, particularly in training deep learning models. It aims to minimize the loss function by iteratively adjusting the model's parameters.
Working with Data Structures: SGD can work with both sparse and dense data. It processes data instances one at a time or in small batches, making it suitable for large datasets. It sequentially samples the training instances and performs a gradient update using the computed error for each instance.
Selection Considerations: SGD is a simple and efficient optimization algorithm. It is well-suited for large-scale problems with a large number of training instances. However, it might converge to a suboptimal solution due to its random sampling of instances, which can make it sensitive to the initial configuration.

## Adam (Adaptive Moment Estimation):

Problem Addressed: Adam is an adaptive optimization algorithm that is particularly effective for training deep neural networks. It combines concepts from both RMSprop and momentum techniques, providing robustness and faster convergence.
Working with Data Structures: Adam can handle both sparse and dense data effectively. It maintains adaptive learning rates for each parameter, making it suitable for non-stationary objectives and problems with noisy gradients.
Selection Considerations: Adam is widely used and often preferred for deep learning tasks due to its good convergence properties and fast training speed. It automatically adapts the learning rate for each parameter based on past gradients, allowing it to converge quickly and handle variations in the learning rate. However, Adam can require more memory compared to SGD due to the additional computation and storage of the adaptive learning rates.
When choosing between SGD and Adam, consider the following factors:

- Dataset Size: For large-scale datasets, SGD's ability to process instances in smaller batches or one at a time can provide computational advantages.
- Convergence Speed: Adam's adaptive learning rates and momentum can lead to faster convergence, particularly in deep learning scenarios with potentially noisy gradients.
- Memory Constraints: If memory usage is a concern, SGD might be a better choice due to its simplicity and lower memory requirements.

Ultimately, the choice between SGD and Adam depends on the specific problem, dataset size, convergence speed requirements, and available computational resources. Both solvers have their strengths and weaknesses, and it's essential to experiment and evaluate their performance on your specific task to determine the optimal choice.