### binary classifiers distinguish between two classes, multiclass classifiers (also called *multinomial classifiers*) can distinguish between more than two classes.

## You can perform multiclass classification using multiple binary classifiers:
    - u can classify the digit images from 0 to 9 (10 classes) using 10 binary classifiers , one for each digit ( not 1 / 1 , not 2/2 etc);
    - when you want to classifiy an image u select the class with the highest output score 
    - This strategy is called ONE-VERSUS-ALL (OvA)   also called ONE-VERSUS-THE-REST
    
    - u can train a binary classifier for every pair of digit images , one to distinguish 0s and 1s , another for 0s and 2s and so on 
    - this strategy is called ONE-VERSUS-ONE (OvO).  For N classes you need Nx(N-1) classifiers.
    
    -For most binary classifiers algorithms , OvA is preffered

### Scikit-Learn detects when u try to use binary classification for a multiclass classification task and automatically runs OvA 

In [1]:
import numpy as np
from sklearn.datasets import fetch_openml

mnist = fetch_openml("mnist_784",version =1)

In [2]:
X = mnist.data
y = mnist.target
y = y.astype(np.uint8) # ! remember that target column is str

some_digit = X[0] # first row of features in the dataset

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

#y_train_5 = (y_train == 5) 
#y_test_5 = (y_test == 5) we don't need this anymore in multiclass classification



In [5]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(max_iter=1000,tol=1e-3,random_state=42)
sgd_clf.fit(X_train,y_train)  #we are now training with the original target classes

SGDClassifier(random_state=42)

In [8]:
sgd_clf.predict([some_digit])

array([3], dtype=uint8)

### It predicts a  3 although the digit is a 5  , but we can see that it automatically detects multiclass classification.

!!! in the github repository they also use :
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html which is actually a using a OvO  strategy 

In [10]:
#this is the code from git

from sklearn.svm import SVC

svm_clf = SVC(gamma="auto", random_state=42)
svm_clf.fit(X_train[:1000], y_train[:1000]) # again they use the original targets
svm_clf.predict([some_digit])

array([5], dtype=uint8)

In [None]:
# ih199r

#### Let's return to our SGDClassifier :\
    - in the background Scikit actually trained 10 binary classifiers , got their decision scores for the image and selected the class with the highest score
    - we can use the decision_function()  method to check
    - we can see that the highest score coresponds to the class 3 it predicted

In [13]:
some_digit_scores = sgd_clf.decision_function([some_digit]) #get scores
classes = sgd_clf.classes_  #get classes

In [32]:
some_digit_scores , classes

(array([[-31893.03095419, -34419.69069632,  -9530.63950739,
           1823.73154031, -22320.14822878,  -1385.80478895,
         -26188.91070951, -16147.51323997,  -4604.35491274,
         -12050.767298  ]]),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8))

### Let's use a RandomForestClassifier

In [33]:
from sklearn.ensemble import RandomForestClassifier

forest_clf = RandomForestClassifier(random_state=42)
forest_clf.fit(X_train,y_train)

forest_clf.predict([some_digit])

array([5], dtype=uint8)

RandomForestClassifier  directly classify instances into multiple classes , let;s use the predict_proba() method to see the list of probabilities for each class

In [37]:
forest_clf.predict_proba([some_digit])  # u can see that it gives a 90% probability
# that the image represents the digit 5

array([[0.  , 0.  , 0.01, 0.08, 0.  , 0.9 , 0.  , 0.  , 0.  , 0.01]])

## Warning : code below has a long time of execution

In [40]:
#we evaluate the sgd classifier
from sklearn.model_selection import cross_val_score   

cross_val_score(sgd_clf, X_train, y_train, cv=3, scoring="accuracy")

array([0.87365, 0.85835, 0.8689 ])

In [41]:
# use scaling on features like we did in part 1  to see if the accuracy increases 


from sklearn.preprocessing import StandardScaler   

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.astype(np.float64))
cross_val_score(sgd_clf, X_train_scaled, y_train, cv=3, scoring="accuracy")

array([0.8983, 0.891 , 0.9018])