# Iris Dataset Classification

In this tutorial, We will use **SVM** (Support Verctor Machine) model which is used in classification of supervised learning. SVC of python has 2 options either linear or polynomial kernel (there is more of kernel types). We will use both.

## Loading Iris Dataset

In this part, We will load the iris dataset & set up the base variables (data & target)
1. Import `datasets` from `sklearn`
2. Import `StandardScaler` from `sklearn.preprocessing`
3. Use `load_iris()` in `datasets` to load the iris dataset into a variable & name it `iris`
4. Standardize the data of iris `iris.data` using `fit_transform` of `StandardScaler`
5. Load the new standardized data into variable & name it `X`
6. Load the new standardized target `iris.target` into variable & name it `y`

In [10]:
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

iris = datasets.load_iris()
print('Iris Dataset Has Been Loaded Successfully!')

scaler = StandardScaler()
iris.data = scaler.fit_transform(iris.data, iris.target)
print('Data Has Been Standardized Successfully!')

X = iris.data
y = iris.target
print('Done!')

Iris Dataset Has Been Loaded Successfully!
Data Has Been Standardized Successfully!
Done!


## Setting Up The Data Frame

USELESS PART & SHOULD BE ABANDONED

In [11]:
import pandas as pd
import numpy as np

df_iris = pd.DataFrame(X)
df_iris[len(df_iris.columns)] = y

df_iris.columns = np.append(iris.feature_names, 'class')
print('Done!')

Done!


## Seperating Train & Test

In this part, We will seperate the training & testing sets
1. Import `train_test_split` from `sklearn.model_selection`
2. Use `train_test_split` to split the data & target (dont forget to set `random_state` parameter to some constant to avoid inconsistent outputs in different runs & set `test_size` to be 0.2 (to split the dataset into 2 parts 80% for training & 20% for testing))

In [12]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)
print('Done!')

Done!


## Defining C Hyperparameter Test Function (Linear)

In this part, We will make a function that will use grid search to find the most suitable combination of parameters to obtain the highest accuracy (since this is linear we will have only 1 hyperparamter which is C (if you want to know more about c, google it))

1. Import `ShuffleSplit` from `sklearn.model_selection`
2. Import `GreadSearchCV` from `sklearn.model_selection`
3. Import `LinearSVC` from `sklearn.svm`
4. Define a function with the following signature `grid_linear_svc_test(X, y, c)` where X is the data, y is target & c is a numpy array with the values of c that we grid search to test (note: the bigger the range of c the more time it will take)
5. Define `ShuffleSplit` object (don't forget to set `random_state` parameter & set `test_size` to 0.2)
6. Define `LinearSVC` object with `loss` parameter set to `hinge` (`random_state` xD)
7. Make a dictionary with key named `C` and set its value to the passed parameter of the function `c`
8. Define `GridSearchCV` object and pass `LinearSVC` as `estimator` & the dictionary that contains `C` as `param_grid` & `ShuffleSplit` as `cv`
9. Fit the passed `X` & `y` in the `GridSearchCV` using `fit` function
10. Return `GridSearchCV` object

In [14]:
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import GridSearchCV
from sklearn.svm import LinearSVC

def grid_linear_svc_test(X, y, c):
    print('Initiating Grid Search of Linear SVC To Find Most Suitable C')
    
    cv_sets = ShuffleSplit(test_size = 0.20, random_state = 0)
    svc = LinearSVC(loss='hinge', random_state=0)
    params = {'C': c}
    grid = GridSearchCV(svc, params, cv=cv_sets)
    print('Testing C:', c)
    
    grid = grid.fit(X, y)
    print('Grid Search of Linear SVC Is Done!')
    
    return grid

## Generating C Values & Testing Them On The Training Set (Linear)

In this part, We will generate numpy arrays that contain possible values of C & do grid search on them
1. Pass the data & target of the training set with the array of `c` to the above function

In [15]:
import math
import numpy as np

def generate_polynomials(base, min_exp, max_exp):
    output = []
    for i in np.arange(min_exp, max_exp+1):
        if(i > 0):
            output += [base**i]
    return output

c = generate_polynomials(2, 1, 20)
grid = grid_linear_svc_test(X_train, y_train, c)

left_boundry_exp = math.log(grid.best_estimator_.C, 2) - 1
right_boundry_exp = math.log(grid.best_estimator_.C, 2) + 1

c = np.arange(math.pow(2, left_boundry_exp), math.pow(2, right_boundry_exp), 1)
grid = grid_linear_svc_test(X_train, y_train, c)

Initiating Grid Search of Linear SVC To Find Most Suitable C
Testing C: [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576]
Grid Search of Linear SVC Is Done!
Initiating Grid Search of Linear SVC To Find Most Suitable C
Testing C: [128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141.
 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155.
 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169.
 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183.
 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197.
 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211.
 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225.
 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239.
 240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253.
 254. 255. 256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 2

## Displaying Best C & Best Score

In [7]:
print("Best C    :", grid.best_estimator_.C)
print("Best Score:", grid.best_score_ * 100, "%")

Best C    : 507.0
Best Score: 95.41666666666667 %


## Test Set Score (Linear)

In [8]:
print("Test Set Score", grid.best_estimator_.score(X_test, y_test) * 100, "%")

Test Set Score 96.66666666666667 %


## Using Polynomial SVC

In [19]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVC

def grid_poly_svc_test(X, y, c, degree, coef0):
    cv_sets = ShuffleSplit(test_size = 0.20, random_state = 0)
    svc = SVC(kernel="poly", random_state=0)
    params = {'C': c, "degree": degree, "coef0": coef0}
    grid = GridSearchCV(svc, params, cv=cv_sets)
    grid = grid.fit(X, y)
    return grid

## Generating C Values & Testing Them On The Training Set (Polynomial)

In [20]:
c = np.logspace(-10, 11, base=2)
degree = np.arange(2, 10, 1)
coef0 = np.arange(-10, 11, 1)

grid = grid_poly_svc_test(X_train, y_train, c, degree, coef0)

## Displaying Best C & Best Degree & Best Coef0 & Best Score

In [21]:
print("Best C     :", grid.best_estimator_.C)
print("Best Degree:", grid.best_estimator_.degree)
print("Best Coef0 :", grid.best_estimator_.coef0)
print("Best Score :", grid.best_score_ * 100, "%")

Best C     : 0.37149857228423705
Best Degree: 3
Best Coef0 : 1
Best Score : 96.66666666666667 %


## Test Set  (Polynomial)

In [23]:
print("Test Set Score", grid.best_estimator_.score(X_test, y_test) * 100, "%")

Test Set Score 100.0 %
