<div style="text-align:center">
    <img src="../files/monolearn-logo.png" height="150px">
    <h1>ML course</h1>
    <h3>Session 08: SVM, SVC, SVR</h3>
    <h4><a href="https://amzenterprise.ir/">Ali Momenzadeh</a></h5>
</div>

### SVM

<img src = "../files/8/0_9jEWNXTAao7phK-5.png" width=25%>

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However,  it is mostly used in classification problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well 

<img src = "../files/8/0_0o8xIA4k3gXUDCFU.png" width=25%>

Support Vectors are simply the co-ordinates of individual observation. The SVM classifier is a frontier which best segregates the two classes (hyper-plane/ line).

<img src = "../files/8/1_ZpkLQf2FNfzfH4HXeMw4MQ.png" width=50%>

* Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the number of features exceeds 3.

#### About Dataset - Mobile Price Classification

* Mobile phones are the best selling electronic devices as people keep updating their cell phones whenever they find new features in a new device. Thousands of mobiles are sold daily, in such a situation it is a very difficult task for someone who is planning to set up their own mobile phone business to decide what the price of the mobile should be.

* Mr Jason wants to start his own mobile phone company and he wants to wage an uphill battle with big smartphone brands like Samsung and Apple. But he doesn’t know how to estimate the price of a mobile that can cover both marketing and manufacturing costs. So in this task, you don’t have to predict the actual prices of the mobiles but you have to predict the price range of the mobiles. 

#### Import libraries

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn

import warnings
warnings.filterwarnings('ignore')

since our task is to classify the price range of mobile phones and not to predict the actual prices, so here I am going to train a classification model to classify the price range of mobile phones as:

    0 (low cost)
    1 (medium cost)
    2 (high cost)
    3 (very high cost)

#### Load and prepare data

In [None]:
data = pd.read_csv("mobile_prices.csv")

In [None]:
data.head()

#### More on Dataset

Dataset as 21 features and 2000 entries. The meanings of the features are given below.

battery_power: Total energy a battery can store in one time measured in mAh

blue: Has bluetooth or not

clock_speed: speed at which microprocessor executes instructions

dual_sim: Has dual sim support or not

fc: Front Camera mega pixels

four_g: Has 4G or not

int_memory: Internal Memory in Gigabytes

m_dep: Mobile Depth in cm

mobile_wt: Weight of mobile phone

n_cores: Number of cores of processor

pc: Primary Camera mega pixels

px_height: Pixel Resolution Height

px_width: Pixel Resolution Width

ram: Random Access Memory in Mega Bytes

sc_h: Screen Height of mobile in cm

sc_w: Screen Width of mobile in cm

talk_time: longest time that a single battery charge will last when you are

three_g: Has 3G or not

touch_screen: Has touch screen or not

wifi: Has wifi or not

price_range: This is the target variable with value of 0(low cost), 1(medium cost), 2(high cost) and 3(very high cost).


#### EDA

In [None]:
data.shape

In [None]:
data.columns

In [None]:
data.info()

In [None]:
data["price_range"].value_counts()

In [None]:
data.isnull().sum()

#### Strorytelling - Visualization

In [None]:
corr=data.corr()
fig = plt.figure(figsize=(15,12))
r = sns.heatmap(corr, cmap="Purples", annot=True)
r.set_title("Correlation ")

In [None]:
corr.sort_values(by=["price_range"], ascending=False).iloc[0].sort_values(ascending=False)

#### Data PreProcessing 

In [None]:
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

In [None]:
from sklearn.preprocessing import StandardScaler

X = StandardScaler().fit_transform(X)

In [None]:
X

In [None]:
y

#### Train and test 

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

In [None]:
from sklearn.svm import SVC

svclassifier = SVC(kernel="linear")
svclassifier.fit(X_train, y_train)
y_pred = svclassifier.predict(X_test)

In [None]:
y_pred

In [None]:
y_test

In [None]:
X_test[0]

#### Evaluation

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
accuracy = accuracy_score(y_test, y_pred) * 100
print("Accuracy of the Logistic Regression Model: ",accuracy)

In [None]:
pd.crosstab(y_test, y_pred)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

<hr/>

### SVR

<img src = "../files/8/1_25Kk53QBOpBie4_qMSTnAA.png" width=55%>

Support Vector regression is a type of Support vector machine that supports linear and non-linear regression. As it seems in the below graph, the mission is to fit as many instances as possible between the lines while limiting the margin violations. The violation concept in this example represents as ε (epsilon).

#### Generate sample data

In [None]:
import numpy as np

X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()

In [None]:
X

In [None]:
y

#### Add some Noise

In [None]:
y[::5] += 3 * (0.5 - np.random.rand(8))

In [None]:
y

#### Fit Regression Model

In [None]:
from sklearn.svm import SVR

# 1e4 = 10,000 
svr_rbf = SVR(kernel="rbf", C=1e4, gamma=0.1)
svr_lin = SVR(kernel="linear", C=1e4)
svr_poly = SVR(kernel="poly", C=1e4, degree=2)
y_rbf = svr_rbf.fit(X, y).predict(X)
y_lin = svr_lin.fit(X, y).predict(X)
y_poly = svr_poly.fit(X, y).predict(X)

#### Check the Result

In [None]:
import pylab as pl
pl.scatter(X, y, c='k', label='data')
pl.plot(X, y_rbf, c='g', label='RBF model')
pl.plot(X, y_lin, c='r', label='Linear model')
pl.plot(X, y_poly, c='b', label='Polynomial model')
pl.xlabel('data')
pl.ylabel('target')
pl.title('Support Vector Regression')
pl.legend()
pl.show()

In [None]:
from sklearn.metrics import r2_score

print("RBF R2 Score: ", r2_score(y, y_rbf))
print("Linear R2 Score: ", r2_score(y, y_lin))
print("Polynomial R2 Score: ", r2_score(y, y_poly))

### Apply SVR on the Salaries Dataset

#### Import libraries

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn

import warnings
warnings.filterwarnings('ignore')

#### Load and prepare data

In [None]:
df = pd.read_csv("Position_Salaries.csv")
X = df.iloc[:,1:2].values
y = df.iloc[:,2:3].values

In [None]:
df

#### Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

#### Train and test

most important SVR parameter is Kernel type. It can be linear,polynomial or gaussian SVR. 
We have a non-linear condition so we can select polynomial or gaussian but here we select RBF(a #gaussian type) kernel.regressor = SVR(kernel='rbf')

In [None]:
from sklearn.svm import SVR

regressor = SVR(kernel="rbf")
regressor.fit(X,y)
y_pred = regressor.predict(X)

#### Evaluation

##### Visualising the Support Vector Regression 

In [None]:
plt.scatter(X, y, color = 'magenta')
plt.plot(X, y_pred, color = 'green')
plt.title('Truth or Bluff (Support Vector Regression Model)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

In [None]:
from sklearn.metrics import r2_score

print("RBF R2 Score: ", r2_score(y, y_pred))