# Destek Vektör Regresyonu (Support vector regression SVR)
* Güçlü ve esnek modelleme tekniklerinden birisidir.
* Sınıflandırma ve regresyon için kullanılabilir !
* Robust (dayanıklı) bir regresyon modelleme tekniğidir.(Aykırı gözlemelere karşı dayanıklı olmayı ifade eder.)
* Amaç, bir marjin aralığına maksimum noktayı en küçük hata ile alabilecek şekilde doğru ya da eğriyi belirlemektir. (Smola 1996 ve Drucker 1997)
--------------------------------
#### $ y = wx + \beta + \epsilon$

##### Minimizasyon problemi : 
#### $\frac{1}{2} ||w||^2 + C \Sigma{  \xi_i+\xi^*} $
##### Kısıtlar : 
* $y_i- (w * x_i)-\beta \leq \epsilon + \xi_i $
* $ (w * x_i)+\beta - y_i \leq \epsilon + \xi_i^* $
* $\xi_i , \xi_i^* \geq 0 $
* $i=1,2,3,4,..., m$

In [2]:
## kütphaneler

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale, StandardScaler ## standartlaStırma
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import neighbors
from sklearn.svm import SVR

# Model ve tahmin

In [3]:
df = pd.read_csv('Hitters.csv')
df = df.dropna()
y= df['Salary']
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']]) ## one hot encoding
X_= df.drop(['Salary', 'League', 'Division','NewLeague'], axis =1).astype('float64')
X= pd.concat([X_,dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25, random_state=42)

In [4]:
svr_model = SVR(kernel='linear').fit(X_train,y_train)

In [5]:
svr_model

In [6]:
svr_model.predict(X_test)

array([ 679.14754919,  633.72883529,  925.68639938,  270.28464317,
        530.26659421,  272.22606175,  549.4423224 ,  446.55264396,
        892.8309562 ,  677.96856008,  677.16149322,  868.18485002,
        451.00610659,  382.35543803,  308.26205728,  609.16129527,
        744.69688828,  152.60132455, 1012.03931325,  376.77473896,
        481.42042557,  771.67234151,  521.05069491,  588.93024829,
        615.06797076,  132.39925922,  958.8936008 ,  355.23613415,
        579.89689139,  124.67778281,  153.86174938,   14.75058547,
        358.06037823,  282.58793005,  281.45885316,  533.38953946,
       1206.24281291,  170.07373765,   40.51550058,  258.34330019,
         66.15838195,  216.51484567,  692.01541809,  449.24701051,
        856.67888329,  753.2161061 ,  442.51268754,  288.71673557,
        309.2636209 ,  556.53116993,  867.46283046,  353.4386512 ,
        656.14839681,  362.44007402,  201.08404007,  525.70822384,
        584.05155335,  910.92606662,  178.24959893, 1247.87338

In [7]:
svr_model.intercept_ # sabiti

array([-80.15196063])

In [8]:
svr_model.coef_ # katsayılar

array([[ -1.2183904 ,   6.09602978,  -3.67574533,   0.14217072,
          0.51435925,   1.28388992,  12.55922527,  -0.08693754,
          0.46597185,   2.98259931,   0.52944513,  -0.79820793,
         -0.16015531,   0.30872795,   0.28842348,  -1.79560066,
          6.41868986, -10.74313785,   1.33374319]])

In [9]:
# test

y_pred = svr_model.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_pred))

370.0408415795005

In [10]:
svr_model = SVR(kernel='rbf').fit(X_train,y_train) # doğrusal olmayan radial basis function ı da kullanabiliriz.

In [11]:
# test

y_pred = svr_model.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_pred))

460.0032657244849

## SVR Model Tuning
* ceza parametresini optimizze edeceğiz

In [12]:
svr_model = SVR(kernel='linear')

In [13]:
svr_model

In [14]:
svr_params= {'C': [0.1,0.5,1,3]}

In [17]:
svr_cv_model= GridSearchCV(svr_model,svr_params, cv=10, verbose=2, n_jobs=-1).fit(X_train,y_train)

Fitting 10 folds for each of 4 candidates, totalling 40 fits


In [18]:
svr_cv_model.best_params_

{'C': 0.1}

In [19]:
svr_cv_model

In [20]:
svr_tuned = SVR(kernel='linear', C=0.5).fit(X_train,y_train)

In [21]:
y_pred = svr_tuned.predict(X_test)

In [23]:
np.sqrt(mean_squared_error(y_test,y_pred)) ## ilkel test hatamız 370 ti

367.98747616655294