# Support Vector Machine (SVM) Algorithm

Support Vector Machine (SVM) algorithms can be used to solve classification and regression problems. SVM regression relies on kernel functions for modeling the data. SVM creates larger margins between categories of data so that they are linearly separable. SVM handles non-linearly separable data, mainly for regression problems, using kernel functions, such as polynomial, radial basis function (RBF) and sigmoid, to project the data onto a hyperplane. 

In [6]:
import os
import numpy as np 
import pandas as pd
from sklearn.svm import SVR
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error

In [11]:
print(os.getcwd())
print("")

train = pd.read_csv("../data/train_after_feature_engineering.csv")
test = pd.read_csv("../data/test_after_feature_engineering.csv")

print ('The train data has {0} rows and {1} columns'.format(train.shape[0],train.shape[1]))    
print ('The test data has {0} rows and {1} columns'.format(test.shape[0],test.shape[1]))


/home/mcheruvu/notebook/code

The train data has 1460 rows and 307 columns
The test data has 1459 rows and 306 columns


In [13]:
np.random.seed(1234)

_svm_algo = SVR(kernel = 'rbf', C=1e3, gamma=1e-8)

# Fit the Model

In [14]:
target_vector = train["SalePrice"]
target_vector= np.log1p(target_vector) # log(SalePrice) + 1

train.drop(['SalePrice'], axis=1, inplace=True)
            
_svm_algo.fit(train, target_vector)    

SVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma=1e-08,
  kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

# Predict the Test Sale Price

In [15]:
y_train = target_vector
y_train_pred = _svm_algo.predict(train)
    
rmse_train = np.sqrt(mean_squared_error(y_train,y_train_pred))

print("SVM score on training set: ", rmse_train)

y_test_pred = _svm_algo.predict(test)

print(y_test_pred[5:])

('SVM score on training set: ', 0.20692414643056628)
[ 12.15006343  12.23559836  12.17325143 ...,  12.14982349  12.3523352
  12.32217342]


# Save Predictions

In [16]:
df_predict = pd.DataFrame({'Id': test["Id"], 'SalePrice': np.exp(y_test_pred) - 1.0})
#df_predict = pd.DataFrame({'Id': id_vector, 'SalePrice': sale_price_vector})

print(df_predict.head())

df_predict.to_csv('../data/kaggle_python_svm.csv', header=True, index=False)

print('...file is saved')

     Id      SalePrice
0  1461  128717.626380
1  1462  435686.036398
2  1463  193445.586162
3  1464  205788.921232
4  1465  167170.662544
...file is saved
