#Objective: To understand the support vector machines for multi-class classification and regression problems.

##Multiclass classification dataset:
###This is a Glass Identification Data Set from UCI. It contains 10 attributes including id. The response is glass type(discrete 7 values)
###Attribute Information:
1.	Id number: 1 to 214 (removed from CSV file)
2.	RI: refractive index
3.	Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10)
4.	Mg: Magnesium
5.	Al: Aluminum
6.	Si: Silicon
7.	K: Potassium
8.	Ca: Calcium
9.	Ba: Barium
10.	Fe: Iron
### Target class
Type of glass: (class attribute)
-- 1 buildingwindowsfloatprocessed -- 2 buildingwindowsnonfloatprocessed -- 3 vehiclewindowsfloatprocessed
-- 4 vehiclewindowsnonfloatprocessed (none in this database)
-- 5 containers
-- 6 tableware
-- 7 headlamps

##Regression dataset:
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.


###Data fields
1.	MSSubClass: The building class
2.	MSZoning: The general zoning classification
3.	LotFrontage: Linear feet of street connected to property
4.	LotArea: Lot size in square feet
5.	Street: Type of road access
6.	Alley: Type of alley access
7.	LotShape: General shape of property
8.	LandContour: Flatness of the property
9.	Utilities: Type of utilities available
10.	LotConfig: Lot configuration
11.	LandSlope: Slope of property
12.	Neighborhood: Physical locations within Ames city limits
13.	Condition1: Proximity to main road or railroad
14.	Condition2: Proximity to main road or railroad (if a second is present)
15.	BldgType: Type of dwelling
16.	HouseStyle: Style of dwelling
17.	OverallQual: Overall material and finish quality
18.	OverallCond: Overall condition rating
19.	YearBuilt: Original construction date
20.	YearRemodAdd: Remodel date
21.	RoofStyle: Type of roof
22.	RoofMatl: Roof material
23.	Exterior1st: Exterior covering on house
24.	Exterior2nd: Exterior covering on house (if more than one material)
25.	MasVnrType: Masonry veneer type
26.	MasVnrArea: Masonry veneer area in square feet
27.	ExterQual: Exterior material quality
28.	ExterCond: Present condition of the material on the exterior
29.	Foundation: Type of foundation
30.	BsmtQual: Height of the basement
31.	BsmtCond: General condition of the basement
32.	BsmtExposure: Walkout or garden level basement walls
33.	BsmtFinType1: Quality of basement finished area
34.	BsmtFinSF1: Type 1 finished square feet
35.	BsmtFinType2: Quality of second finished area (if present)
36.	BsmtFinSF2: Type 2 finished square feet
37.	BsmtUnfSF: Unfinished square feet of basement area
38.	TotalBsmtSF: Total square feet of basement area
39.	Heating: Type of heating
40.	HeatingQC: Heating quality and condition
41.	CentralAir: Central air conditioning
42.	Electrical: Electrical system
43.	1stFlrSF: First Floor square feet
44.	2ndFlrSF: Second floor square feet
45.	LowQualFinSF: Low quality finished square feet (all floors)
46.	GrLivArea: Above grade (ground) living area square feet
47.	BsmtFullBath: Basement full bathrooms
48.	BsmtHalfBath: Basement half bathrooms
49.	FullBath: Full bathrooms above grade
50.	HalfBath: Half baths above grade
51.	Bedroom: Number of bedrooms above basement level
52.	Kitchen: Number of kitchens
53.	KitchenQual: Kitchen quality
54.	TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
55.	Functional: Home functionality rating
56.	Fireplaces: Number of fireplaces
57.	FireplaceQu: Fireplace quality
58.	GarageType: Garage location
59.	GarageYrBlt: Year garage was built
60.	GarageFinish: Interior finish of the garage
61.	GarageCars: Size of garage in car capacity
62.	GarageArea: Size of garage in square feet
63.	GarageQual: Garage quality
64.	GarageCond: Garage condition
65.	PavedDrive: Paved driveway
66.	WoodDeckSF: Wood deck area in square feet
67.	OpenPorchSF: Open porch area in square feet
68.	EnclosedPorch: Enclosed porch area in square feet
69.	3SsnPorch: Three season porch area in square feet
70.	ScreenPorch: Screen porch area in square feet
71.	PoolArea: Pool area in square feet
72.	PoolQC: Pool quality
73.	Fence: Fence quality
74.	MiscFeature: Miscellaneous feature not covered in other categories
75.	MiscVal: $Value of miscellaneous feature
76.	MoSold: Month Sold
77.	YrSold: Year Sold
78.	SaleType: Type of sale
79.	SaleCondition: Condition of sale

###Target:
SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.


Source: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data


# Task 1: Multi-class Support vector machine (SVM)
1.	Load multi-class dataset
2.	Apply pre-processing techniques
3.	Divide dataset into training and testing sets (fraction of your choice)
4.	Build multi-class SVM model (use sklearn)
5.	Evaluate precision and recall
6.	Play with hyper-parameters and find best combination


# Task 2: Support vector regression (SVR)
1.	Load regression dataset
2.	Apply pre-processing techniques
3.	Divide dataset into training and testing sets (fraction of your choice)
4.	Build SVR model (use sklearn)
5.	Evaluate root mean square error
6.	Play with hyper-parameters and find best combination


# Task 3: Play with various SVM kernels such as polynomial, rbf, sigmoid tanh, etc.

###For more details: 
https://scikit-learn.org/stable/modules/svm.html
https://scikit-learn.org/stable/auto_examples/svm/plot_svm_kernels.html




## Task 1: Multi-class Support vector machine (SVM) 

In [1]:
# Load the libraries
from sklearn.svm import SVC,SVR
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score,classification_report,mean_squared_error
from sklearn.preprocessing import LabelEncoder,MinMaxScaler
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split

In [2]:
# Load the dataset 
df=pd.read_csv('glass.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   RI      214 non-null    float64
 1   Na      214 non-null    float64
 2   Mg      214 non-null    float64
 3   Al      214 non-null    float64
 4   Si      214 non-null    float64
 5   K       214 non-null    float64
 6   Ca      214 non-null    float64
 7   Ba      214 non-null    float64
 8   Fe      214 non-null    float64
 9   Type    214 non-null    int64  
dtypes: float64(9), int64(1)
memory usage: 16.8 KB


In [4]:
import dtale
dtale.show(df)

2020-10-24 16:42:02,464 - INFO     - NumExpr defaulting to 8 threads.




In [7]:
# Preprocessing
# Encoding categorical variables (if any)
# Feature Scaling
# Filling missing values (if any)
scaler=MinMaxScaler()
df.iloc[:,0:8]=scaler.fit_transform(df.iloc[:,0:8])
df.head()

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Type
0,0.432836,0.437594,1.0,0.252336,0.351786,0.009662,0.30855,0.0,0.0,1
1,0.283582,0.475188,0.801782,0.333333,0.521429,0.077295,0.223048,0.0,0.0,1
2,0.220808,0.421053,0.790646,0.389408,0.567857,0.062802,0.218401,0.0,0.0,1
3,0.285777,0.372932,0.821826,0.311526,0.5,0.091787,0.259294,0.0,0.0,1
4,0.275241,0.381955,0.806236,0.29595,0.583929,0.088567,0.245353,0.0,0.0,1


In [10]:
X_train, X_test, y_train, y_test = train_test_split(df.drop('Type',axis=1), df['Type'], test_size=0.3, random_state=42)

In [11]:
# Build SVM model 
model=SVC()
model.fit(X_train,y_train)

SVC()

In [12]:
# Evaluate the build model on test dataset
print('Training Accuracy -',accuracy_score(y_train,model.predict(X_train)))
print('Testing Accuracy -',accuracy_score(y_test,model.predict(X_test)))

Training Accuracy - 0.7114093959731543
Testing Accuracy - 0.6615384615384615


In [13]:
# Evaluate training and testing accuracy
print(classification_report(y_test,model.predict(X_test)))

              precision    recall  f1-score   support

           1       0.67      0.84      0.74        19
           2       0.57      0.74      0.64        23
           3       0.00      0.00      0.00         4
           5       1.00      0.17      0.29         6
           6       0.00      0.00      0.00         3
           7       0.90      0.90      0.90        10

    accuracy                           0.66        65
   macro avg       0.52      0.44      0.43        65
weighted avg       0.63      0.66      0.61        65



  _warn_prf(average, modifier, msg_start, len(result))


##Task 2: Implement support vector regression (SVR)


In [4]:
# Load training and testing datasets
train=pd.read_csv('train.csv')
test=pd.read_csv('test.csv')
full=train.append(test)
full.drop('Id',axis=1,inplace=True)

In [5]:
# Apply pre-processing techniques
# Apply feature selection techniques of your choice to reduce the feature set
full[full.select_dtypes(include='object').columns]=full[full.select_dtypes(include='object').columns].astype('str')
enc=LabelEncoder()
for i in full.select_dtypes(include='object').columns:
    full[i]=enc.fit_transform(full[i])
nullcols=list(full.isnull().any()[full.isnull().any()==True].index)
imp=SimpleImputer()
full[nullcols]=imp.fit_transform(full[nullcols])
cols=full.columns
scaler=MinMaxScaler()
X=full.drop('SalePrice',axis=1)
X=scaler.fit_transform(X)
full=pd.DataFrame(full,columns=cols)
full.head()


Unnamed: 0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,60,3,65.0,8450,1,2,3,3,0,4,...,0,3,4,4,0,2,2008,8,4,208500.0
1,20,3,80.0,9600,1,2,3,3,0,2,...,0,3,4,4,0,5,2007,8,4,181500.0
2,60,3,68.0,11250,1,2,0,3,0,4,...,0,3,4,4,0,9,2008,8,4,223500.0
3,70,3,60.0,9550,1,2,0,3,0,0,...,0,3,4,4,0,2,2006,8,0,140000.0
4,60,3,84.0,14260,1,2,0,3,0,2,...,0,3,4,4,0,12,2008,8,4,250000.0


In [6]:
X_train, X_test, y_train, y_test = train_test_split(full.drop('SalePrice',axis=1), full['SalePrice'], test_size=0.3, random_state=42)

In [7]:
# Train SVR model
model=SVR()
model.fit(X_train,y_train)
print("Trained")



Trained


In [8]:
# Evaluate training and testing root mean square error
print('Training MSE -',(mean_squared_error(y_train,model.predict(X_train)))**(0.5))
print('Testing MSE -',(mean_squared_error(y_test,model.predict(X_test)))**(0.5))

Training MSE - 54224.110865502094
Testing MSE - 60448.57044189207



##Task 3: Play with various SVM kernels such as polynomial, rbf, sigmoid tanh, etc.


In [None]:
#Play with various SVM kernels such as polynomial, rbf, sigmoid tanh, etc.

In [10]:
model=SVR(kernel='poly')
model.fit(X_train,y_train)
print('Training MSE -',(mean_squared_error(y_train,model.predict(X_train)))**(0.5))
print('Testing MSE -',(mean_squared_error(y_test,model.predict(X_test)))**(0.5))

Training MSE - 54120.00486564621
Testing MSE - 60104.22059250853


In [11]:
model=SVR(kernel='rbf')
model.fit(X_train,y_train)
print('Training MSE -',(mean_squared_error(y_train,model.predict(X_train)))**(0.5))
print('Testing MSE -',(mean_squared_error(y_test,model.predict(X_test)))**(0.5))

Training MSE - 54224.110865502094
Testing MSE - 60448.57044189207


In [12]:
model=SVR(kernel='sigmoid')
model.fit(X_train,y_train)
print('Training MSE -',(mean_squared_error(y_train,model.predict(X_train)))**(0.5))
print('Testing MSE -',(mean_squared_error(y_test,model.predict(X_test)))**(0.5))

Training MSE - 54222.51545124184
Testing MSE - 60447.11169364456
