### **AQI-SupportVector Regressor Method**

In [2]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


### **Import necessary libraries**

In [3]:
import pickle
import numpy as np
import pandas as pd


### **Loading cleaned data**

Before applying linear regression, I did feature engineerings such as dealing with outliers, null values and correlation analysis. The Cleland data has been saved.

In [4]:
with open('/content/drive/MyDrive/Google_colab_project/df.pkl','rb') as file:
    df= pickle.load(file)

In [5]:
#Read value
df.head()

Unnamed: 0,T,TM,Tm,SLP,H,VV,V,VM,PM 2.5
0,10.8,16.3,5.2,1017.6,93.0,0.5,4.3,9.4,219.720833
1,10.8,16.3,5.2,1018.5,87.0,0.6,4.4,11.1,182.1875
2,10.8,16.3,5.2,1019.4,82.0,0.6,4.8,11.1,154.0375
3,10.8,16.3,5.2,1018.7,72.0,0.8,8.1,20.6,223.208333
4,12.4,20.9,5.2,1017.3,61.0,1.3,8.7,22.2,200.645833


In [6]:
#Seperating dependent and independent variables 
x=df.iloc[:,:-1]
y=df.iloc[:,-1]

In [7]:
#Train-Test Split
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=100)

### **SVR  Implementation without Hyperparameter Tuning**

In [8]:
from sklearn import svm
from sklearn import metrics
svr= svm.SVR(kernel='rbf')
svr.fit(x_train, y_train)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [9]:
pred_svr=svr.predict(x_test)

In [10]:
print('MAE:', metrics.mean_absolute_error(y_test,pred_svr))
print('MSE:', metrics.mean_squared_error(y_test, pred_svr))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test,pred_svr)))

MAE: 67.78222613275365
MSE: 8801.746850919259
RMSE: 93.81762548113899


Without any hyperparameter tuning support vector regressor RMSE value around 94 but in the linear regressor, we got around 57 So, next step we are going to do some hyperparameter tuning to find optimal parameters.

**SVR-Linear Kernel Implementation with hyperparameter tuning**

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedKFold
svr_tune = svm.SVR(kernel='linear')
param_grid_linear= {'C': [4,5,6,7,8,9,10,11,12,13,14,15]} 
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=100)
svr_linear= GridSearchCV(svr_tune, param_grid_linear,cv=cv, refit = True, verbose = 3)
svr_linear.fit(x_train, y_train)

### **Model Evaluation**

In [15]:
from sklearn import metrics
rf_prediction=svr_linear.predict(x_test)
print(svr_linear.best_params_)
print(svr_linear.best_score_)
print('MAE:', metrics.mean_absolute_error(y_test, rf_prediction))
print('MSE:', metrics.mean_squared_error(y_test, rf_prediction))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, rf_prediction)))

{'C': 6}
0.5312529379125495
MAE: 44.99114683033755
MSE: 3709.9802172266427
RMSE: 60.909606937055855


In [21]:
#Save trained model
import pickle
# open a file, where you want to store the data
file = open('AQI_SVR_liner.pkl', 'wb')

# dump information to that file
pickle.dump(svr_linear
            , file)

So, Now we got RMSE values around 61 with hyperparameter tuning but without hyperparameter tuning, we got around 94. So right hyperparameter tuning will impact the RMSE. Further, Next step we are going to see,l How does  Support vector Polunomial kernel work with this data?

### **SVR-polynomial- Kernal Implementation**

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedKFold
svr_tune = svm.SVR(kernel='poly')
param_grid_poly=  {'degree': [2, 5,8,10,12], 'C': [10000,20000,30000,40000], 'coef0': [0,0.5,0.75,1]} 
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=100)
svr_poly= GridSearchCV(svr_tune, param_grid_poly,cv=cv, refit = True, verbose = 3)
svr_poly.fit(x_train, y_train)

In [17]:
from sklearn import metrics
poly_prediction=svr_poly.predict(x_test)
print(svr_poly.best_params_)
print(svr_poly.best_score_)
print('MAE:', metrics.mean_absolute_error(y_test, poly_prediction))
print('MSE:', metrics.mean_squared_error(y_test, poly_prediction))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, poly_prediction)))

{'C': 30000, 'coef0': 0.5, 'degree': 12}
0.5709729449225315
MAE: 40.77491768583136
MSE: 3277.2711086212435
RMSE: 57.24745504056266


In [20]:
#Save trained model
import pickle
# open a file, where you want to store the data
file = open('AQI_SVR_ploy.pkl', 'wb')

# dump information to that file
pickle.dump(svr_poly
            , file)

### **Conclusion**

###For SVR-Polynomial we got RMSE values around 57.2 but For SVR-Linear we got an RMSE value of 61 with hyperparameter tuning. So right hyperparameter tuning will impact the RMSE. Further, in the Next step we are going to see, How does Extra TreeRegressor work with this data?