## Programming Assignment

# II. Data

You may choose one of the following three datasets to work on. Introduce your data and visualize them. Describe your observations about the data. 
1. https://archive.ics.uci.edu/ml/datasets/covertype
2. https://archive.ics.uci.edu/ml/datasets/cnae-9
3. https://archive.ics.uci.edu/ml/datasets/Activity+recognition+using+wearable+physiological+measurements (DL Link - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6960825/bin/sensors-19-05524-s001.zip )


# III. Method

In the earlier assignment you had to implement Least Mean Square Classifier, Fisher Linear Discriminant, Perceptron,logistic regression, and Neural Network. In this assignment your tasks are the followings:

1. Implement both SVM and Kernel SVM and report the classification performance of the classifiers on the original dataset.
2. Use PCA to reduce the feature representation to a more compact version that may be of size: 10%, 15%, 20%, 25%, and 30% of the originial dataset dimension.
3. Compare the performance of the classifier using the PCA reduced descriptor and the original feature descriptor.

Do not forget to explain your implementation. 

The explanation of your codes should not be the comments in a code cell. 

Each implementation will be followed by a separate markdown cell that should include
 - your implementation description
 - Review of the classification model implemented.
 - Plots or metrics to show the performance of the algorithm

### Method

In [24]:
##your method implementation goes here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing

## Loading, preprocessing, splitting and visualization of dataset

In [25]:
covertype = pd.read_csv('covtype.data') #Loading the data using the pandas dataframe.
covertype.head()

Unnamed: 0,2596,51,3,258,0,510,221,232,148,6279,...,0.34,0.35,0.36,0.37,0.38,0.39,0.40,0.41,0.42,5
0,2590,56,2,212,-6,390,220,235,151,6225,...,0,0,0,0,0,0,0,0,0,5
1,2804,139,9,268,65,3180,234,238,135,6121,...,0,0,0,0,0,0,0,0,0,2
2,2785,155,18,242,118,3090,238,238,122,6211,...,0,0,0,0,0,0,0,0,0,2
3,2595,45,2,153,-1,391,220,234,150,6172,...,0,0,0,0,0,0,0,0,0,5
4,2579,132,6,300,-15,67,230,237,140,6031,...,0,0,0,0,0,0,0,0,0,2


In [26]:
#feature scaling the data using sklearn library
normalize = covertype.columns[:54]
scaler = preprocessing.MinMaxScaler()
covertype_scaled = scaler.fit_transform(covertype.iloc[:,:54])
covertype_scaled = pd.DataFrame(covertype_scaled, columns=normalize)

covertype_scaled.head()

Unnamed: 0,2596,51,3,258,0,510,221,232,148,6279,...,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.40,0.41,0.42
0,0.365683,0.155556,0.030303,0.151754,0.215762,0.054798,0.866142,0.925197,0.594488,0.867838,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.472736,0.386111,0.136364,0.19184,0.307494,0.446817,0.92126,0.937008,0.531496,0.853339,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.463232,0.430556,0.272727,0.173228,0.375969,0.434172,0.937008,0.937008,0.480315,0.865886,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.368184,0.125,0.030303,0.10952,0.222222,0.054939,0.866142,0.92126,0.590551,0.860449,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.36018,0.366667,0.090909,0.214746,0.204134,0.009414,0.905512,0.933071,0.551181,0.840792,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
#replacing old data with scaled data
covertype.iloc[:,:54] = covertype_scaled 
covertype.head()

Unnamed: 0,2596,51,3,258,0,510,221,232,148,6279,...,0.34,0.35,0.36,0.37,0.38,0.39,0.40,0.41,0.42,5
0,0.365683,0.155556,0.030303,0.151754,0.215762,0.054798,0.866142,0.925197,0.594488,0.867838,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5
1,0.472736,0.386111,0.136364,0.19184,0.307494,0.446817,0.92126,0.937008,0.531496,0.853339,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
2,0.463232,0.430556,0.272727,0.173228,0.375969,0.434172,0.937008,0.937008,0.480315,0.865886,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
3,0.368184,0.125,0.030303,0.10952,0.222222,0.054939,0.866142,0.92126,0.590551,0.860449,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5
4,0.36018,0.366667,0.090909,0.214746,0.204134,0.009414,0.905512,0.933071,0.551181,0.840792,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2


In [28]:
#splitting dataset into 80% training and 20% testing sets choosing randomly
## splitting the data into independent x and target values y as well as transforming them to the array

from sklearn.model_selection import train_test_split

x = covertype.iloc[:50000,:54].values  #cutting down the data as training was taking too long
y = covertype.iloc[:50000,-1].values   ##getting the target values attached to the

In [6]:
#Splitting the dataset

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20, random_state = 0)

print("x_train",type(x_train))
print("x_test", x_test.shape)
print("y_train",y_train.shape)
print("y_test", y_test.shape)

x_train <class 'numpy.ndarray'>
x_test (10000, 54)
y_train (40000,)
y_test (10000,)


## Building a model for SVM without kernel

In [7]:
from sklearn.svm import SVC
model = SVC()

In [8]:
model.fit(x_train, y_train) #fitting the data

SVC()

In [9]:
y_predict = model.predict(x_test) #making model predict the values 

In [10]:
from sklearn.metrics import classification_report 
print(classification_report(y_test,y_predict)) #printing the classification report

              precision    recall  f1-score   support

           1       0.80      0.57      0.67      2059
           2       0.85      0.95      0.89      5765
           3       0.68      0.53      0.60       427
           4       0.76      0.97      0.85       432
           5       0.74      0.57      0.65       476
           6       0.62      0.62      0.62       407
           7       0.92      0.85      0.89       434

    accuracy                           0.82     10000
   macro avg       0.77      0.72      0.74     10000
weighted avg       0.81      0.82      0.81     10000



## Building a model for SVC with Kernel

In [11]:
model_kernel = SVC(kernel='linear')
model_kernel.fit(x_train, y_train)

SVC(kernel='linear')

In [12]:
y_predict_kernel = model.predict(x_test)

In [13]:
from sklearn.metrics import classification_report 
print(classification_report(y_test,y_predict_kernel)) #printing the classification report of model with Kernel

              precision    recall  f1-score   support

           1       0.80      0.57      0.67      2059
           2       0.85      0.95      0.89      5765
           3       0.68      0.53      0.60       427
           4       0.76      0.97      0.85       432
           5       0.74      0.57      0.65       476
           6       0.62      0.62      0.62       407
           7       0.92      0.85      0.89       434

    accuracy                           0.82     10000
   macro avg       0.77      0.72      0.74     10000
weighted avg       0.81      0.82      0.81     10000



## Comparison of SVM model with Kernel and without kernel

I found that the for the smaller dataset like 10000 the Model without kernel showed an efficiency of 76% and with kernel showed an efficiency of around 75%, however when the dataset was increased to 50000 then both showed the better efficiency and accuracy which is around 82%

## Using PCA to reduce the feature representation to a more compact version 

## Applying PCA for 30% of the data

In [14]:
from sklearn.decomposition import PCA #importing library for PCA
pca = PCA(n_components=16) #for 30% of the data

In [15]:
x.shape #intital shape of the training data

(50000, 54)

In [16]:
x_reduced = pca.fit_transform(x)  #applying PCA on the data
x_reduced.shape

(50000, 16)

In [17]:
#now on this feature reduced data applying SVM without Kernel
x_train, x_test, y_train, y_test = train_test_split(x_reduced, y, test_size = 0.20, random_state = 0)
model.fit(x_train, y_train) 
y_predict_30 = model.predict(x_test)
print(classification_report(y_test,y_predict_30)) #printing the classification report

              precision    recall  f1-score   support

           1       0.80      0.57      0.66      2059
           2       0.85      0.94      0.90      5765
           3       0.66      0.55      0.60       427
           4       0.76      0.95      0.85       432
           5       0.69      0.65      0.67       476
           6       0.63      0.61      0.62       407
           7       0.91      0.84      0.87       434

    accuracy                           0.82     10000
   macro avg       0.76      0.73      0.74     10000
weighted avg       0.81      0.82      0.81     10000



In [18]:
#on the feature reduced data applying SVM with kernel
model_kernel.fit(x_train, y_train) 
y_predict_30_kernel = model.predict(x_test)
print(classification_report(y_test,y_predict_30_kernel)) #printing the classification report

              precision    recall  f1-score   support

           1       0.80      0.57      0.66      2059
           2       0.85      0.94      0.90      5765
           3       0.66      0.55      0.60       427
           4       0.76      0.95      0.85       432
           5       0.69      0.65      0.67       476
           6       0.63      0.61      0.62       407
           7       0.91      0.84      0.87       434

    accuracy                           0.82     10000
   macro avg       0.76      0.73      0.74     10000
weighted avg       0.81      0.82      0.81     10000



## Comparing SVC with and without Kernel on 30% data.

I found that the training time reduced a lot after applying the PCA and also since the important features are kept intact thats why the accuracy remains intact which helps the system to become more efficient and use less computation power.

## Applying PCA for 20% of the data

In [29]:
pca_20 = PCA(n_components=11) #for 20% of the data
x.shape #intital shape of the training data

(50000, 54)

In [31]:
x_reduced_20 = pca_20.fit_transform(x)  #applying PCA on the data
x_reduced_20.shape

(50000, 11)

In [32]:
#now on this feature reduced data applying SVM without Kernel
x_train, x_test, y_train, y_test = train_test_split(x_reduced_20, y, test_size = 0.20, random_state = 0)
model.fit(x_train, y_train) 
y_predict_20 = model.predict(x_test)
print(classification_report(y_test,y_predict_20)) #printing the classification report

              precision    recall  f1-score   support

           1       0.78      0.54      0.63      2059
           2       0.85      0.93      0.89      5765
           3       0.68      0.34      0.45       427
           4       0.67      0.93      0.78       432
           5       0.59      0.60      0.59       476
           6       0.55      0.62      0.58       407
           7       0.76      0.84      0.80       434

    accuracy                           0.79     10000
   macro avg       0.70      0.68      0.67     10000
weighted avg       0.79      0.79      0.78     10000



In [33]:
#on the feature reduced data applying SVM with kernel
model_kernel.fit(x_train, y_train) 
y_predict_20_kernel = model.predict(x_test)
print(classification_report(y_test,y_predict_20_kernel)) #printing the classification report

              precision    recall  f1-score   support

           1       0.78      0.54      0.63      2059
           2       0.85      0.93      0.89      5765
           3       0.68      0.34      0.45       427
           4       0.67      0.93      0.78       432
           5       0.59      0.60      0.59       476
           6       0.55      0.62      0.58       407
           7       0.76      0.84      0.80       434

    accuracy                           0.79     10000
   macro avg       0.70      0.68      0.67     10000
weighted avg       0.79      0.79      0.78     10000



For the PCA which reduced the features to 20% the accuracy of the data reduced though the computation power required by the system decreased as it trained the data in a very less time.

## Applying PCA for 10% of the data

In [34]:
pca_10 = PCA(n_components=6) #for 20% of the data
x.shape #intital shape of the training data

(50000, 54)

In [35]:
x_reduced_10 = pca_10.fit_transform(x)  #applying PCA on the data
x_reduced_10.shape

(50000, 6)

In [36]:
#now on this feature reduced data applying SVM without Kernel
x_train, x_test, y_train, y_test = train_test_split(x_reduced_10, y, test_size = 0.20, random_state = 0)
model.fit(x_train, y_train) 
y_predict_10 = model.predict(x_test)
print(classification_report(y_test,y_predict_10)) #printing the classification report

              precision    recall  f1-score   support

           1       0.76      0.53      0.62      2059
           2       0.85      0.93      0.89      5765
           3       0.52      0.34      0.41       427
           4       0.64      0.96      0.77       432
           5       0.52      0.58      0.55       476
           6       0.59      0.48      0.53       407
           7       0.68      0.71      0.69       434

    accuracy                           0.78     10000
   macro avg       0.65      0.65      0.64     10000
weighted avg       0.77      0.78      0.77     10000



In [37]:
#on the feature reduced data applying SVM with kernel
model_kernel.fit(x_train, y_train) 
y_predict_10_kernel = model.predict(x_test)
print(classification_report(y_test,y_predict_10_kernel)) #printing the classification repor

              precision    recall  f1-score   support

           1       0.76      0.53      0.62      2059
           2       0.85      0.93      0.89      5765
           3       0.52      0.34      0.41       427
           4       0.64      0.96      0.77       432
           5       0.52      0.58      0.55       476
           6       0.59      0.48      0.53       407
           7       0.68      0.71      0.69       434

    accuracy                           0.78     10000
   macro avg       0.65      0.65      0.64     10000
weighted avg       0.77      0.78      0.77     10000



For the PCA which reduced the features to 10% the accuracy of the data further reduced though the computation power required by the system decreased even further as it trained the data in a very less time.

# IV. Experiments

Apply the classfiers on the data and discuss the results.
Please describe your codes for experiments. You may have subsections of results and discussions here.
Here follows the list that you consider to include:
- the classification results
- plots of classification results 
- model comparision 
- choice of evaluation metrics
- **Must partition data into training and testing**

# Conclusions

Both the classifier worked really well on the data but the maximum accuracy I can get and that was the saturation point of the accuracy according to me was 82%, before that with the smaller dataset SVC without kernel was performing better that with kernel.

Also after applying PCA and reducing features to 30% the efficiency of the system improved to almost 50% as the system took less time to train, but it kept intact the important features of the data that helped the system to work with the same accuracy.

But as I kept on reducing the features of the data more and more to 20% and 10%, the model's accuracy started decreasing as may be the important features were truncated. 

So, what I observed is that reducing the data to 30% maintained a perfect balance with keeping the accuracy intact and also kept the computation power usage to medium level as well.
