## Predicting Speedup - Classification

We label each data point collected as:-
<ul>
    <li>'Slowdown' (speedup<1)</li>
    <li>'Minimal' (speedup<10)</li>
    <li>'Moderate' (speedup<20)</li>
    <li>'High' (speedup >= 20)</li>
</ul>
Then we train our classifier to predict wheather the application would get a preferable speed up range.

Importing required libraries

In [1]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from matplotlib import pyplot as plt
from matplotlib import cm as cm
from sklearn.svm import SVC
from sklearn import metrics
import seaborn as sns
import pandas as pd 
import numpy as np

### Data Pre-Processing

Read CSV

In [2]:
np.random.seed(42)
df = pd.read_csv('./data/final_data_sort.csv')
df.head()

Unnamed: 0,SP/SM,Num SM,Cluster ID,Data (Transfer) Size (in Bytes),Ratio of Global Access,Ratio of FP Instructions,Ratio of Branch Inst,Speedup,Num Blocks,Num Threads,Application Name,PCIe Bandwidth,Class
0,64,80,3,40000,0.11,0.0,0.0,0.004842,1020,1,QuickSort,4.0,Slowdown
1,64,80,3,40000,0.11,0.0,0.0,0.004842,1020,1,QuickSort,4.0,Slowdown
2,64,80,3,4000,0.11,0.0,0.0,0.0049,118,1,QuickSort,4.0,Slowdown
3,64,80,3,4000,0.11,0.0,0.0,0.0049,118,1,QuickSort,4.0,Slowdown
4,64,80,3,100001,0.82,0.0,0.032,0.008255,1563,64,PrimeGen,4.0,Slowdown


In [3]:
df.dtypes

SP/SM                                int64
Num SM                               int64
Cluster ID                           int64
Data (Transfer) Size (in Bytes)      int64
Ratio of Global Access             float64
Ratio of FP Instructions           float64
Ratio of Branch Inst               float64
Speedup                            float64
Num Blocks                           int64
Num Threads                          int64
Application Name                    object
PCIe Bandwidth                     float64
Class                               object
dtype: object

Drop columns that do not add to analysis

In [4]:
#drop columns irrelevant columns
df = df.drop(['Cluster ID'], axis=1)
df = df.drop(['Application Name'], axis=1)
df = df.drop(['Ratio of FP Instructions'], axis=1)
df = df.drop(['Speedup'], axis=1)

Train-test split after feature-test split and standardizing training data

In [5]:
#features-target split
y = df['Class'].values
df = df.drop(['Class'], axis=1)
X = df.values

#standardize
scaler = StandardScaler()
X=scaler.fit_transform(X)

#train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=0)

### Random Forest

In [6]:
#Random Forest Classifier
classifier = RandomForestClassifier(n_estimators=20, random_state=0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

#evalaute
print('Accuracy:', metrics.accuracy_score(y_test, y_pred))
print('F1 score:', metrics.f1_score(y_test, y_pred,average='weighted'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='weighted'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='weighted'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='macro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='macro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='macro'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='micro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='micro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='micro'))

Accuracy: 0.95
F1 score: 0.9460000000000001
Precision: 0.9538461538461538
Recall: 0.95
F1 score: 0.94
Precision: 0.9807692307692308
Recall: 0.9166666666666666
F1 score: 0.9500000000000001
Precision: 0.95
Recall: 0.95


Confusion Matrix -> We notice that 1 Minimal has been wrongly predicted as slow -> Acceptable

In [7]:
metrics.confusion_matrix(y_test, y_pred, labels=["Slowdown", "Minimal", "Moderate","High"])

array([[12,  0,  0,  0],
       [ 1,  2,  0,  0],
       [ 0,  0,  2,  0],
       [ 0,  0,  0,  3]], dtype=int64)

### Gradient Boosted Trees

In [8]:
#GBT
classifier = GradientBoostingClassifier(n_estimators=20)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

#evalaute
print('Accuracy:', metrics.accuracy_score(y_test, y_pred))
print('F1 score:', metrics.f1_score(y_test, y_pred,average='weighted'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='weighted'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='weighted'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='macro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='macro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='macro'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='micro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='micro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='micro'))

Accuracy: 0.95
F1 score: 0.9460000000000001
Precision: 0.9538461538461538
Recall: 0.95
F1 score: 0.94
Precision: 0.9807692307692308
Recall: 0.9166666666666666
F1 score: 0.9500000000000001
Precision: 0.95
Recall: 0.95


Confusion Matrix -> Same error as Random Forest

In [9]:
metrics.confusion_matrix(y_test, y_pred, labels=["Slowdown", "Minimal", "Moderate","High"])

array([[12,  0,  0,  0],
       [ 1,  2,  0,  0],
       [ 0,  0,  2,  0],
       [ 0,  0,  0,  3]], dtype=int64)

### Multi-Layer Perceptron

In [10]:
#Multi-Layer Perceptron
classifier = MLPClassifier(hidden_layer_sizes=(8,8,8), activation='relu', solver='adam', max_iter=4500, random_state=10)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)

#evalaute
print('Accuracy:', metrics.accuracy_score(y_test, y_pred))
print('F1 score:', metrics.f1_score(y_test, y_pred,average='weighted'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='weighted'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='weighted'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='macro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='macro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='macro'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='micro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='micro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='micro'))

Accuracy: 0.85
F1 score: 0.8621212121212121
Precision: 0.9
Recall: 0.85
F1 score: 0.8106060606060606
Precision: 0.7916666666666666
Recall: 0.875
F1 score: 0.85
Precision: 0.85
Recall: 0.85


Confusion Matrix -> 2 Slowdown have been missclassifed as Minimal and Moderate & 1 Minimal as Moderate -> Not preferrable

In [11]:
metrics.confusion_matrix(y_test, y_pred, labels=["Slowdown", "Minimal", "Moderate","High"])

array([[10,  1,  1,  0],
       [ 0,  2,  1,  0],
       [ 0,  0,  2,  0],
       [ 0,  0,  0,  3]], dtype=int64)

### Support Vector Machine

In [12]:
#Support Vector Machine 
classifier = SVC(kernel='rbf') #try different kernels
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)

#evalaute
print('Accuracy:', metrics.accuracy_score(y_test, y_pred))
print('F1 score:', metrics.f1_score(y_test, y_pred,average='weighted'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='weighted'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='weighted'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='macro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='macro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='macro'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='micro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='micro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='micro'))

Accuracy: 0.65
F1 score: 0.6417391304347826
Precision: 0.6633116883116883
Recall: 0.65
F1 score: 0.4173913043478261
Precision: 0.4237012987012987
Recall: 0.45833333333333337
F1 score: 0.65
Precision: 0.65
Recall: 0.65


  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Confusion Matrix -> 2 Slowdowns as Minimal and 1 Minimal as High; 1 Moderate as Minimal and 1 as Slowdown -> (not preferrable); 2 High as Minimal (ok)

In [13]:
metrics.confusion_matrix(y_test, y_pred, labels=["Slowdown", "Minimal", "Moderate","High"])

array([[10,  2,  0,  0],
       [ 0,  2,  0,  1],
       [ 1,  1,  0,  0],
       [ 0,  2,  0,  1]], dtype=int64)

### Logistic Regression

In [14]:
#Logistic Regression
classifier = LogisticRegression(random_state=0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

#evalaute
print('Accuracy:', metrics.accuracy_score(y_test, y_pred))
print('F1 score:', metrics.f1_score(y_test, y_pred,average='weighted'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='weighted'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='weighted'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='macro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='macro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='macro'))

print('F1 score:', metrics.f1_score(y_test, y_pred,average='micro'))
print('Precision:', metrics.precision_score(y_test, y_pred,average='micro'))
print('Recall:', metrics.recall_score(y_test, y_pred,average='micro'))

Accuracy: 0.65
F1 score: 0.603
Precision: 0.5676923076923076
Recall: 0.65
F1 score: 0.345
Precision: 0.31153846153846154
Recall: 0.3958333333333333
F1 score: 0.65
Precision: 0.65
Recall: 0.65


  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


Confusion Matrix -> 1 Slowdown as High, 1 Minimal as Moderate and 2 Minimal as High; 1 High as Moderate -> Not at all Preferable -> Worst performance

In [15]:
metrics.confusion_matrix(y_test, y_pred, labels=["Slowdown", "Minimal", "Moderate","High"])

array([[11,  0,  0,  1],
       [ 0,  0,  1,  2],
       [ 2,  0,  0,  0],
       [ 0,  0,  1,  2]], dtype=int64)