# Task III - Estimating SLA Conformance and Violation from Device Statistics

The objective for this task is to build a binary classiﬁer function that estimates whether the VoD service conforms to the given SLA for speciﬁc device statistics X, or whether the service violates the SLA for a speciﬁc value of X.
You apply logistic regression to build the classiﬁer. As in the previous task, build a training set and a test set from the trace data, containing 70% of the observations and 30% of the observations, respectively.

### (1) Model Training - use Logistic Regression to train a classiﬁer C with the training set. Provide the coeﬃcients (Θ0,...,Θ9) of your model C. (Θ0 is the oﬀset.)


In [1]:
import pandas
import numpy
import matplotlib
import pylab
import sklearn.model_selection
import sklearn.linear_model
import random

In [16]:
# import dataset_x and dataset_y
data_set_x = pandas.read_csv('X.csv')
data_set_y = pandas.read_csv('Y.csv')

# join operations similar to relational databases on field 'TimeStamp'
data_set = pandas.merge(data_set_x, data_set_y, on='TimeStamp')

In [17]:
(data_set_train, data_set_test) = sklearn.model_selection.train_test_split(data_set, test_size=0.3)

In [19]:
data_set_train.loc[ data_set_train.DispFrames < 18.00, 'DispFrames'] = -1

In [20]:
data_set_train.loc[ data_set_train.DispFrames >= 18.00, 'DispFrames'] = 1

In [21]:
data_set_test.loc[ data_set_test.DispFrames < 18.00, 'DispFrames'] = -1

In [22]:
data_set_test.loc[ data_set_test.DispFrames >= 18.00, 'DispFrames'] = 1

In [27]:
x = data_set_train.iloc[:, 1:-1] # all lines, all columns except the last
y = data_set_train.iloc[:, -1] # all lines and only last column
logistic_regression = sklearn.linear_model.LogisticRegression(max_iter=200)
logistic_regression.fit(x, y)
print('The coefficients of classifier is', logistic_regression.coef_)

The coefficients of classifier is [[-6.88573615e-02 -4.41600202e-02  2.30192653e-03 -1.24388500e-05
   3.94246888e-03  3.10770418e-04 -8.69348877e-02 -7.10099340e-02
  -7.25927368e-06]]


### (2) Accuracy of the Classiﬁers C - Compute the classiﬁcation error (ERR) on the test set for C. For this, you ﬁrst compute the confusion matrix, which includes the four numbers True Positives (TP), True Negatives (TN), False Positives (FN), and False Negatives (FN). We deﬁne the classiﬁcation error as ERR = 1− TP+TN m , whereby m is the number of observations in the test set. A true positive is an observation that is correctly classiﬁed by the classiﬁer as conforming to the SLA; a true negative is an observation that is correctly classiﬁed by the classiﬁer as violating the SLA.

In [28]:
x = data_set_test.iloc[:, 1:-1] # all lines, all columns except the last
y = data_set_test.iloc[:, -1] # all lines and only last column

In [31]:
y_classification = logistic_regression.predict(x)
confusion_matrix = sklearn.metrics.confusion_matrix(y, y_classification)
true_negatives = confusion_matrix[0,0]
false_negatives = confusion_matrix[1,0]
true_positives = confusion_matrix[1,1]
false_positives = confusion_matrix[0,1]
number_observations = x.shape[0]
classification_error = 1 - ( (true_positives + true_negatives) / number_observations )
print('The classification error(ERR) on the test set is', classification_error)

The classification error(ERR) on the test set is 0.11851851851851847


### (3) As a baseline for C, use a naıve method which relies on Y values only, as follows. For each x ∈ X, the naıve classiﬁer predicts a value True with probability p and False with probability 1−p. p is the fraction of Y values that conform with the SLA. Compute p over the training set and the classiﬁcation error for the naıve classiﬁer over the test set.

In [34]:
data_set_train_length = data_set_train.shape[0]
fraction_conform_sla = data_set_train[ data_set_train['DispFrames'] >= 1.00 ].shape[0]
fraction_conform_sla = fraction_conform_sla / data_set_train_length

In [27]:
x = data_set_test.iloc[:, :-1] # all lines, all columns except the last
y = data_set_test.iloc[:, -1] # all lines and only last column
y_classification = [ 1.0 if random.random() < fraction_conform_sla else -1.0 for _ in range(data_set_test.shape[0]) ]

In [35]:
confusion_matrix = sklearn.metrics.confusion_matrix(y, y_classification)
true_negatives = confusion_matrix[0,0]
false_negatives = confusion_matrix[1,0]
true_positives = confusion_matrix[1,1]
false_positives = confusion_matrix[0,1]
number_observations = x.shape[0]
classification_error = 1 - ( (true_positives + true_negatives) / number_observations )
print('The classification error(ERR) on the test set is', classification_error)

The classification error(ERR) on the test set is 0.11851851851851847


### (4) Build a new classiﬁer by extending extend the linear regression function developed in Task II with a check on the output, i.e., the Video Frame Rate. If the frame rate for a given X is above the SLA threshold, then the Y label of the classiﬁer is set to conformance, otherwise to violation. Compute the new classiﬁer over the training set and the classiﬁcation error for this new classiﬁer over the test set.

In [43]:
class LinearRegressionExtend(object):
    def __init__(self, linear_regression):
        self.__linear_regression = linear_regression
    
    def fit(self, x, y):
        self.__linear_regression.fit(x, y)
    
    def predict(self, x):
        y = self.__linear_regression.predict(x)
        y = [ 1.0 if value >= 18.00 else -1.0 for value in y ]
        return y

In [36]:
# import dataset_x and dataset_y
data_set_x = pandas.read_csv('X.csv')
data_set_y = pandas.read_csv('Y.csv')

# join operations similar to relational databases on field 'TimeStamp'
data_set = pandas.merge(data_set_x, data_set_y, on='TimeStamp')

In [39]:
(data_set_train, data_set_test) = sklearn.model_selection.train_test_split(data_set, test_size=0.3)

In [40]:
data_set_test.loc[ data_set_test.DispFrames < 18.00, 'DispFrames'] = -1

In [41]:
data_set_test.loc[ data_set_test.DispFrames >= 18.00, 'DispFrames'] = 1

In [44]:
linear_regressor = LinearRegressionExtend(sklearn.linear_model.LinearRegression())
x = data_set_train.iloc[:, 1:-1] # all lines, all columns except the last
y = data_set_train.iloc[:, -1] # all lines and only last column
linear_regressor.fit(x, y)

In [45]:
x = data_set_test.iloc[:, 1:-1] # all lines, all columns except the last
y = data_set_test.iloc[:, -1] # all lines and only last column
y_classification = linear_regressor.predict(x)

In [46]:
confusion_matrix = sklearn.metrics.confusion_matrix(y, y_classification)
true_negatives = confusion_matrix[0,0]
false_negatives = confusion_matrix[1,0]
true_positives = confusion_matrix[1,1]
false_negatives = confusion_matrix[0,1]
number_observations = x.shape[0]
classification_error = 1 - ( (true_positives + true_negatives) / number_observations )

print('The classification error(ERR) for the linear regression extend is', classification_error)

The classification error(ERR) for the linear regression extend is 0.12314814814814812


### (5) Formulate your observations and conclusions based on the above work.