# Unit Test & Logging mit Logistische Regression

In diesem Projekt werden wir mit Fake-Daten zu Werbung arbeiten, die aufzeigen, ob ein Nutzer auf eine Werbeanzeige auf einer Webseite einer Firma geklickt hat oder nicht. Wir werden versuchen ein Modell zu erstellen, das anhand von Nutzereigenschaften vorhersagt, ob dieser auf die Werbung klicken wird oder nicht.

## Libraries installieren & importieren

In [5]:
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install seaborn
!pip install scikit-learn
!pip install sklearn
!pip install unittest

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import unittest
from functools import wraps
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
%matplotlib inline

## Die Daten

In [61]:
ad_data = pd.read_csv('Advertising.csv')

## Die my_logger und my_timer Methoden

In [62]:
def my_logger(orig_func):
    import logging
    logging.basicConfig(filename='{}.log'.format(orig_func.__name__), level=logging.INFO)

    @wraps(orig_func)
    def wrapper(*args, **kwargs):
        logging.info('Ran with args: {}, and kwargs: {}'.format(args, kwargs))
        return orig_func(*args, **kwargs)

    return wrapper

In [63]:
def my_timer(orig_func):
    import time
    
    @wraps(orig_func)
    def wrapper(*args, **kwargs):
        t1 = time.time()
        result = orig_func(*args, **kwargs)
        t2 = time.time() - t1
        print('{} ran in: {} sec'.format(orig_func, t2))
        return result, t2

    return wrapper

## Logistische Regression

**Teile die Daten in Trainings- und Testset auf.**

In [64]:
from sklearn.model_selection import train_test_split

In [65]:
X = ad_data[['Daily Time Spent on Site', 'Age', 'Area Income','Daily Internet Usage', 'Male']]
y = ad_data['Clicked on Ad']

In [66]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [67]:
logmodel = LogisticRegression()

**Fitte (trainiere) die Trainingsdaten auf das Modell mit Anwendung von my_logger und my_timer**

In [68]:
@my_logger
@my_timer
def fit(X_train, y_train):
    fitted = logmodel.fit(X_train, y_train)
    fitted.train_y_predicted = fitted.predict(X_train)
    fitted.train_accuracy = np.mean(fitted.train_y_predicted.ravel() == y_train.ravel()) * 100
    fitted.train_confusion_matrix = confusion_matrix(y_train, fitted.train_y_predicted)
    
    return fitted

**Predicte (vorhersage) mit den Testdaten mit Anwendung von my_logger und my_timer**

In [69]:
@my_logger
@my_timer  
def predict():
    predictions = logmodel
    predictions.test_y_predicted = predictions.predict(X_test)
    predictions.test_accuracy = np.mean(predictions.test_y_predicted.ravel() == y_test.ravel()) * 100 
    predictions.test_confusion_matrix = confusion_matrix(y_test, predictions.test_y_predicted)        
    predictions.report = classification_report(y_test, predictions.test_y_predicted)
    
    return predictions

**Das Ergebnis**

In [70]:
fitted, fitted_time = fit(X_train, y_train)
print()
print('Train Time: ', fitted_time,'\n')
print('Train Accuracy : ', fitted.train_accuracy,'\n')
print('Train Confusion Matrix :\n %s\n' % (fitted.train_confusion_matrix))
print()
    
predictions, prediction_time = predict()
print()
print('Test Accuracy : ', predictions.test_accuracy,'\n')
print('Test Confusion Matrix :\n %s\n' % (predictions.test_confusion_matrix))
print('Classification Report :\n %s\n' % (predictions.report))
print()

<function fit at 0x000001EC5C100940> ran in: 0.028923988342285156 sec

Train Time:  0.028923988342285156 

Train Accuracy :  89.70149253731343 

Train Confusion Matrix :
 [[312  26]
 [ 43 289]]


<function predict at 0x000001EC5C081F70> ran in: 0.007013082504272461 sec

Test Accuracy :  90.6060606060606 

Test Confusion Matrix :
 [[156   6]
 [ 25 143]]

Classification Report :
               precision    recall  f1-score   support

           0       0.86      0.96      0.91       162
           1       0.96      0.85      0.90       168

    accuracy                           0.91       330
   macro avg       0.91      0.91      0.91       330
weighted avg       0.91      0.91      0.91       330





## Anwendung der Testklasse
### test_fit(): 
**Test-Methode für die Trainingswerte**
### test_predict(): 
**Test-Methode für die Vorhersagewerte**

In [71]:
class TestInput(unittest.TestCase):
  
    @classmethod
    def setUpClass(cls):
        # print('setupClass')   
        pass

    @classmethod
    def tearDownClass(cls): 
        # print('teardownClass')
        pass

    def setUp(self):
        print('setUp')
        
        X = ad_data[['Daily Time Spent on Site', 'Age', 'Area Income','Daily Internet Usage', 'Male']]
        y = ad_data['Clicked on Ad']
        
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
        
        self.train_accuracy = 89.70149253731343
        self.train_confusion_matrix = np.matrix([[312, 26],
                                                 [43, 289]])
        self.train_runningtime = 0.026003599166870117
        
        self.test_accuracy = 90.6060606060606
        self.test_confusion_matrix = np.matrix([[156, 6],
                                                [25, 143]])

    def tearDown(self):
        # print('tearDown')
        pass
    
    
    def test_fit(self):     
        self.fitted, self.time = fit(X_train, y_train)
        self.assertEqual(self.fitted.train_accuracy, self.train_accuracy)
        self.assertEqual(self.fitted.train_confusion_matrix.tolist(), self.train_confusion_matrix.tolist())
        
        self.modified_train_runningtime = self.train_runningtime * 1.2
        self.assertLessEqual(self.time, self.modified_train_runningtime)
        
        print('\n' + 'Train Time: ', self.train_runningtime,'\n')
        print('Modified Train Time: ', self.modified_train_runningtime,'\n')
        print('Unittest Train Time: ', self.time,'\n')
        
        with open("Testdatenfile_Fit.txt", "w") as text_file:
            print('Train Time: ', self.train_runningtime,'\n' + 
                  'Modified Train Time: ', self.modified_train_runningtime,'\n' +
                  'Unittest Train Time: ', self.time,'\n',
                  file=text_file)

            
    def test_predict(self):
        self.fitted, self.time = fit(X_train, y_train)
        self.predicted, self.prediction_time = predict()
        self.assertEqual(self.predicted.test_accuracy, self.test_accuracy)
        self.assertEqual(self.fitted.test_confusion_matrix.tolist(), self.test_confusion_matrix.tolist())
        
        print('\n' + 'Test Accuracy : ', self.predicted.test_accuracy,'\n')
        print('Unit Test Accuracy : ', self.test_accuracy,'\n')
        print('Test Confusion Matrix :\n %s\n' % (self.predicted.test_confusion_matrix))
        print('Unit Test Confusion Matrix :\n %s\n' % (self.test_confusion_matrix))
        
        with open("Testdatenfile_Predict.txt", "w") as text_file:
            print('Test Accuracy : ', self.predicted.test_accuracy,'\n' + 
                  'Unit Test Accuracy : ', self.test_accuracy,'\n' + 
                  'Test Confusion Matrix :\n %s\n' % (self.predicted.test_confusion_matrix) +
                  'Unit Test Confusion Matrix :\n %s\n' % (self.test_confusion_matrix),
                  file=text_file)

            
if __name__ == '__main__':
  
    #run tests 
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

  self.train_confusion_matrix = np.matrix([[312, 26],
  self.test_confusion_matrix = np.matrix([[156, 6],
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  indices = (scores > 0).astype(np.int)
  self.train_confusion_matrix = np.matrix([[312, 26],
  self.test_confusion_matrix = np.matrix([[156, 6],
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  indices = (scores > 0).astype(np.int)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  indices = (scores > 0).astype(np.int)
.

setUp
<function fit at 0x000001EC5C100940> ran in: 0.02892899513244629 sec

Train Time:  0.026003599166870117 

Modified Train Time:  0.03120431900024414 

Unittest Train Time:  0.02892899513244629 

setUp
<function fit at 0x000001EC5C100940> ran in: 0.023935794830322266 sec
<function predict at 0x000001EC5C081F70> ran in: 0.006979703903198242 sec

Test Accuracy :  90.6060606060606 

Unit Test Accuracy :  90.6060606060606 

Test Confusion Matrix :
 [[156   6]
 [ 25 143]]

Unit Test Confusion Matrix :
 [[156   6]
 [ 25 143]]




----------------------------------------------------------------------
Ran 2 tests in 0.097s

OK
