# Decision Tree WIFI Signal Classification

## Description:

This note book is intended deep dives into the additional functionality of the Decision Tree algorithm. To run the code on the clean and noisy dataset we refer to the main method in the parent directory. The code in this jupyter notebook can be used to:

- Build individual Trees
- Evaluate individual Trees
- Visualize individual Trees
- Prune individual Trees
- Run K-Fold Cross Validation generating K trees and returning average statistics



## The algorithm:
We begin by importing the necessary libraries and modules

In [1]:
import sys
import os
import numpy as np

sys.path.append('../')
os.getcwd()

from src.utils import *
from src.validation import *
from src.DecisionTree import *

%load_ext autoreload
%autoreload 2

Next we import the data that we want to analyse. Make sure you get the data with the get_data function which automatically shuffles the data. If you want to run the algorithm on different data, make sure the data is in the data/wifi_db folder and give the name of the file to the get_data function. We split our data into a train and a test set in a ratio 90% to 10%. 

In [2]:
data = get_data('clean_dataset.txt')
train = data[:1600,:]
val = data[1600:1800]
test = data[1800:,:]

Next we build are decision tree

In [3]:
decisiontree = DecisionTree()
tree = decisiontree.buildTree(train)

Now we can visualize our tree with the following function:

In [4]:
decisiontree.visualize()

Depth of the tree: 13

signal:0 val:-53.5
├─ signal:4 val:-58.5
|  ├─ signal:3 val:-55.5
|  |  ├─ signal:2 val:-51.5
|  |  |  ├─ signal:2 val:-54.5
|  |  |  |  ├─ signal:4 val:-60.5
|  |  |  |  |  ├─ signal:4 val:-61.0
|  |  |  |  |  |  ├─ leaf val:1.0
|  |  |  |  |  |  └─ leaf val:4.0
|  |  |  |  |  └─ leaf val:1.0
|  |  |  |  └─ signal:5 val:-86.5
|  |  |  |     ├─ signal:1 val:-53.0
|  |  |  |     |  ├─ leaf val:4.0
|  |  |  |     |  └─ leaf val:1.0
|  |  |  |     └─ signal:0 val:-54.0
|  |  |  |        ├─ leaf val:1.0
|  |  |  |        └─ leaf val:3.0
|  |  |  └─ signal:4 val:-63.0
|  |  |     ├─ signal:0 val:-56.0
|  |  |     |  ├─ leaf val:1.0
|  |  |     |  └─ leaf val:3.0
|  |  |     └─ signal:0 val:-56.0
|  |  |        ├─ leaf val:3.0
|  |  |        └─ leaf val:4.0
|  |  └─ signal:1 val:-50.0
|  |     ├─ leaf val:3.0
|  |     └─ signal:0 val:-58.0
|  |        ├─ leaf val:3.0
|  |        └─ leaf val:1.0
|  └─ signal:4 val:-55.5
|     ├─ signal:0 val:-55.5
|     |  ├─ signal:0 v

To evaluate our tree we split our testdata into the attributes and labels and run the evaluate function

In [5]:

decisiontree.evaluate(test)

{'confusionmatrix': array([[42.,  0.,  0.,  0.],
        [ 0., 49.,  3.,  0.],
        [ 0.,  3., 52.,  2.],
        [ 1.,  0.,  0., 48.]]),
 'precision': array([1.        , 0.94230769, 0.9122807 , 0.97959184]),
 'recall': array([0.97674419, 0.94230769, 0.94545455, 0.96      ]),
 'F1score': array([0.98823529, 0.94230769, 0.92857143, 0.96969697]),
 'posClassRate': 0.955}

In some cases pruning the tree makes sense to avoid overfitting and improve our evaluation. To prune our tree that we have created we can run the pruning function

In [6]:
decisiontree.prune(train,val)

If we now evaluate the tree again, we can see that we have improved significantly. This is especially true for the tree created with noisy data. 

In [7]:
decisiontree.evaluate(test)

{'confusionmatrix': array([[43.,  0.,  0.,  1.],
        [ 0., 47.,  3.,  0.],
        [ 0.,  5., 52.,  0.],
        [ 0.,  0.,  0., 49.]]),
 'precision': array([0.97727273, 0.94      , 0.9122807 , 1.        ]),
 'recall': array([1.        , 0.90384615, 0.94545455, 0.98      ]),
 'F1score': array([0.98850575, 0.92156863, 0.92857143, 0.98989899]),
 'posClassRate': 0.955}

We can also see that our tree got significantly simpler if we visualize it again

In [8]:
decisiontree.visualize()

Depth of the tree: 9

signal:0 val:-53.5
├─ signal:4 val:-58.5
|  ├─ signal:3 val:-55.5
|  |  ├─ leaf val:1.0
|  |  └─ leaf val:3.0
|  └─ leaf val:4.0
└─ signal:0 val:-43.5
   ├─ signal:3 val:-45.5
   |  ├─ signal:2 val:-51.5
   |  |  ├─ signal:3 val:-47.5
   |  |  |  ├─ leaf val:3.0
   |  |  |  └─ signal:6 val:-78.5
   |  |  |     ├─ leaf val:3.0
   |  |  |     └─ signal:0 val:-44.5
   |  |  |        ├─ signal:1 val:-52.0
   |  |  |        |  ├─ leaf val:2.0
   |  |  |        |  └─ leaf val:3.0
   |  |  |        └─ leaf val:2.0
   |  |  └─ leaf val:3.0
   |  └─ signal:4 val:-68.5
   |     ├─ leaf val:2.0
   |     └─ signal:3 val:-37.5
   |        ├─ signal:6 val:-75.5
   |        |  ├─ leaf val:3.0
   |        |  └─ leaf val:2.0
   |        └─ leaf val:2.0
   └─ signal:3 val:-49.5
      ├─ signal:6 val:-79.0
      |  ├─ leaf val:3.0
      |  └─ leaf val:2.0
      └─ leaf val:2.0


To to K-fold cross validation run the cross_val function as follows:

In [9]:
folds = 10
cross_val(data,folds)

Tree Complete! Test Set: 0 Validation Set: 1
Tree Complete! Test Set: 0 Validation Set: 2
Tree Complete! Test Set: 0 Validation Set: 3
Tree Complete! Test Set: 0 Validation Set: 4
Tree Complete! Test Set: 0 Validation Set: 5
Tree Complete! Test Set: 0 Validation Set: 6
Tree Complete! Test Set: 0 Validation Set: 7
Tree Complete! Test Set: 0 Validation Set: 8
Tree Complete! Test Set: 0 Validation Set: 9
Tree Complete! Test Set: 1 Validation Set: 0
Tree Complete! Test Set: 1 Validation Set: 2
Tree Complete! Test Set: 1 Validation Set: 3
Tree Complete! Test Set: 1 Validation Set: 4
Tree Complete! Test Set: 1 Validation Set: 5
Tree Complete! Test Set: 1 Validation Set: 6
Tree Complete! Test Set: 1 Validation Set: 7
Tree Complete! Test Set: 1 Validation Set: 8
Tree Complete! Test Set: 1 Validation Set: 9
Tree Complete! Test Set: 2 Validation Set: 0
Tree Complete! Test Set: 2 Validation Set: 1
Tree Complete! Test Set: 2 Validation Set: 3
Tree Complete! Test Set: 2 Validation Set: 4
Tree Compl

({'confusionmatrix': array([[49.5       ,  0.        ,  0.24444444,  0.35555556],
         [ 0.        , 47.45555556,  2.14444444,  0.        ],
         [ 0.13333333,  2.54444444, 47.16666667,  0.43333333],
         [ 0.36666667,  0.        ,  0.44444444, 49.21111111]]),
  'precision': array([0.98875266, 0.9573626 , 0.93988263, 0.98344185]),
  'recall': array([0.99046499, 0.95066755, 0.94226045, 0.98383959]),
  'F1score': array([0.98951599, 0.95343679, 0.94053133, 0.98343281]),
  'posClassRate': 0.9666666666666666},
 {'confusionmatrix': array([[49.82222222,  0.        ,  0.58888889,  0.52222222],
         [ 0.        , 46.96666667,  1.28888889,  0.        ],
         [ 0.12222222,  3.03333333, 47.74444444,  0.23333333],
         [ 0.05555556,  0.        ,  0.37777778, 49.24444444]]),
  'precision': array([0.97956005, 0.97415692, 0.93509792, 0.99150958]),
  'recall': array([0.99668831, 0.94173112, 0.9535864 , 0.98418943]),
  'F1score': array([0.98787333, 0.95714755, 0.94360656, 0.98772