<div style="text-align: right">INFO 6105 Data Science Eng Methods and Tools, Lecture 9 Day 1, 18 November 2019</div>
<div style="text-align: right">Dino Konstantopoulos</div>


# Lab: The winnow

Ok, that was a tough notebook, right? Building a simple neural network is not that tough, however. All you need is some knowledge of object-oriented python and linear algebra, matrices, and matrix multiplication. For fun, we're going to program an **Artificial Neural Network** that is of the simplest kind.

<br />
<center>
<img src="images/winnow.png" width=400 />
</center>

### Winnowing 
[Winnowing](https://en.wikipedia.org/wiki/Winnowing) means **removing unwanted items**. Its purpose as an algorthm is to train a binary classifier based on binary features, using a *linear* decision boundary.

In other words, the goal is to predict one of two states, using a collection of features which are all binary.

Unlike classical ANN training procedures, Winnow uses *multiplicative* rather than *additive* updates on the weights between nodes.

The prediction model assigns weights to each feature. To predict the state of an observation, it checks all the features that are **active** (true, or detected in an observation) and sums up the weights assigned to these features. If the total is ***above*** a certain threshold, the result is true, otherwise it’s false. 

I like the Winnow algorithm a lot because it shows you how even a General Linear Regression (GLM) with a **single** layer and a **linear** activation function can model (and thus *learn from*) a dataset. Now imagine a number of these layers put together in a chain, with non-linear activation functions, and you have yourself an artificial neural network (ANN). Imagine what it can do!

### The algorithm: 

We create a fully-connected network, and initialize weights $w_1 = w_2 = \cdots = w_n = 1$.

Then we iterate on each observation consisting of a vector of dimension $n$: $ = [x_1, x_2, \cdots, x_n]$, representing $n$ features.

We predict (for each iteration **epoch**): Output is 1 if *{some condition}*. Output is 0 otherwise.

Then, we get the **true** (binary) label corresponding to that observation, and we update the weights **only if we make a mistake**:
- **False-positive** error (we predict 0 wheras the label is really 1): Then for each $x_i == 1$, we set $w_i = 2*w_i$.
- **False-negative** error (we predict 1 wheras the label is really 0): Then for each $x_i == 1$, we set $wi = wi/2$.

### The english:
Here is the *english* behind the algorithm:
If our network predicts true but should predict false, it is **over-shooting**, so weights that were used in the prediction (i.e. the weights attached to active features) should be ***reduced***.
Conversely, if the prediction is false but the correct result should be true, it is **under-shooting** and the active features are not used enough to reach the threshold, so weights should be bumped ***up***.

Our goal is to minimze the number of mistakes. When we're down to the minimum we can achieve, we say we have **converged**.


# The dataset: Single Proton Emission Computed Tomography data

The Machine learning repository at the University of California, Irvine, has some great data sets. [Here](https://archive.ics.uci.edu/ml/datasets/SPECT+Heart) are data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: **normal** and **abnormal**.

- *The dataset describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each of the patients is classified into two categories: normal and abnormal. The database of 267 SPECT image sets (patients) was processed to extract features that summarize the original SPECT images. As a result, 44 continuous feature pattern was created for each patient. The pattern was further processed to obtain 22 binary feature patterns. The CLIP3 algorithm was used to generate classification rules from these patterns. The CLIP3 algorithm generated rules that were 84.0% accurate (as compared with cardilogists' diagnoses)*. 

SPECT is a good data set for testing ML algorithms; it has 267 instances that are descibed by 23 binary attribute

Our goal is to predict whether a person is categorized normal or abnormal based on 22 binary feature patterns. 

We should strive to ensure that our accuracy on test data is 70% or above (anything approaching 50% is junk: just a guess!). Rerun your training (which shuffles the data), or change your random number generator seed maybe? Add another layer? Those are all ANN hyperparameters!

### The Winnow algorithm:

# The Crazy professor

</br >
<center>
<img src="ipynb.images/crazy.jpg" width=400 />
</center>

Oh no! Crazy professor tried to create a new class to add the Winnow algorithm and he edited the .ipynb files manually, and he completely **$&!&#~ed up the cells**! Based on what I told you above, can you reconstruct the cells so that it works?

Start with:
```(python)
class Winnow:

```

and use the code below. Careful, though, the code below is ***not*** in the right order. You may want to create a copy of this notebook using the `File` | `Make a Copy...` menu above, so you can keep an original reference notebook.

In [None]:
# int[][] matrix
def ShowMatrix(matrix, decimals, numRows, indices):
    frmt = '%.' + str(decimals) + 'f'
    for i in range(numRows):
        if (indices):
            print("[" + '%02d' % i + "]   ", end='')
        for j in range(len(matrix[i])):
            print(frmt % matrix[i][j] + " ", end='')
        print("")
    lastIndex = len(matrix) - 1
    if (indices):
        print("[" + '%02d' % lastIndex + "]   ", end='')
    for j in range(len(matrix[lastIndex])):
        print(frmt % matrix[lastIndex][j] + " ", end='')
    print("")

In [None]:
trainAcc = w.Accuracy(trainData)
testAcc = w.Accuracy(testData)

print("Prediction accuracy on training data = " + str(trainAcc))
print("Prediction accuracy on test data = " + str(testAcc))

In [None]:
print("Final model weights are:")
ShowVector(weights, 4, 8, True)

In [None]:
    # returns double, int[][] trainData
    def Accuracy(self, trainData):
        numCorrect = 0
        numWrong = 0
        xValues = [0] *numInput
        
        for i in range(len(trainData)):
            xValues = np.copy(trainData[i])
            target = trainData[i][numInput] #last value is target
            computed = self.ComputeY(xValues)

            if computed == target:
                numCorrect += 1
            else:
                numWrong += 1
                
        return (numCorrect * 1.0) / (numCorrect + numWrong)

In [97]:
# double[] vector
def ShowVector(vector, decimals, valsPerRow, newLine):
    frmt = '%.' + str(decimals) + 'f'
    for i in range(len(vector)):
        if (i % valsPerRow == 0): print("", end='')
        print(frmt % vector[i] + " ", end='')
    if (newLine): print("")

In [None]:
    # Fisher-Yates shuffle algorithm int[][] trainData
    def ShuffleObservations(self, trainData):
        for i in range(len(trainData)):
            r = randint(i, len(trainData) - 1)
            tmp = []
            tmp = trainData[r]
            trainData[r] = trainData[i]
            trainData[i] = tmp

In [None]:
    # returns double[], int[][] trainData
    def TrainWeights(self, trainData):
        xValues = [] * numInput
        self.ShuffleObservations(trainData)
        for i in range(len(trainData)):
            #  get the inputs
            xValues = np.copy(trainData[i])
            
            #  last value is target
            target = trainData[i][numInput] 
            
            computed = self.ComputeY(xValues)

            if (computed == 1 and target == 0):
                # need to decrease weight:
                for j in range(numInput):
                    if (xValues[j] == 0): continue
                    self.weights[j] = self.weights[j] / self.alpha #demotion
            elif (computed == 0 and target == 1):
                # need to increase weight:
                for j in range(numInput):
                    if (xValues[j] == 0): continue
                    self.weights[j] = self.weights[j] * self.alpha #promotion

        result = [0.0] *numInput # = number weights
        result = self.weights
        return result

In [None]:
print("First few rows of training data are:")
ShowMatrix(trainData, 0, 3, True)

In [None]:
import pandas as pd
data = pd.read_csv("data/SPECT.test")
data.head(10)

In [None]:
    def __init__(self, numInput, rndSeed):
        self.numInput = numInput
        self.weights = [0] *numInput
        for i in range(len(self.weights)):
            self.weights[i] = numInput / 2.0
        self.threshold = 1.0 * numInput
        self.alpha = 2.0
        random.seed( rndSeed )

In [100]:
print("First few lines of all data are:")
ShowMatrix(data, 0, 4, True)

First few lines of all data are:
[00]   0 1 0 1 1 1 0 0 0 1 0 1 1 1 0 1 1 
[01]   0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 0 1 
[02]   0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 
[03]   0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 
[99]   0 0 0 1 1 1 0 0 0 1 0 1 1 1 0 0 1 


In [None]:
import numpy as np

# int[][] data, seed, out int[][] trainData, out int[][] testData
def MakeTrainTest(data, pct, seed):
    totRows = data.shape[0] #compute number of rows in each result
    numTrainRows = int(totRows * pct)
    numTestRows = totRows - numTrainRows
    #trainData = new int[numTrainRows][]
    trainData = np.empty(data.shape)
    #testData = new int[numTestRows][]
    testData = np.empty(data.shape)
    copy = np.empty(data.shape)

    # int[][] copy = new int[data.Length][] #  make a copy of data
    for i in range(copy.shape[0]):
        # by reference to save space
        copy[i] = data[i]
    for i in range(copy.shape[0]):
        # scramble row order of copy
        r = randint(i, copy.shape[0] - 1)
        tmp = copy[r]
        copy[r] = copy[i]
        copy[i] = tmp
    for i in range(numTrainRows):
        # create training
        trainData[i] = copy[i]
    for i in range(numTestRows):
        # create test
        testData[i] = copy[i + numTrainRows]
        
    return trainData, testData

In [None]:
diagnosis = data['1']
diagnosis.head(10)

In [None]:
# drop first column from training data: It's our label
data2 = data.drop(labels='1', axis=1)
data2.head(10)

In [101]:
print("Splitting data into 80% train and 20% test matrices")
trainData, testData = MakeTrainTest(data2.values, 0.8, 17)

Splitting data into 80% train and 20% test matrices


In [None]:
    # int[] xValues
    def ComputeY(self, xValues):
        sum = 0.0
        for i in range(numInput):
            sum += self.weights[i] * xValues[i]
        if sum > self.threshold:
            return 1
        else:
            return 0

In [103]:
print("First few rows of testing data are:")
ShowMatrix(testData, 0, 3, True)

First few rows of testing data are:
[00]   0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 1 1 
[01]   1 0 1 0 0 0 1 1 0 1 1 1 0 1 0 1 0 
[02]   0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 
[99]   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 


In [104]:
%%time

print("Begin training using Winnow algorithm")
numInput = 22
w = Winnow(numInput, 0) #rndSeed = 0
weights = w.TrainWeights(trainData)
print("Training complete")

Begin training using Winnow algorithm
Training complete
Wall time: 2.99 ms


In [106]:
trainAcc = w.Accuracy(trainData)
testAcc = w.Accuracy(testData)

print("Prediction accuracy on training data = " + str(trainAcc))
print("Prediction accuracy on test data = " + str(testAcc))

Prediction accuracy on training data = 0.99
Prediction accuracy on test data = 0.61


In [None]:
print("First few lines of all data are:")
ShowMatrix(data, 0, 4, True)

In [None]:
import random
from random import randint
import numpy as np

class Winnow:
    numInput = 0
    weights = []
    threshold = 0.0
    alpha = 0.0

In [None]:
print("Predicting diagnosis of patient with all abnormal readings: ", end='')
yays = [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]
predicted = w.ComputeY(yays)
if predicted == 0:
    print("normal")
else:
    print("abnormal")

print("Predicting diagnosis of patient with all normal readings: ", end='')
nays = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
predicted2 = w.ComputeY(nays)
if predicted2 == 0:
    print("normal")
else:
    print("abnormal")

We assign weights based on each observation. Some abnormal features are going to be very discriminative about overall diagnosis, others much less so. The discriminative ones should acquire higher weights. The “frivolous” ones much less.

The algorithm uses only incorrect information. Correct information (correct guess of a label) yields a `nop`. So the algorithm essentially drives the weights of irrelevant predictors to 0, *winnowing them out*. This makes Winnow classification effective in situations where many of the predictor variables are irrelevant (frivolous). It's a good way to figure out the important (discriminative) factors. Much like how regression forests told us what doctors should be looking at in correctly diagnosing breast cancer.

It is *unclear* how long to train for. We iterate just once through the training data. An alternative would be to 
Iterate multiple times through the training data, stopping after a fixed number of iterations, or when some desired accuracy has been reached.

Believe it or not, **Machine Learning** algorithms are not much different in methodology. Their modeling power is of course much higher due to the artificial neuron’s inherent non-linearities. But the methodology is very similar:

- Train model with feature observations in order to refine weights (synapse strength)
- Partition data set into 80% observations and 20% test

Once your weights are determined, you can reuse them for any other observation with similar features. Most of the Machine Learning research in the last 20 years was about how to update the weights and what kind of non-linear function to use as a neural transfer function.

# Homework

Reshuffle crazy-professor-shuffled cells so that the Winnow works again. You may want to create a copy of this notebook using the `File` | `Make a Copy...` menu above, so you can keep an original reference notebook.

How many layers and how many neurons in this fully connected network?

Modify the Winnow algorithm so that instead of dividing or multiplying by 2, the increase or decrease in the weights is **proportional to the error**. Does this improve accuracy on the same dataset? Can you improve accuracy by using random forest classification instead?