<div style="text-align: right">INFO 6105 Data Science Eng Methods and Tools, Lecture 10</div>
<div style="text-align: right">Dino Konstantopoulos, 8 April 2019</div>

# Review: Regression Analysis

You had your humble beginnings in data science and machine learning in INFO 6105, with a professor that loves bikes :-) I need to make sure you know the *basics* before you move on to a Coop or a more complicated class, so that employers and professors can be impressed by *how much you know*. So let's do a bit of review.

**Regression analysis** is a **statistical process** for estimating relationships among variables.

It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a **dependent variable** and one or more **independent variables** (or 'predictors').

More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed
Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning.

**Linear Regression** is a simple approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variable (independent variable) x where data are modeled using linear predictor functions. 
Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as with Bayesian modeling where we try to maximize the probability of observing the data.

The decision as to which variable in a data set is modeled as the dependent variable and which are modeled as the independent variables may be based on a presumption that the value of one of the variables is caused by, or directly influenced by the other variables. That decision is the most important one in regression analysis. A decision forest can help you determine these variables, using information theoretic metrics like **Information Gain**.

There are two kinds of regression analyses: **auto-encoding** and **discriminative**. In the former, we learn to model the entire datase. In the latter we attempt to classify data in bins. In the picture below, we classify data in two color bins by shaping the right **decision boundary**. In the regression model, we shape parameters using a probabilistic framework like PyMC3. In Machine Learning, we adjust the weights between the neurons to get the right decision boundary.

In the beginning of the semester, I told you that Machine Learning is a *geometric problem*. Now you see why. On Wednesday, you'll understand this *even better*. And you're never going to be scared by a machine again :-)

<br />
<center>
<img src =ipynb.images/decision-boundary.png width = 800 />
</center>

If the neural transfer function is linear, the ANN can only draw straight  decision boundaries, even if there are many layers of units. And so it might not work in general. It is the **non-linearity** of the neural transfer function that adds modeling power to your ANN. And yet, we can still model with linear decsion boundaries, as we will see today.

# The winnow algorithm


</br >
<center>
<img src="ipynb.images/winnow.png" width=400 />
</center>

*Winnowing* means **removing unwanted items**. Its purpose as an algorthm is to train a binary classifier based on binary features, using a *linear* decision boundary.

In other words, the goal is to predict one of two states, using a collection of features which are all binary.

Our networks so far have been equal-weighted. But with artifical neural networks, are edges acquire weights between nodes. It's similar to the facebook friends graph. Our facebook friends are simiarly weighted. But in reality, we like some friends a lot more than others!

The prediction model assigns weights to each feature. To predict the state of an observation, it checks all the features that are “active” (true, or detected in an observation) and sums up the weights assigned to these features. If the total is *above* a certain threshold, the result is true, otherwise it’s false. 

So we create a network (I know you can do this very well now), and initialize weights $w_1 = w_2 = \cdots = w_n = 1$.

Then we iterate on each observation consisting of a vector of dimension $n$: $ = [x_1, x_2, \cdots, x_n]$.

We predict (for each iteration **epoch**): Output is 1 if *{some condition}*. Output is 0 otherwise.

Then, we get the **true** (binary) label corresponding to that observation, and we update the weights **only if we make a mistake**:
- **False-positive** error (we predict 0 wheras the label is really 1): Then for each $x_i == 1$, we set $w_i = 2*w_i$.
- **False-negative** error (we predict 1 wheras the label is really 0): Then for each $x_i == 1$, we set $wi = wi/2$.

Here is the *english* of the math above:
If our network predicts true but should predict false, it is **over-shooting**, so weights that were used in the prediction (i.e. the weights attached to active features) should be reduced.
Conversely, if the prediction is false but the correct result should be true, the active features are not used enough to reach the threshold, so weights should be bumped up.

Our goal is to minimze the number of mistakes. When we're down to the minimum we can achieve, we say we have **converged**.


# Dataset: 1984 Congress

The Machine learning repository at the University of California, Irvine, has some great data sets. [Here](https://archive.ics.uci.edu/ml/datasets/congressional+voting+records) are the congressional voting records of the House of Representatives for a select set of bills in the 1984 Congress.


</br >
<center>
<img src="ipynb.images/congress.jpg" width=600 />
</center>

Our goal is to predict the political party, Democrat or Republican, of a member of the U.S. House of Representatives, based on the Representative’s votes on 16 different bills. An example of a bill is "Should we drill for oil in Alaskan National Parks"?

The House of Representatives has 435 members. A well-known benchmark data set contains 435 items stored in a simple text [file](https://archive.ics.uci.edu/ml/machine-learning-databases/voting-records/).

You told me last week that you wanted to do some coding. So this is the **simplest** possible artifical neural network I can come up with that has a chance at learning a dataset/ It's so simple that it does *not* have a non-linear transfer function in in its neuron: it lets the *entire* signal pass through! Do you think that can work as an artificial brain? 

Let's see.

We should strive to ensure that our accuracy on the test data is 70% or above (anything approaching 50% is junk: just a guess!). Rerun your training (which shuffles the data), or change your random number generator seed maybe? Add another layer? Those could be hyperparameters?

### These are the questions we want to be able to answer:

Based on the Representatives' voting records on those bills, and knowledge of party affiliation (label), can we guess if a person is a republican or democrat based on how they would vote (`yes` or `nay`) on these bills? If a person voted all `nays` or all `yays` on all bills, would they be democrats or republicans based on how congressmen vote on these bills?

Is there a bill that is more important than others in determining whether a congressman is republican or democrat? Can you figure this out just by looking at the weights of your winnow network?

Finally, compare the performance of your winnow versus the performance of a random forest algorithm.

### winnow algorithm:

## Crazy professor

</br >
<center>
<img src="ipynb.images/crazy.jpg" width=400 />
</center>

Oh no! Crazy professor tried to create a new class to add the Winnow algorithm and he edited the .ipynb files manually, and he completely **$&!&~~wed up the cells**! Based on what I told you above, can you reconstruct the cells so that it works?
```(python)
class Winnow:

```

In [2]:
trainAcc = w.Accuracy(trainData)
testAcc = w.Accuracy(testData)

print("Prediction accuracy on training data = " + str(trainAcc))
print("Prediction accuracy on test data = " + str(testAcc))

NameError: name 'w' is not defined

In [3]:
print("Final model weights are:")
ShowVector(weights, 4, 8, True)

Final model weights are:


NameError: name 'ShowVector' is not defined

In [None]:
print("Predicting party of Representative with all 'yes' votes: ", end='')
yays = [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]
predicted = w.ComputeY(yays)
if predicted == 0:
    print("democrat")
else:
    print("republican")

print("Predicting party of Representative with all 'no' votes: ", end='')
nays = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
predicted2 = w.ComputeY(nays)
if predicted2 == 0:
    print("democrat")
else:
    print("republican")

In [None]:
print("Encoding 'n' and '?' = 0, 'y' = 1, 'democrat' = 0, 'republican' = 1")
print("Moving political party to last column")
print("First few rows of training data are:")
ShowMatrix(trainData, 0, 3, True)

First few lines of all data are:
[00]   0 1 0 1 1 1 0 0 0 1 0 1 1 1 0 1 1 
[01]   0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 0 1 
[02]   0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 
[03]   0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 
[99]   0 0 0 1 1 1 0 0 0 1 0 1 1 1 0 0 1 


Splitting data into 80% train and 20% test matrices


First few rows of testing data are:
[00]   0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 1 1 
[01]   1 0 1 0 0 0 1 1 0 1 1 1 0 1 0 1 0 
[02]   0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 
[99]   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 


Begin training using Winnow algorithm
Training complete
Wall time: 2.99 ms


In [106]:
trainAcc = w.Accuracy(trainData)
testAcc = w.Accuracy(testData)

print("Prediction accuracy on training data = " + str(trainAcc))
print("Prediction accuracy on test data = " + str(testAcc))

Prediction accuracy on training data = 0.99
Prediction accuracy on test data = 0.61


In [None]:
print("First few lines of all data are:")
ShowMatrix(data, 0, 4, True)