<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Machine-Learning" data-toc-modified-id="Machine-Learning-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Machine Learning</a></span></li><li><span><a href="#Introduction--" data-toc-modified-id="Introduction---2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Introduction <a id="intro"> </a></a></span></li><li><span><a href="#Supervised-Learning-" data-toc-modified-id="Supervised-Learning--3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Supervised Learning <a id="s_learn"></a></a></span><ul class="toc-item"><li><span><a href="#Linear-Regression-" data-toc-modified-id="Linear-Regression--3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Linear Regression <a id="linear"></a></a></span><ul class="toc-item"><li><span><a href="#Gathering-Data" data-toc-modified-id="Gathering-Data-3.1.1"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Gathering Data</a></span></li><li><span><a href="#Data-Preparation" data-toc-modified-id="Data-Preparation-3.1.2"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>Data Preparation</a></span></li><li><span><a href="#Choosing-a-Model" data-toc-modified-id="Choosing-a-Model-3.1.3"><span class="toc-item-num">3.1.3&nbsp;&nbsp;</span>Choosing a Model</a></span></li><li><span><a href="#Training" data-toc-modified-id="Training-3.1.4"><span class="toc-item-num">3.1.4&nbsp;&nbsp;</span>Training</a></span></li><li><span><a href="#Evaluation" data-toc-modified-id="Evaluation-3.1.5"><span class="toc-item-num">3.1.5&nbsp;&nbsp;</span>Evaluation</a></span></li></ul></li></ul></li></ul></div>

# Machine Learning

A brief introduction to machine learning.

# Introduction <a id=intro> </a>

Algorithms are the basis of solving problems with a computer. We need a set of direction to tell the computer to run, in order to turn input into output. For example, we can develop an algorithm to trim observations
with values of NA. We would ask the computer to take a dataset as an input, look at the rows of a certain column and if that column returns NA, it would remove the column.

There may be some problem, that an algorithm does not exist, or would be very difficult to create. In other words, we are unable to transform the input easily into an output. Fortunately, if we have a large dataset we can us the computer (machine) to extract the algorithm for us. We do not need to know the data generating process for the data, but we can use patterns from the data to provide a useful approximation.

This approximation can later be used to make predictions, assuming that the pattern would continue on unobserved data.

# Supervised Learning <a id=s_learn></a>

Using inputs (independent variables) to predict the values of the outputs (dependent variable) is called **supervised learning**.

Our dependent variable can be categorical(qualitative) or numerical (continuous, or discrete). Some of the terminology in machine learning(ML) is that **regression** are for quantitaive outputs and **classification** are for qualitative outputs.

## Linear Regression <a id=linear></a>
The linear model is the simplest form of predicting values. Here we are stating that the independent variables
affect the dependent variable linearly.
\begin{equation*}
\hat{Y} = \hat{\beta_0} + \sum^{J}_{j=1} x_j \hat{\beta_j}=X^T\hat{\beta}
\end{equation*}

The intercept is known as the bias in ML, while the other betas are the corresponding coefficients for the independent varibles. The betas represent the partial derivative with respect to the independent variable.

Lets begin our process of Machine Learning:

### Gathering Data

We begin by setting a seed, the number inside the parenthesis is arbitrary. The reason we set a seed is
so that we obtain the same random variables when ever we rerun the code, thus getting the same results.
We will sample one thousand observation from the uniform distribution with dierent ranges. Then we will
produce an output. Observe that our linear model intersept is set to zero, and we have dened our betas.
In essence this is the population linear model, the actuall data generating process.


In [1]:
#------------------------------------------------
# Gather the Data
#------------------------------------------------
set.seed(501)

n=1000
x1<-runif(n,min=0,max=.7)
x2<-runif(n,min=0,max=1)
y<-0.5*x1+0.4*x2 

data<-as.data.frame(cbind(y,x1,x2))
summary(data)

       y                  x1                  x2          
 Min.   :0.009879   Min.   :0.0009545   Min.   :0.002781  
 1st Qu.:0.272609   1st Qu.:0.1806411   1st Qu.:0.244862  
 Median :0.390394   Median :0.3727318   Median :0.508171  
 Mean   :0.383089   Mean   :0.3614634   Mean   :0.505893  
 3rd Qu.:0.495837   3rd Qu.:0.5339685   3rd Qu.:0.762016  
 Max.   :0.737228   Max.   :0.6996580   Max.   :0.998414  

The Output values range from ~0.01 to ~0.74 with a mean of ~0.38, approximately. One thing to notice is
that the independent variables are within the same range, by construction. There may be time when we will
have to normalize the data, that way the magnitude of one variable does not over power the predictiveness
of ML.

### Data Preparation

Lets create a rule for our output. From the summary of the data we say that the mean of the output data
was about 0.4, therefore we will classify the output as such: values greater than 0.4 are red, otherwise they
are blue. Since we are using a linear model, we will code the outputs as a 0-1 dummary variable, 1=blue.

Using the mean will allow our data to have even amount of reds and blues, which again will help avoid
having one of the outputs outweight the other.

In [4]:

#------------------------------------------------
# Prepare Data
#------------------------------------------------
Recode<- function(y)
{
  
  if(y>.4)
  {y<-0}
  
  else{y<-1}
}
#apply function above to each row
data$y<-as.data.frame(sapply(y,Recode))

#create a table and rename col
Freqtable<-data.frame(table(data$y))
colnames(Freqtable)<-c("Red","Blue")
#print
Freqtable

Red,Blue
0,470
1,530


### Choosing a Model
Since we know what the actual model is , we will choose a linear model. Our initial guess for our model will
be :

\begin{equation*}
Y=0.5X_1+0.4X_2
\end{equation*}

### Training 
<a id="train"></a>
Given our specification of the model above, we will use it on our training Data and get our output. This is
simple just specify the columns for each of the independent variables, and then recode the output values as
we did earlier. Try it out for youself before seeing the code in the next section.

### Evaluation

This is where the bulk of the ML starts. We need to evaluate how well our initial model did against the
training data. We are able to evaluate the preformance because we know the actual values of Y.
In order to create a ML process , we will need to update our parameters given some Loss value. Loss
would be the calculation of how many output variables we predicted incorrectly. Below I will write some ad
psuedo code:

<div class="alert alert-block alert-warning">
<b>Example:</b> 
    function(Initial Values){ <br>
parameters= Initial Values <br>
RunModel(parameters) <br>
if (LossValue gt epsilon) <br>
{return Y} <br>
else parameters<-updateValues}
</div>

If we were to run this code it will give us the best predictive values,but it may take a very long time for
us to get the LossValue below some threshold. Instead we will initialize the number of times we want it to
run the iteration.
First we will see how well our first guess, or initial values do in our linear model. With our initial guess
we have predicted the correct values 77% of the time. Now lets build our ML model to iterate over different
values of coecients and find the one with the best accuracy.
You will note that I have defined the size of the Accuracy and ypred vector prior to the loop, this is to
save on speed and not have the vector changing size as the loop continues.

In [6]:
#-------------------------
#Training
#-------------------------
n=4
intb1<-0.0
intb2<-0.8
Accuracy<-rep(0,n)
y_pred<-as.data.frame(matrix(0,nrow=nrow(data),ncol=n))

b1<-intb1
b2<-intb2

y_pred[,1]<-b1*data$x1+b2*data$x2
#recode the predictions to a binary classification
ypred[,1]<-as.data.frame(sapply(y_pred[,1],Recode))
data$ypred<-ypred[,1]
Accuracy[1]<-mean(data$y==ypred[,1])
sprintf("The Accuracy is: %g percent",Accuracy[1]*100)

Next we will build our ML model. Let create a vector of the betas, called B, and also a vector or values
that we will call A and C. The vector A and C will be used to update the values of B. The values in A and
C were chosen arbitrarily. Since we know that the values of the original Y's before the tranformation are
below 1, then the change in our beta should not be large and should also be below 1. Therefore we use two
assumptions that in order to increase the accuracy either both betas increase or one does with the other
decreases.

Now for our function, it takes our initial guesses as inputs, then it transforms them starting with assumption
A, again arbitrary choice. We begin our loop and calculate our new prediciton and its accuracy<a href="#ref1">[1]</a>. If
our new accuracy parameter is larger than the previous accuracy parameter from our training section then
update using assumption A. Once the loop is done, we check which entry in our vector is the one with the
highest accuracy and then we return the value as well as the corresponding ypred vector. Finally we run the
code and view the data side by side , with our True Y values and our first prediciton.

<a id="ref1">[1]</a>your footer text Our for loop starts from i=2 because our vector ypred's first column is the prediction from section<a href="#train"> Training</a>.