# Machine Learning
Machine learning enables computers to act and make data-driven decisions rather than being explicitly programmed to carry out a certain task.

![Model.png](attachment:Model.png)
<img src="https://github.com/mattnedrich/GradientDescentExample/blob/master/gradient_descent_example.gif" width="580">

The field of machine learning consists of supervised learning,unsupervised learning, and reinforcement learning.

![machine%20Learning%20Types.png](attachment:machine%20Learning%20Types.png)


Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
Y = f(X)

Supervised learning problems can be further grouped into regression and classification problems.

### Classification: 
A classification problem is when the output variable is a category, such as "yes" or "no" or "spam" and "not spam".

![classification.png](attachment:classification.png)

In [11]:
from  sklearn import tree
#features=[[140,"red"],[145,"red"],[170,"Yellow"],[180,"yellow"]]
features=[[140,1],[150,1],[170,0],[180,0]]
#labels=["apple","apple","orange","orange"]
labels=[1,1,0,0]
clf=tree.DecisionTreeClassifier()
clf=clf.fit(features,labels)
print(clf.predict([[160,1]]))

[1]


### Regression:
Regressipon predicts a continuous target variable Y. It allows you to estimate a value, such as prices or weight, based on input data x.

High Level Idea:Based on the Lot of Inputs and outputs we found a equation or Function that maps the inputs to Outputs.

![linear%20regression.gif](attachment:linear%20regression.gif)

Let’s suppose we want to model the above set of points with a line. To do this we’ll use the standard y = mx + b line equation where m is the line’s slope and b is the line’s y-intercept. To find the best line for our data, we need to find the best set of slope m and y-intercept b values.
A standard approach to solving this type of problem is to define an error function (also called a cost function) that measures how “good” a given line is. This function will take in a (m,b) pair and return an error value based on how well the line fits our data. To compute this error for a given line, we’ll iterate through each (x,y) point in our data set and sum the square distances between each point’s y value and the candidate line’s y value (computed at mx + b). It’s conventional to square this distance to ensure that it is positive and to make our error function differentiable. 


![gradient_descent_example.gif](attachment:gradient_descent_example.gif)

### Cost Function
![linear_regression_error1.png](attachment:linear_regression_error1.png) 

![gradient_descent_error_surface.png](attachment:gradient_descent_error_surface.png)

Each point in this two-dimensional space represents a line. The height of the function at each point is the error value for that line. You can see that some lines yield smaller error values than others (i.e., fit our data better). When we run gradient descent search, we will start from some location on this surface and move downhill to find the line with the lowest error.

To run gradient descent on this error function, we first need to compute its gradient. The gradient will act like a compass and always point us downhill. To compute it, we will need to differentiate our error function. Since our function is defined by two parameters (m and b), we will need to compute a partial derivative for each. These derivatives work out to be:

![linear_regression_gradient1.png](attachment:linear_regression_gradient1.png)

Each iteration will update m and b to a line that yields slightly lower error than the previous iteration. The direction to move in for each iteration is calculated using the two partial derivatives from above.The learningRate variable controls how large of a step we take downhill during each iteration. If we take too large of a step, we may step over the minimum. However, if we take small steps, it will require many iterations to arrive at the minimum.


We can also observe how the error changes as we move toward the minimum. A good way to ensure that gradient descent is working correctly is to make sure that the error decreases for each iteration. Below is a plot of error values for the first 100 iterations of the above gradient search.

I’ll give an introduction to the Gradient Descent algorithm, and walk through an example that demonstrates how gradient descent can be used to solve machine learning problems such as linear regression.
gradient descent is an algorithm that minimizes functions. Before Discussing about GradientDescent we will discuss Linear regression

![gradient_descent_error_by_iteration.png](attachment:gradient_descent_error_by_iteration.png)

In [1]:
from numpy import *
def costFunction(theta0,theta1,data):
   
    totalError=0
    for i in range(0, len(data)):
         y=data[i,1]
         x=data[i,0]
         totalError+=(y-(theta1*x+theta0))**2
    return totalError/float(len(data))
    
def gradientDescent(theta0,theta1,data,learningRate):
    derivativetheta0=0
    derivativetheta1=0
    for i in range(0,len(data)):
        y=data[i,1]
        x=data[i,0]
        derivativetheta0+=-(2*(y-(theta1*x+theta0)))/len(data)
        derivativetheta1+=-2*x*(y-(theta1*x+theta0))/len(data)
    theta0=theta0-(learningRate*derivativetheta0)
    theta1=theta1-(learningRate*derivativetheta1)
    return[theta0,theta1]

def gradientDescentConverger(theta0,theta1,data,leraningrate,iterations):
   
    for i in range(0,iterations):
     [theta0, theta1]=gradientDescent(theta0,theta1,data,leraningrate)
     #print(costFunction(theta0,theta1,data),"Iterations","theta0",theta0,"theta1",theta1)
    
    return[theta0,theta1]
    
        
def run():
    data=genfromtxt("Lineardata.csv", delimiter=",")
    learningRate=0.0001
    theta0=0
    theta1=0
    iterations=5000
    costFunction(theta0,theta1,data)
    [theta0, theta1]=gradientDescentConverger(theta0,theta1,data,learningRate,iterations)
   
if __name__ == '__main__':
    run()