# Machine Learning
Machine learning enables computers to act and make data-driven decisions rather than being explicitly programmed to carry out a certain task.

![title](images/model.png)

The field of machine learning consists of supervised learning,unsupervised learning, and reinforcement learning.

![title](images/IRL1.png)


Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
Y = f(X)

Supervised learning problems can be further grouped into regression and classification problems.

### Classification: 
A classification problem is when the output variable is a category, such as "yes" or "no" or "spam" and "not spam".

![title](images/download.png)

In [None]:
from  sklearn import tree
#features=[[140,"red"],[145,"red"],[170,"Yellow"],[180,"yellow"]]
features=[[140,1],[150,1],[170,0],[180,0]]
#labels=["apple","apple","orange","orange"]
labels=[1,1,0,0]
clf=tree.DecisionTreeClassifier()
clf=clf.fit(features,labels)
print(clf.predict([[160,1]]))

### Regression:
Regressipon predicts a continuous target variable Y. It allows you to estimate a value, such as prices or weight, based on input data x.

High Level Idea:Based on the Lot of Inputs and outputs we found a equation or Function that maps the inputs to Outputs.

![title](images/linear regression.gif)

Let’s suppose we want to model the above set of points with a line. To do this we’ll use the standard y = mx + b line equation where m is the line’s slope and b is the line’s y-intercept. To find the best line for our data, we need to find the best set of slope m and y-intercept b values.
A standard approach to solving this type of problem is to define an error function (also called a cost function) that measures how “good” a given line is. This function will take in a (m,b) pair and return an error value based on how well the line fits our data. To compute this error for a given line, we’ll iterate through each (x,y) point in our data set and sum the square distances between each point’s y value and the candidate line’s y value (computed at mx + b). It’s conventional to square this distance to ensure that it is positive and to make our error function differentiable. 


![title](images/gradient_descent_example.gif)

### Cost Function
![title](images/CostFunction.PNG)
![title](images/CostFunction1.PNG)

![title](images/gradient_descent_error_surface.png)

Each point in this two-dimensional space represents a line. The height of the function at each point is the error value for that line. You can see that some lines yield smaller error values than others (i.e., fit our data better). When we run gradient descent search, we will start from some location on this surface and move downhill to find the line with the lowest error.

To run gradient descent on this error function, we first need to compute its gradient. The gradient will act like a compass and always point us downhill. To compute it, we will need to differentiate our error function. Since our function is defined by two parameters (m and b), we will need to compute a partial derivative for each. These derivatives work out to be:

![title](images/linear_regression_gradient1.png)

Each iteration will update m and b to a line that yields slightly lower error than the previous iteration. The direction to move in for each iteration is calculated using the two partial derivatives from above.The learningRate variable controls how large of a step we take downhill during each iteration. If we take too large of a step, we may step over the minimum. However, if we take small steps, it will require many iterations to arrive at the minimum.


We can also observe how the error changes as we move toward the minimum. A good way to ensure that gradient descent is working correctly is to make sure that the error decreases for each iteration. Below is a plot of error values for the first 100 iterations of the above gradient search.

I’ll give an introduction to the Gradient Descent algorithm, and walk through an example that demonstrates how gradient descent can be used to solve machine learning problems such as linear regression.
gradient descent is an algorithm that minimizes functions. Before Discussing about GradientDescent we will discuss Linear regression

In [5]:
from numpy import *

def costFunction(c,m,data):
   
    totalError=0
    for i in range(0, len(data)):
         y=data[i,1]
         x=data[i,0]
         totalError+=(y-(m*x+c))**2
    return totalError/float(len(data))
   
def gradientDescent(c,m,data,learningRate):
    derivative_c=0
    derivative_m=0
    for i in range(0,len(data)):
        y=data[i,1]
        x=data[i,0]
        derivative_c+=-(2*(y-(m*x+c)))/len(data)
        derivative_m+=-2*x*(y-(m*x+c))/len(data)
    Step_c=c-(learningRate*derivative_c)
    Step_m=m-(learningRate*derivative_m)
    return[Step_c,Step_m]


def gradientDescentConverger(c,m,data,leraningrate,iterations):
   
    for i in range(0,iterations):
     [c, m]=gradientDescent(c,m,data,leraningrate)
     print("c-value",c,"m-value",m,  "Cost Function------",costFunction(c,m,data))
    
    return[c,m]

def run():
    data=genfromtxt("Lineardata.csv", delimiter=",")
    learningRate=0.0001
    inital_c=0
    inital_m=0
    iterations=200
   # print(costFunction(inital_c,inital_m,data))
    #print(gradientDescent(inital_c,inital_m,data,learningRate))
    gradientDescentConverger(inital_c,inital_m,data,learningRate,iterations)
    
   
if __name__ == '__main__':
    run()

c-value 0.014547010110737297 m-value 0.7370702973591052 Cost Function------ 1484.586557408649
c-value 0.02187396295959641 m-value 1.1067954543515157 Cost Function------ 457.8542575737673
c-value 0.025579224321293136 m-value 1.2922546649131115 Cost Function------ 199.50998572553894
c-value 0.027467789559144355 m-value 1.385283255651245 Cost Function------ 134.50591058200533
c-value 0.028445071981738963 m-value 1.4319472323843205 Cost Function------ 118.14969342239947
c-value 0.02896524076647862 m-value 1.4553540088980408 Cost Function------ 114.0341490603815
c-value 0.0292561141260467 m-value 1.4670946177201354 Cost Function------ 112.99857731713661
c-value 0.02943196916380713 m-value 1.4729832982243762 Cost Function------ 112.7379818756847
c-value 0.029550129024383073 m-value 1.4759365618962286 Cost Function------ 112.67238435909097
c-value 0.02963934787473239 m-value 1.4774173755483797 Cost Function------ 112.65585181499746
c-value 0.029714049245227046 m-value 1.4781595857319891 Cost 

![title](images/gradient_descent_error_by_iteration.png)

In [6]:
# predicting out put for x values

print((1.478684*48.1498)+0.0410)
print((1.3224310227553597*48.1498)+7.991020982270324)

71.2393388632
71.66581024173634


### Normal equation
Gradient descent gives one way of minimizing J. Let’s discuss a second way of doing so, this time performing the minimization explicitly and without resorting to an iterative algorithm. In the "Normal Equation" method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:



![title](images/NormalEquation.jpg)

In [7]:
import numpy as np
def costFunction(theta0,theta1,data):
    totalError=0
    for i in range(0, len(data)):
         y=data[i,1]
         x=data[i,0]
         totalError+=(y-(theta1*x+theta0))**2
    return totalError/float(len(data))
def run():
    data=np.genfromtxt("Lineardata.csv", delimiter=",")
    X=[[0 for i in range(2)] for j in range(len(data))]
    Y=[0 for i in range(len(data))]
    for i in range(0,len(data)):
        X[i][0]=1
        X[i][1]=data[i,0]
        Y[i]=data[i,1] 
    Xt=np.transpose(X)
    [Theta0, Theta1]=np.dot(np.dot(np.linalg.inv(np.dot(Xt, X)), Xt), Y)
    print('Theta0=',Theta0,'Theta1=', Theta1)
    print('costfunction=',costFunction(Theta0,Theta1,data))
    y1=51.84518690563943*1.3224310227553597 + 7.991020982270324
    gradient=51.84518690563943*1.4731250921538916+0.32394362446450875
    #print(y1,gradient)
if __name__ == '__main__':
    run()

Theta0= 7.991020982270324 Theta1= 1.3224310227553597
costfunction= 110.2573834662132
