# Video Series: Introduction to Machine Learning with Scikit-Learn


## [6.1] - Linear Regression 

<br/>

_Let's understand Linear Regression, a supervised learning method for predicting the regression values based on the learning prior data which has a linear relationship_


<br/>

<br/><br/>

__The Dataset__

_Let's generate our own limited dataset_

Farm Product and it's price at different locations (Distance in Miles)

- __Local Farm Market ( 4 Miles from farm) - 10 Dollars__
- __Village ( 12 Miles from farm ) - 28 Dollars__
- __Town ( 25 Miles from farm ) - 54 Dollars__
- __City ( 52 Miles from farm ) - 84 Dollars__
- __Downtown ( 60 Miles from farm ) - 90 Dollars__


___Distance from farm will be on 'X-axis' and price will be on 'y-axis'___

<br/><br/><br/>

In [None]:
import numpy as np

In [None]:
X_Distance = np.array([4, 12, 25, 52, 60])
y_price = np.array([10, 28, 54, 84, 90])

In [None]:
X_Distance, y_price

<br/><br/>

___Remember --> Single row of numbers has to be represented as colum vector___

In [None]:
X_Distance.reshape(5,1)

In [None]:
X_Distance

<br/><br/><br/>

In [None]:
X_Distance.shape = (5,1)

In [None]:
X_Distance

In [None]:
y_price.shape = (5,1) 
y_price

<br/><br/><br/>

___Let's plot the 'X' and 'y' using Matplotlib.pyplot___

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.xlabel("Distance in Miles from Farmers Place")
plt.ylabel("Price at the location")
plt.plot(X_Distance, y_price, "r*--")

<br/><br/><br/>

First, we'll use the Linear Regressing mechanism from scikit-learn, then we'll decode the same

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

<br/><br/><br/>

### The Unified ML Interface of Scikit-Learn

The best thing about Scikit-learn is that it provides a unified interface for almost all Machine Learning algorithms so that you don't have to remember the interfaces. Here is how scikit-learn algorithms interfaces looks like


- ___Create the instance of an algorithm with appropriate parameters -> algo = Algorithm(...)___
- ___Call the .fit function with training data ->  algo.fit(...)___
- ___Call the .predict function using test data to get back the predictions -> prediction = algo.predict(...)___


- algo = Algorithm(...)
- algo.fit(...)
- algo.predict(...)

_For attempting Linear Regression for the above problem statement this is what we'll be doing_

<br/><br/>

In [None]:
l_reg = LinearRegression()

In [None]:
l_reg.fit(X_Distance, y_price)  # This is where the training of the model is done

<br/><br/><br/>

___Now we have trained our M.L Model, Let's predict the value for some data___

<br/><br/>

___For predicting the values, we need to provide column vector as input paramter(s)___

<br/><br/><br/>

In [None]:
check_at_distance = np.array([45]).reshape(1,1)

In [None]:
predict = l_reg.predict(check_at_distance)

In [None]:
predict

<br/><br/><br/>

_Let's see the predictions by plotting against original points_

<br/><br/>

In [None]:
plt.xlabel("Distance in Miles from Farmers Place")
plt.ylabel("Price at the location")
plt.plot(X_Distance, y_price, "r*")
plt.plot(check_at_distance, predict, "g*")

<br/><br/>

___Predicting for multiple values___

<br/>

In [None]:
check_at_distance = np.array([10, 22, 25, 30, 32, 32, 55, 62, 80, 100]).reshape(-1,1)

In [None]:
check_at_distance

In [None]:
predict = l_reg.predict(check_at_distance)

In [None]:
predict

<br/><br/>

___Let's plot the predicted value against the training values___

<br/>

In [None]:
plt.xlabel("Distance in Miles from Farmers Place")
plt.ylabel("Price at the location")
plt.plot(X_Distance, y_price, "r*")
plt.plot(check_at_distance, predict, "g^")

<br/>

#### What Linear Regression Does?

It attemps to generate the best fit line (Generally Straight) for training data set

_Best fit :-  Defined as Line with least_ ___Mean Square Error___

<br/><br/>

___Remember___

<br/>
Machine Learning algorithm(s) doesn't mugs, but learns from data
<br/><br/><br/>

#### Let's see the Mean Square Error value for our LR method

To know the Mean Square Error, we need to know the correct values so that we can compare against. Let's use our LR prediction to predict the data for original values

<br/><br/>

Let's use the Linear Regression algorithm to predict for ___Training Data___

<br/><br/>

In [None]:
predict = l_reg.predict(X_Distance)

In [None]:
predict

__Plot Original Vs. Training Data__

In [None]:
plt.xlabel("Distance in Miles from Farmers Place")
plt.ylabel("Price at the location")
plt.plot(X_Distance, y_price, "r*")
plt.plot(X_Distance, predict, "g*--")

<br/><br/><br/>

___Let's calculate the Mean Squared Error___

In [None]:
mse = mean_squared_error(y_price, predict)

In [None]:
mse

<br/><br/><br/>

### The Maths behind Linear Regression and How it Works?

#### See it in Chapter 6.2

<br/><br/>