# Challenge: **Regression Formula**

If you really want a challenge try to implement these formulas yourself! We have already covered all the code needed.

Y = aX + b

Use the input:
```
x = [1,2,3,4,5]
y = [1,2,3,4,5]
```

![Screen Shot 2021-05-04 at 4.27.37 PM.png](Unknown.png)


In [2]:
# This will get you the intercept (b)
def getIntercept(x,y):
  n = len(x)
  # Calculating the summating of x*y
  x_times_y = [i*j for i,j in zip(x,y)]
  x_times_y_sum = sum(x_times_y)
  # Calculating the summation of x
  x_sum = sum(x)
  # Calculating the summation of y
  y_sum = sum(y)
  # Calculating the numerator
  numerator = (n*x_times_y_sum) - (x_sum*y_sum)

  # Calculating the summation of x^2
  x_sqrd = [i*i for i in x]
  x_sqrd_sum = sum(x_sqrd)
  # Squaring the summation of x
  x_sum_sqrd = x_sum ** 2
  # Calculating the denomitator
  denominator = (n*x_sqrd_sum) - x_sum_sqrd

  # Calculating the intercept and returning it.
  b = numerator / denominator
  return b

# Get the slop of the regression line
def getSlope(x,y,b):
    # Caluclating the sum of x
    x_sum = sum(x)
    # Calculating the sum of y
    y_sum = sum(y)
    n = len(x)

    #Getting the numerator
    numerator = y_sum - (b*x_sum)

    #Getting the final a value and returned it.
    a = numerator / n
    return a

# The data
x = [1,2,3,4,5]
y = [1,2,3,4,5]

# Calculating parameters for regression line
b = getIntercept(x,y)
a = getSlope(x,y,b)
print("Intercept: " + str(b))
print("Slope: " + str(a))

# Using the parameters to get new Y value.
def getNewY(X,b,a):
  Y = b*X + a
  return Y

# Passing a new input x value.
Y = getNewY(7,b,a)
print("New Y: " + str(Y))

Intercept: 1.0
Slope: 0.0
New Y: 7.0


# **Using Regression to Predict the Stock Market**

This is a fun application of the regression line. Regression lines are kind of the most basic form of machine learning. Where you have some input data x. You "train" a model on x to make a prediction. It then produces a prediction y.

First, the input data is the close price of the 'AAPL' stock:
```
close_prices = [127.90,130.36,133.00,131.24,134.43]
```

The labels need to be the next day's closing price:
```
labels = [130.36,133.00,131.24,134.43]
```

In the code below, we just offset close_prices by 1 to get these labels. I also needed to remove the last element of the close_prices because it does not have a label:

```
close_prices = [127.90,130.36,133.00,131.24]
```


When performing a machine learning task, you typically divide the data into input and test data. To do this I just removed the very last element of the list and we will be using that for testing. In this example x and y are the training data sets. And test_x and test_y are the testing datasets that look like as follows:

```
x = [127.90,130.36,133.00]
y = [130.36,133.00,131.24]
test_x = 131.24
test_y = 134.43
```
You can then run getSlope and getInterceps on the x and y data to "train" the model and get the a and b parameters. You can the use those parameters to make a prediction. To test how well the model performs we will apply the model to test_x by using test_x as input into the getNewY function. We can then compare the predicted y value from getNewY to the test_y value that was the real value.

In this example the values were quite far off, but that is expected due to the small amount of data, and the plainess of the regression model.


In [4]:
# Assigning the label to be the next day's close price
close_prices = [127.90,130.36,133.00,131.24,134.43]
labels = close_prices[1:len(close_prices)]
print("Close Prices and Labels: ")
print(close_prices)
print(labels)

# Removing the final value from the close price to match the labels.
x = close_prices[0:-1]
y = labels
print("\nX and Y: ")
print(x)
print(y)

# Removing the last value of both x and y so that we can use it later to
# test the prediction.
test_x = x[-1]
test_y = y[-1]
print("\Test X and Y: ")
print(test_x)
print(test_y)
x= x[0:-1]
y = y[0:-1]
print("\nNew X and Y: ")
print(x)
print(y)

# Using previously created functions to get the slope and intercepts
print("\n Finding the regression")
b = getIntercept(x,y)
a = getSlope(x,y,b)
print("Slope: " + str(b))
print("Intercept: " + str(a))
# Testing the regression on the test data
print("\nThe prediction")
Y = getNewY(test_x,b,a)
print("Predicted Y: " + str(Y))
print("Real Y: " + str(test_y))

Close Prices and Labels: 
[127.9, 130.36, 133.0, 131.24, 134.43]
[130.36, 133.0, 131.24, 134.43]

X and Y: 
[127.9, 130.36, 133.0, 131.24]
[130.36, 133.0, 131.24, 134.43]
\Test X and Y: 
131.24
134.43

New X and Y: 
[127.9, 130.36, 133.0]
[130.36, 133.0, 131.24]

 Finding the regression
Slope: 0.16233167312319133
Intercept: 110.36203652460672

The prediction
Predicted Y: 131.66644530529436
Real Y: 134.43


# Sklearn

In [5]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

In [10]:
# Same data loading as before
close_prices = [127.90,130.36,133.00,131.24,134.43]
labels = close_prices[1:len(close_prices)]
x = close_prices[0:-1]
y = labels
test_x = x[-1]
test_y = y[-1]
x= x[0:-1]
y = y[0:-1]


x = [[i] for i in x]
test_x = [[test_x]]
# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(x, y)

# Make predictions using the testing set
y_pred = regr.predict(test_x)