# Simple Linear Regression From Scratch

In this notebook I make a Regression Model from scratch to show how the Model really works.<br>
(Do not take this to be an exact model of how Linear Regression Models are. This is just my way of representing them.)

## Importing Libraires

In [1]:
import random
import csv
from math import sqrt

## Importing the dataset

Here we define how loading a CSV file is going to take place

In [2]:
def load_csv(file):
    data = []
    with open(file, 'r') as file:
        read = csv.reader(file)
        for row in read:
            if row:
                data.append(row)
    return data

We have to be careful, as this data is in string format.<br>
So to make this data into float type

## Splitting the dataset

In [3]:
def train_test_split(data, split):
    train = []
    size = len(data) * split
    test = list(data)    # Since python tags variable we pass the data through the list()
    while len(train) < size:
        index = random.randrange(len(test))
        train.append(test.pop(index))
    return train, test

## Scoring function / Metric

In [4]:
def metric(real, pred):
    error = 0.0
    for i in range(len(real)):
        error += ((pred[i] - float(real[i])) ** 2)
    mean_error = float(error / len(real))
    return sqrt(mean_error)

In [5]:
def evaluate(data, algo, split):
    train, test = train_test_split(data, split)
    test_set = []
    for row in test:
        row_copy = list(row)
        row_copy[-1] = None
        test_set.append(row_copy)
    pred = algo(train, test_set)
    real = [row[-1] for row in test]
    return metric(real, pred)

## Covariance function

In [6]:
def covar(x, mean_x, y, mean_y):
    cov = 0.0
    for i in range(len(x)):
        cov += (x[i] - mean_x) * (y[i] - mean_y)
    return cov

In [7]:
def var(val, mean):
    return sum([(x - mean) ** 2 for x in val])

## Coefficient function

In [8]:
def coeff(data):
    x = [float(row[0]) for row in data]
    y = [float(row[1]) for row in data]
    mean_x = float(sum(x) / len(x))
    mean_y = float(sum(y) / len(y))
    a = covar(x, mean_x, y, mean_y) / var(x, mean_x)
    b = mean_y - a * mean_x
    return b, a

## Finally, Creating the Algo

In [9]:
def SLR(train, test):
    pred = []
    b, a = coeff(train)
    for row in test:
        delta = b + (a * float(row[0]))
        pred.append(delta)
    return pred

# Running the algorithm

In [10]:
df = load_csv("Salary_Data.csv")
split = 0.8
score = evaluate(df, SLR, split)
print(score)

6815.104029905253


In [11]:
train, test = train_test_split(df, 0.8)
print(SLR(train, test))

[42289.99635894255, 48687.591375623524, 57827.01282802491, 65138.54998994602, 69708.2607161467, 100782.29365431143]


### You can compare the above values to see how the Model performed