# Project: Linear Regression

# Linear Regression Overview

Linear regression is a statistical method used to model the relationship between a dependent variable (what we want to predict) and one or more independent variables. The model assumes a linear relationship, aiming to find the best-fit line through the data.

## Key Points:

- **Variables:**
  - Dependent Variable (Y): What we're predicting.
  - Independent Variable(s) (X1, X2, ..., Xn): Factors influencing the dependent variable.

- **Model Equation:**
  The model equation is expressed as a sum of terms, where each independent variable is multiplied by a corresponding coefficient:
  Y = β0 + β1*X1 + β2*X2 + ... + βn*Xn + ε

- **Purpose:**
  - Prediction, understanding relationships, and making inferences.

the formula is:
```
y = m*x + b
```
`m` is the slope of the line and `b` is the intercept, where the line crosses the y-axis.

In [3]:
def get_y(m, b, x):
    y = (m*x) + b
    return y

print(get_y(1, 0, 7) == 7)
print(get_y(5, 10, 3) == 25)
print(get_y(5, 10, 4) == 25)

True
True
False


In [5]:
def calculate_error(m, b, point):
    x_point = point[0]
    y_point = point[1]
    y_loc = get_y(m, b, x_point)
    distance = abs(y_loc - y_point)
    return distance

In [7]:
#this is a line that looks like y = x, so (3, 3) should lie on it. thus, error should be 0:
print(calculate_error(1, 0, (3, 3)))
#the point (3, 4) should be 1 unit away from the line y = x:
print(calculate_error(1, 0, (3, 4)))
#the point (3, 3) should be 1 unit away from the line y = x - 1:
print(calculate_error(1, -1, (3, 3)))
#the point (3, 3) should be 5 units away from the line y = -x + 1:
print(calculate_error(-1, 1, (3, 3)))

0
1
1
5


In [9]:
datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]

In [10]:
def calculate_all_error(m, b, points):
    total_error = 0
    for point in points:
        total_error += calculate_error(m, b, point)
    return total_error

In [12]:
#every point in this dataset lies upon y=x, so the total error should be zero:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, 0, datapoints))

#every point in this dataset is 1 unit away from y = x + 1, so the total error should be 4:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, 1, datapoints))

#every point in this dataset is 1 unit away from y = x - 1, so the total error should be 4:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, -1, datapoints))


#the points in this dataset are 1, 5, 9, and 3 units away from y = -x + 1, respectively, so total error should be
# 1 + 5 + 9 + 3 = 18
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(-1, 1, datapoints))

0
4
4
18


In [14]:
possible_ms = []
for i in range(-100, 101):
    possible_ms.append(i*0.1)

In [17]:
print(possible_ms)

[-10.0, -9.9, -9.8, -9.700000000000001, -9.600000000000001, -9.5, -9.4, -9.3, -9.200000000000001, -9.1, -9.0, -8.9, -8.8, -8.700000000000001, -8.6, -8.5, -8.4, -8.3, -8.200000000000001, -8.1, -8.0, -7.9, -7.800000000000001, -7.7, -7.6000000000000005, -7.5, -7.4, -7.300000000000001, -7.2, -7.1000000000000005, -7.0, -6.9, -6.800000000000001, -6.7, -6.6000000000000005, -6.5, -6.4, -6.300000000000001, -6.2, -6.1000000000000005, -6.0, -5.9, -5.800000000000001, -5.7, -5.6000000000000005, -5.5, -5.4, -5.300000000000001, -5.2, -5.1000000000000005, -5.0, -4.9, -4.800000000000001, -4.7, -4.6000000000000005, -4.5, -4.4, -4.3, -4.2, -4.1000000000000005, -4.0, -3.9000000000000004, -3.8000000000000003, -3.7, -3.6, -3.5, -3.4000000000000004, -3.3000000000000003, -3.2, -3.1, -3.0, -2.9000000000000004, -2.8000000000000003, -2.7, -2.6, -2.5, -2.4000000000000004, -2.3000000000000003, -2.2, -2.1, -2.0, -1.9000000000000001, -1.8, -1.7000000000000002, -1.6, -1.5, -1.4000000000000001, -1.3, -1.20000000000000

In [20]:
possible_bs = []
for i in range(-200, 201):
    possible_bs.append(i*0.1)

In [21]:
print(possible_bs)

[-20.0, -19.900000000000002, -19.8, -19.700000000000003, -19.6, -19.5, -19.400000000000002, -19.3, -19.200000000000003, -19.1, -19.0, -18.900000000000002, -18.8, -18.7, -18.6, -18.5, -18.400000000000002, -18.3, -18.2, -18.1, -18.0, -17.900000000000002, -17.8, -17.7, -17.6, -17.5, -17.400000000000002, -17.3, -17.2, -17.1, -17.0, -16.900000000000002, -16.8, -16.7, -16.6, -16.5, -16.400000000000002, -16.3, -16.2, -16.1, -16.0, -15.9, -15.8, -15.700000000000001, -15.600000000000001, -15.5, -15.4, -15.3, -15.200000000000001, -15.100000000000001, -15.0, -14.9, -14.8, -14.700000000000001, -14.600000000000001, -14.5, -14.4, -14.3, -14.200000000000001, -14.100000000000001, -14.0, -13.9, -13.8, -13.700000000000001, -13.600000000000001, -13.5, -13.4, -13.3, -13.200000000000001, -13.100000000000001, -13.0, -12.9, -12.8, -12.700000000000001, -12.600000000000001, -12.5, -12.4, -12.3, -12.200000000000001, -12.100000000000001, -12.0, -11.9, -11.8, -11.700000000000001, -11.600000000000001, -11.5, -11.4

In [23]:
datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]
smallest_error = (float("inf"))
best_m = 0
best_b = 0
for m in possible_ms:
    for b in possible_bs:
        error = calculate_all_error(m,b, datapoints)
        if error < smallest_error:
            best_m = m
            best_b = b
            smallest_error = error

print(best_m)
print(best_b)
print(smallest_error)

0.30000000000000004
1.7000000000000002
4.999999999999999


## Testing Model Prediction

Example:

```
y = 0.3x + 1.7
```

* m = 0.3
* b = 1.7
* x = 6

In [24]:
get_y(0.3, 1.7, 6)

3.5