-
Notifications
You must be signed in to change notification settings - Fork 0
chardur/SimpleLinearRegressionPython
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Background info / Notes:
Find a line that models the relationship between a dependent variable and an independent variable.
Equation:
y = α + b*x
In English:
y is the dependent variable: what we are trying to predict
α is a constant: y at x = 0
b is a coefficient: slope of the line
x is the independent variable: what we think predicts y
Equation:
[SUM(Y) * SUM(X^2)] - [SUM(X) * SUM(XY)]
α = ________________________________________
n[SUM(X^2)] - [SUM(X)]^2
n[SUM(XY)] - [SUM(X) * SUM(Y)]
b = ___________________________________
n[SUM(X^2)] - [SUM(X)]^2
SUM = Summation
n = sample size
R-Squared: Tells us how good our prediction is, closer to 1 the better. Equation:
SUM(Y - Yi)^2
R^2 = 1 - _______________________________
SUM(Y - Yavg)^2
Y = the actual data point
Yi = the predicted Y value
Yavg = the average Y value
Convert the equations to code (we can leverage the numpy dot function for SUM(XY) and SUM(X^2)):
n = X.size
sumY = Y.sum()
sumX = X.sum()
sumXY = X.dot(Y)
sumX2 = X.dot(X)
denominator = (n * sumX2) – (sumX ** 2)
a = ((sumY * sumX2) – (sumX * sumXY)) / denominator
b = ((n * sumXY) – (sumX * sumY)) / denominator
SSres = Y – predictedY
SStot = Y – Y.mean()
rSquared = 1 – (SSres.dot(SSres) / SStot.dot(SStot))
About
Simple linear regression with Python, Numpy, Matplotlib
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published