# Linear Regression Module User Guide

<h3><a href = '#gettingstarted'>Getting Started</a></h3>
<h3><a href='#functions'>Functions</a></h3>
<h3><a href = '#graphs'>Graphs</a></h3>
<h3><a href='#equations'>Linear Regression Mathematical Equations</a></h3>

## Getting Started <a id='gettingstarted'></a>

Save the module file 'linear_regression.py', which can be downloaded [here](https://github.com/DRosenman/linear-regression-module/blob/master/linear_regression.py) , in the directory you will be working in
    
To import the module, type the following line of code at the start of your script:

```python
import linear_regression as lr
```

In [18]:
%%html
<script>
 $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
import linear_regression as lr

## Functions <a id='functions'></a>

**lr.results(x, y, through_origin=False)** 

>**parameters**: x,y, (optional) through_origin <br>
x and y must be two sets of measurements of equal length. To force the regression line to go through the origin, include the third parameter as through_origin = True.

>**returns:**  
    
    >>**slope** of regression line 
    
    >>**intercept** of regression line
    
    >>**standard error of slope** of regression line
    
    >>**standard error of intercept** of regression line
    
    >>**correlation coefficient (a.k.a. r)** 

**Examples:**

```python
x = [1.0,5.0,9.0,15.0]
y = [10.0,150.0,290.5,400.8]
slope, intercept, slope_error, intercept_error, r = lr.results(x,y)
print('slope             = ', slope)
print('intercept         = ',  intercept)
print('stderr.,slope     = ', slope_error)
print('stderr.,intercept = ', intercept_error)
print('r                 = ', r)
print('\ny =(' + str(slope) + ")x" + " + " + str(intercept))

slope00,intercept00,slope_error00,intercept_error00,r00 = lr.results(x,y,through_origin=True)
print('\nBest Fit Line Through Origin, (0,0)')
print('slope             = ', slope00)
print('intercept         = ',  intercept00)
print('stderr.,slope     = ', slope_error00)
print('stderr.,intercept = ', intercept_error00)
print('r                 = ', r)
print('\ny =(' + str(slope00) + ")x") 
```

In [19]:
x = [1.0,5.0,9.0,15.0]
y = [10.0,150.0,290.5,400.8]
slope, intercept, slope_error, intercept_error, r = lr.results(x,y)
print('slope             = ', slope)
print('intercept         = ',  intercept)
print('stderr.,slope     = ', slope_error)
print('stderr.,intercept = ', intercept_error)
print('r                 = ', r)
print('\ny =(' + str(slope) + ")x" + " + " + str(intercept))

slope             =  28.0537383178
intercept         =  2.42196261682
stderr.,slope     =  3.23723842784
stderr.,intercept =  29.4926456767
r                 =  0.986944379051

y =(28.0537383178)x + 2.42196261682


**Try It Out!**

In [20]:
%%html
<iframe src="https://trinket.io/embed/python3/cf70c8c3a2" width="100%" height="300" frameborder="0" marginwidth="0" marginheight="0" allowfullscreen></iframe>

*Forcing the best-fit line to go through the origin (0,0).*

```python
import numpy as np
x = np.array([-3.0,-2.0,-1.0,1.0,2.0,3.0])
y = np.array([-4.0,-2.5,-.2,2.0,3.5,4.2])

#best fit line through origin
slope, intercept, slope_error, intercept_error, r = lr.results(x,y,through_origin= True) 

#regular best fit line
slope2,intercept2,slope_error2,intercept_error2,r2 = lr.results(x,y)


%matplotlib notebook
import matplotlib.pyplot as plt

params = {'legend.fontsize': 8,
          }
plt.rcParams.update(params)
fig, ax = plt.subplots()
ax.scatter(x,y,s=10, label = 'Measurements')
ax.plot(x,slope*x + intercept, label = "Regression Line Through (0,0)")
ax.plot(x,slope2*x+intercept2, label = "Regular Best Fit Line" )
ax.hlines(0,-4,4,color = 'black')
ax.vlines(0,-5,5,color = 'black')

plt.legend(loc = 2)
plt.show()
```

In [21]:
import numpy as np
x = np.array([-3.0,-2.0,-1.0,1.0,2.0,3.0])
y = np.array([-4.0,-2.5,-.2,2.0,3.5,4.2])

#best fit line through origin
slope, intercept, slope_error, intercept_error, r = lr.results(x,y,through_origin= True) 

#regular best fit line
slope2,intercept2,slope_error2,intercept_error2,r2 = lr.results(x,y)


%matplotlib notebook
import matplotlib.pyplot as plt

params = {'legend.fontsize': 8,
          }
plt.rcParams.update(params)
fig, ax = plt.subplots()
ax.scatter(x,y,s=10, label = 'Measurements')
ax.plot(x,slope*x + intercept, label = "Regression Line Through (0,0)")
ax.plot(x,slope2*x+intercept2, label = "Regular Best Fit Line" )
ax.hlines(0,-4,4,color = 'black')
ax.vlines(0,-5,5,color = 'black')

plt.legend(loc = 2)
plt.show()

<IPython.core.display.Javascript object>

**lr.print_results(x, y, through_origin=False)**

>**parameters**: x,y, (optional) through_origin <br>
x and y must be two sets of measurements of equal length. To force the regression line to through through the origin, include the third parameter as through_origin = True.

>**prints:** 
  >>**slope** of regression line 
    
    >>**intercept** of regression line
    
    >>**standard error of slope** of regression line
    
    >>**standard error of intercept** of regression line
    
    >>**correlation coefficient (a.k.a. r)** 

**Example:**

```python
x = [1.0,5.0,9.0,15.0]
y = [10.0,150.0,290.5,400.8]
print('Regular Linear Regression:\n')
lr.print_results(x,y)
print('\nForcing the best fit line to go through (0,0):\n')
lr.print_results(x,y,through_origin = True)
```

In [22]:
x = [1.0,5.0,9.0,15.0]
y = [10.0,150.0,290.5,400.8]
print('Regular Linear Regression:\n')
lr.print_results(x,y)
print('\nForcing the best fit line to go through (0,0):\n')
lr.print_results(x,y,through_origin = True)

Regular Linear Regression:

      Slope  Intercept  Std. Error, Slope
  28.053738   2.421963           3.237238

  Std. Error, Intercept         r
              29.492646  0.986944

Forcing the best fit line to go through (0,0):

     Slope  Intercept  Std. Error, Slope
  28.27259        0.0           1.503083

         r
  0.986944


**Try It Out!**

In [23]:
%%html
<iframe src="https://trinket.io/embed/python3/e8a32ea6e3" width="100%" height="356" frameborder="0" marginwidth="0" marginheight="0" allowfullscreen></iframe>


### Individual Linear Regression Variables
**lr.slope(x, y, through_origin=False)**

>**parameters**: x,y, (optional) through_origin <br>
x and y must be two sets of measurements of equal length. For the slope of the best fit line through the origin, include the third parameter as through_origin = True.

>**returns:** the best fit slope

**lr.slope_error(x, y, through_origin=False)**

>**parameters**: x,y, (optional) through_origin <br>
x and y must be two sets of measurements of equal length. For the standard error of the slope of the best fit line through the origin, include the third parameter as through_origin = True.

>**returns:** The standard error of the slope.

**lr.intercept(x,y)**
>**parameters**: x,y <br>
x and y must be two sets of measurements of equal length.

>**returns:** The intercept of the best fit line

**lr.intercept_error(x,y)**
>**parameters**: x,y, (optional) through_origin <br>
x and y must be two sets of measurements of equal length.

>**returns:** The standard error of the intercept

**lr.r(x,y)**
>**parameters**: x,y <br>
x and y must be two sets of measurements of equal length.

>**returns:** The correlation coefficient, a.k.a. r

**lr.r_squared(x,y):**
>**parameters**: x,y <br>
x and y must be two sets of measurements of equal length.

>**returns:** The coefficient of determination, a.k.a. $r^2$

**lr.p_value(x,y,through_origin=False)**

>**parameters**: x,y, (optional) through_origin<br>
x and y must be two sets of measurements of equal length. For the p-value when the best fit line in forced to inercept the origin, include the third parameter as through_origin = True.

>**returns:** Two sided p-value for the t-test with the null hypothesis that the slope is zero.

## Graphs <a id='graphs'></a>

### Example 1

```python
x = np.array([1.0,2.0,3.0,4.0,5.0,7.0])
y = np.array([2.0,10.0,10.3,15.9,19.3,22.5])
slope = lr.slope(x,y)
intercept = lr.intercept(x,y)
axis_font = {'family': 'serif',
        'color':  'darkred',
        'weight': 'bold',
        'size': 12,
        } 
font_big = {'family': 'serif',
        'color':  'darkred',
        'weight': 'bold',
        'size': 16,
} 
fig,ax = plt.subplots(1, figsize = (8,6))
ax.scatter(x,y, label = 'Measurements', color = 'black')
ax.set_xlabel('x', fontdict = axis_font)
ax.set_ylabel('y',fontdict = axis_font)
ax.set_title('Linear Regression: y vs x', fontdict = font_big)
ax.set_yticks(np.arange(0,y.max(), 2))
ax.plot(x,x*slope+intercept, color = 'black')
coordinates = list(zip(x,y))

for coordinate in coordinates:
    ax.vlines(coordinate[0],coordinate[0]*slope + intercept,coordinate[1])
    
```

In [14]:
x = np.array([1.0,2.0,3.0,4.0,5.0,7.0])
y = np.array([2.0,10.0,10.3,15.9,19.3,22.5])
slope = lr.slope(x,y)
intercept = lr.intercept(x,y)
axis_font = {'family': 'serif',
        'color':  'darkred',
        'weight': 'bold',
        'size': 12,
        } 
font_big = {'family': 'serif',
        'color':  'darkred',
        'weight': 'bold',
        'size': 16,
} 
fig,ax = plt.subplots(1, figsize = (8,6))
ax.scatter(x,y, label = 'Measurements', color = 'black')
ax.set_xlabel('x', fontdict = axis_font)
ax.set_ylabel('y',fontdict = axis_font)
ax.set_title('Linear Regression: y vs x', fontdict = font_big)
ax.set_yticks(np.arange(0,y.max(), 2))
ax.plot(x,x*slope+intercept, color = 'black')
coordinates = list(zip(x,y))

for coordinate in coordinates:
    ax.vlines(coordinate[0],coordinate[0]*slope + intercept,coordinate[1])
    

plt.show();

<IPython.core.display.Javascript object>

### Example 2: 
**Same dataset as Example 1, but with best fine through (0,0)**

```python
slope = lr.slope(x,y,through_origin = True)
intercept = lr.intercept(x,y,through_origin = True)

fig,ax = plt.subplots(1, figsize = (8,6))
ax.scatter(x,y, label = 'Measurements', color = 'black')
ax.set_xlabel('x', fontdict = axis_font)
ax.set_ylabel('y',fontdict = axis_font)
ax.set_title('Linear Regression Through (0,0): y vs x', fontdict = font_big)
ax.set_yticks(np.arange(0,y.max(), 2))
x_fit = np.arange(0,x.max()+1)
ax.plot(x_fit,x_fit*slope+intercept, color = 'black')
coordinates = list(zip(x,y))

for coordinate in coordinates:
    ax.vlines(coordinate[0],coordinate[0]*slope + intercept,coordinate[1])
   
```

In [16]:

slope = lr.slope(x,y,through_origin = True)
intercept = lr.intercept(x,y,through_origin = True)

fig,ax = plt.subplots(1, figsize = (8,6))
ax.scatter(x,y, label = 'Measurements', color = 'black')
ax.set_xlabel('x', fontdict = axis_font)
ax.set_ylabel('y',fontdict = axis_font)
ax.set_title('Linear Regression Through (0,0): y vs x', fontdict = font_big)
ax.set_yticks(np.arange(0,y.max(), 2))
x_fit = np.arange(0,x.max()+1)
ax.plot(x_fit,x_fit*slope+intercept, color = 'black')
coordinates = list(zip(x,y))

for coordinate in coordinates:
    ax.vlines(coordinate[0],coordinate[0]*slope + intercept,coordinate[1])
    

plt.show();

<IPython.core.display.Javascript object>


## Linear Regression Mathematical Equations <a id='equations'></a>

If you have taken $\mathrm{n}$ pairs of measurements $(x_1,y_1),(x_2,y_2),...,(x_n,y_n)$, the mean value of $\mathrm{x}$ is by definition:
$$\bar{x} = \frac{1}{n}\sum_{i=1}^n{x_i}$$

and the mean value of $\mathrm{y}$ is 
$$\bar{y} = \frac{1}{n}\sum_{i=1}^n{y_i}$$


The slope of the best fit line, $\mathrm{m}$ is given by:

$$m = \frac{\sum_{i=1}^n{(x_i - \bar{x})y_i}}{\sum_{i=1}^n{(x_i - \bar{x})^2}}
$$


The $\mathrm{y}$-intercept, $\mathrm{c}$, is given by:
$$c = \bar{y} - m\bar{x}$$


The standard error in the slope, $\Delta m$, is:
$$\Delta m = \sqrt{\frac{1}{\sum_{i=1}^n{(x_i - \bar{x})^2}}\frac{\sum_{i=1}^n{(y_i - mx_i - c )^2}}{n-2}}$$


The standard error in the y intercept, $\Delta c$ is:
$$\Delta c = \sqrt{\left(\frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n{(x_i-\bar{x})^2}}\right)\frac{\sum_{i=1}^n{(y_i - mx_i - c )^2}}{n-2}}$$


**If the best fit is required to pass through the origin**, $(0,0)$, then $c = 0$, and 
$$m = \frac{\sum_{i=1}^n{x_iy_i}}{\sum_{i=1}^n{x_i^2}}$$


and the standard error of the slope, $\Delta m$ is
$$\Delta m = \sqrt{\frac{1}{\sum_{i=1}^n{x_i^2}}\frac{(y_i - mx_i)^2}{n-1}}$$