# Simple Linear Regression in Python

In [147]:
import pandas as pd
import numpy as np
from scipy import stats

For simple linear regression and proving statistical significant we need three libraries. Pandas for data importing, numpy for data manipulation and scipy for the normal distribution.

In [130]:
df = pd.read_stata('Maternalmortality19902015.dta')

Here is an old project from my ECO475 class where the dependant variable was the Maternal Mortality Ratio and the independant variable was the percentage use of contraception in the country.

In [148]:
df.head()

Unnamed: 0,Countryid,Year,MMR,cpp,cppmod,GDPcap,Docs,yr1990,yr1995,yr2000,yr2005,yr2010,yr2015
0,Afghanistan,2000,1100,5.3,3.6,,12.4,0.0,0.0,1.0,0.0,0.0,0.0
1,Afghanistan,2005,821,13.6,12.5,20.177277,,0.0,0.0,0.0,1.0,0.0,0.0
2,Afghanistan,2010,584,21.8,19.9,36.233788,34.299999,0.0,0.0,0.0,0.0,1.0,0.0
3,Albania,2000,43,57.5,17.9,26.813463,99.099998,0.0,0.0,1.0,0.0,0.0,0.0
4,Albania,2005,30,60.1,25.0,58.729275,99.800003,0.0,0.0,0.0,1.0,0.0,0.0


In [149]:
def estimate_coef(x, y): 
    n = np.size(x) 
  
    m_x, m_y = np.mean(x), np.mean(y) 
  
    SS_xy = np.sum(y*x) - n*m_y*m_x 
    SS_xx = np.sum(x*x) - n*m_x*m_x 
  
    b_1 = SS_xy / SS_xx 
    b_0 = m_y - b_1*m_x 
    
    rss = np.sum(y - (b_1)**2)
    tss = np.sum((y - m_y)**2)
    ess = np.sum((x-m_x)**2)
    r_2 = 1 - rss/tss
    ar_2 = 1 -((n-1)/(n-2))*(1-r_2)
    
    stde_1 = np.sqrt(tss/(n-2))/np.sqrt(ess)
    tstat = b_1/stde_1
    df = n-1
    pvalue = stats.t.cdf(tstat,df=df)
    
    return(b_0, b_1, tss, rss, ess, r_2, ar_2, stde_1, tstat, pvalue) 

This function produces all our regression statistics

In [150]:
def main():
 b = estimate_coef(df.cppmod, df.MMR)
 print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {} \nSE_1: {} \nT-statistic: {} \npvalue: {} \nTotal Sum of Squares:\nTSS = {} \
\nResidual Sum of Squares:\nRSS = {} \
\nError Sum of Squares:\nESS = {} \
\nRsquared:\nR2 = {} \
\nAdjustedRsquared:\nR2 = {} ".format(b[0], b[1], b[7], b[8], b[9], b[2], b[3], b[4], b[5], b[6]))

This function produces the output function

In [146]:
if __name__ == "__main__":
    main()

Estimated coefficients:
b_0 = 807.091740779 
b_1 = -13.1333964602 
SE_1: 1.46203888414 
T-statistic: -8.98293239849 
pvalue: 2.80310870826e-16 
Total Sum of Squares:
TSS = 28326615.2216 
Residual Sum of Squares:
RSS = 25580.820869 
Error Sum of Squares:
ESS = 80314.302622 
Rsquared:
R2 = 0.999096933373 
AdjustedRsquared:
R2 = 0.999096933373 


As we can see there is a negative relationship between the perecentage of contraception use and the maternal mortality ratio. We can see tis is statistically significant at a very high level due to the low p-value.