# OLS example

This is a brief example on using the stats models OLS function with patsy formulas.

In [14]:
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
import scipy as sp

## Create Example Datasets
First I simulate a simple example dataset. You can probably ignore this since you have real data :)

In [15]:
# Create some example data
cs= [12, 14, 16, 18]  # classes of carbons
ds = [0, 1, 2, 3]     # classes of double bonds

df = pd.DataFrame({'RT': np.random.uniform(low=0.1, high=15, size=1000), 'Carbon': np.random.choice(cs, size=1000), 'DB': np.random.choice(ds, size=1000)})
df.head()

Unnamed: 0,Carbon,DB,RT
0,16,0,14.25553
1,14,0,9.523048
2,12,0,14.139203
3,16,2,12.711984
4,18,3,1.184072


## Create OLS output

Now I use stats models to generate OLS output.

In [16]:
# Write out my R-style formula
formula = 'RT ~ C(Carbon) + C(DB)'

# Generate a model using the formula and dataframe. This steps builds all of the matrices needed for OLS.
model = smf.ols(formula, df)

# Fit the model and get the results output
results = model.fit()

# Print an overall summary of the model
results.summary()

0,1,2,3
Dep. Variable:,RT,R-squared:,0.003
Model:,OLS,Adj. R-squared:,-0.003
Method:,Least Squares,F-statistic:,0.4958
Date:,"Thu, 04 Jun 2015",Prob (F-statistic):,0.812
Time:,08:33:22,Log-Likelihood:,-2869.8
No. Observations:,1000,AIC:,5754.0
Df Residuals:,993,BIC:,5788.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
Intercept,7.7054,0.346,22.250,0.000,7.026 8.385
C(Carbon)[T.14],-0.1923,0.391,-0.491,0.623,-0.960 0.576
C(Carbon)[T.16],-0.0354,0.374,-0.095,0.925,-0.769 0.699
C(Carbon)[T.18],-0.5568,0.378,-1.475,0.141,-1.298 0.184
C(DB)[T.1],0.1626,0.378,0.430,0.667,-0.579 0.904
C(DB)[T.2],0.1198,0.379,0.316,0.752,-0.623 0.863
C(DB)[T.3],0.1249,0.389,0.321,0.748,-0.639 0.888

0,1,2,3
Omnibus:,773.418,Durbin-Watson:,2.033
Prob(Omnibus):,0.0,Jarque-Bera (JB):,60.635
Skew:,-0.004,Prob(JB):,6.81e-14
Kurtosis:,1.794,Cond. No.,5.4


## Access results attributes

The results object now holds all of the attributes related to the OLS output. For a description of output see description of the regression results class:

http://statsmodels.sourceforge.net/0.5.0/generated/statsmodels.regression.linear_model.RegressionResults.html#statsmodels.regression.linear_model.RegressionResults

In python you can look at an objects attributes and methods using the `dir` function.

In [17]:
dir(results)

['HC0_se',
 'HC1_se',
 'HC2_se',
 'HC3_se',
 '_HCCM',
 '__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_cache',
 '_data_attr',
 '_get_robustcov_results',
 '_is_nested',
 '_wexog_singular_values',
 'aic',
 'bic',
 'bse',
 'centered_tss',
 'compare_f_test',
 'compare_lm_test',
 'compare_lr_test',
 'condition_number',
 'conf_int',
 'conf_int_el',
 'cov_HC0',
 'cov_HC1',
 'cov_HC2',
 'cov_HC3',
 'cov_kwds',
 'cov_params',
 'cov_type',
 'df_model',
 'df_resid',
 'diagn',
 'eigenvals',
 'el_test',
 'ess',
 'f_pvalue',
 'f_test',
 'fittedvalues',
 'fvalue',
 'get_influence',
 'get_robustcov_results',
 'initialize',
 'k_constant',
 'llf',
 'load',
 'model',
 'mse_model',
 'mse_resid',
 'mse_total',
 'nobs',
 'normalized_cov_params',
 'outlier_test',
 'params',
 'predict',
 '

There are two $R^2$ values that you can get from these results: `rsquared` and `rsquared_adj`. To get the rsquared value you can do:

In [18]:
rSquared = results.rsquared
print rSquared

0.00298689722429
