# Multivariate Analysis

* Load the [Lending Club Statistics](https://www.lendingclub.com/info/download-data.action).
* Use income, **annual_inc**, to model interest rates (int_rate).
* Add **home ownership** (home_ownership) to the model.

In [6]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt 

# Load Data into Pandas DataFrame
loans = pd.read_csv("https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv")
loans2 = pd.DataFrame(columns=['int_rate', 'annual_inc', 'home_ownership_ord'])

# Pre-process data
intrate = loans['Interest.Rate'].map(lambda x: round(float(x.rstrip('%'))/100,4))
income = loans['Monthly.Income'].map(lambda x: x * 12)
ownership = pd.Categorical(loans['Home.Ownership']).codes

# Assign cleaned data to DataFrame columns
loans2['int_rate'] = intrate
loans2['annual_inc'] = income
loans2['home_ownership_ord'] = ownership

# Add interaction term
loans2 = sm.add_constant(loans2)
loans2['Interaction'] = loans2['annual_inc'] * loans2['home_ownership_ord']

# Define Least Squares Model
model = sm.OLS(loans2['int_rate'], loans2[['const', 'annual_inc']], missing='drop')
result = model.fit()
print (result.summary())

print ('''
-------------------------------Model #2-----------------------------------------
''')

model2 = sm.OLS(loans2['int_rate'], loans2[['const', 'annual_inc', 'home_ownership_ord']], missing='drop')
result2 = model2.fit()
print (result2.summary())

                            OLS Regression Results                            
Dep. Variable:               int_rate   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.3764
Date:                Sat, 07 May 2016   Prob (F-statistic):              0.540
Time:                        13:42:48   Log-Likelihood:                 4390.2
No. Observations:                2499   AIC:                            -8776.
Df Residuals:                    2497   BIC:                            -8765.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.1300      0.001     88.868      0.0