## Statistical Model 
- Mathematical equation which explains the relationship between dependent variable (Y) and independent variable(X).
          
          Y = f(X)
          
- Due to uncertainy in result and noise the equation is
            
          Y = f(X) + e
          
          


## Linear Models
- Y = θ0 + θ1 * X1 + θ2 * X2 + ... + θn * Xn
- Y = θ0 + θ1 * X + θ2 * X^2 + ... + θn * X^n
- Y = θ0 + θ1 * sin(X1) + θ2 * cos(X2)

## Linear Regression

    Y = θ0 + θ1 * X + e

    θ0, θ1 = Coefficient 
    e = normally distributed residual error 
    
- Linear Regression model assumes that residuals are independent and normally distributed
- Model is fitted to the data using ordinary least squares approach

## Non-Linear Models
- Most of the cases, the non-linear models are generalized to linear models
- Binomial Regresson, Poisson Regression 


## Design Matrices 
- Once the model is chosen design metrices are constructed.
    Y = XB + e

| Variable | Description | 
| -------- | ----------- |
| Y | vector/matrix of dependent variable | 
| X | vector/matrix of independent variable | 
| B | vector/matrix of coefficient | 
| e | residual error | 


## Creating a Model 

### using statsmodel library
- OLS
- GLM
- ols
- glm 

        Uppercase names take design metrices as args 
        Lowercase names take Patsy formulas and dataframes as args 
        
## Fitting a Model 
- fitting method returns a model object for futher methods, attributes and coefficient matrix for analysis

## View Model Summary
- Describe the fit description of the model in text. 

<hr>


## Construct Design Matrices 

Y = θ0 + θ1 * X1 + θ2 * X2 + θ3 * X1 * X2

## Design Matrix with Numpy

In [26]:
import numpy as np 

Y = np.array([1,2,3,4,5]).reshape(-1,1)

x1 = np.array([6,7,8,9,10])
x2 = np.array([11,12,13,14,15])

X = np.vstack([np.ones(5), x1, x2, x1*x2]).T 

print(Y)
print(X)

[[1]
 [2]
 [3]
 [4]
 [5]]
[[  1.   6.  11.  66.]
 [  1.   7.  12.  84.]
 [  1.   8.  13. 104.]
 [  1.   9.  14. 126.]
 [  1.  10.  15. 150.]]


## Design Matrix with patsy 

- allows defining a model easily 
- constructs relevant design matrices (patsy.dmatrices)
- takes a formula in string form as arg and a dictionary like object with data arrays for resoponse variables 



In [25]:
import patsy 

y = np.array([1, 2, 3, 4, 5])
x1 = np.array([6, 7, 8, 9, 10])
x2 = np.array([11, 12, 13, 14, 15])
data = {
    'Y' : Y,
    'x1' : x1,
    'x2' : x2,
}

equation = 'Y ~ 1 + x1 + x2 + x1*x2'

Y, X = patsy.dmatrices(equation, data)

print(Y)
print(X)

[[1.]
 [2.]
 [3.]
 [4.]
 [5.]]
[[  1.   6.  11.  66.]
 [  1.   7.  12.  84.]
 [  1.   8.  13. 104.]
 [  1.   9.  14. 126.]
 [  1.  10.  15. 150.]]
