# 1 Assignment 3

1. Generate a function named **beta_OLS**. This function must return the estimated beta using OLS. The inputs of this function should be `X`, `y`, and `intercept`. `X` (covariables) must be a **pd.DataFrame** and `y` (endog) must be a **pd.Series** and `intercept` can be `True` or `False`, by default `True`. When `intercept` is `False`, the estimated beta does not include **intercept**. Also, you must specify the type of your function's parameters and output and It must raise an error if the inputs do not meet the requirements. The function's output must be a **pd.DataFrame** (`n`, 1) where `n` is the total number of regressors ( it includes the **Intercept** when `intercept` is `True`). The column should be named as **Coef.** and the row index should be named as the original name of the columns in `X` input. Your results should look like `output_example` being `X_input`, `y_input` and `intercept_input` your inputs. Apply your function to find $\widehat{\boldsymbol{\beta}}^{(OLS)}$ of the equation below. Use greene data.  **Hint: Use NumPy to generate the OLS beta and check [this link](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html) to change the index name of a pd.Dataframe, [link](https://notebooks.githubusercontent.com/view/ipynb?browser=chrome&color_mode=auto&commit=69c80e1f2c1c268f0480a32932262201785a576c&device=unknown_device&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f616c6578616e6465727175697370652f4469706c6f6d61646f5f505543502f363963383065316632633163323638663034383061333239333232363232303137383561353736632f4c6563747572655f352f4c6563747572655f352e6970796e62&logged_in=true&nwo=alexanderquispe%2FDiplomado_PUCP&path=Lecture_5%2FLecture_5.ipynb&platform=windows&repository_id=427747212&repository_type=Repository&version=96##5.1.7.).**

$$
\widehat{\boldsymbol{\beta}}^{(OLS)} = \left( \mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X}^\top \mathbf{Y}
$$


$$
\begin{aligned} 
lnCT &= \beta_{0}+\beta_{q}lnq+ \beta_{qq}(lnq)^2+\beta_{q1}lnqlnp_1+\beta_{q2}lnqlnp_2+ \beta_{q3}lnqlnp_{3} +\beta_{1}lnp_1+\beta_{2}lnp_2+ \beta_{3}lnp_3 \\
& + \beta_{11}(lnp_{1})^2+ \beta_{22}(lnp_{2})^2+ \beta_{33}(lnp_{3})^2 + \beta_{12}lnp_{1}lnp_{2}+ \beta_{13}lnp_{1}lnp_{3}+\beta_{23}lnp_{2}lnp_{3} 
\end{aligned}
$$


In [4]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

In [5]:
greene = pd.read_csv(r"..\..\_data/christensen_greene_f4.csv")
greene

Unnamed: 0,id,YEAR,COST,Q,PL,SL,PK,SK,PF,SF
0,1,1970,0.2130,8.0,6869.47,0.3291,64.945,0.4197,18.000,0.2512
1,4,1970,3.0427,869.0,8372.96,0.1030,68.227,0.2913,21.067,0.6057
2,5,1970,9.4059,1412.0,7960.90,0.0891,40.692,0.1567,41.530,0.7542
3,14,1970,0.7606,65.0,8971.89,0.2802,41.243,0.1282,28.539,0.5916
4,15,1970,2.2587,295.0,8218.40,0.1772,71.940,0.1623,39.200,0.6606
...,...,...,...,...,...,...,...,...,...,...
153,214,1970,6.8293,946.6,10642.16,0.0883,43.600,0.1914,51.463,0.7203
154,215,1970,3.7605,377.0,7432.24,0.2117,74.120,0.2274,33.436,0.5609
155,216,1970,3.9822,391.0,5826.04,0.1926,78.288,0.0924,44.633,0.7151
156,217,1970,30.1880,5317.0,9586.63,0.0845,78.008,0.2009,41.840,0.7147


In [6]:
# Defining beta_OLS function  

def beta_ols( X:pd.DataFrame , y:pd.Series, intercept:bool = True):
    
    if not isinstance( X , pd.DataFrame ):
        raise TypeError( "X is not a pd.DataFrame type." )
        
    if not isinstance( y , pd.Series ):
        raise TypeError( "y is not a pd.Series type." )
        
    if not isinstance( intercept , bool ):
        raise TypeError( "intercept is not a boolean type." )
    
    if intercept == True:
        X_new = X.copy() 
        X_new.insert(0, 'intercept', 1)
        beta_ols = np.linalg.inv(X_new.T @ X_new) @ X_new.T @ y 
        beta_ols = pd.DataFrame(beta_ols)
        beta_ols['Param.'] = X_new.columns.tolist()
        beta_ols.set_index('Param.', inplace=True)
        beta_ols.rename(columns = {0:'Coef.'}, inplace=True)
        results = beta_ols
    
    else:
        X_new = X.copy() 
        beta_ols = np.linalg.inv(X_new.T @ X_new) @ X_new.T @ y 
        beta_ols = pd.DataFrame(beta_ols)
        beta_ols['Param.'] = X_new.columns.tolist()
        beta_ols.set_index('Param.', inplace=True)
        beta_ols.rename(columns = {0:'Coef.'}, inplace=True)
        results = beta_ols 
    
    return results

In [7]:
# Defining variables for building Y vector and X covariate matrix

only_cost = greene.COST.values 
only_Q = greene.Q.values 
only_PL = greene.PL.values 
only_PK = greene.PK.values 
only_PF = greene.PF.values

lnct = np.log(only_cost)
lnq = np.log(only_Q)
lnp1 = np.log(only_PL)
lnp2 = np.log(only_PK)
lnp3 = np.log(only_PF)
lnq_2 = np.multiply(lnq,lnq)
lnqlnp1 = np.multiply(lnq,lnp1)
lnqlnp2 = np.multiply(lnq,lnp2)
lnqlnp3 = np.multiply(lnq,lnp3)
lnp1_2 = np.multiply(lnp1,lnp1) 
lnp2_2 = np.multiply(lnp2,lnp2) 
lnp3_2 = np.multiply(lnp3,lnp3) 
lnp1lnp2 = np.multiply (lnp1,lnp2)
lnp1lnp3 = np.multiply (lnp1,lnp3)
lnp2lnp3 = np.multiply (lnp2,lnp3)

In [8]:
# Bulding vector Y_input of dependent variable
y_input = pd.Series(lnct)
y_input

0     -1.546463
1      1.112745
2      2.241337
3     -0.273648
4      0.814789
         ...   
153    1.921222
154    1.324552
155    1.381834
156    3.407444
157    4.217391
Length: 158, dtype: float64

In [9]:
# Building covariates matrix (X_input)

X_input = np.hstack(( lnq.reshape( -1, 1 ), lnq_2.reshape( -1, 1 ), lnqlnp1.reshape( -1, 1 ), lnqlnp2.reshape(-1, 1 ), 
                lnqlnp3.reshape( -1, 1 ), lnp1.reshape( -1, 1 ), lnp2.reshape( -1, 1 ), lnp3.reshape(-1, 1 ), lnp1_2.reshape( -1, 1 ), lnp2_2.reshape( -1, 1 ), 
                lnp3_2.reshape( -1, 1 ), lnp1lnp2.reshape(-1, 1 ), lnp1lnp3.reshape(-1, 1 ), lnp2lnp3.reshape(-1, 1 ) ))
X_input = pd.DataFrame(X_input)
X_input.rename(columns = {0:'lnq', 1:'lnq_2', 2:'lnqlnp1', 3:'lnqlnp2', 4:'lnqlnp3', 5:'lnp1', 6:'lnp2', 7:'lnp3', 8:'lnp1_2',
                          9:'lnp2_2', 10:'lnp3_2', 11:'lnp1lnp2', 12:'lnp1lnp3', 13:'lnp2lnp3'}, inplace = True)
intercept_input = True
X_input

Unnamed: 0,lnq,lnq_2,lnqlnp1,lnqlnp2,lnqlnp3,lnp1,lnp2,lnp3,lnp1_2,lnp2_2,lnp3_2,lnp1lnp2,lnp1lnp3,lnp2lnp3
0,2.079442,4.324077,18.371538,8.678634,6.010359,8.834842,4.173541,2.890372,78.054437,17.418442,8.354249,36.872574,25.535978,12.063084
1,6.767343,45.796933,61.127805,28.577410,20.624885,9.032763,4.222840,3.047708,81.590803,17.832381,9.288523,38.143915,27.529222,12.869984
2,7.252762,52.602563,65.146469,26.878966,27.026810,8.982297,3.706032,3.726416,80.681665,13.734670,13.886177,33.288677,33.471777,13.810215
3,4.174387,17.425509,37.994654,15.526556,13.989505,9.101852,3.719481,3.351272,82.843703,13.834542,11.231021,33.854168,30.502777,12.464992
4,5.686975,32.341689,51.263140,24.316554,20.863674,9.014131,4.275832,3.668677,81.254554,18.282743,13.459189,38.542913,33.069932,15.686647
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
153,6.852877,46.961918,63.543838,25.870001,27.006249,9.272579,3.775057,3.940863,85.980717,14.251056,15.530402,35.004515,36.541963,14.876983
154,5.932245,35.191533,52.877557,25.542382,20.820004,8.913583,4.305685,3.509633,79.451954,18.538927,12.317525,38.379082,31.283405,15.111376
155,5.968708,35.625470,51.749248,26.025919,22.671977,8.670093,4.360394,3.798473,75.170509,19.013039,14.428401,37.805024,32.933118,16.562842
156,8.578665,73.593485,78.650266,37.375623,32.031471,9.168125,4.356811,3.733853,84.054510,18.981805,13.941657,39.943790,34.232428,16.267692


In [10]:
# Applying beta_OLS function 

beta_ols( X_input, y_input, intercept_input )

Unnamed: 0_level_0,Coef.
Param.,Unnamed: 1_level_1
intercept,-76.259258
lnq,-1.080425
lnq_2,0.026489
lnqlnp1,0.131041
lnqlnp2,0.040144
lnqlnp3,0.058652
lnp1,14.718292
lnp2,6.380797
lnp3,-0.894733
lnp1_2,-0.769264


In [11]:
#Comparing results with OLS fuction 

results_r = smf.ols( 'lnct ~ lnq + lnq_2 + lnqlnp1 + lnqlnp2 + lnqlnp3 + lnp1 + lnp2 + lnp3 + lnp1_2 + lnp2_2 + lnp3_2 + lnp1lnp2 + lnp1lnp3 + lnp2lnp3', 
        data = greene)
X_r = pd.DataFrame(results_r.exog, columns = results_r.exog_names).iloc[ : , 1: ]
y_r = pd.Series(results_r.endog , name = results_r.endog_names )
intercept_r = True
beta_OLS_output_r = results_r.fit().summary2().tables[1].iloc[ :, [0]]
beta_OLS_output_r

Unnamed: 0,Coef.
Intercept,-76.259261
lnq,-1.080425
lnq_2,0.026489
lnqlnp1,0.131041
lnqlnp2,0.040144
lnqlnp3,0.058652
lnp1,14.718293
lnp2,6.380797
lnp3,-0.894733
lnp1_2,-0.769264


2. Generate a new function named **var_OLS**. This function must return the estimated variance for OLS. The inputs of this function should be `X`, `y`, and `intercept`. `X` (covariables) must be a **pd.DataFrame** and `y` (endog) must be a **pd.Series** and `intercept` can be `True` or `False`, by default `True`. When `intercept` is `False`, the estimated variance does not include **intercept** in the `X` regressor. Also, you must specify the type of your function's parameters and output and It must raise an error if the inputs do not meet the requirements. The output of the function should look like `var_ols` where the columns and the index use the name of `X` columns.  Apply your function to find $\mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}^{(OLS)})$ of the equation below. Use greene data.  **Hint: Use numpy, `def` function, and `columns` method, [link](https://notebooks.githubusercontent.com/view/ipynb?browser=chrome&color_mode=auto&commit=69c80e1f2c1c268f0480a32932262201785a576c&device=unknown_device&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f616c6578616e6465727175697370652f4469706c6f6d61646f5f505543502f363963383065316632633163323638663034383061333239333232363232303137383561353736632f4c6563747572655f352f4c6563747572655f352e6970796e62&logged_in=true&nwo=alexanderquispe%2FDiplomado_PUCP&path=Lecture_5%2FLecture_5.ipynb&platform=windows&repository_id=427747212&repository_type=Repository&version=96##5.1.7.).**


$$
\begin{aligned}
\mathbb{V}{\rm ar} (\widehat{\boldsymbol{\beta}}^{(OLS)}) = 
\sigma^2 \left( \mathbf{X}^\top  \mathbf{X}\right)^{-1}
\end{aligned}
$$

$$
\begin{aligned} 
lnCT &= \beta_{0}+\beta_{q}lnq+ \beta_{qq}(lnq)^2+\beta_{q1}lnqlnp_1+\beta_{q2}lnqlnp_2+ \beta_{q3}lnqlnp_{3} +\beta_{1}lnp_1+\beta_{2}lnp_2+ \beta_{3}lnp_3 \\
& + \beta_{11}(lnp_{1})^2+ \beta_{22}(lnp_{2})^2+ \beta_{33}(lnp_{3})^2 + \beta_{12}lnp_{1}lnp_{2}+ \beta_{13}lnp_{1}lnp_{3}+\beta_{23}lnp_{2}lnp_{3} 
\end{aligned}
$$

In [12]:
# Defining var_OLS function  

def var_ols( X:pd.DataFrame , y:pd.Series, intercept:bool = True) -> pd.DataFrame:
    
    if not isinstance( X , pd.DataFrame ):
        raise TypeError( "X is not a pd.DataFrame type." )
        
    if not isinstance( y , pd.Series ):
        raise TypeError( "y is not a pd.Series type." )
        
    if not isinstance( intercept , bool ):
        raise TypeError( "intercept is not a boolean type." )
    
    if intercept == True:
        X_new = X.copy() 
        X_new.insert(0, 'intercept', 1) 
        names = X_new.columns.tolist()
        beta_est = beta_ols( X, y, intercept)   ### Estimating parameters
        y_pred = X_new @ beta_est               ### Calculating predicted values
        y_pred = y_pred.squeeze()
        n_data = X_new.shape[0]
        k_param = X_new.shape[1]
        sr2 = ( (1 / (n_data - k_param)) * (y_pred - y).T  @ (y_pred - y)) ### Calculating the variance of the regression
        var_beta = sr2*np.linalg.inv(X_new.T @ X_new)                ### Getting variance and covariance Matrix
        var_beta = pd.DataFrame(var_beta, columns=names)
        var_beta[' '] = names
        var_beta.set_index(' ', inplace=True)
        
    else:
        X_new = X.copy() 
        names = X_new.columns.tolist()
        beta_est = beta_ols( X, y, intercept)   ## Estimating parameters
        y_pred = X_new @ beta_est               ## Calculating predicted values
        y_pred = y_pred.squeeze()
        n_data = X_new.shape[0]
        k_param = X_new.shape[1]
        sr2 = ( (1 / (n_data - k_param)) * (y_pred - y).T  @ (y_pred - y)) ## Calculating the variance of the regression
        var_beta = sr2*np.linalg.inv(X_new.T @ X_new)  #### Getting variance and covariance Matrix
        var_beta = pd.DataFrame(var_beta, columns=names)
        var_beta[' '] = names
        var_beta.set_index(' ', inplace=True)
        
    results = var_beta
   
    return results