# Run different specifications on the regression model

In [1]:
library(tidyverse)
library(haven)
library(stargazer)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.8     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

“package ‘ggplot2’ was built under R version 4.1.3”
“package ‘tidyr’ was built under R version 4.1.2”
“package ‘readr’ was built under R version 4.1.2”
“package ‘dplyr’ was built under R version 4.1.3”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

“package ‘haven’ was built under R version 4.1.3”
“package ‘stargazer’ was built under R version 4.1.2”

Please cite as: 


 Hlavac, Marek (2022). stargaz

## Data Cleaning

In [2]:
dat <- read_dta("data/CCHS_Annual_2017_2018_curated_trimmed_25%.dta") |> 
    select(GEN_010, SPS_040, dhhgage, DHH_SEX, dhhdglvg) |>
    na.omit()

In [3]:
dat_cleaned <- dat |>
    rename(satisfaction = GEN_010, emo_bond = SPS_040, age = dhhgage, sex = DHH_SEX, family = dhhdglvg) |>
    filter(satisfaction < 11 & emo_bond <= 4 & age <= 16 & sex <= 2 & family <= 8) |> #filter out invalid values
    mutate(sex = as_factor(sex),
           emo_bond = as_factor(emo_bond),
         family = as_factor(family),
         age = as_factor(age))

In [4]:
dat_cleaned$age <- case_when(dat_cleaned$age == "Age between 12 and 14" ~ 13,
                            dat_cleaned$age == "Age between 15 and 17" ~ 16,
                            dat_cleaned$age == "Age between 18 and 19" ~ 18.5,
                            dat_cleaned$age == "Age between 20 and 24" ~ 22,
                            dat_cleaned$age == "Age between 25 and 29" ~ 27,
                            dat_cleaned$age == "Age between 30 and 34" ~ 32,
                            dat_cleaned$age == "Age between 35 and 39" ~ 37,
                            dat_cleaned$age == "Age between 40 and 44" ~ 42,
                            dat_cleaned$age == "Age between 45 and 49" ~ 47,
                            dat_cleaned$age == "Age between 50 and 54" ~ 52,
                            dat_cleaned$age == "Age between 55 and 59" ~ 57,
                            dat_cleaned$age == "Age between 60 and 64" ~ 62,
                            dat_cleaned$age == "Age between 65 and 69" ~ 67,
                            dat_cleaned$age == "Age between 70 and 74" ~ 72,
                            dat_cleaned$age == "Age between 75 and 79" ~ 77,
                            dat_cleaned$age == "Age 80 and older" ~ 80
)

## Model
To perform the statistical analysis, we will estimate a linear regression model in this paper:

$$
Y_i = \beta_0 + \sum_{b=1}^3 \beta_{1, b} E^b_{i} + \beta_2 A_i + \sum_{b=1}^3 \sigma_{b} (E^b_{i} \times A_i) + \alpha X_i + \epsilon_i
$$

Let $i$ index the observation. 
- $Y_i$ is the satisfaction with life in general. 
- $E_i$ is the degree of agreement for strong emotional bond with at least one person. In the summation function $\sum_{b=1}^3 \beta_{1, b} E^b_{i}$, $E^b_{i}$ is an indicator variable equal to one if $E_i$ falls in the given level $b$ (e.g., “agree”). 
- $A_i$ is the age. 
- $E^b_{i} \times A_i$ is the interaction between the emotional bond of a given category $b$ and age. We include this term because we hypothesize that the effect of emotional bond on life satisfaction may depend on age groups, as indicated by the previous study (Vandeleur et al., 2009). 
- $X_i$ represents other control variables. As mentioned above, we will include sex and living/family arrangement, for which we will run different specifications for multiple trials.

### Specifications

I choose 4 specifications. 

Adhering to our proposed model, my first specification is only considering the first three terms of our primary interest.

Having a look at the regression coefficients of the first specification, I realized the effects of interaction term on satisfaction are not significant. Thus, I run the second specification similar to the first one but left the interaction term.

Later, I took control variables into account. In the third specification, I included "sex" in the model. In the fourth one, I included "family arrangement."

#### Specification 1 - Without controls, with interaction

In [5]:
reg1 = lm(satisfaction ~ emo_bond + age + emo_bond:age, data = dat_cleaned)

#### Specification 2 - Without controls, without interaction

In [6]:
reg2 = lm(satisfaction ~ emo_bond + age, data = dat_cleaned)

#### Specification 3 - Controlling "sex", without interaction

In [7]:
reg3 = lm(satisfaction ~ emo_bond + age + sex, data = dat_cleaned)

#### Specification 4 - Controlling "family arrangement", without interaction

In [8]:
reg4 = lm(satisfaction ~ emo_bond + age + family, data = dat_cleaned)

## Output Summary - Regression Table

In [9]:
stargazer(reg1, reg2, reg3, type = "text")


                                                          Dependent variable:                             
                              ----------------------------------------------------------------------------
                                                              satisfaction                                
                                        (1)                       (2)                       (3)           
----------------------------------------------------------------------------------------------------------
emo_bondAgree                        -0.503***                 -0.620***                 -0.620***        
                                      (0.102)                   (0.038)                   (0.038)         
                                                                                                          
emo_bondDisagree                     -1.373***                 -1.702***                 -1.702***        
                                    

In [10]:
stargazer(reg4, type = "text") # no enough space, thus output a new table


                                                                      Dependent variable:    
                                                                  ---------------------------
                                                                         satisfaction        
---------------------------------------------------------------------------------------------
emo_bondAgree                                                              -0.584***         
                                                                            (0.038)          
                                                                                             
emo_bondDisagree                                                           -1.511***         
                                                                            (0.102)          
                                                                                             
emo_bondStrongly disagree                                  