# Workbook 8
This week will be learning a bit of structural equation modeling (aka SEM). This workbook borrows heavily from A Gentle Introduction to Stata by Alan Acock, Discovering Structural Equation Modeling Using Stata by Alan Acock, and Quantitative Data Analysis by Donald Treiman.

![image.png](attachment:image.png)

## Factor Analysis
As social scientists, we are usually trying to understand a concept. That concept is usually an abstraction. For example, alienation, socio-economic status, or racial discrimination. Yes, we have a definition to it, but translating that concept to measurement is an important step. Assessing the concept means translating the abstract concept into a concrete measurement. This process is sometimes referred to as operationalization. There is a whole statistical field that focuses estimating latent variables (abstract concept) based on measured variables (manifest variables). This field is both factor analysis and structural equation modeling. We will get a taste of it today.

We will cover:
* Principal component factor analysis (pcf)
* Confirmatory factor analysis (cfa)

Both are very similar, but they have slightly different assumptions about variances. Because the way factor analysis estimates the latent variable is it examines the <i>unique factors</i> of each measured variable.

The unique factor can be partitioned into specific factor and error of measurement. Specific factor are specific to the measured variable. The error of measurement for a unique factor are expected to be randomed. 

Every measured variable has the following:

observed variance = common variance + unique variance

where unique variance = specific variance + error variance

In factor analysis, we want to larger communality:

communality =common variance / observed variance

and 

reliability = (common variance +specific variance)/observed variance

## Principal component factor analysis (PCF)
PCFA is one way to measure a latent factor. Remember we do not have actual data for the latent factor. We estimating the latent factor by using a bunch measured variables. With a PCFA we assume no unique variance and is sometimes not even considered factor analysis because of this. 

Let's go through a example of estimating a latent variable with PCFA in Stata

In [2]:
*Set up your working directory to "week 8" folder
cd "C:\Users\acade\Documents\teaching\SOC 211 spring 2023\week8"
*Open the NLSY97 data
use "http://www.stata-press.com/data/dsemusr/nlsy97cfa.dta", clear


C:\Users\acade\Documents\teaching\SOC 211 spring 2023\week8



This data is the National Longitudinal Survey of Youth 1997. The survey follow participants over time.

The survey ask a bunch of questions on attitudes about government support in decent housing, college aid, reducing income differential, health care, job, etc...

In [3]:
*There ten measures let's take a look at them
codebook x1-x10


--------------------------------------------------------------------------------
x1                                       GOVT RESPONSIBILITY - PROVIDE JOBS 2006
--------------------------------------------------------------------------------

                  Type: Numeric (float)
                 Label: vlS8646900

                 Range: [1,4]                         Units: 1
         Unique values: 4                         Missing .: 7,152/8,985

            Tabulation: Freq.   Numeric  Label
                          454         1  Definitely should be
                          617         2  Probably should be
                          462         3  Probably should not be
                          300         4  Definitely should not be
                        7,152         .  

--------------------------------------------------------------------------------
x2                                   GOVT RESPNSBLTY - KEEP PRICES UND CTRL 2006
--------------------------------------

<b>Let's construct a conservatism (latent variable) using x1-x10 using PCFA estimation</b>

In [4]:
factor x1-x10, pcf

(obs=1,617)

Factor analysis/correlation                      Number of obs    =      1,617
    Method: principal-component factors          Retained factors =          2
    Rotation: (unrotated)                        Number of params =         19

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      3.91523      2.90094            0.3915       0.3915
        Factor2  |      1.01429      0.13285            0.1014       0.4930
        Factor3  |      0.88144      0.11496            0.0881       0.5811
        Factor4  |      0.76648      0.02404            0.0766       0.6577
        Factor5  |      0.74243      0.04889            0.0742       0.7320
        Factor6  |      0.69354      0.08649            0.0694       0.8013
        Factor7  |      0.60705      0.06820            0.0

<b>How to read this table</b>

Factor1-factor10 are different estimated latent variables. The eigenvalues is the amount of total variance measured by measured variables. Generally, you want to choose a factor/latent variable with an eigenvalue greater than 1. Moreover, you want to check the factor loadings. Researchers advise measured variables should have a factor loading greater than .3. The factor loadings estimate the eigenvalue:

In [5]:
di .6064^2+.5810^2+0.7221^2+0.7174^2+0.5780^2+0.6091^2+0.6050^2+0.5994^2+0.7330^2+0.4543^2

3.9154428


We are going to drop x10 because the environmental question seems a bit out of place.

In [6]:
factor x1-x9, pcf

(obs=1,625)

Factor analysis/correlation                      Number of obs    =      1,625
    Method: principal-component factors          Retained factors =          1
    Rotation: (unrotated)                        Number of params =          9

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      3.76124      2.80650            0.4179       0.4179
        Factor2  |      0.95473      0.10627            0.1061       0.5240
        Factor3  |      0.84847      0.10176            0.0943       0.6183
        Factor4  |      0.74671      0.05561            0.0830       0.7012
        Factor5  |      0.69110      0.07429            0.0768       0.7780
        Factor6  |      0.61681      0.07780            0.0685       0.8466
        Factor7  |      0.53900      0.09177            0.0

These results look better. There is a clear latent variable (only one factor has eignvalue >1). 

### Alpha reliability
Once you have a latent factor estimated. You want to check it's reliability. Stata can estimate this for you.

In [7]:
alpha x1-x9, std item label asis


Test scale = mean(standardized items)

Items        | S  it-cor  ir-cor   ii-cor   alpha   Label
-------------+------------------------------------------------------------------
x1           | +   0.626   0.499    0.338   0.804   GOVT RESPONSIBILITY -
             |                                        PROVIDE JOBS 2006
x2           | +   0.596   0.462    0.345   0.808   GOVT RESPNSBLTY - KEEP
             |                                        PRICES UND CTRL 2006
x3           | +   0.703   0.592    0.323   0.792   GOVT RESPNSBLTY - HLTH CARE
             |                                        FOR SICK 2006
x4           | +   0.699   0.588    0.323   0.793   GOVT RESPNSBLTY -PROV ELD
             |                                        LIV STAND 2006
x5           | +   0.584   0.446    0.347   0.809   GOVT RESPNSBLTY -PROV IND
             |                                        HELP 2006
x6           | +   0.620   0.491    0.340   0.804   GOVT RESPNSBLTY -PROV UNEMP
        

This reports an alpha score of .81. A rule of thumb of alpha is that it should be >= .70.

### Generating the factor score variable
To generate the factor score which generates a new variable and weights it based on the factor loading. For example x4 has loading of .71 and x5 has a loading .58; x4 has a larger weight in the factor score as compared to x5.

In [8]:
quietly factor x1-x9, pcf
predict conservf



(option regression assumed; regression scoring)

Scoring coefficients (method = regression)

    ------------------------
        Variable |  Factor1 
    -------------+----------
              x1 |  0.16598 
              x2 |  0.15641 
              x3 |  0.19200 
              x4 |  0.18958 
              x5 |  0.15468 
              x6 |  0.16475 
              x7 |  0.16179 
              x8 |  0.15866 
              x9 |  0.19654 
    ------------------------



In [9]:
codebook conservf


--------------------------------------------------------------------------------
conservf                                                     Scores for factor 1
--------------------------------------------------------------------------------

                  Type: Numeric (float)

                 Range: [-1.3865741,4.6149096]        Units: 1.000e-11
         Unique values: 1,022                     Missing .: 7,360/8,985

                  Mean:  3.7e-10
             Std. dev.:        1

           Percentiles:      10%       25%       50%       75%       90%
                        -1.19724  -.769619  -.160152    .61392   1.30616


In [10]:
histogram conservf, norm freq name(B, replace) ///
    xtitle(Factor Score on Conservatism) ylabel(0(25)175)
graph export "conservf_histogram.png", replace width(3400)


(bin=32, start=-1.3865741, width=.18754637)

file C:/Users/acade/.stata_kernel_cache/graph0.svg saved as SVG format
file C:/Users/acade/.stata_kernel_cache/graph0.pdf saved as PDF format

(file conservf_histogram.png not found)
file conservf_histogram.png saved as PNG format


![conservf_histogram.png](attachment:conservf_histogram.png)

## Confirmatory Factor Analysis (CFA)
CFA allows each measured variable to have its own variance (or error variance). CFA is actually factor analysis. Since factor analysis is part of SEM, we can use the SEM model in Stata.

In [11]:
*type sembuilder to open the sem builder window
sembuilder

Click the "Add Measurement Component" and click the blank canvas.
![image.png](attachment:image.png)

Type "Conservative" in the latent variable.

Type "x1-x9" for measured variables. Click OK

![image.png](attachment:image.png)

Click the "Adjust canvas size" change the width to 7 and center the path model.
![image-2.png](attachment:image-2.png)

<i>Notice that every measured variable has its own error</i>

### Estimating a CFA model in Stata

In [12]:
sem (Conservative -> x1-x9), standardized

(7360 observations with missing values excluded)

Endogenous variables
  Measurement: x1 x2 x3 x4 x5 x6 x7 x8 x9

Exogenous variables
  Latent: Conservative

Fitting target model:
Iteration 0:   log likelihood = -15604.985  
Iteration 1:   log likelihood = -15594.134  
Iteration 2:   log likelihood =  -15593.73  
Iteration 3:   log likelihood = -15593.729  

Structural equation model                                Number of obs = 1,625
Estimation method: ml

Log likelihood = -15593.729

 ( 1)  [x1]Conservative = 1
-------------------------------------------------------------------------------
              |                 OIM
 Standardized | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
Measurement   |
  x1          |
  Conservat~e |    .549795   .0200518    27.42   0.000     .5104942    .5890958
        _cons |   2.279751   .0470588    48.44   0.000     2.187518    2.371985
  ----------

You can get these results in sembuilder in Stata:

Click "estimate"
![image-2.png](attachment:image-2.png)

Click on the "reporting" tab and check "Display standardized coefficients and values" and click Okay.
![image.png](attachment:image.png)

![image.png](attachment:image.png)

# Practice

* Open the NLYS97 data: http://www.stata-press.com/data/dsemusr/nlsy97cfa.dta
* Use codebook command to examine variables: x11-x13. Do all the variables follow the same direction? We want to make a "depression" latent variable.
* Use this command to reverse code it: gen x12_reversed=5-x12 
* Use the PCF approach to estimate a Depression latent variables using measured variables x11 x13 x12_reversed. 
* Calculate the alpha reliability test statistic pass? Construct the factor score for Depression. 


<i>answers are listed at the end of the workbook</i>

# SEM
Structural equation modeling (SEM) is another field in advanced statistics. SEM make path models and use a series of equations to estimate that model. SEM has powerful estimation techniques that can account for missing data. So SEM is super popular with psychologists and survey research.

You can use SEM to estimate linear regression models.

In [13]:
use "http://www.stata-press.com/data/agis6/flourishing_bmi.dta", clear

In [14]:
regress bmi age children income educ quickfood


      Source |       SS           df       MS      Number of obs   =       448
-------------+----------------------------------   F(5, 442)       =     16.92
       Model |  2921.16092         5  584.232183   Prob > F        =    0.0000
    Residual |  15264.4284       442  34.5349059   R-squared       =    0.1606
-------------+----------------------------------   Adj R-squared   =    0.1511
       Total |  18185.5893       447   40.683645   Root MSE        =    5.8766

------------------------------------------------------------------------------
         bmi | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0849882   .0471673     1.80   0.072    -.0077118    .1776881
    children |   .6167338   .2881417     2.14   0.033     .0504358    1.183032
    incomeln |  -1.750445   .4360436    -4.01   0.000    -2.607422   -.8934689
        educ |  -.6900809   .2208598    -3.12   0.

This uses OLS estimation. However with SEM we can different estimations such as maximum likelihood.

In [15]:
sem bmi <- age children incomeln educ quickfood

(32 observations with missing values excluded)

Endogenous variables
  Observed: bmi

Exogenous variables
  Observed: age children incomeln educ quickfood

Fitting target model:
Iteration 0:   log likelihood = -5442.1579  
Iteration 1:   log likelihood = -5442.1579  

Structural equation model                                  Number of obs = 448
Estimation method: ml

Log likelihood = -5442.1579

------------------------------------------------------------------------------
             |                 OIM
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Structural   |
  bmi        |
         age |   .0849882   .0468503     1.81   0.070    -.0068368    .1768131
    children |   .6167338   .2862057     2.15   0.031      .055781    1.177687
    incomeln |  -1.750445   .4331138    -4.04   0.000    -2.599333   -.9015577
        educ |  -.6900809   .2193759    -3.15   0.002     -1.

Let's make model in the sembuilder in Stata. Click the "Add Regression Component"
![image.png](attachment:image.png)

Then, type "bmi" for dependent variable and "age children incomeln educ quickfood" in independent variables.
![image.png](attachment:image.png)

Click "Estimate" and click "Okay"
![image.png](attachment:image.png)

You can save the image by pressing the "save" button.

In [None]:
*You can save with command
graph export "sem_bmi_regress.png", as(png) name("sem_bmi_regress")

![sem_bmi_regress.png](attachment:sem_bmi_regress.png)

## Types of variables in SEM
In SEM, there are three types of variables:
* exogenous predictor -- these variables we are not caually dependent on other variables. They are external to the path model. They are independent variables, but not all independent variables are exogenous variables.
* endogenous outcome -- these variables are trying to be explained by the model. This is the dependent variable. 
* endogenous mediator -- is a variable that is a independent variable for some variables and is a dependent variable for other variables.

### Example of path model
![SEM_path_example.png](attachment:SEM_path_example.png)

Generally, in SEM:
* circle indicate latent variables (and have the first letter capitalize)
* rectangle indicates measured variables
* straight arrows x1 -> x4 indicates a path between variables
* curved arrows indicate correlation (this can happen between variables or errors)

In the example above:
* exogenous predictor -- x1, x2
* endogenous outcome -- x6
* endogenous mediator -- x3, x4, x5

Other notes:
* x1 and x2 are correlated

### Direct and indirect effects
x1 has a <b>direct effect</b> on x6 (estimated by beta_61)

x1 has <b>indrect effects</b> on x6 through endogenous mediator x4, x4, and x5. If we have the betas estiamted, we can calculate the indirect effects. For example, to calculate the indirect effect of x1 -> x6 mediated by x4, we multiple beta_41 (x1->x4) and beta_64 (x4 -> x6): (beta_41)*(beta_64)

## Assessing mediation with SEM
Remember mediating variables? x -> z -> y

We can assess mediators with SEM.

For example, we have the following path model.

![sem_bmi_educ_incomeln.png](attachment:sem_bmi_educ_incomeln.png)

And we suspect that the mothers with higher income and higher education attainment are less likely to eat fast food. Mothers with higher SES can afford healthy food options. 
![sem_bmi_quickfood%20mediator%20w%20letters.png](attachment:sem_bmi_quickfood%20mediator%20w%20letters.png)

The path model evaluates whether quickfood mediates the influence educ and incomeln have on BMI.

For us to suspect mediation, usually paths a and b should be significant. 
* quickfood mediates the influence educ has on bmi if: If a is found significant AND a' is not significant AND c*e is significant (this is an example of full mediation)
* quickfood mediates the influence educ has on bmi if: If a is found significant AND a' is significant (but smaller as compared to a) AND c*e is significant (this is an example of partial mediation)

#### An example of endogenous mediator in Stata using SEM

We will be using the data from the Florishing Families Study. The unit of analysis is women with children approaching adolescence. The variables include: # of children in household, mother's age, mother's education, family's income, mother's bmi, and how often family eats at fast food.

![sem_bmi_educ_incomeln.png](attachment:sem_bmi_educ_incomeln.png)

In [16]:
*read in data
use "http://www.stata-press.com/data/agis6/flourishing_bmi.dta", clear
desc




Contains data from http://www.stata-press.com/data/agis6/flourishing_bmi.dta
 Observations:           480                  
    Variables:             8                  9 Feb 2016 12:11
--------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
numberchildren  byte    %8.0g                 Number of children in family
quickfood       float   %9.0g                 
age             byte    %8.0g                 Parent 1 Age
educ            byte    %8.0g      P1SCHLEV   Parent 1 Highest Education Level
                                                Completed
incomeln        float   %9.0g                 
bmi             double  %10.0g                P1 BMI Value
children        byte    %8.0g                 Number of children in family
bmi2            float   %9.0g                 


In [17]:
*Lets estimate the path model above
sem (bmi <- educ incomeln), ///
    cov(educ*incomeln) standardized

(29 observations with missing values excluded)

Endogenous variables
  Observed: bmi

Exogenous variables
  Observed: educ incomeln

Fitting target model:
Iteration 0:   log likelihood = -2725.2995  
Iteration 1:   log likelihood = -2725.2995  

Structural equation model                                  Number of obs = 451
Estimation method: ml

Log likelihood = -2725.2995

-------------------------------------------------------------------------------
              |                 OIM
 Standardized | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
Structural    |
  bmi         |
         educ |  -.1784099    .049314    -3.62   0.000    -.2750635   -.0817562
     incomeln |  -.2345947   .0488561    -4.80   0.000    -.3303509   -.1388385
        _cons |   7.330909   .4749491    15.44   0.000     6.400026    8.261792
--------------+------------------------------------------------------------

You could have also done this in the sembuilder in Stata:
Type sembuilder in the command window
Click "add a regression component"
![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

Click add a covariance
![image-3.png](attachment:image-3.png)

Click educ and drag the line to incomeln
![image-4.png](attachment:image-4.png)

Click Estimation
![image-5.png](attachment:image-5.png)

Go to the Reporting tab and click "Display standardized coefficients and values"
![image-6.png](attachment:image-6.png)

![sem_bmi_educ_incomeln%20et%20std.png](attachment:sem_bmi_educ_incomeln%20et%20std.png)

Now, let's say we think the quickfood variable is a mediator that can explain 

I recommend opening a new sembuilder to make the mediater path analysis

![image.png](attachment:image.png)

Using the mouse icon, spread all the variables out:
![image-2.png](attachment:image-2.png)

Use the covariance and path tools to make the paths:
![image-3.png](attachment:image-3.png)

Your final mediator path model should look like this:
![sem_bmi_quickfood%20mediator.png](attachment:sem_bmi_quickfood%20mediator.png)

Click estimation to run model:
![sem_bmi_quickfood%20mediator%20est%20std.png](attachment:sem_bmi_quickfood%20mediator%20est%20std.png)

In [18]:
*of you can run the estimate the model using the following code:
sem (bmi <- educ quickfood incomeln) ///
    (quickfood <- educ) ///
    (quickfood <- incomeln), ///
    standardized cov( educ*incomeln)

(32 observations with missing values excluded)

Endogenous variables
  Observed: bmi quickfood

Exogenous variables
  Observed: educ incomeln

Fitting target model:
Iteration 0:   log likelihood = -3387.1928  
Iteration 1:   log likelihood = -3387.1928  

Structural equation model                                  Number of obs = 448
Estimation method: ml

Log likelihood = -3387.1928

-------------------------------------------------------------------------------
              |                 OIM
 Standardized | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
Structural    |
  bmi         |
    quickfood |   .1599279   .0455764     3.51   0.000     .0705998     .249256
         educ |  -.1667029   .0491219    -3.39   0.001    -.2629801   -.0704256
     incomeln |   -.188049    .050615    -3.72   0.000    -.2872526   -.0888454
        _cons |   6.495662   .5430899    11.96   0.000     5.4312

Let's combine the results into a table:

| relationship | First path model | p | Mediator path model | p |
| --- | --- | --- |  --- | --- | 
| Education on BMI | -.178 | 0.000 | -.167*** | 0.001 |
| Income on BMI | -.235 | 0.000 | -.188*** | 0.000 |
| Education -> quickfood -> BMI | | | (-.0657723)*(.1599279) = -.011 | (0.000)*(0.196) |
| Income -> quickfood -> BMI | | | (-.2854332)*(.1599279) = -.046*** | (0.000)*(0.000) |

Here, we find the number of times a family eats fastfood is does not mediates the effect of education on BMI. We find the number of times a family eats fastfood does not mediated the effect of income on BMI.

# PCF PRACTICE

In [19]:
*Answer question 1
use "http://www.stata-press.com/data/dsemusr/nlsy97cfa.dta", clear

codebook x11 x12 x13




--------------------------------------------------------------------------------
x11                                             HOW OFT R FELT DOWN OR BLUE 2008
--------------------------------------------------------------------------------

                  Type: Numeric (float)
                 Label: vlT2782800

                 Range: [1,4]                         Units: 1
         Unique values: 4                         Missing .: 1,690/8,985

            Tabulation: Freq.   Numeric  Label
                          108         1  All of the time
                          654         2  Most of the time
                        4,031         3  Some of the time
                        2,502         4  None of the time
                        1,690         .  

--------------------------------------------------------------------------------
x12                                             HOW OFT R BEEN HAPPY PERSON 2008
---------------------------------------------------------

In [1]:
gen x13_reversed=5-x13

In [6]:
codebook x11-x13 x12_reversed


--------------------------------------------------------------------------------
x11                                             HOW OFT R FELT DOWN OR BLUE 2008
--------------------------------------------------------------------------------

                  Type: Numeric (float)
                 Label: vlT2782800

                 Range: [1,4]                         Units: 1
         Unique values: 4                         Missing .: 1,690/8,985

            Tabulation: Freq.   Numeric  Label
                          108         1  All of the time
                          654         2  Most of the time
                        4,031         3  Some of the time
                        2,502         4  None of the time
                        1,690         .  

--------------------------------------------------------------------------------
x12                                             HOW OFT R BEEN HAPPY PERSON 2008
-----------------------------------------------------------

In [7]:
factor x11 x13 x12_reversed, pcf

(obs=7,183)

Factor analysis/correlation                      Number of obs    =      7,183
    Method: principal-component factors          Retained factors =          1
    Rotation: (unrotated)                        Number of params =          3

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      1.95291      1.34899            0.6510       0.6510
        Factor2  |      0.60393      0.16076            0.2013       0.8523
        Factor3  |      0.44316            .            0.1477       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(3)  = 4659.15 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    ---------------------------------------
        Variable |  Factor1 |

In [9]:
alpha x11 x13 x12_reversed, std item label asis


Test scale = mean(standardized items)

Items        | S  it-cor  ir-cor   ii-cor   alpha   Label
-------------+------------------------------------------------------------------
x11          | +   0.838   0.609    0.398   0.570   HOW OFT R FELT DOWN OR BLUE
             |                                        2008
x13          | +   0.800   0.537    0.492   0.660   HOW OFT R DEPRESSED LAST
             |                                        MONTH 2008
x12_reversed | +   0.786   0.506    0.533   0.695   
-------------+------------------------------------------------------------------
Test scale   |                      0.474   0.730   mean(standardized items)
--------------------------------------------------------------------------------


Alpha score is above .7 thus pass reliability.

In [10]:
quietly factor x11 x13 x12_reversed, pcf
predict Depress



(option regression assumed; regression scoring)

Scoring coefficients (method = regression)

    ------------------------
        Variable |  Factor1 
    -------------+----------
             x11 |  0.43398 
             x13 |  0.40827 
    x12_reversed |  0.39627 
    ------------------------



# Practice

* Open the path data (path.dta on catcourses). Here the unit of analysis is students. 
* Use the codebook command the examine the following variables: read21 (reading at age 21), read7( reading at age 7), and momed (educational attainment of mom who raised child). 
* Estimate a SEM model to see if educational attainment of child's mom mediate the effect read7 on read21. Make sure to use standardized coefficients. 
* Make a figure of the path model for (read21 <- read7) and the mediating model.

In [20]:
*Answer question 2
cd "C:\Users\acade\Documents\teaching\SOC 211 spring 2023\week8"
use "path.dta", clear


C:\Users\acade\Documents\teaching\SOC 211 spring 2023\week8



In [22]:
codebook read21 read7 momed


--------------------------------------------------------------------------------
read21                                                      Reading at age 21???
--------------------------------------------------------------------------------

                  Type: Numeric (byte)

                 Range: [35,84]                       Units: 1
         Unique values: 37                        Missing .: 70/430

                  Mean: 73.6722
             Std. dev.: 8.51532

           Percentiles:     10%       25%       50%       75%       90%
                             64        69        76        80        82

--------------------------------------------------------------------------------
read7                                                              Read at age 7
--------------------------------------------------------------------------------

                  Type: Numeric (byte)

                 Range: [18,61]                       Units: 1
         Unique values: 39

In [21]:
*First see if there is a significant relationship between read21 <- read7
sem (read21 <- read7), standardized

(93 observations with missing values excluded)

Endogenous variables
  Observed: read21

Exogenous variables
  Observed: read7

Fitting target model:
Iteration 0:   log likelihood = -2329.8008  
Iteration 1:   log likelihood = -2329.8008  

Structural equation model                                  Number of obs = 337
Estimation method: ml

Log likelihood = -2329.8008

------------------------------------------------------------------------------
             |                 OIM
Standardized | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Structural   |
  read21     |
       read7 |   .5296785   .0363378    14.58   0.000     .4584577    .6008993
       _cons |   6.701316   .3924968    17.07   0.000     5.932036    7.470595
-------------+----------------------------------------------------------------
var(e.read21)|   .7194407   .0384947                      .6478139    .7989871
----------

![read21%20and%20read7.png](attachment:read21%20and%20read7.png)

We find a significant positive relationship between reading level at age 7 to reading level at age 21.

In [23]:
*Now we check the mediating effect
sem (read21 <- read7) ///
    (read21 <- momed) ///
    (momed <- read7) , standardized

(102 observations with missing values excluded)

Endogenous variables
  Observed: read21 momed

Exogenous variables
  Observed: read7

Fitting target model:
Iteration 0:   log likelihood = -2949.4532  
Iteration 1:   log likelihood = -2949.4532  

Structural equation model                                  Number of obs = 328
Estimation method: ml

Log likelihood = -2949.4532

------------------------------------------------------------------------------
             |                 OIM
Standardized | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Structural   |
  read21     |
       momed |   .0494696   .0470411     1.05   0.293    -.0427293    .1416686
       read7 |   .5238079   .0374437    13.99   0.000     .4504196    .5971963
       _cons |   6.279877   .5211504    12.05   0.000     5.258441    7.301313
  -----------+----------------------------------------------------------------
  m

![read21%20and%20read7%20mediating.png](attachment:read21%20and%20read7%20mediating.png)

| relationship | First path model | p | Mediator path model | p |
| --- | --- | --- |  --- | --- | 
| read7 on read21 | .530 | 0.000 | .524 | 0.000 |
| read7 -> momed -> read21 | | | (.1164429)*(.0494696) = .00576038 | (0.032)*(0.293) |

We find that mom's educational attainment does not mediate the effect of read7 on read21.