Skip to content

Notes on I‐GSCA Implementation

emstruong edited this page Jun 4, 2024 · 38 revisions

These are developer notes on how I-GSCA is implemented in the cSEM notes. It is intended to serve as both a refresher and reference to cSEM developers.

Implementation of cSEM::igsca()

Flow Chart

At a high level and omitting both (1) some mathematical details and (2) accessory steps to prepare data for computation, the core of the algorithm is stored between helper_igsca.R and estimator_weights.R.

flowchart TD
    A["`Initialize algorithm using 
_cSEM::csem(..., .approach_weights = 'IGSCA')_`"]
    B["`IGSCA core commences in 
_calculateWeightsIGSCA()_`"]
    C["`Initial estimates for IGSCA and ALS are found using
_initializeAlsEstimates()_, 
which internally calls
_initializeIgscaEstimates()_`"] 
    D["`Alternating Least Squares Algorithm`"] 
    E["`Flip the signs of the **C**, **B** and **Gamma** using 
_flipGammaCBDSigns()_`"]
    Exit[(Exit)]
    A --> B --> C --> D --> E --> Exit
Loading
  • C stands for loadings matrix
  • B stands for structural-coefficients matrix
  • Gamma stands for the construct scores (i.e., factor scores and composite scores), as computed from the matrix multiplication of the indicators (Z) and weights (W) matrix.

initializeAlsEstimates()

flowchart TD
    init["`Provides initial estimates of W, C and B using **GSCA** as implemented in
_initializeIgscaEstimates()_`"]
    init2["`Initialize:
(1) **V**
(2) **Z** as standardized indicators matrix
(3) **Gamma** as matrix multiplication of **Z** and **W**
(4) **DU** through SVD`"]
    Exit[(Exit)]
    init --> init2 --> Exit
Loading
  • DU corresponds to the uniqueness terms. (U is capitalised because lower-case u corresponds to SVD output and not the error/uniqueness terms we are interested in.)

Alternating Least Squares Algorithm

  • Weights are done in one function
  • Loadings are done in a for loop sequentially for each construct variable
flowchart TD

    ChangeEps["`Evaluate if both are true:
(1) The sum of the absolute changes in a sub-set of the parameter estimates since the last iterations is more than _ceps_
(2) If the number of elapsed iterations is less than or equal to _itmax_`"]
    pseudoWeights["`Create pseudo-weights **X** and **WW** using 
_updateXAndWW()_`"]
    ConstructUpdate(("`Use a _for_ loop that iterates through each construct variable and updates their associated **W**, **Gamma** and **V**`"))
    UpdateComposite(("`Update using **X** and
_updateCompositeTheta()_`"))
UpdateCommonFactor(("`Update using **WW** and
_updateCommonFactorTheta()_`"))
    LoadingsFactors["`Update **C**, **B**, **D**, **uniqueD**, list of parameter estimates and **U** using
_updateCBDU()_`"]
    Exit[(Exit)]

    subgraph while-loop
    Start --> ChangeEps -->|TRUE|pseudoWeights --> ConstructUpdate --> LoadingsFactors --> ChangeEps
    subgraph for-loop
       ConstructUpdate -.->|Factor|UpdateCommonFactor -.-> ConstructUpdate-.->|Composite|UpdateComposite -.->ConstructUpdate 
    end
end
    ChangeEps -- FALSE ----> Exit
Loading
  • X and WW is what I call 'pseudo-weights' because they are not actually weights, but they're used to estimate the weights of used to compute the construct scores from the indicators, depending on whether the indicator corresponds to a composite variable (X) or latent factor (WW)
  • D is the matrix of error terms associated with each latent variable onto its indicators (Technically Du is.)
  • V includes s, which includes Du (meaning, DU)

Current Limitations

Suspected

  • Unable to have cross-loadings between one indicator and multiple construct variables
  • Construct variables and indicators cannot be regressed onto each other in the structural model (outside of the measurement model)
  • Higher-order constructs cannot be modelled
  • Only nomological composites are supported, not canonical.

Confirmed

  • .instruments does not do anything
  • Computation of reliability is incorrect
  • R^2 is not correctly computed
  • Indicators can only correspond to either a composite or a common factor variable, but not both. One reason why is because of how extract_parseModel() currently works.
  • Although the R implementation of IGSCA contained is computationally equivalent (testthat::expect_equivalent) to the Matlab version, it is currently not equivalent to the GSCAPro output.
  • .GSCA_modes is currently a largely non-functional place holder argument
  • The current IGSCA fitting algorithm in Matlab and R runs deterministically, whereas GSCAPro permits random initial weight estimates in the user-facing settings. It may be the case that numerical differences between GSCAPro versus Matlab & R, are due to what random starting values are used.
  • ~ .0025- .0026 numerical difference between current R and Matlab versions due to side-stepping Kronecker product by using correlation matrix, in-favor of speed. This trade-off is not done just for speed because both R and Matlab versions use matrix-inversions/pseudo-inverses, which are numerically unstable. Future development versions will investigate the wholesale by-passing of both Kronecker products and matrix inversions for both accuracy and speed.

Development Cycle

To ensure correctness, test-igsca.R can be continuously sourced in-order to ensure that any modifications have not made cSEM::igsca() diverge markedly from the other implementations.

Integration with library(cSEM)

When doing regular GSCA, the dis-attenuation of weights is set to FALSE.

csem()

flowchart TD
    A["`Begin using _cSEM::csem()_ in *00_csem.R*`"]
    B["`Loads arguments using _match.arg()_ and _handleArgs()_ to *args* object`"]
    C["`Error and argument checking`"]
    D["`Parses model using _parseModel()_ to *model_original* object`"]
    E["`Model modification if second-order constructs are detected`"]
    F["`Case selection and model-fitting via 
_out<-do.call(foreman, args_needed)_`"]
    G["`Output class selection for post-result method handling.`"]
    H["`Optional Resampling using 
_resamplecSEMResults()_`"]
A --> B --> C --> D --> E --> F --> G --> H
Loading

foreman()

Regarding foreman() in 00_foreman.R

The following graph needs work, but ATM I just needed something I could see without scrolling

flowchart TD

    A["`Begin using _cSEM:::foreman()_ in *00_foreman.R*`"]
    B["`Stores _parseModel()_ in *csem_model*`"]
    C["`Stores _processData()_ in *X_cleaned*`"]
    D["`Computes empirical correlation/covariance matrix using _calculateIndicatorCor()_ in *Cor*`"]
    E["`Standardizes data of *X_cleaned* to *X* using _scale(data.matrix(X_cleaned))_`"]
    F{{"`Select between different ways of calculating weights and use estimator-functions in *estimators_weights.R* and *helper_estimators_weights.R* and stores in *W*`"}}
    G["`Sets dominant indicators using _setDominantIndicator()_ to *W$W*`"]
    H["`Compute reliabilities using _calculateReliabilities()_ from *helper_foreman.R* and stores in *LambdaQ2W*`"]
    I["`Dis-attenuation argument modification`"]
    J["`Compute Theta`"]
    K["`Compute H, the construct scores`"]
    L["`Compute C the proxy-covariance matrix using
_calculateCompositeVCV()_`"]
    M["`Compute P the construct correlation matrix using 
_calculateConstructVCV()_`"]
    N["`Estimate structural coefficients using
_estimatePath()_`"]
    Exit[("`return(out)`")]
    
    subgraph subgraph1
        direction LR       
        A --> B --> C --> D
    end
    subgraph subgraph2
        direction TB
        subgraph1 --> E
    end
    subgraph subgraph3
        direction RL
        subgraph2 --> F --> G
    end

subgraph3 --> H --> I --> J --> K --> L --> M --> N --> Exit
Loading

Comparisons with Other Implementations

The implementation of IGSCA in cSEM was compared against GSCAPro Version 1.2.1 and a Matlab version kindly sent by Dr. Heungsun Hwang.

GSCAPro V1.2.1

The .csv files for the output of GSCAPro are stored in tests/comparisons/igsca_translation/GSCAPro_1_2_1Output.

These .csv files were formatted for comparison using cSEM::get_lavaan_table_igsca_gscapro()

The .RData for comparing between GSCAPro and csem::igsca() are found in tests/data/igsca_gscapro.RData/

The code for interacting with GSCAPro results can be found in get_lavaan_Table_igsca_gscapro.R and parse_GSCAPro_FullResults.R

Matlab: igsca_sim_test.m

igsca_sim_test.m is a modified version of the original igsca_sim.m sent by Dr. Hwang. The modifications were made to facilitate the repeated execution of the exemplary model.

To better align with GSCAPro, ceps was also changed to 0.0001 in both igsca_sim_test.m and in csem using the .tolerance csem argument during testing for equivalence.

cSEM::csem() can better match the Matlab version through the .conv_criterion argument in cSEM::csem(, .conv_criterion = ). After each iteration of ALS, the original Matlab version checks for the average change in a sub-set of the estimated parameters (.conv_criterion = "mean_diff_absolute"). whereas here we check for the sum of the change in a sub-set of the estimated parameters (.conv_criterion = "sum_diff_absolute").

The .csv files for the input to igsca_sim.m are stored in tests/comparisons/igsca_translation/matlab_in. The .csv files were generated from write_for_matlab() in write_for_matlab.R.

The results of igsca_sim_test.m can be read into R using R.matlab::readMat("FILEDIRECTORY/FILENAME.MAT"). Then, the individual matrices can be converted into a summary table for test comparison using cSEM::get_lavaan_table_igsca_matrix(). This summary table was converted into a .RData file called igsca_matlab.RData. The code for this procedure can be found in Matlab_out2R.R

The .RData for comparing between Matlab and cSEM::igsca() are found in tests/data/igsca_matlab.RData/

Terminological Differences

Here, we adopt the terminology that:

  • Latent variables, common factors, factors, latent common factors are all the same concept
  • Component variables, composite variables, weighted sum scores are all the same concept
  • Latent and component variables are sub-sets of the more general class of construct variables.

Clone this wiki locally