-
Notifications
You must be signed in to change notification settings - Fork 13
Notes on I‐GSCA Implementation
These are developer notes on how I-GSCA is implemented in the cSEM notes. It is intended to serve as both a refresher and reference to cSEM developers.
At a high level and omitting both (1) some mathematical details and (2) accessory steps to prepare data for computation, the core of the algorithm is stored between helper_igsca.R and estimator_weights.R.
flowchart TD
A["`Initialize algorithm using
_cSEM::csem(..., .approach_weights = 'IGSCA')_`"]
B["`IGSCA core commences in
_calculateWeightsIGSCA()_`"]
C["`Initial estimates for IGSCA and ALS are found using
_initializeAlsEstimates()_,
which internally calls
_initializeIgscaEstimates()_`"]
D["`Alternating Least Squares Algorithm`"]
E["`Flip the signs of the **C**, **B** and **Gamma** using
_flipGammaCBDSigns()_`"]
Exit[(Exit)]
A --> B --> C --> D --> E --> Exit
-
Cstands for loadings matrix -
Bstands for structural-coefficients matrix -
Gammastands for the construct scores (i.e., factor scores and composite scores), as computed from the matrix multiplication of the indicators (Z) and weights (W) matrix.
flowchart TD
init["`Provides initial estimates of W, C and B using **GSCA** as implemented in
_initializeIgscaEstimates()_`"]
init2["`Initialize:
(1) **V**
(2) **Z** as standardized indicators matrix
(3) **Gamma** as matrix multiplication of **Z** and **W**
(4) **DU** through SVD`"]
Exit[(Exit)]
init --> init2 --> Exit
-
DUcorresponds to the uniqueness terms. (Uis capitalised because lower-caseucorresponds to SVD output and not the error/uniqueness terms we are interested in.)
- Weights are done in one function
- Loadings are done in a
forloop sequentially for each construct variable
flowchart TD
ChangeEps["`Evaluate if both are true:
(1) The sum of the absolute changes in a sub-set of the parameter estimates since the last iterations is more than _ceps_
(2) If the number of elapsed iterations is less than or equal to _itmax_`"]
pseudoWeights["`Create pseudo-weights **X** and **WW** using
_updateXAndWW()_`"]
ConstructUpdate(("`Use a _for_ loop that iterates through each construct variable and updates their associated **W**, **Gamma** and **V**`"))
UpdateComposite(("`Update using **X** and
_updateCompositeTheta()_`"))
UpdateCommonFactor(("`Update using **WW** and
_updateCommonFactorTheta()_`"))
LoadingsFactors["`Update **C**, **B**, **D**, **uniqueD**, list of parameter estimates and **U** using
_updateCBDU()_`"]
Exit[(Exit)]
subgraph while-loop
Start --> ChangeEps -->|TRUE|pseudoWeights --> ConstructUpdate --> LoadingsFactors --> ChangeEps
subgraph for-loop
ConstructUpdate -.->|Factor|UpdateCommonFactor -.-> ConstructUpdate-.->|Composite|UpdateComposite -.->ConstructUpdate
end
end
ChangeEps -- FALSE ----> Exit
-
XandWWis what I call 'pseudo-weights' because they are not actually weights, but they're used to estimate the weights of used to compute the construct scores from the indicators, depending on whether the indicator corresponds to a composite variable (X) or latent factor (WW) -
Dis the matrix of error terms associated with each latent variable onto its indicators (TechnicallyDuis.) -
Vincludess, which includesDu(meaning,DU)
- Unable to have cross-loadings between one indicator and multiple construct variables
- Construct variables and indicators cannot be regressed onto each other in the structural model (outside of the measurement model)
- Higher-order constructs cannot be modelled
- Only nomological composites are supported, not canonical.
-
.instrumentsdoes not do anything - Computation of reliability is incorrect
- R^2 is not correctly computed
- Indicators can only correspond to either a composite or a common factor variable, but not both. One reason why is because of how
extract_parseModel()currently works. - Although the R implementation of IGSCA contained is computationally equivalent (testthat::expect_equivalent) to the Matlab version, it is currently not equivalent to the GSCAPro output.
- .GSCA_modes is currently a largely non-functional place holder argument
- The current IGSCA fitting algorithm in Matlab and R runs deterministically, whereas GSCAPro permits random initial weight estimates in the user-facing settings. It may be the case that numerical differences between GSCAPro versus Matlab & R, are due to what random starting values are used.
- ~ .0025- .0026 numerical difference between current R and Matlab versions due to side-stepping Kronecker product by using correlation matrix, in-favor of speed. This trade-off is not done just for speed because both R and Matlab versions use matrix-inversions/pseudo-inverses, which are numerically unstable. Future development versions will investigate the wholesale by-passing of both Kronecker products and matrix inversions for both accuracy and speed.
To ensure correctness, test-igsca.R can be continuously sourced in-order to ensure that any modifications have not made cSEM::igsca() diverge markedly from the other implementations.
When doing regular GSCA, the dis-attenuation of weights is set to FALSE.
flowchart TD
A["`Begin using _cSEM::csem()_ in *00_csem.R*`"]
B["`Loads arguments using _match.arg()_ and _handleArgs()_ to *args* object`"]
C["`Error and argument checking`"]
D["`Parses model using _parseModel()_ to *model_original* object`"]
E["`Model modification if second-order constructs are detected`"]
F["`Case selection and model-fitting via
_out<-do.call(foreman, args_needed)_`"]
G["`Output class selection for post-result method handling.`"]
H["`Optional Resampling using
_resamplecSEMResults()_`"]
A --> B --> C --> D --> E --> F --> G --> H
Regarding foreman() in 00_foreman.R
The following graph needs work, but ATM I just needed something I could see without scrolling
flowchart TD
A["`Begin using _cSEM:::foreman()_ in *00_foreman.R*`"]
B["`Stores _parseModel()_ in *csem_model*`"]
C["`Stores _processData()_ in *X_cleaned*`"]
D["`Computes empirical correlation/covariance matrix using _calculateIndicatorCor()_ in *Cor*`"]
E["`Standardizes data of *X_cleaned* to *X* using _scale(data.matrix(X_cleaned))_`"]
F{{"`Select between different ways of calculating weights and use estimator-functions in *estimators_weights.R* and *helper_estimators_weights.R* and stores in *W*`"}}
G["`Sets dominant indicators using _setDominantIndicator()_ to *W$W*`"]
H["`Compute reliabilities using _calculateReliabilities()_ from *helper_foreman.R* and stores in *LambdaQ2W*`"]
I["`Dis-attenuation argument modification`"]
J["`Compute Theta`"]
K["`Compute H, the construct scores`"]
L["`Compute C the proxy-covariance matrix using
_calculateCompositeVCV()_`"]
M["`Compute P the construct correlation matrix using
_calculateConstructVCV()_`"]
N["`Estimate structural coefficients using
_estimatePath()_`"]
Exit[("`return(out)`")]
subgraph subgraph1
direction LR
A --> B --> C --> D
end
subgraph subgraph2
direction TB
subgraph1 --> E
end
subgraph subgraph3
direction RL
subgraph2 --> F --> G
end
subgraph3 --> H --> I --> J --> K --> L --> M --> N --> Exit
The implementation of IGSCA in cSEM was compared against GSCAPro Version 1.2.1 and a Matlab version kindly sent by Dr. Heungsun Hwang.
The .csv files for the output of GSCAPro are stored in tests/comparisons/igsca_translation/GSCAPro_1_2_1Output.
These .csv files were formatted for comparison using cSEM::get_lavaan_table_igsca_gscapro()
The .RData for comparing between GSCAPro and csem::igsca() are found in tests/data/igsca_gscapro.RData/
The code for interacting with GSCAPro results can be found in get_lavaan_Table_igsca_gscapro.R and parse_GSCAPro_FullResults.R
igsca_sim_test.m is a modified version of the original igsca_sim.m sent by Dr. Hwang. The modifications were made to facilitate the repeated execution of the exemplary model.
To better align with GSCAPro, ceps was also changed to 0.0001 in both igsca_sim_test.m and in csem using the .tolerance csem argument during testing for equivalence.
cSEM::csem() can better match the Matlab version through the .conv_criterion argument in cSEM::csem(, .conv_criterion = ). After each iteration of ALS, the original Matlab version checks for the average change in a sub-set of the estimated parameters (.conv_criterion = "mean_diff_absolute"). whereas here we check for the sum of the change in a sub-set of the estimated parameters (.conv_criterion = "sum_diff_absolute").
The .csv files for the input to igsca_sim.m are stored in tests/comparisons/igsca_translation/matlab_in. The .csv files were generated from write_for_matlab() in write_for_matlab.R.
The results of igsca_sim_test.m can be read into R using R.matlab::readMat("FILEDIRECTORY/FILENAME.MAT"). Then, the individual matrices can be converted into a summary table for test comparison using cSEM::get_lavaan_table_igsca_matrix(). This summary table was converted into a .RData file called igsca_matlab.RData. The code for this procedure can be found in Matlab_out2R.R
The .RData for comparing between Matlab and cSEM::igsca() are found in tests/data/igsca_matlab.RData/
Here, we adopt the terminology that:
- Latent variables, common factors, factors, latent common factors are all the same concept
- Component variables, composite variables, weighted sum scores are all the same concept
- Latent and component variables are sub-sets of the more general class of construct variables.