/
cSEM.Rmd
742 lines (618 loc) · 34.1 KB
/
cSEM.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
---
title: "Introduction to cSEM"
date: "Last edited: `r Sys.Date()`"
author: Manuel Rademaker
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
toc: true
includes:
before_body: preamble.mathjax.tex
vignette: >
%\VignetteIndexEntry{csem-basic}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: ../inst/REFERENCES.bib
---
<!-- used to print boldface greek symbols (\mathbf only works for latin symbols) -->
```{r child="preamble-tex.tex", echo=FALSE, include=FALSE}
```
# Preface
Structural equation modeling (SEM) has been used and developed for decades
across various research fields such as (among others) psychology,
sociology, and business research.
As an almost inevitable consequence, a different terminology system and, to some extent,
mathematical notation has evolved within each field over the years. This
"terminology mess" is one of the major obstacles in interdisciplinary
research and (scientific) debates, since it hampers a broader understanding of
methodological issues, and, even worse, promotes systematic misuse (e.g., the use
of Cronbach's alpha as an estimator for congeneric reliability). This is particularly
true for users new to SEM or practitioners who are overwhelmed
by terminology of their own field only to find that the term they
just thought to have finally completely grasped is defined differently in another field,
making the confusion complete. A prime example is the term "formative" (measurement)
which has been used to describe a causal-formative as well as [a composite model](#cmodel).
See e.g., @Henseler2017 for a clarification.
Ultimately, this is a matter of (mis)communication which, in our opinion, can
only be satisfactorily solved by establishing a clear, unambiguous definition
for each term and symbol that is used within the package. We emphasize that
we do not seek to impose "our" conventions, nor do we claim they are the
"correct" conventions, but merely seek to make communication
between us (the authors of the package) and you (the package user) as
unambiguous and error-free as possible.
Hence, we provide a [Terminology][Terminology] and a [Notation][Notation]
file that contains key terms and mathematical notation/symbols that we
consider crucial, alongside our definition. Users are encouraged to read these
files carefully to avoid potential misunderstandings.
1. The [Terminology][Terminology] file contains each term that we
think should be defined and explained in order to make sure
package users understand complementary help files and vignettes the way
the package authors intended them to be understood.
1. The [Notation][Notation] files contains some fundamental mathematical notation/
symbols used in the package documentation alongside a definition. Bearing some
exceptions, we mostly follow standard notation as laid out in e.g., @Bollen1989.
The package is designed based on a set of principles and a terminology that
is partly in contrast to commonly used open-source or commercial software packages
offering similar content (e.g., smartPLS).
Together with the [Terminology][Terminology] and [Notation][Notation] file this introduction seeks to explain
these principles.
Lastly, to use **cSEM** effectively, it is helpful to understand its design.
Hence, the package architecture and design as well as what we consider the "cSEM workflow" are discussed.
# Composite-based structural equation modeling
## What is structural equation modeling (SEM)
[Structural equation modeling (SEM)][Terminology] is about analyzing,
i.e., modeling, estimating, assessing, and testing, the (causal) relationships
between [concepts][Terminology] - an entity defined by a conceptual definition - with other
[concepts][Terminology] and/or observable quantities generally referred to as
[indicators, manifest variables or items][Terminology].
Broadly speaking, two modeling approaches for the [concepts][Terminology] and their relationship exist.
We refer to the first as [the latent variable or common factor model](#lvmodel) and to
the second as [the composite model](#cmodel). Each approach entails a set of methods, test,
and evaluation criteria as well as a specific terminology that may or may not
be adequate within the realm of the other approach.
### The classical latent variable or common factor model {#lvmodel}
Assuming a researcher identifies $J$ [concepts][Terminology] and $K$ indicators,
the fundamental feature
of the latent variable model is the assumption of the existence of a set of $J$
[latent variables (common factors)][Terminology] that each serve as a representation of
one of the $J$ [concepts][Terminology] to be studied in a sense that each
[latent variable][Terminology] is
causally responsible for the manifestations of a set of $K_j$ [indicators][Terminology] which
are supposed to measure the [concept][Terminology] in question.
The entirety of these measurement relations is captured by the **measurement model**
which relates [indicators][Terminology] to [latent variables][Terminology]
according to the researchers theory of how observables are related to the
[concepts][Terminology] in question. The entirety of the relationships between
[concepts][Terminology] (i.e., its representation in a statistical model, the [construct][Terminology]) is
captured by the **structural model** whose parameters are usually at the center
of the researchers interest. Caution is warranted though as the [common factor][Terminology]
and its respective [concept][Terminology] are not the same thing. Within the "classical"
[covariance-based or factor-based][Terminology] literature [concept][Terminology],
[construct][Terminology], [latent variable][Terminology]
and its representation the
[common factor][Terminology] have often been used interchangeably
[@Rigdon2012; @Rigdon2016; @Rigdon2019].
This will not be the case in **cSEM** and readers are explicitly made aware of the
fact that [concepts][Terminology] are the abstract entity which *may* be modeled by
a [common factor][Terminology], however, no assertion as to the correctness of this approach
in terms of "closeness of the [common factor][Terminology] and its related [concept][Terminology]" are made.
Parameters in latent variable models are usually retrieved by
maximum likelihood (ML). The basic idea of ML is to find parameters such that the
difference between the model-implied and the empirical indicator covariance matrix
is minimized. Such estimation methods are therefore often referred to as
[covariance-based methods][Terminology].
### The composite model {#cmodel}
The second approach is known as the composite model. As opposed to the latent
variable or common factor model, composites do not presuppose the existence of
a [latent variable][Terminology]. Hence, designed entities (artifacts) such as the
"OECD Better Life Index" that arguably have no latent counterpart may be
adequately described by a [composite][Terminology], i.e., the linear combination of observables defining
the composite. Composites may also be formed to represent [latent variables/common factors][Terminology]
(or more precisely [concepts][Terminology] modeled as [common factors][Terminology]) in which case
the [composite][Terminology] serves as a [proxy or stand-in][Terminology] for the [latent variable][Terminology].
However, in **cSEM**, the term "composite *__model__*" is only used to refer to a model
in a the former sense, i.e., a model in which the [composite][Terminology] is a direct representation
of [concept/construct][Terminology]!
Parameters in composite models are retrivied by a composite-based approach such as partial
least squares path modeling (PLS-PM), generalized structured component analysis (GSCA) or
dimension reduction techniques such as principal component analysis (PCA). The basic idea of any composite-based approach
is to build scores/composites for each concept and subsequently retrieve structural
model parameters by a series of (linear) regressions.
Such estimation methods are therefore often referred to as
[variance-based methods][Terminology] as regression maximizes the explained variance
of the dependent variable.
## What is composite-based SEM?
Composite-based SEM is the entirety of methods, approaches, procedures, algorithms
that in some way or another involve linear compounds ([composites/proxies/scores][Terminology]),
i.e., linear combinations of
observables when retrieving (estimating) quantities of interest such as the
coefficients of the structural model. It is crucial to clearly distinguish between
the **composite model** and **composite-based SEM**. They are not the same.
While the former is "only" *__a__* *statistical model* relating [concepts][Terminology] to observables,
the latter simply states that [composites][Terminology] - linear compounds, i.e., weighted linear combinations
of observables - are used to retrieve quantities of interest!
Hence, composite-based SEM as a way of obtaining/estimating parameters of interest
may thus be used for [the latent variable or common factor model](#lvmodel) as well
as [the composite model](#cmodel). However, interpretation of the parameter
estimates is fundamentally different since the underlying models differ!
As sketched above, common factor and composite models fundamentally differ in
how the relation between observables and [concepts][Terminology] is modeled.
Naturally, results and, most notably, their (correct/meaningful) interpretation
critically hinge on what type of model the user
specifies. Across the package we therefore strictly distinguish between
- [Concepts][Terminology]/[Constructs][Terminology] modeled as common factors (or alternatively latent variables)
- [Concepts][Terminology]/[Constructs][Terminology] modeled as composites
These phrases will repeatedly appear in help files and other complementary files. It
is therefore crucial to remember what they supposed to convey.
# Using cSEM
The idea of cSEM is twofold:
1. Provide a unified framework to all common composite-based SEM approaches,
including typical postestimation procedures. In principal, it should be
similar to what [lavaan](https://lavaan.ugent.be/) does for covariance-bases/factor-based SEM.
2. Make the user experience as hassle-free as possible.
The first point is an always ongoing task since approaches are constantly evolving
with new developments appearing at a pace that we, the package authors, will not
be able to keep up with. The second point, however, was particularly important
to us as we have been frustrated ourselves by how technical, unfriendly packages
in R can be. Hence, from the very start we envisioned a workflow that essentially
only comprises three steps:
1. **Get the essential**: no estimator or approach works without data and a description
of what parameters are to be estimated and how data is related to these parameters,
i.e. a model. Hence, we always need a data set and model.
Since, model specification in [lavaan model syntax](https://lavaan.ugent.be/tutorial/syntax1.html) is probably unbeatable in its
ease and well known to R users that have an interest in SEM, to us, [lavaan model syntax](https://lavaan.ugent.be/tutorial/syntax1.html)
is the obvious tool for users to specify their model. Experience tells,
that for R beginners the biggest obstacle has been to get the data into R.
However, largely thanks to [RStudio](https://posit.co/), data import and data transformation
are nowadays relatively easy to handle. See the [Preparing the data](#preparedata)
and the [Specifying a model](#specifyingamodel) sections below.
1. **Estimate**: no matter the model and type of data, estimation is always done using one
central function with the data as its first and the model as its second argument:
```{r eval = FALSE}
csem(.data = my_data, .model = my_model)
```
Naturally, the `csem()` function has a number of additional arguments to fine-tune the estimation,
however, since `csem()` automatically recognizes, for instance, whether a concept
was modeled as a common factor or composite and automatically applies appropriate
correction for attenuation, default arguments are often sufficient. See the
[Estimate using csem()](#estimate) section below.
1. **Postestimate**: Inspired by the [grammar of data manipulation](https://dplyr.tidyverse.org/)
underlying the [dplyr](https://dplyr.tidyverse.org/) package, cSEM provides
5 postestimation verbs that concisely cover all common postestimation tasks
as well as 4 additional test commands and 2 general do commands:
1. `assess()`
1. `infer()`
1. `predict()`
1. `summarize()`
1. `verify()`
1. `testOMF()`
1. `testMICOM()`
1. `testHausman()`
1. `testMGD()`
1. `doIPMA()`
1. `doNonlinearRedundancyAnalysis()`
1. `doRedundancyAnalysis()`
All verbs accept the result of a call to `csem()` as input which makes working
with these function extremely simple. You only need to remember the word, not any
specific syntax or arguments. Of course, all functions have a number
of additional arguments to fine-tune the postestimation. See the
[Apply postestimation functions](#postestimate) sections below. For details
on the arguments consult the individual help files.
The price we pay for an increase in flexibility is primarily a, mostly minor, loss
in computational speed, in particular, when intense resampling is involved
(i.e., 5000 bootstrap run for a complex model with, say, 1000 observations).
Users looking for the most efficient implementation of common resampling routines
may find faster implementations. That said, we believe, the time saved when using a
standardized estimate-postestimate workflow, no matter the model or data used,
well outweighs the potential loss in computational
efficiency.
The following sections describe the workflow in more detail.
## The cSEM-Workflow
As described in the previous section, working with **cSEM** consists of 3-4 steps:
1. Prepare/load the data to analyze
1. Specify the model to estimate
1. Estimate using the `csem()` function
1. Apply postestimation functions to the result of 3.
### Prepare the data{#preparedata}
Technically, preparing the data does not require the **cSEM** and is therefore
better considered a preparation task, i.e., a "pre-cSEM" task. The reason why
this step is nevertheless considered an explicit part of the cSEM-workflow is motivated by the experience that
applied/causal users tend to shy away from software like R because
"just getting the data in" and understanding how to show, manipulate and work with
data can be frustrating if one is not aware of R's rich and easy to learn
data import and data processing capabilities. While these topics may have been
overwhelming for newcomers several years ago, data import and data transformation
have become extremely simple and user-friendly if the right tools packages
are used. The best place to start is
the [Rstudio Cheat sheet webpage](https://posit.co/resources/cheatsheets/),
especially the *Data Import* and the *Data Transformation* cheat sheets.
**cSEM** is relatively flexible as to the type of data accepted. Currently the following
data types/structures are accepted:
1. A `data.frame` or `tibble` with column names matching the indicator names used
in the lavaan model description of the measurement or composite model. Possible column
types or classes of the data provided are: `"logical"` (`TRUE`/`FALSE`),
`"numeric"` (`"double"` or `"integer"`), `"factor"` (`"ordered"` and/or `"unordered"`)
or a mix of several types. Additionally, the data may also include **one**
character column whose column name must be given to `.id`.
Values of this column are interpreted as group identifiers and `csem()`
will split the data by levels of that column and run the estimation for each
level separately.
**Example:**
Assuming the following simple model is to be estimated:
```{r}
model <- "
# Structural model
EXPE ~ IMAG
# Reflective measurement model
EXPE =~ expe1 + expe2
IMAG =~ imag1 + imag2
"
```
To estimate the model a data frame with $N$ rows (the observations) and
$K = 4$ columns with column names `expe1`, `imag1`, `expe2`, `imag2` is required.
The order of the columns in the dataset is irrelevant. In **cSEM** the order
is defined by the order in which the names appear in the measurement or
composite model equations in the
model description. In this case any resulting matrix or vector whose (row/column)
names contain the indicator names would have the order `expe1`, `expe2`,
`imag1`, `imag2`. More one model specification below.
1. A `matrix` with column names matching the indicator names used
in the lavaan model description of the measurement model or composite model
description.
1. A list of data frames or matrices. In this case estimation is repeated for
each data frame or matrix separately.
The current version 0.5.0 available on CRAN does not
provide any tools to handle missing values. Future versions are likely to include
at least the basic approaches for handling missing values. Regularly check <https://github.com/FloSchuberth/cSEM/>
to get the latest updates.
### Specify a model {#specifyingamodel}
Models are defined using [lavaan model syntax](https://lavaan.ugent.be/tutorial/syntax1.html).
Currently, only the "standard" lavaan model syntax is supported. This comprises:
1. The definition of a latent variable/common factor (or more precisely: the definition of a concept
modeled as a common factor) by the "`=~`" operator.
2. The definition of a composite (or more precisely: the definition of a concept
modeled as a composite) by the "`<~`" operator.
3. The specification of regression equations by the "`~`" operator.
4. The definition of error (co)variances, indicator correlations, or correlations
between exogenous constructs using the "`~~`" operator.
**cSEM** handles linear, nonlinear and hierarchical models. Syntax for each
model is illustrated below using variables of the build-in `satisfaction` dataset.
For more information see the [lavaan syntax tutorial](https://lavaan.ugent.be/tutorial/syntax1.html).
#### Linear models
A typical linear model would look like this:
```{r eval=FALSE}
model <- "
# Structural model
EXPE ~ IMAG
QUAL ~ EXPE
VAL ~ EXPE + QUAL
SAT ~ IMAG + EXPE + QUAL + VAL
LOY ~ IMAG + SAT
# Composite model
IMAG <~ imag1 + imag2 + imag3 # composite
EXPE <~ expe1 + expe2 + expe3 # composite
QUAL <~ qual1 + qual2 + qual3 + qual4 + qual5 # composite
VAL <~ val1 + val2 + val3 # composite
# Reflective measurement model
SAT =~ sat1 + sat2 + sat3 + sat4 # common factor
LOY =~ loy1 + loy2 + loy3 + loy4 # common factor
# Measurement error correlation
sat1 ~~ sat2
"
```
Note that the operator `<~` tells **cSEM** that the concept to its left is modeled as a composite;
the operator `=~` tells **cSEM** that the concept to its left is modeled
as a common factor. `~~` tells **cSEM** that the measurement errors of `sat1` and `sat2` are assumed to correlate.
#### Nonlinear models
Nonlinear terms are specified as interactions using the dot operator `"."`.
Nonlinear terms include interactions and exponential terms. The latter is
described in model syntax as an "interaction with itself", e.g.,
`x_1^3 = x1.x1.x1`. Currently the following terms are allowed
- Single, e.g., `eta1`
- Quadratic, e.g., `eta1.eta1`
- Cubic, e.g., `eta1.eta1.eta1`
- Two-way interaction, e.g., `eta1.eta2`
- Three-way interaction, e.g., `eta1.eta2.eta3`
- Quadratic and two-way interaction, e.g., `eta1.eta1.eta3`
A simple example would look like this:
```{r}
model <- "
# Structural model
EXPE ~ IMAG + IMAG.IMAG
# Composite model
EXPE <~ expe1 + expe2
IMAG <~ imag1 + imag2
"
```
#### Hierarchical (second order) models
Currently only second-order models are supported. Specification of the
second-order construct takes place in the measurement/composite model.
```{r, eval=FALSE}
model <- "
# Structural model
SAT ~ QUAL
VAL ~ SAT + QUAL
# Reflective measurement model
SAT =~ sat1 + sat2
VAL =~ val1 + val2
# Composite model
IMAG <~ imag1 + imag2
EXPE <~ expe1 + expe2
# Second-order term
QUAL =~ IMAG + EXPE
"
```
In this case `QUAL` is modeled as a second-order common factor measuring `IMAG` and `EXPE`,
where `IMAG` is modeled as a composite and and `EXPE` is itself a common factor.
### Estimate using `csem()`{#estimate}
`csem()` is the central function of the package. Although it is possible to
estimate a model using individual functions called by `csem()` (such as
`parseModel()`, `processData()`, `calculateWeightsPLS()`, `estimatePath()` etc.)
using R's `:::`mechanism for non-exported functions, it is virtually
always easier, safer and quicker to use `csem()` instead (this is why these functions
are not exported).
`csem()` accepts all models and data types described above. The result of a call to
`csem()`is always an object of class `cSEMResults`. Technically, the resulting
object has an additional class attribute, namely `cSEMResults_default`, `cSEMResults_multi`
or `cSEMResults_2ndorder` that depends on the type of model and/or data provided,
however, users usually do not need to worry since postestimation
functions automatically work on all classes.
The simplest possible call to `csem()` involves a data set and a model:
```{r warning=FALSE, message=FALSE}
require(cSEM)
model <- "
# Path model / Regressions
eta2 ~ eta1
eta3 ~ eta1 + eta2
# Reflective measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"
a <- csem(.data = threecommonfactors, .model = model)
a
```
This is equivalent to:
```{r, eval=FALSE}
csem(
.data = threecommonfactors,
.model = model,
.approach_cor_robust = "none",
.approach_nl = "sequential",
.approach_paths = "OLS",
.approach_weights = "PLS-PM",
.conv_criterion = "diff_absolute",
.disattenuate = TRUE,
.dominant_indicators = NULL,
.estimate_structural = TRUE,
.id = NULL,
.iter_max = 100,
.normality = FALSE,
.PLS_approach_cf = "dist_squared_euclid",
.PLS_ignore_structural_model = FALSE,
.PLS_modes = NULL,
.PLS_weight_scheme_inner = "path",
.reliabilities = NULL,
.starting_values = NULL,
.tolerance = 1e-05,
.resample_method = "none",
.resample_method2 = "none",
.R = 499,
.R2 = 199,
.handle_inadmissibles = "drop",
.user_funs = NULL,
.eval_plan = "sequential",
.seed = NULL,
.sign_change_option = "no"
)
```
See the `csem()` documentation for details on the arguments.
#### Inference
By default, no inferential quantities are calculated since composite-based approaches,
generally, do not have closed-form solutions for standard errors. **cSEM** relies on the
`bootstrap` or `jackknife` to estimate standard errors, test statistics,
critical quantiles, and confidence intervals.
**cSEM** offers two ways to compute resamples:
1. Inference can be done by first setting argument `.resample_method` to
`"jackkinfe"` or `"bootstrap"` to perform resampling and subsequently use
`infer()` (or more conveniently `summarize()` which internally calls `infer()`)
to compute the actual inferential quantities of interest.
2. The same result is achieved by passing a `cSEMResults` object to `resamplecSEMResults()`
and subsequently using `summarize()` or `infer()`.
```{r echo=FALSE, include=FALSE}
x <- runif(1) # to intialize .Random.seed
```
```{r}
b1 <- csem(.data = threecommonfactors, .model = model, .resample_method = "bootstrap")
b2 <- resamplecSEMResults(a)
```
Several confidence intervals are implemented, see `?infer()`:
```{r }
summarize(b1)
```
Or directly via `infer()`
```{r}
ii <- infer(b1, .quantity = c("CI_standard_z", "CI_percentile"), .alpha = c(0.01, 0.05))
ii$Path_estimates
```
Both bootstrap and jackknife resampling support platform-independent multiprocessing
as well as random seeds via the [future framework](https://github.com/HenrikBengtsson/future/).
For multiprocessing simply set `.eval_plan = "multiprocess"` in which case the maximum number of available cores
is used if not on Windows. On Windows as many separate R instances are opened in the
background as there are cores available instead. Note that this naturally has some overhead.
Consequently, for a small number of resamples multiprocessing
will generally not be faster compared to sequential (single core) processing (the default).
Seeds are set via the `.seed` argument. A typical call would look like this:
```{r eval=FALSE}
b <- csem(
.data = satisfaction,
.model = model,
.resample_method = "bootstrap",
.R = 999,
.seed = 98234,
.eval_plan = "multiprocess")
# Output omitted
```
### Apply postestimation functions {#postestimate}
There are 5 major postestimation function and 4 test-family functions:
`assess()`
: Assess the quality of the estimated model *without conducting a statistical test*.
Quality in this case is taken to be a catch-all term for all common aspects
of model assessment. This mainly comprises fit indices, reliability
estimates, common validity assessment criteria and other related quality
measures/indices that do not rely on a formal test procedure. In **cSEM**
a generic (fit) index or quality/assessment measure is referred to as
a **quality criterion**.
`infer()`
: Calculate common inferential quantities. For users interested in the
estimated standard errors and/or confidences intervals `summarize()` will usually
be more helpful as it has a much more user-friendly print method.
`predict()`
: Predict indicator scores of endogenous constructs based on the procedure
introduced by @Shmueli2016.
`summarize()`
: Summarize a model. The function is mainly called for its side effect, the printing of
a structured summary of the estimates. It also provides most estimates in
user-friendly data frames. The data frame format is usually much more
convenient if users intend to present the results in e.g., a paper or a presentation.
`verify()`
: Verify admissibility of the estimated quantities for a given model.
Results based on an estimated model exhibiting one of the following defects
are deemed inadmissible: non-convergence, loadings and/or (congeneric) reliabilities
larger than 1, a construct VCV and/or a model-implied VCV matrix that
is not positive (semi-)definite.
#### The `test_*` family of postestimation functions
`testHausman()`
: The regression-based Hausman test for SEM.
`testOMF()`
: Test for overall model fit based on @Beran1985. See also @Dijkstra2015.
`testMGD()`
: Test for group differences using several different approaches such as e.g.,
the one described in @Klesel2019.
`testMICOM()`
: Test of measurement invariance of composites proposed by @Henseler2016
#### The `do_*` family of postestimation functions
`doIPMA()`
: Performs an importance-performance matrix analysis (IPMA).
`doNonlinearEffectsAnalysis()`
: Performs nonlinear effects analysis such as floodlight and surface analysis
as described in e.g., @Spiller2013.
`doRedundancyAnalysis()`
: Performs a redundancy analysis (RA) as proposed by @Hair2016 with reference to
@Chin1998.
Technically, postestimation functions are generic function with methods for objects of class
`cSEMResults_default`, `cSEMResults_multi`, `cSEMResults_2ndorder`. In **cSEM**
every `cSEMResults_*` object must also have class `cSEMResults` for internal reasons.
When using one of the major postestimation functions, method dispatch is therefore
technically done on one of the `cSEMResults_*` class attributes, ignoring the
`cSEMResults` class attribute. As long as a postestimation function is
used directly method dispatch is not of any practical concern to the end-user.
The difference, however, becomes important if a user seeks to directly invoke an
internal function which is called by one of the postestimation functions
(e.g., `calculateAVE()` or `calculateHTMT()` as called by `assess()`).
In this case, only objects of class `cSEMResults_default` are accepted as this
ensures a specific structure. Therefore, it is important to remember that
*internal functions are generally **not** generic.*
## Principles underlying cSEM
**cSEM** is based on a number of principles, that have shaped its design,
terminology and scope. These principles are discussed below
### Model vs. Estimation
The way different concepts and their relationship
are *modeled* is strictly distinct from how they are *estimated*. Hence we
strictly distinguish between concepts modeled as common factors (or composites)
and the actual estimation **for a given model**. In our opinion, these differences
are fundamental to understanding the scope and limits of a certain approach.
The most notable consequence is that approaches such as partial least squares
and everything related to it (e.g., the modes) or generalized structured component
analysis are "only" considered as estimators/estimation approaches **for a given model**.
### Composites in a composite model vs. composites in a common factor model and disattenuation
By virtue of the package, **cSEM** uses composite-based estimators/approaches only. Depending
on the postulated model, linear compounds may therefore either serve as a
composite as part of the composite model or as a proxy/stand-in for a common factor.
If a concept is modeled as a common factor, proxy correlations, proxy-indicator
correlations and path coefficients are inconsistent estimates for their supposed
construct level counterparts (construct correlations, loadings and
path coefficients) unless the proxy is a perfect representation of its
construct level counterpart. This is commonly referred to as **attenuation bias**.
Several approaches have been suggested to correct for these biases.
In **cSEM** estimates are correctly dissattenuated by default if any of the
concepts involved is modeled as a common factor! Disattentuation is controlled
by the `.disattenuate` argument of `csem()`.
**Example**
```{r eval=FALSE}
model <- "
## Structural model
eta2 ~ eta1
## Measurement model
eta1 <~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
"
# Identical
csem(threecommonfactors, model)
csem(threecommonfactors, model, .disattenuate = TRUE)
# To supress automatic disattenuation
csem(threecommonfactors, model, .disattenuate = FALSE)
```
Note that since `.approach_weights = "PLS-PM"` and `.disattentuate = TRUE`
by default (see for [The role of the weighting scheme and partial least squares (PLS)](#roleofpls) below)
and one of the concepts in the model above is modeled as a common factor,
composite (proxy) correlations, loadings and path coefficients are adequately
disattenuated using the correction approach commonly known as **consistent partial
least squares (PLSc)**.
If `.disattenuate = FALSE` or all concepts are modeled as composites "proper"
PLS values are returned.
### The role of the weighting scheme and partial least squares (PLS) {#roleofpls}
In principal, any weighted combination of
appropriately chosen observables can be used to estimate structural relationships
between these compounds. Hence, any conceptual or methodological issue
discussed based on a composite build by a given (weighting) approach may
equally well be discussed for any other potential weighting scheme. The
appropriateness or potential superiority of a specific weighting approach such as
"partial least squares path modeling" (PLS-PM) over another such as "unit weights" (sum scores)
or generalized structured component analysis (GSCA)
is therefore to some extent a question of *relative* appropriateness and
*relative* superiority.
As a notable consequence, we believe that well known approaches such partial least
squares path modeling (PLS-PM) and generalized structured component analysis (GSCA)
are - contrary to common belief - best exclusively understood as prescriptions for forming
linear compounds based on observables, i.e., as weighting approaches. Not more, not less.[^1]
In **cSEM** this is reflected by the fact that `"PLS"` and `"GSCA"` are choices of
the `.approach_weights` argument.
```{r eval=FALSE}
model <- "
## Structural model
eta2 ~ eta1
## Composite model
eta1 <~ y11 + y12 + y13
eta2 <~ y21 + y22 + y23
"
### Currently the following weight approaches are implemented
# Partial least squares path modeling (PLS)
csem(threecommonfactors, model, .approach_weights = "PLS-PM") # default
# Generalized canonical correlation analysis (Kettenring approaches)
csem(threecommonfactors, model, .approach_weights = "SUMCORR")
csem(threecommonfactors, model, .approach_weights = "MAXVAR")
csem(threecommonfactors, model, .approach_weights = "SSQCORR")
csem(threecommonfactors, model, .approach_weights = "MINVAR")
csem(threecommonfactors, model, .approach_weights = "GENVAR")
# Generalized structured component analysis (GSCA)
csem(threecommonfactors, model, .approach_weights = "GSCA")
# Principal component analysis (PCA)
csem(threecommonfactors, model, .approach_weights = "PCA")
# Factor score regression (FSR) using "unit", "bartlett" or "regression" weights
csem(threecommonfactors, model, .approach_weights = "unit")
csem(threecommonfactors, model, .approach_weights = "bartlett")
csem(threecommonfactors, model, .approach_weights = "regression")
```
# Literature
[^1]: In fact, labels such as PLS-PM and even more so PLS-SEM are misleading as they create
the impression that PLS(-PM) is somehow capable of more than other
composite-based approaches. While among composite-based approaches, methodological
research surrounding *composites formed using weights obtained by the PLS(-PM) algorithm*
is most advanced, the PLS algorithm remains a weighting scheme in its core.
[Assess]: https://floschuberth.github.io/cSEM/articles/Using-assess.html
[Lavaan]: https://lavaan.ugent.be/tutorial/index.html
[Notation]: https://floschuberth.github.io/cSEM/articles/Notation.html
[Testing]: https://floschuberth.github.io/cSEM/articles/Using-test.html
[Terminology]: https://floschuberth.github.io/cSEM/articles/Terminology.html