(model-revision-notebook)=
# La revisione del modello 

In [1]:
source("_common.R")
suppressPackageStartupMessages({
    library("lavaan")
    library("effectsize")
})

set.seed(42)

{cite:t}`brown2015confirmatory` discute alcune possibili cause che possono essere responsabili della mancanza di adattamento del modello EFA o CFA ai dati. In particolare, vengono esaminate le seguenti possibili cause: 

- il ricercatore ha ipotizzato il numero sbagliato di fattori comuni latenti, 
- un item viene ipotizzato saturare su un solo fattore comune mentre satura su diversi fattori,
- un item viene ipotizzato saturare sul fattore comune sbagliato, 
- è possibile che vi siano correlazioni residue che il modello non ha considerato.

{cite:t}`brown2015confirmatory` mostra come il ricercatore possa usare i *Modification Indices* per valutare le cause del mancato adattamento del modello ai dati.

## Un numero di fattori troppo piccolo

Una delle possibili fonti di mancanza di adattamento del modello può dipendere dal fatto che è stato ipotizzato un numero insufficiente di fattori latenti comuni. {cite:t}`brown2015confirmatory` discute il caso nel quale si confrontano gli indici di bontà di adattamento di un modello ad un solo fattore comune e un modello a due fattori comuni. L'esempio riguarda i dati già in precedenza discussi e relativi relativi a otto misure di personalità raccolte su un campione di 250 pazienti che hanno concluso un programma di psicoterapia. Le scale sono le seguenti:

- anxiety (N1), 
- hostility (N2), 
- depression (N3), 
- self-consciousness (N4), 
- warmth (E1), 
- gregariousness (E2), 
- assertiveness (E3), 
- positive emotions (E4). 

Leggiamo i dati in $\mathsf{R}$.

In [2]:
varnames <- c("N1", "N2", "N3", "N4", "E1", "E2", "E3", "E4")

sds <- c(5.7,  5.6,  6.4,  5.7,  6.0,  6.2,  5.7,  5.6)

cors <- '
 1.000
 0.767  1.000 
 0.731  0.709  1.000 
 0.778  0.738  0.762  1.000 
-0.351  -0.302  -0.356  -0.318  1.000 
-0.316  -0.280  -0.300  -0.267  0.675  1.000 
-0.296  -0.289  -0.297  -0.296  0.634  0.651  1.000 
-0.282  -0.254  -0.292  -0.245  0.534  0.593  0.566  1.000'

psychot_cor_mat <- getCov(cors, names = varnames)

n <- 250

Supponiamo di adattare ai dati il modello "sbagliato" che include un unico fattore comune.  Svolgiamo qui l'analisi *fattoriale esplorativa* usando la funzione sperimentale `efa()` di `lavaan`.

In [3]:
# 1-factor model
f1 <- '
  efa("efa")*f1 =~ N1 + N2 + N3 + N4 + E1 + E2 + E3 + E4
'

 Adattiamo il modello ai dati.

In [4]:
efa_f1 <-
  cfa(
    model = f1,
    sample.cov = psychot_cor_mat,
    sample.nobs = 250,
    rotation = "oblimin"
  )

Consideriamo ora un modello a due fattori.

In [5]:
f2 <- '
  efa("efa")*f1 +
  efa("efa")*f2 =~ N1 + N2 + N3 + N4 + E1 + E2 + E3 + E4
'

Adattiamo il modello ai dati.

In [6]:
efa_f2 <-
  cfa(
    model = f2,
    sample.cov = psychot_cor_mat,
    sample.nobs = 250,
    rotation = "oblimin"
  )

Esaminiamo gli indici di bontà di adattamento.

In [7]:
# define the fit measures
fit_measures_robust <- c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr")

# collect them for each model
rbind(
  fitmeasures(efa_f1, fit_measures_robust),
  fitmeasures(efa_f2, fit_measures_robust)
) %>%
  # wrangle
  data.frame() %>%
  mutate(
    chisq = round(chisq, digits = 0),
    df = as.integer(df),
    pvalue = ifelse(pvalue == 0, "< .001", pvalue)
  ) %>%
  mutate_at(vars(cfi:srmr), ~ round(., digits = 3))

chisq,df,pvalue,cfi,tli,rmsea,srmr
<dbl>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
375,20,< .001,0.71,0.594,0.267,0.187
10,13,0.709310449320062,1.0,1.006,0.0,0.01


In [8]:
effectsize::interpret(efa_f1)

Name,Value,Threshold,Interpretation
<chr>,<dbl>,<dbl>,<effctsz_>
GFI,0.6713421,0.95,poor
AGFI,0.4084158,0.9,poor
NFI,0.700646,0.9,poor
NNFI,0.5941736,0.9,poor
CFI,0.710124,0.9,poor
RMSEA,0.2665811,0.05,poor
SRMR,0.1873289,0.08,poor
RFI,0.5809044,0.9,poor
PNFI,0.5004614,0.5,satisfactory
IFI,0.7120036,0.9,poor


In [9]:
effectsize::interpret(efa_f2)

Name,Value,Threshold,Interpretation
<chr>,<dbl>,<dbl>,<effctsz_>
GFI,0.990554109,0.95,satisfactory
AGFI,0.973842148,0.9,satisfactory
NFI,0.992174918,0.9,satisfactory
NNFI,1.005603388,0.9,satisfactory
CFI,1.0,0.9,satisfactory
RMSEA,0.0,0.05,satisfactory
SRMR,0.009907613,0.08,satisfactory
RFI,0.983145977,0.9,satisfactory
PNFI,0.46065264,0.5,poor
IFI,1.002570123,0.9,satisfactory


I risultati mostrano come, in un modello EFA, una soluzione a due fattori produca un adattamento adeguato, mentre ciò non si verifica con un modello ad un solo fattore.

## Specificazione errata delle relazioni tra indicatori e fattori latenti

Un'altra potenziale fonte di errata specificazione del modello CFA è una designazione errata delle relazioni tra indicatori e fattori latenti.

In questo esempio, un ricercatore ha sviluppato un questionario di 12 item (gli item sono valutati su scale da 0 a 8) progettato per valutare le motivazioni dei giovani adulti a consumare bevande alcoliche (Cooper, 1994). La misura aveva lo scopo di valutare tre aspetti di questo costrutto (4 item ciascuno): (1) motivazioni di coping (item 1–4), (2) motivazioni sociali (item 5–8) e (3) motivazioni di miglioramento (item 9 –12). I dati sono i seguenti.

In [10]:
sds <- c(2.06, 1.52, 1.92, 1.41, 1.73, 1.77, 2.49, 2.27, 2.68, 1.75, 2.57, 2.66)

cors <- '
  1.000 
  0.300  1.000 
  0.229  0.261  1.000 
  0.411  0.406  0.429  1.000 
  0.172  0.252  0.218  0.481  1.000 
  0.214  0.268  0.267  0.579  0.484  1.000 
  0.200  0.214  0.241  0.543  0.426  0.492  1.000 
  0.185  0.230  0.185  0.545  0.463  0.548  0.522  1.000 
  0.134  0.146  0.108  0.186  0.122  0.131  0.108  0.151  1.000 
  0.134  0.099  0.061  0.223  0.133  0.188  0.105  0.170  0.448  1.000 
  0.160  0.131  0.158  0.161  0.044  0.124  0.066  0.061  0.370  0.350  1.000 
  0.087  0.088  0.101  0.198  0.077  0.177  0.128  0.112  0.356  0.359  0.507  1.000'

covs <- getCov(cors, sds = sds, names = paste("x", 1:12, sep = ""))

Iniziamo con un modello che ipotizza tre fattori comuni latenti correlati, coerentemente con la motivazione che stava alla base della costruzione dello strumento.

In [11]:
model1 <- '
  copingm  =~ x1 + x2 + x3 + x4
  socialm  =~ x5 + x6 + x7 + x8
  enhancem =~ x9 + x10 + x11 + x12
'

Adattiamo il modello ai dati.

In [12]:
fit1 <- cfa(
  model1, 
  sample.cov = covs, 
  sample.nobs = 500, 
  mimic = "mplus"
)

    sample.mean= argument is missing, but model contains
    mean/intercept parameters.”


Esaminando le misure di adattamento potremmo concludere che il modello è adeguato.

In [13]:
effectsize::interpret(fit1)

Name,Value,Threshold,Interpretation
<chr>,<dbl>,<dbl>,<effctsz_>
GFI,0.97009178,0.95,satisfactory
AGFI,0.94722078,0.9,satisfactory
NFI,0.94785001,0.9,satisfactory
NNFI,0.97102541,0.9,satisfactory
CFI,0.97761054,0.9,satisfactory
RMSEA,0.03745791,0.05,satisfactory
SRMR,0.03438699,0.08,satisfactory
RFI,0.93251177,0.9,satisfactory
PNFI,0.73242955,0.5,satisfactory
IFI,0.97781875,0.9,satisfactory


Tuttavia, un esame più attento mette in evidenza un comportamento anomalo dell'item `x4` e alcune caratteristiche anomale del modello in generale.

In [14]:
standardizedSolution(fit1)

lhs,op,rhs,est.std,se,z,pvalue,ci.lower,ci.upper
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
copingm,=~,x1,0.43164226,0.03913432,11.029763,0.0,0.3549404,0.50834412
copingm,=~,x2,0.43575459,0.03899692,11.174078,0.0,0.359322,0.51218714
copingm,=~,x3,0.4512378,0.03847027,11.729519,0.0,0.3758375,0.52663815
copingm,=~,x4,0.9532301,0.02446252,38.966968,0.0,0.9052845,1.00117576
socialm,=~,x5,0.6332428,0.03156182,20.063571,0.0,0.5713828,0.69510283
socialm,=~,x6,0.74779892,0.02546753,29.362831,0.0,0.6978835,0.79771437
socialm,=~,x7,0.68994737,0.02856498,24.153608,0.0,0.633961,0.74593371
socialm,=~,x8,0.72872974,0.02648066,27.519319,0.0,0.6768286,0.78063089
enhancem,=~,x9,0.60199381,0.03863571,15.58128,0.0,0.5262692,0.67771841
enhancem,=~,x10,0.5973762,0.03879788,15.397135,0.0,0.5213338,0.67341864


In particolare, l'item `x4` mostra una saturazione molto forte sul fattore Motivi di coping (.955) ed emerge una correlazione molto alta tra i fattori Motivi di coping e Motivi sociali (.798).

@brown2015confirmatory suggerisce di esaminare i *Modification Indices*. Tale esame mostra che il MI associato a `x4` è molto alto, 18.916.

In [15]:
modindices(fit1)

Unnamed: 0_level_0,lhs,op,rhs,mi,epc,sepc.lv,sepc.all,sepc.nox
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
46,copingm,=~,x5,0.030005989,-0.029864968,-0.026528841,-0.015349946,-0.015349946
47,copingm,=~,x6,0.483607013,0.126715141,0.112560167,0.063657017,0.063657017
48,copingm,=~,x7,0.779713184,0.220132468,0.195542120,0.078609654,0.078609654
49,copingm,=~,x8,1.962276016,-0.323442154,-0.287311386,-0.126695694,-0.126695694
50,copingm,=~,x9,0.101119531,0.044105439,0.039178551,0.014633498,0.014633498
51,copingm,=~,x10,2.016179576,0.128686040,0.114310902,0.065385941,0.065385941
52,copingm,=~,x11,1.870017679,-0.181368830,-0.161108654,-0.062750981,-0.062750981
53,copingm,=~,x12,0.039774895,-0.027386556,-0.024327285,-0.009154759,-0.009154759
54,socialm,=~,x1,6.926785939,-0.520304949,-0.569429069,-0.276698865,-0.276698865
55,socialm,=~,x2,0.052053171,-0.033339419,-0.036487130,-0.024028734,-0.024028734


Le considerazioni precedenti, dunque, suggeriscono che il modello potrebbe non avere descritto in maniera adeguata le relazioni tra `x4` e i fattori comuni latenti.  In base a considerazioni teoriche, supponiamo che abbia senso pensare che `x4` saturi non solo sul fattore Motivi di coping ma anche sul fattore di Motivi Sociali. Specifichiamo dunque un nuovo modello nel modo seguente.

In [16]:
model2 <- '
  copingm  =~ x1 + x2 + x3 + x4
  socialm  =~ x4 + x5 + x6 + x7 + x8
  enhancem =~ x9 + x10 + x11 + x12
'

Adattiamo il modello.

In [17]:
fit2 <- cfa(
  model2, 
  sample.cov = covs, 
  sample.nobs = 500, 
  mimic = "mplus"
)

    sample.mean= argument is missing, but model contains
    mean/intercept parameters.”


Esaminiamo gli indici di bontà di adattamento.

In [18]:
effectsize::interpret(fit2)

Name,Value,Threshold,Interpretation
<chr>,<dbl>,<dbl>,<effctsz_>
GFI,0.97684139,0.95,satisfactory
AGFI,0.95831451,0.9,satisfactory
NFI,0.95826773,0.9,satisfactory
NNFI,0.98393923,0.9,satisfactory
CFI,0.98783275,0.9,satisfactory
RMSEA,0.02788804,0.05,satisfactory
SRMR,0.02887855,0.08,satisfactory
RFI,0.9449134,0.9,satisfactory
PNFI,0.7259604,0.5,satisfactory
IFI,0.98795337,0.9,satisfactory


La bontà di adattamento è migliorata.

Esaminiamo la soluzione standardizzata. Vediamo ora che sono scomparse le due anomalie trovate in precedenza.

In [19]:
standardizedSolution(fit2)

lhs,op,rhs,est.std,se,z,pvalue,ci.lower,ci.upper
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
copingm,=~,x1,0.5136919,0.04268561,12.034311,0.0,0.43002967,0.59735419
copingm,=~,x2,0.5149158,0.04265451,12.071778,0.0,0.43131452,0.59851714
copingm,=~,x3,0.5160398,0.04262607,12.106202,0.0,0.43249424,0.59958536
copingm,=~,x4,0.5380264,0.0621268,8.660133,0.0,0.41626007,0.65979267
socialm,=~,x4,0.438997,0.06093489,7.204362,5.830891e-13,0.3195668,0.5584272
socialm,=~,x5,0.6318747,0.0316009,19.995465,0.0,0.56993804,0.69381129
socialm,=~,x6,0.7464621,0.02549441,29.279444,0.0,0.69649398,0.79643023
socialm,=~,x7,0.6905804,0.02849475,24.235357,0.0,0.63473172,0.74642908
socialm,=~,x8,0.7308592,0.0263256,27.762304,0.0,0.67926197,0.78245641
enhancem,=~,x9,0.6026163,0.0385664,15.625422,0.0,0.52702753,0.67820505


Esaminando i MI, notiamo che il modello potrebbe migliorare se introduciamo una correlazione tra le specificità `x11` e `x12`.

In [20]:
modindices(fit2)

Unnamed: 0_level_0,lhs,op,rhs,mi,epc,sepc.lv,sepc.all,sepc.nox
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
47,copingm,=~,x5,0.075879281,0.032464745,0.034320003,0.019858025,0.019858025
48,copingm,=~,x6,1.412911873,0.143137105,0.151316941,0.085575383,0.085575383
49,copingm,=~,x7,0.244623644,0.083197038,0.087951488,0.035357264,0.035357264
50,copingm,=~,x8,3.668026217,-0.294652924,-0.311491413,-0.137358226,-0.137358226
51,copingm,=~,x9,0.242920085,0.065568420,0.069315449,0.025889887,0.025889887
52,copingm,=~,x10,0.566100546,0.065434430,0.069173802,0.039567476,0.039567476
53,copingm,=~,x11,0.119109121,-0.043934458,-0.046445174,-0.018090153,-0.018090153
54,copingm,=~,x12,0.598292941,-0.101897430,-0.107720548,-0.040537007,-0.040537007
55,socialm,=~,x1,1.947767101,-0.395605353,-0.244629140,-0.118870915,-0.118870915
56,socialm,=~,x2,0.718049453,0.177418638,0.109709761,0.072249732,0.072249732


Il nuovo modello diventa dunque il seguente.

In [21]:
model3 <- '
  copingm  =~ x1 + x2 + x3 + x4
  socialm  =~ x4 + x5 + x6 + x7 + x8
  enhancem =~ x9 + x10 + x11 + x12
  x11 ~~ x12
'

Adattiamo il modello.

In [22]:
fit3 <- cfa(
  model3, 
  sample.cov = covs, 
  sample.nobs = 500, 
  mimic = "mplus"
)

    sample.mean= argument is missing, but model contains
    mean/intercept parameters.”


Un test basato sul rapporto di verosimiglianze conferma che il miglioramento di adattamento è sostanziale.

In [23]:
lavTestLRT(fit2, fit3)

Unnamed: 0_level_0,Df,AIC,BIC,Chisq,Chisq diff,RMSEA,Df diff,Pr(>Chisq)
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>
fit3,49,23934.34,24107.14,44.95535,,,,
fit2,50,23956.83,24125.41,69.44357,24.48823,0.2167405,1.0,7.476528e-07


Esaminiamo gli indici di bontà di adattamento.

In [24]:
out = summary(fit3, fit.measures = TRUE)
print(out)

lavaan 0.6.15 ended normally after 61 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        41

  Number of observations                           500

Model Test User Model:
                                                      
  Test statistic                                44.955
  Degrees of freedom                                49
  P-value (Chi-square)                           0.638

Model Test Baseline Model:

  Test statistic                              1664.026
  Degrees of freedom                                66
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.003

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11926.170
  Loglikelihood unrestricted model (H1)     -119

Gli indici di fit sono migliorati.

Esaminiamo la soluzione standardizzata.

In [25]:
standardizedSolution(fit3)

lhs,op,rhs,est.std,se,z,pvalue,ci.lower,ci.upper
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
copingm,=~,x1,0.5137373,0.0427527,12.016486,0.0,0.42994349,0.59753101
copingm,=~,x2,0.5150194,0.0427208,12.055473,0.0,0.43128819,0.59875065
copingm,=~,x3,0.5144101,0.04273594,12.036943,0.0,0.43064919,0.598171
copingm,=~,x4,0.5396994,0.06268938,8.609105,0.0,0.41683052,0.66256838
socialm,=~,x4,0.4377501,0.06140696,7.128672,1.013412e-12,0.31739467,0.55810553
socialm,=~,x5,0.63195,0.03159184,20.003584,0.0,0.57003112,0.69386884
socialm,=~,x6,0.746493,0.02548552,29.290865,0.0,0.69654231,0.79644372
socialm,=~,x7,0.6901567,0.02851159,24.206176,0.0,0.63427496,0.74603836
socialm,=~,x8,0.7311852,0.0263013,27.800344,0.0,0.67963558,0.78273478
enhancem,=~,x9,0.6691959,0.04083337,16.388454,0.0,0.58916395,0.74922784


Non ci sono ulteriori motivi di preoccupazione.  {cite:t}`brown2015confirmatory` conclude che il modello più adeguato sia `model3`. 

Nel caso presente, a mio parare, l'introduzione della correlazione residua tra `x11` e `x12` si sarebbe anche potuta evitare, dato che il modello `model3` (con meno idiosincrasie legate al campione) si era già dimostrato adeguato.

## Saturazione sul fattore sbagliato

{cite:t}`brown2015confirmatory` considera anche il caso opposto, ovvero quello nel quale il ricercatore ipotizza una saturazione spuria. Per i dati in discussione, si può avere la situazione presente.

In [26]:
model4 <- '
  copingm  =~ x1 + x2 + x3 + x4
  socialm  =~ x4 +x5 + x6 + x7 + x8 + x12
  enhancem =~ x9 + x10 + x11
'

Adattiamo il modello ai dati.

In [27]:
fit4 <- cfa(
  model4, 
  sample.cov = covs, 
  sample.nobs = 500, 
  mimic = "mplus"
)

    sample.mean= argument is missing, but model contains
    mean/intercept parameters.”


Esaminiamo la soluzione ottenuta.

In [28]:
out = summary(fit4, fit.measures = TRUE)
print(out)

lavaan 0.6.15 ended normally after 59 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        40

  Number of observations                           500

Model Test User Model:
                                                      
  Test statistic                               212.717
  Degrees of freedom                                50
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                              1664.026
  Degrees of freedom                                66
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.898
  Tucker-Lewis Index (TLI)                       0.866

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -12010.051
  Loglikelihood unrestricted model (H1)     -119

È chiaro che il modello `model4` è inadeguato. Il problema emerge chiaramente anche esaminando i MI.

In [29]:
modindices(fit4)

Unnamed: 0_level_0,lhs,op,rhs,mi,epc,sepc.lv,sepc.all,sepc.nox
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
47,copingm,=~,x5,0.09017503,0.03562312,0.03784791,0.021899323,0.021899323
48,copingm,=~,x6,0.55421214,0.09003505,0.09565805,0.054098226,0.054098226
49,copingm,=~,x7,0.10746410,0.05543532,0.05889745,0.023677281,0.023677281
50,copingm,=~,x8,3.91880191,-0.30573892,-0.32483339,-0.143241711,-0.143241711
51,copingm,=~,x12,6.10941671,0.49879335,0.52994476,0.199427006,0.199427006
52,copingm,=~,x9,0.38979797,-0.09562061,-0.10159245,-0.037945578,-0.037945578
53,copingm,=~,x10,0.02663662,-0.01608649,-0.01709114,-0.009776150,-0.009776150
54,copingm,=~,x11,0.82324384,0.12309854,0.13078648,0.050940643,0.050940643
55,socialm,=~,x1,1.99011638,-0.39751600,-0.25053040,-0.121738462,-0.121738462
56,socialm,=~,x2,0.63818689,0.16639198,0.10486685,0.069060382,0.069060382


Il MI relativo alla saturazione di `x12` su `enhancem` è uguale a 116.781. Chiaramente, in una revisione del modello, questo problema dovrebbe deve essere affrontato.

## Commenti e considerazioni finali {-}

Gli esempi discussi da {cite:t}`brown2015confirmatory` mostrano come l'uso dei MI, insieme all'esame della soluzione fattoriale, possano essere usati dallo psicologo per migliorare il modello che viene proposto.
