In [1]:
import numpy as np
import pandas as pd
import os
from IPython.display import IFrame

One of the goals of this small project was to conduct a latent profile analysis to identify latent subpopulations (regarding vaccination attitudes) based on various sets of variables from the LISS panel. I consider the following sets of variables, which are also specified in the files `src/model_specs/lpa_var_set_x.json`:

#### first set:
- covid_vaccine_safe
- covid_vaccine_effective
- covid_health_concern
- confidence_science
- confidence_media
- trust_gov
- subj_effect_measures

#### second set:

Same as first and

- p_2m_infected
- effect_mask
- effect_wash_hands

#### third set:

Same as second plus

- flu_vaccine_safe
- flu_vaccine_effective
- flu_health_concern

#### fourth set:

Same as second plus

- effect_pray



### Short motivation

- The sets try to reflect the categories from the *Vaccine Confidence Inventory* (VCI) in `Rossen, Isabel, et al. "Accepters, fence sitters, or rejecters: Moral profiles of vaccination attitudes." Social Science & Medicine 224 (2019): 23-27.`, given the available variables in the LISS panel. 
- The VCI consists of five major concerns: (1) vaccines are unsafe, (2) vaccines are ineffective, (3) malevolence of government and pharmaceutical companies, (4) vaccines are unnatural/alternative remedies or healthy lifestyle is sufficient, and (5) parents should retain the right to decidewhether one's child is vaccinated.
- Since the last category is not applicable in the LISS data, it was excluded. 
- Since there is no direct equivalent of the first four categories in our data, four different sets (above) have been calculated. For illustration purposes (no formal publication at this point), I will give a quick walkthrough through the results of the first set of variables only.

In [2]:
model_performances = pd.read_csv(os.path.abspath("../..") + "/bld/analysis/lpa_var_set_1_performance.csv", index_col=0 )

In [3]:
model_performances

Unnamed: 0,Model,Classes,LogLik,AIC,AWE,BIC,CAIC,CLC,KIC,SABIC,ICL,Entropy,prob_min,prob_max,n_min,n_max,BLRT_val,BLRT_p
1,1,3,-29810.754878,59681.509756,60201.403996,59867.378043,59897.378043,59623.352091,59714.509756,59772.052976,-60084.626899,0.921167,0.950163,0.990502,0.102621,0.469241,2109.765384,0.009901
2,3,3,-29030.986841,58163.973681,59049.483764,58479.949769,58530.949769,58063.415774,58217.973681,58317.897154,-59603.275359,0.721046,0.782327,0.910362,0.104552,0.664,77.835568,0.009901
3,1,4,-29694.64283,59465.28566,60124.46783,59700.718823,59738.718823,59390.969817,59506.28566,59579.973737,-60466.424935,0.842078,0.717086,0.990752,0.102621,0.469241,2883.27095,0.009901
4,3,4,-28701.205702,57520.411403,58544.809606,57885.952368,57944.952368,57404.09513,57582.411403,57698.479735,-58607.639194,0.841863,0.763435,0.999293,0.102897,0.469517,659.819916,0.009901
5,1,5,-27549.01393,55190.02786,55988.061208,55475.0259,55521.0259,55099.990593,55239.02786,55328.860797,-55573.760515,0.981366,0.686765,1.0,0.02069,0.427586,222.351198,0.009901
6,3,5,-28557.523905,57249.047809,58412.679892,57664.15365,57731.15365,57116.627408,57319.047809,57451.260999,-58821.319985,0.789799,0.81276,0.896798,0.017103,0.480276,155.49627,0.009901


- Apparently, taking a mixed gaussian model with five classes, equal variances, and covariances fixed to zero (i.e. model = 1 in the table) fits the data optimally, given BIC as the measure of performance (the lower the better, see documentation of the `tidyLPA` R package for more details).
- the estimation of models with more than five classes fails in the `tidyLPA` package (which calls the `mclust` package for this task), which is probably a degrees of freedom issue. Thus only 3, 4, or 5 classes were inspected
- Let's have a look at the profile plot for the best model (which is commonly used tool in the LPA literature):

In [4]:
IFrame("lpa_var_set_1_profile_plot.pdf", width=1000, height=600)

- The plot above shows the average responses to each of the normalized variables in set 1 for the five different profiles of participants. Error bars represent 95% confidence intervals.
- It becomes clear that in this dataset we have a more complex class structure (and more classes) than in `Rossen, Isabel, et al. (2019)`, where only three groups are described, labelled as (1) *vaccine accepters*, (2) *fence sitters*, and (3) *vaccine rejecters*.

- It seems as if group 3 (green) is the least concerned, but also least confident one in the safety and effectiveness of the vaccines, whereas it is only marginally below average in the other categories, such as trust in the government and confidence in the media. Apparently, it is also not too sceptical about science in general (in its own perception). One would expect that its group members would show low values for their intention to take a vaccine.

- Group 4 (purple) is extraordinary worried about covid as a health concern and shows high trust in the safety and effectiveness of vaccines. Also the other categories it scores higher than any other group, but with less distance to the average of others. One would expect that its group members would show high values for their intention to take a vaccine.

- Group 2 (blue) shows a highly similar pattern as group 4, but starting from a lower level in the first categories.

- Group 1 (red) starts of with a similar pattern as group 2, but then seriously drops (even below group 3) when it comes to variables such as government trust or confidence in the media/ science. The high standard errors here, however, reflect high uncertainty due to the small amount of participants in this group. A subsequent analysis should analyse this group and its persistence in more detail.

- Group 5 (yellow) is somewhere in the middle, possibly indicating some skepticism, but not rejection of vaccines against covid.

Let us now explore some group averages on auxiliary variables (group membership as in the five groups above):

In [5]:
IFrame("lpa_aux_var_set_1_barplot.pdf", width=1000, height=1000)

- the values in the figure above are normalized such that the sum of each variable across the different groups results in unity.
- As expected, members of group 4 (purple) show the highest average values for their intention to take a vaccine, and members of group 3 (green) show the lowest values for their intention to take a vaccine. There is no qualitative difference in the vaccination attitude of participants between july and january.
- less pronounced differences persist also across other auxiliary variables, but I will not go into detail at this point.
- no confidence intervals are provided, since this is a more exploratory approach (even though it is common in the LPA literature) and not completely statistically rigorous (due to the two dependent steps of analysis)