Surveys - ESS #33

briatte · 2023-05-06T09:35:49Z

This one is complex enough to be its own issue…

Weighting guide

https://www.europeansocialsurvey.org/methodology/ess_methodology/data_processing_archiving/weighting.html
https://www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1_1.pdf

From the weighting guide, v1.1 (2020), page 7:

From round 9 onwards, all the necessary sample design indicators and weights are already included in the integrated (second release) data file, but if you are working with data from earlier rounds you will first need to merge the sample design indicators on to the main data file. For rounds 7 and 8, the sample design indicators are in the integrated SDDF (sample design data file), so you need to merge this file with the main integrated (questionnaire data) file. For rounds 1 to 6, sample design indicators are stored in a separate file for each country (and files are missing for some countries in some rounds), so you would need to merge several files. Furthermore, for these rounds the indicators psu and stratify have not been recoded in a manner suitable for cross-country analysis, so you will need to do this if you are analysing data from more than one country. Follow the guidance in section 2 of Kaminska & Lynn (2017) and ensure that each value is exclusive to one country.

The guide asks for the creation of anweight ('analytical weights') from the following variables:

# R, data.table syntax
data1[, anweight := pspwght * pweight * 10e3]
# Stata
# gen anweight=pspwght*pweight

Once anweight exists, weighting guide instructs the following design:

# R
svydesign(ids = ~psu, strata = ~stratum, weights = ~anweight, data = data1)
# Stata
# svyset psu [pweight=anweight], strata(stratum)

Details on analytical weights (ESS9+)

Quoting again from the weighting guide:

It is constructed by first deriving the design weight, then applying a post-stratification adjustment, and then a population size adjustment. Further details of how the weights are derived are documented in the round-specific report on the production of weights. Starting from Round 9, anweight is provided for you in the integrated data file. If you are using data from earlier ESS rounds, you can derive anweight yourself.

Full range of weighting variables, quoted from ESS9 codebook:

idno - Respondent's identification number
cntry - Country
dweight - Design weight
pspwght - Post-stratification weight including design weight
pweight - Population size weight (must be combined with dweight or pspwght)
anweight - Analysis weight
prob - Sampling probability
stratum - Sampling stratum
psu - Primary sampling unit

Notes:

pspwght includes dweight
anweight is just the product of pspwght and pweight
no obvious use for prob

Discussions

InductiveStep/R-notes#1
ropensci/essurvey#39
ropensci/essurvey#9 (comment)

Second link right above recommends the following for ESS4:

svydesign(
  ids = ~ psu + idno, # further comment at the link: specifying just `psu` would be enough
  strata = ~ stratify,
  weights = ~ dweight,
  nest = TRUE,
  data = ess4gb
)

Example: Andi Fugard, ESS9

Intermediate Quantitative Social Research, Birkbeck, University of London (2017-2020)
https://inductivestep.github.io/R-notes/complex-surveys.html

Working on a multi-country example:

# using srvyr
as_survey_design(
  ids = idno, # instead of `psu` or `psu + idno` because `psu` is not in ESS9?
  strata = cntry,
  nest = TRUE,
  weights = pspwght
)

From the text:

The nest option takes account of the ids being nested within strata: in other words the same ID is used more than once across the dataset but only once in a country.

Example: Federico Vegetti, ESS7

Introduction to Survey Statistics, University of Heidelberg, 2018
https://federicovegetti.github.io/teaching/heidelberg_2018/lab/sst_lab_day2.html

When working on countries separately:

# using srvyr
as_survey_design(weights = c(dweight, pspwght)) %>%
  group_by(cntry) %>%
  # etc.

# ... doesn't pspwght include dweight?
# ... what about stratum? psu?

When working on all countries together:

# using srvyr
as_survey(weights = c(dweight, pspwght, pweight))

Example: Daniel Oberski, ESS7

http://asdfree.com/european-social-survey-ess.html

Working on a single country (Belgium) after merging the data to the SDDF file:

svydesign(
  ids = ~psu ,
  strata = ~stratify,
  probs = ~prob,
  data = ess_df
)

The text was updated successfully, but these errors were encountered:

briatte · 2023-05-08T18:29:52Z

ESS now featured in Session 12 via a spatial viz example.

briatte · 2023-05-22T22:45:28Z

Try this, using lmer to get predicted probabilities: https://github.com/halhen/viz-pub/tree/master/ess-political-expression

briatte · 2023-05-25T19:34:22Z

z <- fs::dir_ls(regexp = "*.zip", recurse = TRUE)
v <- tibble()
for (i in z) {
  
  cat(fs::path_file(i))
  d <- unzip(i, exdir = tempdir())
  f <- str_subset(d, "dta$")
  cat(" ->", fs::path_file(f), "...\n")
  d <- haven::read_dta(f)
  n <- names(d)
  n <- n[ n %in% c("essround", "cntry", "psu", "idno", "stratify", "stratum",
                   "dweight", "pspwght", "pweight", "prob", "anweight") ]
  v <- bind_rows(v, tibble(file = f, n))
  
}

v %>% 
  mutate(file = fs::path_file(file)) %>% 
  pivot_wider(values_from = n, names_from = n) %>% 
  mutate(essround = as.integer(str_extract(file, "\\d+"))) %>% 
  arrange(essround)

# A tibble: 14 × 11
   file          essround idno  cntry dweight pspwght pweight anweight prob  stratum psu  
   <chr>            <int> <chr> <chr> <chr>   <chr>   <chr>   <chr>    <chr> <chr>   <chr>
 1 ESS1e06_6.dta        1 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 2 ESS4AT.dta           4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 3 ESS4LT.dta           4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 4 ESS4e04_5.dta        4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 5 ESS5ATe1_1.d…        5 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 6 ESS5e03_4.dta        5 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 7 ESS6e02_5.dta        6 idno  cntry dweight pspwght pweight anweight NA    NA      NA   
 8 ESS7SDDFe1_2…        7 idno  cntry NA      NA      NA      NA       prob  stratum psu  
 9 ESS7e02_2.dta        7 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
10 ESS8SDDFe01_…        8 idno  cntry NA      NA      NA      NA       prob  stratum psu  
11 ESS8e02_2.dta        8 idno  cntry dweight pspwght pweight anweight NA    NA      NA   
12 ESS9ROe01.dta        9 idno  cntry dweight pspwght pweight anweight prob  stratum psu  
13 ESS9e03_1.dta        9 idno  cntry dweight pspwght pweight anweight prob  stratum psu  
14 ESS10.dta           10 idno  cntry dweight pspwght pweight anweight prob  stratum psu

briatte · 2023-05-26T13:38:30Z

Did some more tests, found weird things: gergness/srvyr#157

Best guess, based on weighting guide:

as_survey_design(ids = psu,
                 strata = c(cntry, stratum),
                 nest = TRUE,
                 weights = anweight)

briatte · 2023-05-26T14:14:51Z

More tests with other designs. Conclusions:

Use psu and stratum for more accurate sampling error estimation
Use anweight for same reason
Using psu + idno is redundant with above
Using nest = TRUE seems optional, but use it just in case

library(srvyr)
library(tidyverse)

ess9 <- readr::read_rds("https://f.briatte.org/temp/ess9_extract.rds")

# Andy Fugard's design
ess9_af1 <- ess9_extract %>%
  as_survey_design(ids = idno, strata = cntry, nest = TRUE,
                   weights = pspwght)
# Fugard, using PSU
ess9_af2 <- ess9_extract %>%
  as_survey_design(ids = psu, strata = cntry, nest = TRUE,
                   weights = pspwght)

# weighting guide + cntry
ess9_wg1 <- ess9_extract %>%
  as_survey_design(ids = psu,
                   strata = c(cntry, stratum), # adding cntry
                   nest = TRUE,
                   weights = anweight)

# weighting guide, no cntry
ess9_wg2 <- ess9_extract %>%
  as_survey_design(ids = psu,
                   strata = stratum, # as recommended
                   nest = TRUE,
                   weights = anweight)

# Vegetti's design -- implicit `ids = idno`
ess9_mv1 <- ess9_extract %>%
  as_survey_design(weights = c(dweight, pspwght))
# Vegetti, using PSU
ess9_mv2 <- ess9_extract %>%
  as_survey_design(ids = psu, weights = c(dweight, pspwght))

# Oberski's design -- implicit `nest = TRUE`
ess9_do <- ess9_extract %>%
  as_survey_design(ids = psu, strata = stratum, weights = prob)

# Stefan Zins' design
# https://github.com/ropensci/essurvey/issues/39#issuecomment-507855290
ess9_sz <- ess9_extract %>%
  as_survey_design(ids = psu, strata = stratum, weights = dweight)

# results -----------------------------------------------------------------

list("AF_idno" = ess9_af1, "AF_psu" = ess9_af2,
     "WG_cntry" = ess9_wg1, "WG_stratum" = ess9_wg2,
     "MV_idno" = ess9_mv1, "MV_psu" = ess9_mv2, "DO_psu" = ess9_do,
     "SZ_psu" = ess9_sz) %>%
  map_dfr(
    ~ .x %>%
      filter(cntry == "GB") %>%
      group_by(wltdffr_group) %>%
      summarise(prop = srvyr::survey_mean(vartype = "se")),
    .id = "design"
  ) %>%
  filter(wltdffr_group == "Fair") %>%
  arrange(-prop_se)

# A tibble: 8 × 4
  design     wltdffr_group  prop prop_se
  <chr>      <fct>         <dbl>   <dbl>
1 MV_psu     Fair          0.200  0.0204
2 MV_idno    Fair          0.200  0.0166
3 WG_cntry   Fair          0.196  0.0128
4 AF_psu     Fair          0.196  0.0128
5 WG_stratum Fair          0.196  0.0125
6 SZ_psu     Fair          0.190  0.0116
7 DO_psu     Fair          0.191  0.0104
8 AF_idno    Fair          0.196  0.0102

briatte · 2023-05-26T14:19:06Z

Availability of weighting vars:

ESS 9 or 10 have required vars
ESS 7 or 8 require merging with SDDF
ESS 6 has anweight but psu and stratum have to be retrieved from individual SDDFs
ESS 5 and below do not have anweight, so even more work required

… so, use ESS 9 or 10 in examples, or use 7 or 8 for one more example of a merge.

briatte added data surveys labels May 6, 2023

briatte added this to the v1.0 complete milestone May 6, 2023

briatte self-assigned this May 6, 2023

briatte mentioned this issue May 6, 2023

Surveys and survey weighting #19

Open

7 tasks

briatte closed this as completed May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surveys - ESS #33

Surveys - ESS #33

briatte commented May 6, 2023 •

edited

Loading

briatte commented May 8, 2023

briatte commented May 22, 2023

briatte commented May 25, 2023

briatte commented May 26, 2023

briatte commented May 26, 2023

briatte commented May 26, 2023

Surveys - ESS #33

Surveys - ESS #33

Comments

briatte commented May 6, 2023 • edited Loading

Weighting guide

Details on analytical weights (ESS9+)

Discussions

Example: Andi Fugard, ESS9

Example: Federico Vegetti, ESS7

Example: Daniel Oberski, ESS7

briatte commented May 8, 2023

briatte commented May 22, 2023

briatte commented May 25, 2023

briatte commented May 26, 2023

briatte commented May 26, 2023

briatte commented May 26, 2023

briatte commented May 6, 2023 •

edited

Loading