Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surveys - ESS #33

Closed
Tracked by #19
briatte opened this issue May 6, 2023 · 6 comments
Closed
Tracked by #19

Surveys - ESS #33

briatte opened this issue May 6, 2023 · 6 comments
Assignees
Milestone

Comments

@briatte
Copy link
Owner

briatte commented May 6, 2023

This one is complex enough to be its own issue…

Weighting guide

https://www.europeansocialsurvey.org/methodology/ess_methodology/data_processing_archiving/weighting.html
https://www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1_1.pdf

From the weighting guide, v1.1 (2020), page 7:

From round 9 onwards, all the necessary sample design indicators and weights are already included in the integrated (second release) data file, but if you are working with data from earlier rounds you will first need to merge the sample design indicators on to the main data file. For rounds 7 and 8, the sample design indicators are in the integrated SDDF (sample design data file), so you need to merge this file with the main integrated (questionnaire data) file. For rounds 1 to 6, sample design indicators are stored in a separate file for each country (and files are missing for some countries in some rounds), so you would need to merge several files. Furthermore, for these rounds the indicators psu and stratify have not been recoded in a manner suitable for cross-country analysis, so you will need to do this if you are analysing data from more than one country. Follow the guidance in section 2 of Kaminska & Lynn (2017) and ensure that each value is exclusive to one country.

The guide asks for the creation of anweight ('analytical weights') from the following variables:

# R, data.table syntax
data1[, anweight := pspwght * pweight * 10e3]
# Stata
# gen anweight=pspwght*pweight

Once anweight exists, weighting guide instructs the following design:

# R
svydesign(ids = ~psu, strata = ~stratum, weights = ~anweight, data = data1)
# Stata
# svyset psu [pweight=anweight], strata(stratum)

Details on analytical weights (ESS9+)

Quoting again from the weighting guide:

It is constructed by first deriving the design weight, then applying a post-stratification adjustment, and then a population size adjustment. Further details of how the weights are derived are documented in the round-specific report on the production of weights. Starting from Round 9, anweight is provided for you in the integrated data file. If you are using data from earlier ESS rounds, you can derive anweight yourself.

Full range of weighting variables, quoted from ESS9 codebook:

  • idno - Respondent's identification number
  • cntry - Country
  • dweight - Design weight
  • pspwght - Post-stratification weight including design weight
  • pweight - Population size weight (must be combined with dweight or pspwght)
  • anweight - Analysis weight
  • prob - Sampling probability
  • stratum - Sampling stratum
  • psu - Primary sampling unit

Notes:

  • pspwght includes dweight
  • anweight is just the product of pspwght and pweight
  • no obvious use for prob

Discussions

InductiveStep/R-notes#1
ropensci/essurvey#39
ropensci/essurvey#9 (comment)

Second link right above recommends the following for ESS4:

svydesign(
  ids = ~ psu + idno, # further comment at the link: specifying just `psu` would be enough
  strata = ~ stratify,
  weights = ~ dweight,
  nest = TRUE,
  data = ess4gb
)

Example: Andi Fugard, ESS9

Intermediate Quantitative Social Research, Birkbeck, University of London (2017-2020)
https://inductivestep.github.io/R-notes/complex-surveys.html

Working on a multi-country example:

# using srvyr
as_survey_design(
  ids = idno, # instead of `psu` or `psu + idno` because `psu` is not in ESS9?
  strata = cntry,
  nest = TRUE,
  weights = pspwght
)

From the text:

The nest option takes account of the ids being nested within strata: in other words the same ID is used more than once across the dataset but only once in a country.

Example: Federico Vegetti, ESS7

Introduction to Survey Statistics, University of Heidelberg, 2018
https://federicovegetti.github.io/teaching/heidelberg_2018/lab/sst_lab_day2.html

When working on countries separately:

# using srvyr
as_survey_design(weights = c(dweight, pspwght)) %>%
  group_by(cntry) %>%
  # etc.

# ... doesn't pspwght include dweight?
# ... what about stratum? psu?

When working on all countries together:

# using srvyr
as_survey(weights = c(dweight, pspwght, pweight))

Example: Daniel Oberski, ESS7

http://asdfree.com/european-social-survey-ess.html

Working on a single country (Belgium) after merging the data to the SDDF file:

svydesign(
  ids = ~psu ,
  strata = ~stratify,
  probs = ~prob,
  data = ess_df
)
@briatte briatte added this to the v1.0 complete milestone May 6, 2023
@briatte briatte self-assigned this May 6, 2023
@briatte
Copy link
Owner Author

briatte commented May 8, 2023

ESS now featured in Session 12 via a spatial viz example.

@briatte
Copy link
Owner Author

briatte commented May 22, 2023

@briatte
Copy link
Owner Author

briatte commented May 25, 2023

z <- fs::dir_ls(regexp = "*.zip", recurse = TRUE)
v <- tibble()
for (i in z) {
  
  cat(fs::path_file(i))
  d <- unzip(i, exdir = tempdir())
  f <- str_subset(d, "dta$")
  cat(" ->", fs::path_file(f), "...\n")
  d <- haven::read_dta(f)
  n <- names(d)
  n <- n[ n %in% c("essround", "cntry", "psu", "idno", "stratify", "stratum",
                   "dweight", "pspwght", "pweight", "prob", "anweight") ]
  v <- bind_rows(v, tibble(file = f, n))
  
}

v %>% 
  mutate(file = fs::path_file(file)) %>% 
  pivot_wider(values_from = n, names_from = n) %>% 
  mutate(essround = as.integer(str_extract(file, "\\d+"))) %>% 
  arrange(essround)
# A tibble: 14 × 11
   file          essround idno  cntry dweight pspwght pweight anweight prob  stratum psu  
   <chr>            <int> <chr> <chr> <chr>   <chr>   <chr>   <chr>    <chr> <chr>   <chr>
 1 ESS1e06_6.dta        1 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 2 ESS4AT.dta           4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 3 ESS4LT.dta           4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 4 ESS4e04_5.dta        4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 5 ESS5ATe1_1.d…        5 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 6 ESS5e03_4.dta        5 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 7 ESS6e02_5.dta        6 idno  cntry dweight pspwght pweight anweight NA    NA      NA   
 8 ESS7SDDFe1_2…        7 idno  cntry NA      NA      NA      NA       prob  stratum psu  
 9 ESS7e02_2.dta        7 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
10 ESS8SDDFe01_…        8 idno  cntry NA      NA      NA      NA       prob  stratum psu  
11 ESS8e02_2.dta        8 idno  cntry dweight pspwght pweight anweight NA    NA      NA   
12 ESS9ROe01.dta        9 idno  cntry dweight pspwght pweight anweight prob  stratum psu  
13 ESS9e03_1.dta        9 idno  cntry dweight pspwght pweight anweight prob  stratum psu  
14 ESS10.dta           10 idno  cntry dweight pspwght pweight anweight prob  stratum psu

@briatte
Copy link
Owner Author

briatte commented May 26, 2023

Did some more tests, found weird things: gergness/srvyr#157

Best guess, based on weighting guide:

as_survey_design(ids = psu,
                 strata = c(cntry, stratum),
                 nest = TRUE,
                 weights = anweight)

@briatte
Copy link
Owner Author

briatte commented May 26, 2023

More tests with other designs. Conclusions:

  • Use psu and stratum for more accurate sampling error estimation
  • Use anweight for same reason
  • Using psu + idno is redundant with above
  • Using nest = TRUE seems optional, but use it just in case
library(srvyr)
library(tidyverse)

ess9 <- readr::read_rds("https://f.briatte.org/temp/ess9_extract.rds")

# Andy Fugard's design
ess9_af1 <- ess9_extract %>%
  as_survey_design(ids = idno, strata = cntry, nest = TRUE,
                   weights = pspwght)
# Fugard, using PSU
ess9_af2 <- ess9_extract %>%
  as_survey_design(ids = psu, strata = cntry, nest = TRUE,
                   weights = pspwght)

# weighting guide + cntry
ess9_wg1 <- ess9_extract %>%
  as_survey_design(ids = psu,
                   strata = c(cntry, stratum), # adding cntry
                   nest = TRUE,
                   weights = anweight)

# weighting guide, no cntry
ess9_wg2 <- ess9_extract %>%
  as_survey_design(ids = psu,
                   strata = stratum, # as recommended
                   nest = TRUE,
                   weights = anweight)

# Vegetti's design -- implicit `ids = idno`
ess9_mv1 <- ess9_extract %>%
  as_survey_design(weights = c(dweight, pspwght))
# Vegetti, using PSU
ess9_mv2 <- ess9_extract %>%
  as_survey_design(ids = psu, weights = c(dweight, pspwght))

# Oberski's design -- implicit `nest = TRUE`
ess9_do <- ess9_extract %>%
  as_survey_design(ids = psu, strata = stratum, weights = prob)

# Stefan Zins' design
# https://github.com/ropensci/essurvey/issues/39#issuecomment-507855290
ess9_sz <- ess9_extract %>%
  as_survey_design(ids = psu, strata = stratum, weights = dweight)

# results -----------------------------------------------------------------

list("AF_idno" = ess9_af1, "AF_psu" = ess9_af2,
     "WG_cntry" = ess9_wg1, "WG_stratum" = ess9_wg2,
     "MV_idno" = ess9_mv1, "MV_psu" = ess9_mv2, "DO_psu" = ess9_do,
     "SZ_psu" = ess9_sz) %>%
  map_dfr(
    ~ .x %>%
      filter(cntry == "GB") %>%
      group_by(wltdffr_group) %>%
      summarise(prop = srvyr::survey_mean(vartype = "se")),
    .id = "design"
  ) %>%
  filter(wltdffr_group == "Fair") %>%
  arrange(-prop_se)
# A tibble: 8 × 4
  design     wltdffr_group  prop prop_se
  <chr>      <fct>         <dbl>   <dbl>
1 MV_psu     Fair          0.200  0.0204
2 MV_idno    Fair          0.200  0.0166
3 WG_cntry   Fair          0.196  0.0128
4 AF_psu     Fair          0.196  0.0128
5 WG_stratum Fair          0.196  0.0125
6 SZ_psu     Fair          0.190  0.0116
7 DO_psu     Fair          0.191  0.0104
8 AF_idno    Fair          0.196  0.0102

@briatte briatte closed this as completed May 26, 2023
@briatte
Copy link
Owner Author

briatte commented May 26, 2023

Availability of weighting vars:

  • ESS 9 or 10 have required vars
  • ESS 7 or 8 require merging with SDDF
  • ESS 6 has anweight but psu and stratum have to be retrieved from individual SDDFs
  • ESS 5 and below do not have anweight, so even more work required

… so, use ESS 9 or 10 in examples, or use 7 or 8 for one more example of a merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant