Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many QOG variables have low-N for cross-sectional analysis #29

Open
Tracked by #31
briatte opened this issue Jan 30, 2021 · 0 comments
Open
Tracked by #31

Many QOG variables have low-N for cross-sectional analysis #29

briatte opened this issue Jan 30, 2021 · 0 comments
Assignees

Comments

@briatte
Copy link
Owner

briatte commented Jan 30, 2021

library(tidyverse)
d <- haven::read_dta('/Users/fr/Documents/Teaching/SRQM/data/qog2019.dta')

tibble(
  var = names(d),
  # data sources
  src = str_extract(names(d), ".*?_"),
  n = apply(d, 2, function(x) sum(!is.na(x)))
) %>% 
  group_by(src) %>% 
  summarise(n_vars = n(), min_N = min(n), max_N = max(n)) %>%
  arrange(min_N) %>% 
  # arbitrary threshold at N = 50
  filter(!is.na(src), min_N < 50) %>% 
  print(n = 100)

PSI, EU, OECD, WWBI and a few others are particularly at fault:

# A tibble: 28 x 5
   src     n_vars min_N med_N max_N
   <chr>    <int> <int> <dbl> <int>
 1 psi_         6     1  10.5    20
 2 mad_         4    15  29     163
 3 eu_        277    16  34      48
 4 une_        47    16 146     193
 5 wwbi_       38    17  41      62
 6 oecd_      281    19  37      44
 7 wdi_       278    19 156     192
 8 dev_         4    20  20      20
 9 dpi_        70    26 160.    175
10 bs_          8    28  28      28
11 ess_         9    28  28      28
12 ideavt_      6    28 107     180
13 wel_        36    29  32     189
14 wvs_        42    29  34      34
15 aid_         6    31 139     139
16 cses_        2    31  31.5    32
17 gol_        20    33 127     129
18 wiid_       18    34  35      35
19 ucdp_        2    35  70     105
20 cpds_       49    36  36      36
21 h_          11    37 165     185
22 lis_        23    37  37      37
23 r_           5    40  98     144
24 sgi_        29    41  41      41
25 top_         2    41  41      41
26 nelda_      10    44  45      45
27 vi_         13    45  48      50
28 qs_          9    47 112     115

Not a bug, but leads students to build designs with low sample sizes.

@briatte briatte self-assigned this Jan 30, 2021
@briatte briatte mentioned this issue Jan 30, 2021
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant