# [Workshop 5. Assessment of Survey Data Quality](https://wapor.org/events/annual-conference/current-conference/training-workshops/)

# Part D: Response styles

Author: <a href="mailto:Dimitri.Prandner@jku.at?subject=Regarding the WAPOR 2023 workshop">Dimitri Prandner @ JKU</a>

Date: September 19, 2023

## Requirements

To run the script without errors, the required data needs to be in the `data` directory and the relevatn R packages are installed and loaded. For the data, please review the DataDownload notebook and the R packages are take care of with sourcing `install.R`.

In [None]:
source("install.R")

## Prepare the data

At first we load the data for our short experiment:

In [None]:
# Read data for UK and Austria
uk_ten <- readRDS("data/ESS10.Rdata") %>%
  dplyr::filter(cntry == "GB") 
at_ten <- readRDS("data/ESS10SC.Rdata") %>%
  dplyr::filter(cntry == "AT")

In our second step we select the variables of interest:
- ID numbers of the cases,
- The information on where (`cntry`) the data was collected
- Five variables that include different assessements. 

You can look up information on the data via the ESS homepage:

- [ESS10 Source Questionnaire](https://stessrelpubprodwe.blob.core.windows.net/data/round10/fieldwork/source/ESS10_source_questionnaires.pdf)
- [ESS10 Codebook](https://stessrelpubprodwe.blob.core.windows.net/data/round10/survey/ESS10_appendix_a7_e03_1.pdf)
- [ESS10 Source Questionnaire (Self completion, Paper)](https://stessrelpubprodwe.blob.core.windows.net/data/round10/fieldwork/source/ESS10_source_questionnaire_self_completion_paper.pdf)
- [ESS10 Codebook (Self completion)](https://stessrelpubprodwe.blob.core.windows.net/data/round10/survey/ESS10SC_appendix_a7_e03_0.pdf)

In [None]:
# Define variables for experiments
Exp_Vars <- c("gvhanc19", "gvjobc19", "gveldc19", "gvfamc19", "hscopc19")
# Define variables for subsetting
Subset_Vars <- c("idno", "cntry", Exp_Vars)

# Combine both datasets and select variables of interest
dataWS <- rbind(uk_ten[, Subset_Vars], 
                at_ten[, Subset_Vars])

# Print the merged dataset
head(dataWS, n=20)

In [None]:
# Create a table of counts for countries "AT" and "UK" to verify everything worked correctly x
table(dataWS$cntry)

In [None]:
table(dataWS$gvhanc19,
      dataWS$cntry,
      useNA="always")

## Response styles

Now we try to get info on response styles.
This is a first experiment. It is simply creating information on the amount of times someone agreed to a statement or used only the extreme categories at the far ends of a scale or only selected the same answers on all the items. 

### Experiment 1: Count extremes

In [None]:
# count "extremes" #
lapply(Exp_Vars, 
       function(VarName){
         sum(dataWS[VarName]  == 0 | # Extremely dissatisfied, disagree
             dataWS[VarName]  == 10, # Extremely satisfied, agree
             na.rm=TRUE)
       }) %>%
  setNames(Exp_Vars) 

### Experiment 2: Count yes

In [None]:
# Count "yes"
lapply(Exp_Vars, 
       function(VarName){
         sum(dataWS[VarName] == 10,
             na.rm=TRUE)
       }) %>%
  setNames(Var_Selection[-c(1,2)])

### Experiment 3: Count extremes by case

In [None]:
# Function to count extremes by case
calculateExtremeValues <- function(data, Exp_Vars) {
  extreme_count <- rowSums(data[Exp_Vars] == 0 | data[Exp_Vars] == 10)
  data$ExtremeValues <- extreme_count
  return(data)
}

# Execute funtion
dataWS <- calculateExtremeValues(dataWS, Exp_Vars)

# Show first 20 cases of data
head(dataWS, n=20)

### Experiment 4: Count yes by case

In [None]:
# Function to count yes by case
calculateYesValues <- function(data, Exp_Vars) {
  yes_count <- rowSums(data[Exp_Vars] == 10)
  data$YesValues <- yes_count
  return(data)
}

# Execute funtion
dataWS <- calculateYesValues(dataWS, Exp_Vars)

# Show first 20 cases of data
head(dataWS, n=20)

### Categorize answering tendency

In [None]:
# Categorize extremes 
dataWS$ExtremeCat[dataWS$ExtremeValues<3] <- 0
dataWS$ExtremeCat[dataWS$ExtremeValues==3] <- 1
dataWS$ExtremeCat[dataWS$ExtremeValues==4] <- 2
dataWS$ExtremeCat[dataWS$ExtremeValues==5] <- 3
dataWS$ExtremeCat <-  factor(dataWS$ExtremeCat, 
                             levels = 0:3,
                             labels = c("no tendency", "some tendency", 
                                        "tendency", "complete tendency"),
                             ordered = TRUE)

# Categorize yes
dataWS$YesCat[dataWS$YesValues<3] <- 0
dataWS$YesCat[dataWS$YesValues==3] <- 1
dataWS$YesCat[dataWS$YesValues==4] <- 2
dataWS$YesCat[dataWS$YesValues==5] <- 3
dataWS$YesCat <-  factor(dataWS$YesCat, 
                             levels = 0:3,
                             labels = c("no tendency", "some tendency", 
                                        "tendency", "complete tendency"),
                             ordered = TRUE)

### Experiment 5: Identify straightlining

Here all cases identified, which answer the same value on all five variables. Here we identify this cases by checking if the minimum and maximum are identical.

In [None]:
## function to compute sum / mean scores with .X function ##
dataWS$Straight <- apply(dataWS[Exp_Vars],1,
                         function(x) min(x) == max(x))

### Summary of response styles

First, we take a look at the newly created variables.

In [None]:
dataWS %>%
  dplyr::select(idno, ExtremeCat, YesCat, Straight) %>%
  datatable()

#### Extreme categories

Now, we check the distributions of the extreme categories by country.

In [None]:
table(dataWS$ExtremeCat,
                    dataWS$cntry) %>%
  prop.table(2) %>%
  round(3)

And testing for differences across countries:

In [None]:
wilcox.test(as.numeric(ExtremeCat)~cntry, dataWS)

#### Yes categories

In [None]:
table(dataWS$YesCat,
      dataWS$cntry) %>%
  prop.table(2) %>%
  round(3)

Test for differences across countries:

In [None]:
wilcox.test(as.numeric(YesCat)~cntry, dataWS)

#### Straigthlining

In [None]:
table(dataWS$Straight,
      dataWS$cntry) %>%
  prop.table(2) %>%
  round(3)

Test for differences across countries

In [None]:
wilcox.test(as.numeric(Straight)~cntry, dataWS)