# WPA #4 - Chapter 10

## Why do we overestimate others' willingness to pay?

<img src="https://virginialord.files.wordpress.com/2008/03/shack1.jpg" height="200" />

In this WPA, we will analyze data from Matthews et al. (2016): Why do we overestimate others' willingness to pay? The purpose of this research was to test if our beliefs about other people's affluence (i.e.; wealth) affect how much we think they will be willing to pay for items. You can find the full paper at http://journal.sjdm.org/15/15909/jdm15909.pdf.

### Study 1

In this WPA, we will analyze data from their first study. In study 1, participants indicated the proportion of other people taking part in the survey who have more than themselves, and then whether other people would be willing to pay more than them for each of 10 items.

The following table shows a table of the 10 projects and proportion of participants who indicated that others would be more willing to pay for the product than themselves (Table 1 in Matthews et al., 2016).

Product Number| Product | Reported p(other > self)
------------  | ------- | --------
1             | A freshly-squeezed glass of apple juice | .695
2             | A Parker ballpoint pen | .863
3             | A pair of Bose noise-cancelling headphones | .705
4             | A voucher giving dinner for two at Applebee's | .853
5             | A 16 oz jar of Planters dry-roasted peanuts | .774
6             | A one-month movie pass | .800
7             | An Ikea desk lamp | .863
8             | A Casio digital watch | .900
9             | A large, ripe pineapple| .674
10             | A handmade wooden chess set | .732

**Table 1**: Proportion of participants who indicated that the "typical participant" would pay more than they would for each product in Study 1.

**Study 1 variables description**

Here are descriptions of the data variables (taken from the author's dataset notes available at http://journal.sjdm.org/15/15909/Notes.txt)

- `id`: participant id code
- `gender`: participant gender. 1 = male, 2 = female
- `age`: participant age
- `income`: participant annual household income on categorical scale with 8 categorical options: Less than 5,000; 15,001–25,000; 25,001–35,000; 35,001–50,000; 50,001–75,000; 75,001–100,000; 100,001–150,000; greater than 150,000.
- `p1-p10`: whether the "typical" survey respondent would pay more (coded 1) or less (coded 0) than oneself, for each of the 10 products 
- `task`: whether the participant had to judge the proportion of other people who "have more money than you do" (coded 1) or the proportion who "have less money than you do" (coded 0)
- `havemore`: participant's response when task = 1
- `haveless`: participant's response when task = 0
- `pcmore`: participant's estimate of the proportion of people who have more than they do (calculated as 100-haveless when task=0)

## A. Revision: Loading and Saving data

1. If you created an R project last week (I recommended calling it `RCourse` or something similar), open this R project again. If you did not create an R project, instead set your working directory (using `setwd()`) to the location of the `Rcourse` folder you created last week. There should be at least two folders in this working directory: `data` and `R`.

2. Open a new R script and save it as **wpa_4_LastFirst.R** in the **R** folder.

3. The data are stored at https://github.com/laurafontanesi/RcourseSpring2019/blob/master/data/data_wpa4.csv. Load the data into R by using `read.csv()` into a new object called `matthews.df`. 

4. Using `write.table()`, save the data as a tab--delimited text file called `matthews.txt` in the data folder of your working directory.

5. `R` also has its own file format in which you can save data. Using `save()`, save a copy of the `matthews.df` data in the `.Rdata` format.

6. Look at the first few rows of `matthews.df` using `head()`, `View()`, and `str()`

7. Clean your workspace by running `rm(list = ls())`. You can check it has worked by re-running `head()`, `View()`, or `str()` on `matthews.df`

8. Reload the matthews data by using either `load()` or `read.table()`. Remember to use the correct file extension. 

In [1]:
# For Example:
setwd("~/git/RcourseSpring2019/")

In [2]:
matthews.df <- read.csv(file = "data/data_wpa4.csv", 
                        header = TRUE)

In [3]:
write.table(x = matthews.df, 
            file = "data/matthews.txt", 
            sep = "\t")

In [4]:
save(x = matthews.df, 
     file = "data/matthews.Rdata")

In [5]:
head(matthews.df)

id,gender,age,income,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,task,havemore,haveless,pcmore
R_3PtNn51LmSFdLNM,2,26,7,1,1,1,1,1,1,1,1,1,1,0,,50.0,50
R_2AXrrg62pgFgtMV,2,32,4,1,1,1,1,1,1,1,1,1,1,0,,25.0,75
R_cwEOX3HgnMeVQHL,1,25,2,0,1,1,1,1,1,1,1,0,0,0,,10.0,90
R_d59iPwL4W6BH8qx,1,33,5,1,1,1,1,1,1,1,1,1,1,0,,50.0,50
R_1f3K2HrGzFGNelZ,1,24,1,1,1,0,1,1,1,1,1,1,1,1,99.0,,99
R_3oN5ijzTfoMy4ca,1,22,2,1,1,0,0,1,1,1,1,0,1,0,,20.0,80


In [6]:
rm(list = ls())
head(matthews.df)

ERROR: Error in head(matthews.df): object 'matthews.df' not found


In [7]:
load(file = "data/matthews.Rdata")

#or

matthews.df<- read.table(file = "data/matthews.txt", 
                         sep = "\t",
                         header = TRUE)

## B. Data Frame Manipulation

9. What are the names of the data columns?

10. Currently gender is coded as 1 and 2. Let's create a new character column called `gender.a` that codes the data as `male` and `female`. 

11. What percent of participants were male?

12. What was the mean age?

13. Create a new dataframe called `product.df` that only contain columns p1, p2, ... p10 from `matthews.df`. (Hint: Use `paste()`)

14. The `colMeans()` function takes a dataframe as an argument, and returns a vector showing means across rows for each column of data. Using `colMeans()`, calculate the percentage of participants who indicated that the 'typical' participant would be willing to pay more than them for each item. Do your values match what the authors reported in Table 1?

15. The `rowMeans()` function is like `colMeans()`, but for calculating means across columns for every row of data. Using `rowMeans()` calculate, for each participant, the percentage of the 10 items that the participant believed other people would spend more on. Save this data as a vector called `pall`.

16. Add the `pall` vector as a new column called `pall` to the `matthews.df` dataframe

17. What was the mean value of `pall` across participants? This value is the answer to the question: "How often does the average participant think that someone else would pay more for an item than themselves?"

18. I created a new table containing fictional demographic information about each participant. The data are stored at https://github.com/laurafontanesi/RcourseSpring2019/blob/master/data/matthews_demographics.csv. Load the data into an object called  `demo.df` into R.

19. Using `merge` add the demographic data to `matthews.df`

20. Using either basic indexing or `subset()`, calculate the mean age for males only.

21. Using either basic indexing or `subset()`, calculate the mean age for females only.

22. Using `aggregate()` calculate the mean age of male and female participants separately. Do you get the same answers as before?

23. Using `aggregate()` calculate the mean `pall` value for male and female participants separately. Which gender tends to think that others would pay more for products than them?

24.  Using `aggregate()` calculate the mean `pall` value of participants for each level of income. Do you find a consistent relationship between `pall` and income?

25.  Now repeat the previous analysis, but only for females (Hint: use the `subset` argument within the `aggregate` function)

26. What was the mean age for participants for each combination of gender and income?

27. The variable `pcmore` reflects the question: "What percent of people taking part in this survey do you think earn more than you do?". Using `aggregate()`, calculate the median value of this variable separately for each level of income. What does the result tell you?

In [8]:
names(matthews.df)

#or 
colnames(matthews.df)

In [9]:
# Create a new column called gender.a that codes gender as a string
matthews.df$gender.a <- "male"
matthews.df$gender.a[matthews.df$gender == 2] <- "female"


# or:
matthews.df$gender.a <- factor(matthews.df$gender, 
                               levels = c(1, 2), 
                               labels = c("male", "female"))

In [10]:
mean(matthews.df$gender.a == "male")*100

In [11]:
mean(matthews.df$age)

In [12]:
# Create product.df, a dataframe containing only columns p1, p2, ... p10
product.df <- matthews.df[,paste("p", 1:10, sep = "")]

In [13]:
colMeans(product.df)
# or: 
colMeans(product.df==1)

In [14]:
pall <- rowMeans(product.df)

In [15]:
matthews.df$pall <- pall

In [16]:
mean(matthews.df$pall)

In [17]:
demo.df <- read.csv("https://raw.githubusercontent.com/laurafontanesi/RcourseSpring2019/master/data/matthews_demographics.csv", 
                    header = TRUE)

In [18]:
matthews.df <- merge(x = matthews.df,
                     y = demo.df,
                     by = "id")

In [19]:
mean(matthews.df$age[matthews.df$gender.a=="male"])

# or
with(subset(matthews.df, gender.a == "male"), mean(age))

In [20]:
mean(matthews.df$age[matthews.df$gender.a=="female"])

In [21]:
aggregate(formula = age ~ gender.a,
          FUN = mean,
          data = matthews.df)

gender.a,age
male,29.76471
female,34.98592


In [22]:
aggregate(formula = pall ~ gender.a,
          FUN = mean,
          data = matthews.df)

gender.a,pall
male,0.7764706
female,0.8014085


In [23]:
aggregate(formula = pall ~ income,
          FUN = mean,
          data = matthews.df)

income,pall
1,0.9037037
2,0.8044444
3,0.737037
4,0.7862069
5,0.75
6,0.6958333
7,0.8142857
8,0.8666667


In [24]:
aggregate(formula = pall ~ income,
          FUN = mean,
          data = matthews.df,
          subset = gender.a == "female")

income,pall
1,0.8875
2,0.8294118
3,0.8857143
4,0.8625
5,0.75
6,0.6785714
7,0.76
8,0.9


In [25]:
aggregate(formula = age ~ income + gender.a,
          FUN = mean,
          data = matthews.df)

income,gender.a,age
1,male,28.73684
2,male,30.14286
3,male,29.45
4,male,28.52381
5,male,31.0
6,male,29.6
7,male,43.5
8,male,23.0
1,female,31.125
2,female,36.35294


In [26]:
aggregate(formula = pcmore ~ income,
          FUN = median,
          data = matthews.df)

income,pcmore
1,80
2,75
3,50
4,60
5,50
6,45
7,50
8,50


## C. Matrices

28. Columns 2 to 18 of `matthews.df` contain only numeric values. Using the `as.matrix` function on these columns, create a matrix of these values, and call it `matthews.mx`.

29. Use `head()`, `View()` and `str` to see how this matrix compares to the original data.frame.

30. We can use the `apply(X, MARGIN, FUN)` function to perform vector functions on either every row or every column of our matrix. The `MARGIN` argument determines whether the function is applied to the rows (1), columns (2) or both (1:2) of the matrix, and the `FUN` argument specifies the function to be applied. Use `?apply` to see details. For instance the code below returns the mean for each of the 10 item columns (p1, p2 ... p10), similar to colMeans.

31. `apply()` is more flexible than `colMeans` or `rowMeans`, as we can specify different functions. Use `apply()` and `table` to obtain the frequency of each response for each of the 10 items.

32. Pretend we just discovered there was another particpant whose data we lost. We decide to estimate the data for this participant (including their gender, age, income etc.) by calculating the median of the other participants' scores. Use `apply` to calculate these medians, and save the results as `participant.191`.

33. Add a new row to the matrix `matthews.mx` containing participant 191's estimated data. (Hint use rbind). Call this new matrix `new.matthews`

34. Add the column containing gender coded as `male`, `female` from `matthews.df` to `matthews.mx` (Hint: use `cbind`). Call the resuling matrix `matthews.gender.mx`. What happens to the data?

35. Using `save()`, save `matthews.df`, `matthews.mx`, `matthews.gender.mx` and `new.matthews` objects to a file called `wpa4data.RData` in the `data` folder in your working directory.

In [27]:
matthews.mx <- as.matrix(matthews.df[,2:18])

In [28]:
str(matthews.df[,2:18])

'data.frame':	190 obs. of  17 variables:
 $ gender  : int  2 1 1 2 1 1 1 1 1 1 ...
 $ age     : int  45 25 28 44 30 29 33 24 27 24 ...
 $ income  : int  4 3 1 6 2 3 1 6 2 5 ...
 $ p1      : int  1 1 0 1 1 0 1 0 1 1 ...
 $ p2      : int  1 0 1 1 1 1 1 1 0 0 ...
 $ p3      : int  1 1 1 0 1 0 1 0 1 1 ...
 $ p4      : int  1 0 1 0 1 1 1 1 1 1 ...
 $ p5      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ p6      : int  0 1 0 1 0 1 1 1 1 1 ...
 $ p7      : int  1 1 1 1 1 1 1 0 1 0 ...
 $ p8      : int  1 1 1 1 1 1 1 0 1 1 ...
 $ p9      : int  1 0 1 1 0 0 1 1 1 1 ...
 $ p10     : int  1 0 1 1 0 1 1 1 0 0 ...
 $ task    : int  1 1 1 1 1 1 1 0 0 0 ...
 $ havemore: int  80 60 75 50 65 50 70 NA NA NA ...
 $ haveless: int  NA NA NA NA NA NA NA 50 40 45 ...
 $ pcmore  : int  80 60 75 50 65 50 70 50 60 55 ...


In [29]:
str(matthews.mx)

# The matrix version doesn't give information on the data in each column. 
# Instead it treats it like a single vector of data.
# You can think of a matrix as a vector which has been given dimensionality, while 
# a data.frame is more like a collection of lists.

 int [1:190, 1:17] 2 1 1 2 1 1 1 1 1 1 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:17] "gender" "age" "income" "p1" ...


In [30]:
apply(X=matthews.mx[,4:13], MARGIN=2, FUN=mean)

# similar to:
apply(product.df, 2, mean)

In [31]:
apply(X=matthews.mx[,4:13], MARGIN=2, FUN=table)

# using frequency is wrong, look:
apply(matthews.mx[,4:13], 2, frequency)

Unnamed: 0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10
0,58,26,56,28,43,38,26,19,62,51
1,132,164,134,162,147,152,164,171,128,139


In [32]:
participant.191<- apply(X=matthews.mx, MARGIN=2, FUN=median)
participant.191

# You can also pass additional arguments to a function when using apply. 
#For instance the median function allows you to specify how to treat NA values.
#By passing TRUE to na.rm, we can ignore NAs when calculating the medians.

apply(X=matthews.mx, MARGIN=2, FUN=median, na.rm=T)

In [33]:
new.matthews<- rbind(matthews.mx, participant.191)

In [34]:
matthews.gender.mx<- cbind(matthews.mx, matthews.df$gender.a)
head(matthews.gender.mx)

#Everything is changed to characters. Matrices can't deal with a mixture of numeric and character columns (or items)

gender,age,income,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,task,havemore,haveless,pcmore,Unnamed: 17
2,45,4,1,1,1,1,1,0,1,1,1,1,1,80,,80,2
1,25,3,1,0,1,0,1,1,1,1,0,0,1,60,,60,1
1,28,1,0,1,1,1,1,0,1,1,1,1,1,75,,75,1
2,44,6,1,1,0,0,1,1,1,1,1,1,1,50,,50,2
1,30,2,1,1,1,1,1,0,1,1,0,0,1,65,,65,1
1,29,3,0,1,0,1,1,1,1,1,0,1,1,50,,50,1


In [35]:
save(matthews.df,  matthews.mx,  matthews.gender.mx,  new.matthews,  
            file = "data/matthews.Rdata")

### That's it! Now it's time to submit your assignment!

Save and email your `wpa_4_LastFirst.R` file to me at [laura.fontanesi@unibas.ch](mailto:laura.fontanesi@unibas.ch). 

Assignments sent after Sunday 31st March will not be considered (to pass the course you have to hand in all assignments for each week). 