# WPA #3 - Chapters 8 and 9

## Project management

Whenever you want to **load** or **save** data in R you need to tell R where the data is to be saved or where the data is being loaded from. Last week, for instance, we loaded data directly from a url. This week we'll load data that is saved on our machine. To do this we need to set a working directory. This is the default folder R will look in to save or load data. If we wish to load (or save) data from a different folder, we can either changed the working directory, or specify the filepath from the current working directory to the desired folder.

There are two methods we can use for setting our working directory. The instructions below describe how to load your data using both methods. You only need to do one. IF you want to try both, make sure you use different folder names.

### Method One- setwd()

1. Outside R, create a new folder called `Rcourse` (or any name you choose). Create two separate subfolders within `Rcourse` called `R` and `data`.

2. Open a new R script called `wpa_3_LastFirst.R` (where LastFirst is your last and first name), and save it in the `R` folder.

3. Use the `setwd` function to change the working directory to the `Rcourse` folder. You need to give the filepath to this folder. You can also check the current directory by using `getwd`,

In [1]:
setwd("~/git/RcourseSpring2019/")
# This should be the filepath to the Rcourse folder on your machine.

getwd()
# This matches the path you just entered.

### Method Two- R-project

You can also control your working directory by creating an R-project in RStudio (see [Chapter 9](https://bookdown.org/ndphillips/YaRrr/importingdata.html) for more details on R-projects). Basically working in an R-project automatically sets your working directory to the location of the R-project file (`.RProj`).

1. Start a new R-project called `RCourse` (or something similar) using the menus. Then (either within RStudio or outside of R), navigate to the location of your `RCourse` project, and add the folders `R` and `data`.

2. Open a new R script called `wpa_3_LastFirst.R` (where LastFirst is your last and first name), and save it in the `R` folder.

## Loading data

1. The text file containing the data is called `data_wpa3.csv`. It is available at https://github.com/laurafontanesi/RcourseSpring2019/blob/master/data/data_wpa3.csv. Save the file in the `data` folder that you previously created.

2. Using `read.table()` load the data into a new R object called `priming`. Note that the text file is comma-delimited and contains a header row, so be sure to include the `sep = ","` and `header = TRUE` arguments! As `priming.txt` is in a subfolder of your working directory, you need to include the subfolder in the filename.

In [2]:
priming <- read.table('data/data_wpa3.csv', 
                      header=TRUE, 
                      sep=',')

### Dataset description

In a provocative paper, Bargh, Chen and Burrows (1996) sought to test whether or not priming people with trait concepts would trigger trait-consistent behavior. In one study, they primed participants with either neutral words (e.g.; bat, cookie, pen), or with words related to an elderly stereotype (e.g.; wise, stubborn, old). They then, unbeknownst to the participants, used a stopwatch to record how long it took the participants to walk down a hallway at the conclusion of an experiment. They predicted that participants primed with words related to the elderly would walk slower than those primed with neutral words. 

In this WPA, you will analyze *fake* data corresponding to this study. 

Our fake study has 3 primary independent variables:

- `prime`: What kind of primes was the participant given? `neutral` means neutral primes, `elderly` means elderly primes.
- `prime.duration`: How long (in minutes) were primes displayed to participants? There were 4 conditions: 1, 5, 10 or 30.
- `grandparents`: Did the participant have a close relationship with their grandparents? `yes` means yes, `no` means no, `none` means they never met their grandparents.

There was one primary dependent variable

- `walk`: How long (in seconds) did participants take to walk down the hallway?

There were 4 additional variables:

- `id`: The order in which participants completed the study
- `age`: Participants' age
- `sex`: Participants' sex
- `attention`: Did the participant pass an attention check? 0 means they failed the attention check, 1 means they passed.

## Now it's your turn:

### A. Understanding and cleaning the data

1. Get to know the data using `View()`, `summary()`, `head()` and `str()`.

2. Look at the names of the dataframe with `names()`. Those aren't very informative are they? Change the names to the correct values (make sure to use the naming scheme I describe in the dataset description).

In [3]:
summary(priming)
head(priming)
str(priming)

       X               a         b             c            d        
 Min.   :  1.0   Min.   :  1.0   f:252   Min.   :19   Min.   :0.000  
 1st Qu.:125.8   1st Qu.:125.8   m:248   1st Qu.:21   1st Qu.:1.000  
 Median :250.5   Median :250.5           Median :22   Median :1.000  
 Mean   :250.5   Mean   :250.5           Mean   :22   Mean   :0.886  
 3rd Qu.:375.2   3rd Qu.:375.2           3rd Qu.:23   3rd Qu.:1.000  
 Max.   :500.0   Max.   :500.0           Max.   :25   Max.   :1.000  
       e             f            g             h         
 asdf   :153   Min.   : 1.00   no  :175   Min.   :-92.35  
 elderly:161   1st Qu.: 1.00   none:157   1st Qu.: 25.40  
 neutral:186   Median :10.00   yes :168   Median : 34.70  
               Mean   :12.23              Mean   : 30.10  
               3rd Qu.:20.00              3rd Qu.: 36.40  
               Max.   :60.00              Max.   : 51.50  

X,a,b,c,d,e,f,g,h
1,1,m,21,1,asdf,1,no,25.4
2,2,m,21,1,asdf,30,no,23.6
3,3,f,22,1,asdf,30,none,34.5
4,4,m,23,1,elderly,1,yes,40.4
5,5,m,23,1,asdf,10,none,25.0
6,6,m,22,1,asdf,10,yes,24.7


'data.frame':	500 obs. of  9 variables:
 $ X: int  1 2 3 4 5 6 7 8 9 10 ...
 $ a: int  1 2 3 4 5 6 7 8 9 10 ...
 $ b: Factor w/ 2 levels "f","m": 2 2 1 2 2 2 1 2 1 2 ...
 $ c: int  21 21 22 23 23 22 20 24 21 22 ...
 $ d: int  1 1 1 1 1 1 1 1 1 1 ...
 $ e: Factor w/ 3 levels "asdf","elderly",..: 1 1 1 2 1 1 1 2 1 3 ...
 $ f: int  1 30 30 1 10 10 10 1 10 5 ...
 $ g: Factor w/ 3 levels "no","none","yes": 1 1 2 3 2 3 3 2 1 2 ...
 $ h: num  25.4 23.6 34.5 40.4 25 24.7 35.3 34 35.4 24.4 ...


In [4]:
names(priming)

In [5]:
names(priming) <- c("index", 
                    "id", 
                    "sex", 
                    "age", 
                    "attention", 
                    "prime", 
                    "prime.duration", 
                    "grandparents", 
                    "walk")

In [6]:
head(priming)

index,id,sex,age,attention,prime,prime.duration,grandparents,walk
1,1,m,21,1,asdf,1,no,25.4
2,2,m,21,1,asdf,30,no,23.6
3,3,f,22,1,asdf,30,none,34.5
4,4,m,23,1,elderly,1,yes,40.4
5,5,m,23,1,asdf,10,none,25.0
6,6,m,22,1,asdf,10,yes,24.7


### B. Applying functions to columns

1. What was the mean participant age?

2. How many participants were there from each sex?

3. What was the median walking time?

4. What *percent* of participants passed the attention check (Hint: To calculate a percentage from a 0, 1 variable, use `mean()`)

5. Walking time is currently in seconds. Add a new column to the dataframe called `walking.m` That shows the walking time in minutes rather than seconds.

In [7]:
mean(priming$age)
#or
with(priming, mean(age))

In [8]:
table(priming$sex)

#or 
summary(priming$sex)


  f   m 
252 248 

In [9]:
median(priming$walk)

In [10]:
mean(priming$attention)*100

# or
mean(priming$attention==1)*100

In [11]:
priming$walking.m <- priming$walk / 60

### C. Indexing and subsettting dataframes

*Try to split your answers to these problems into two steps*

*Step 1: Index or subset the original data and store as a new object with a new name.*

*Step 2: Calculate the appropriate summary statistic using the new, subsetted object you just created.*

1. What were the sexes of the first 10 participants?

2. What was the data for the 50th participant?

3. What was the mean walking time for the elderly prime condition?

4. What was the mean walking time for the neutral prime condition?

5. What was the mean walking time for participants less than 23 years old?

6. What was the mean walking time for females with a close relationship with their grandparents?

7. What was the mean walking time for males over 24 years old *without* a close relationship with their grandparents?

In [12]:
priming.10 <- subset(priming,
                     subset = id <= 10)

priming.10$sex

#OR

priming$sex[1:10]

In [13]:
subset(priming, subset = id == 50)

# OR

priming[50,]
#Remember when indexing dataframes to specify both the rows (i.e. 50) and the the columns (by leaving the column index blank we retrieve all columns)

Unnamed: 0,index,id,sex,age,attention,prime,prime.duration,grandparents,walk,walking.m
50,50,50,m,21,1,elderly,1,none,34.3,0.5716667


Unnamed: 0,index,id,sex,age,attention,prime,prime.duration,grandparents,walk,walking.m
50,50,50,m,21,1,elderly,1,none,34.3,0.5716667


In [14]:
priming.e <- subset(priming, subset = prime == "elderly")

mean(priming.e$walk)

# OR

walk.elderly<- priming$walk[priming$prime=="elderly"] 
#or
walk.elderly<- with(priming, walk[prime=="elderly"])

mean(walk.elderly)

# The first method results in a dataframe with all columns but only the subset of rows/participants with an elderly prime. 
# The second two methods both result in a vector of walking times for participants in the elderly condition 

In [15]:
walk.neutral<- priming$walk[priming$prime=="neutral"] 
mean(walk.neutral)

#could also use with, or subset as above

In [16]:
walk.23<- priming$walk[priming$age < 23]
mean(walk.23)

#could also use with, or subset as above

In [17]:
priming.fclose <- subset(priming, subset = sex == "f" & grandparents == "yes")

mean(priming.fclose$walk)

# OR

walk.fclose<-priming$walk[priming$sex=="f"& priming$grandparents=="yes"] 
#or
walk.fclose<-with(priming, walk[sex=="f"& grandparents=="yes"] )

mean(walk.fclose)

In [18]:
walk.m24notclose<-with(priming, walk[sex=="m"& grandparents!="yes"& age>24] )
mean(walk.m24notclose)

#check the vector to see why
walk.m24notclose
# the vector is empty, meaning no participants meet these criteria

#alternative method
priming.m24notclose <- subset(priming,
              subset = sex == "m" &
                age > 24 &
                grandparents %in% c("no", "none"))


mean(priming.m24notclose$walk)

### D. Creating new dataframe objects

1. Create a new dataframe called `priming.att` that *only* includes rows where participants passed the attention check. (Hint: use indexing or `subset()`)

2. Some of the data don't make any sense. For example, some walking times are negative, some prime values aren't correct, and some prime.duration values weren't part of the original study plan. Create a new dataframe called `priming.c` (aka., priming clean) that *only* includes rows with valid values for each column -- do this by looking for an few strange values in each column, and by looking at the original dataset description. Additionally, *only* include participants who passed the attention check.

3. How many participants gave valid data and passed the attention check? (Hint: Use the result from your previous answer!)

4. Of those participants who gave valid data and passed the attention check, what was the mean walking time of those given the elderly and neutral prime (calculate these separately).

In [19]:
priming.att <- subset(priming, subset = attention == 1)
#or
priming.att<- priming[priming$attention==1,]

In [20]:
unique(priming$prime)
unique(priming$prime.duration)

In [21]:
# Create priming.c, a subset of the original priming data
#  (replace __ with the appropriate values)
priming.c <- subset(x = priming,
                    subset = attention == 1 &
                             prime %in% c("elderly", "neutral") &
                             prime.duration %in% c(1, 5, 10, 30) &
                             walk > 0)

#or 
priming.c <- subset(x = priming.att, 
                    subset = walk > 0 & 
                             prime %in% c('elderly', 'neutral') &
                             prime.duration %in% c(1, 5, 10, 30))

In [22]:
nrow(priming.c)
#or
dim(priming.c)[1]
#dim gives a vector c(nrows, ncolumns)
# or 
length(priming.c$id)

# NOTE: str(priming.c) is not a good answer because it doesn't return a single number

In [23]:
priming.c.eld <- subset(priming.c, subset = prime == "elderly")
priming.c.neu <- subset(priming.c, subset = prime == "neutral")

mean(priming.c.eld$walk)
mean(priming.c.neu$walk)

#or

walk.c.eld<- priming.c$walk[priming.c$prime=="elderly"]
walk.c.neut<-priming.c$walk[priming.c$prime=="neutral"]

mean(walk.c.eld)
mean(walk.c.neut)

#or
aggregate(walk~prime, data=priming.c, FUN=mean)

prime,walk
elderly,41.93209
neutral,30.25669


### E. Saving and loading data

1. Save your two dataframe objects `priming` and `priming.c` in an .RData file called `priming.RData` in the data folder of your project

2. Save your `priming.c` object as a tab--delimited text file called `priming_clean.txt` in the data folder of your project.

3. Clean your workspace by running `rm(list = ls())`

4. Re-load your two dataframe objects using `load()`.

5. A colleague of yours wants access to the data from the females given the neutral prime in your experiment. Create a dataframe called `priming.f` that only includes these data. Additionally, do *not* include the `id` column as this could be used to identify the participants.

6. Save your `priming.f` object as a tab--delimited text file called `priming_females.txt` in the data folder of your project.

7. Save your entire workspace using `save.image` to an .RData file called `priming_ws.RData` in the data folder of your project.

In [24]:
save(priming, priming.c, file = "data/priming.RData")

In [25]:
write.table(priming.c, file = "data/priming_clean.txt", sep = "\t")

In [26]:
rm(list = ls())

In [27]:
load(file = "data/priming.RData")

In [28]:
priming.f <- subset(priming, 
                    subset = sex == "f" & prime=="neutral",
                    select = c("sex", "age", "attention", "prime", "prime.duration", "grandparents", "walk", "walking.m"))

#OR

priming.f<- priming[priming$sex=="f" & priming$prime=="neutral", -1]
# a column index of -1 means all columns except column 1.

In [29]:
write.table(priming.f, file = "data/priming_females.txt", sep = "\t")

In [30]:
save.image(file = "data/priming_ws.RData")

### F. Final steps...

The following questions apply to your cleaned dataframe (`priming.c`)

1. Did the effect of priming condition on walking times differ between the first 100 and the last 100 participants (Hint: Make sure to index the data using `id`!)?

2. Due to a computer error, the data from every participant with an even id number is invalid. Remove these data from your `priming.c` dataframe.

3. Do you find evidence that a participant's relationship with their grandparents affects how they responded to the primes?

In [31]:
length(priming.c$id)
length(priming$id)

In [32]:
# First 100 participants
neutral.f100 <- subset(priming.c, id <= 100 & prime == "neutral")
elderly.f100 <- subset(priming.c, id <= 100 & prime == "elderly")

# Difference between conditions in first 100
mean(elderly.f100$walk) - mean(neutral.f100$walk)

# Last 100 participants
neutral.l100 <- subset(priming.c, id >= 400 & prime == "neutral")
elderly.l100 <- subset(priming.c, id >= 400 & prime == "elderly")

# Difference between conditions in last 100
mean(elderly.l100$walk) - mean(neutral.l100$walk)


# Answer: The results appear similar

In [33]:
priming.c <- priming.c[priming.c$id %in% seq(1, 499, 2),]

# or 
priming.c <- subset(priming.c,subset=priming.c$id %% 2 != 0)

In [34]:
# No relationship conditions
neutral.no <- subset(priming.c, 
                      subset = grandparents == "no" & 
                        prime == "neutral")

elderly.no <- subset(priming.c, 
                      subset = grandparents == "no" & 
                        prime == "elderly")

# Condition effect for grandparents == "no"
mean(elderly.no$walk) - mean(neutral.no$walk)

# Yes relationship conditions
neutral.yes <- subset(priming.c, 
                      subset = grandparents == "yes" & 
                        prime == "neutral")

elderly.yes <- subset(priming.c, 
                      subset = grandparents == "yes" & 
                        prime == "elderly")

# Condition effect for grandparents == "yes"
mean(elderly.yes$walk) - mean(neutral.yes$walk)

# none relationship conditions
neutral.none <- subset(priming.c, 
                      subset = grandparents == "none" & 
                        prime == "neutral")

elderly.none <- subset(priming.c, 
                      subset = grandparents == "none" & 
                        prime == "elderly")

# Condition effect for grandparents == "none"

mean(elderly.none$walk) - mean(neutral.none$walk)

# Answer: It appears that the effect was strongest for people with a close relationship with their grandparents

In [35]:
# ALternatively (more efficiently)

walk.mns<- aggregate(walk~prime*grandparents, data=priming.c, FUN=mean)

# Effect for grandparents == "no"
walk.mns[1,3]-walk.mns[2,3]

# Effect for grandparents == "yes"
walk.mns[5,3]-walk.mns[6,3]

# Effect for grandparents == "none"
walk.mns[3,3]-walk.mns[4,3]

# You can look at the data.frame to determine appropriate indexing, but it is determined alphabetically, so you can determine it without looking. Alternatively you could use logical indexing. For Example:

walk.mns[walk.mns$prime=="elderly"&walk.mns$grandparents=="no",3]-walk.mns[walk.mns$prime=="neutral"&walk.mns$grandparents=="no",3]

walk.mns[walk.mns$prime=="elderly"&walk.mns$grandparents=="yes",3]-walk.mns[walk.mns$prime=="neutral"&walk.mns$grandparents=="yes",3]

walk.mns[walk.mns$prime=="elderly"&walk.mns$grandparents=="none",3]-walk.mns[walk.mns$prime=="neutral"&walk.mns$grandparents=="none",3]

### That's it! Now it's time to submit your assignment!

Save and email your `wpa_3_LastFirst.R` file to me at [laura.fontanesi@unibas.ch](mailto:laura.fontanesi@unibas.ch). 

Assignments sent after Sunday 24th March will not be considered (to pass the course you have to hand in all assignments for each week). 