-
Notifications
You must be signed in to change notification settings - Fork 0
/
testableScript.Rmd
184 lines (124 loc) · 7.77 KB
/
testableScript.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
title: "Testable Script"
author: "Jonathan Phillips"
date: "2/18/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Contact info
If you have questions or problems, email me (`jonathan.s.phillips@dartmouth.edu`) or create a fork with the proposed changes to the github repo: https://github.com/phillipsjs/testable
## Summary
Testable stores data files for each individual participant separately and lets you download the entrire data set as a zipped folder. This script takes those individual participant-level files and merges them to create two new combined files:
(1) a single .csv file with only the participants' demographic information
(2) a single .csv file that combines all trial-level data from each participant along with their demographic information
## Downloading the data
First download and unzip the folder of results files from testable.
Next, you'll put this .Rmd file in the same folder that contains the folder with the now unzipped data files.
## Importing the data
First, you need to set the working directory in this chunk to the *unzipped* folder containing your results files. We'll also clear out the environment in R just in case there are any existing objects that may cause conflicts.
Then, in this chuck, we are creating a list of the names of all of the files you just downloaded then importing them each as a separate data frame. Depending on the number of files you have an the size of each file, this can take a second.
```{r importData, message=FALSE}
## set the directory to the folder of your data
setwd("exampleData")
## clear out the environment
rm(list=ls())
files <- list.files()
for(i in 1:length(files)) assign(files[i], read.csv(files[i],header=F,stringsAsFactors=F,na.strings=c("", "NA")))
```
## Creating a separate demographics file
Testable writes each participant's demographic information in the top two rows of that participants' results file. Here's an example, where the top two rows are the participant's demographic info and the trial-level is stored below.
```{r exampleData}
get(files[1])[1:2,c(1:9,12:15)] ## this only shows a selection of the columns for the first rows
```
Sometimes it's helpful to have a separate demographics data frame containing only the demographic information for the participants. The next chunk of code creates such a demographic data frame.
```{r demographics}
## start by building the data frame with the right column names
r <- get(files[1]) # take the first file as an example
descriptN <- as.character(r[1,][!is.na(r[1,])]) # get names for demographics
demos <- data.frame(r[2,]) # get first participant's demographic data
colnames(demos) <- descriptN # rename columns to demographic variable names
for(i in 2:length(files)){
r <- get(files[i]) # retrieve each other file separately
descript <- data.frame(r[2,]) # get that participant's demographic data
colnames(descript) <- descriptN # rename columns to demographic variable names
demos <- rbind(demos,descript) # add each participant to the growing data frame
}
demos <- demos[,!is.na(names(demos))] ## remove any columns with no data
```
Here's what the first part of this demographics file will look like:
```{r demoPrint}
print(demos[1:5,1:5]) ## only showing the first several columns and rows
```
We will save the demographics data to a .csv file that can be easily imported into a separate analysis script.
Note that this will be saved in the same folder as this .Rmd file!
```{r writeDemos}
write.csv(demos,file="demographicData.csv", row.names=F) #write data file ## You'll likely want to rename this
```
## Rewriting demographic information to the trial-level data
This next bit of the script rewrites the demographic information to every row containing that participants' data. I find this is easier especially if you end up modeling participant-level data in analyzing participants' responses.
``` {r rewriteDemos}
for(i in 1:length(files)){
r <- get(files[i]) # retrieve each file separately
descriptN <- as.character(r[1,][!is.na(r[1,])]) # get the names of the participant decriptors for this file
descript <- data.frame(r[2,]) # get the values for each of the descriptors
colnames(descript) <- descriptN # write the descriptor names as column names
variablesN <- as.character(r[4,]) # get the names of the data fields for this file
w <- length(c(variablesN,descriptN)) # get the total number of columns needed (width of the data)
l <- length(r[-c(1:4),1]) # get the number of trials completed (length of the data)
d <- data.frame(matrix(ncol=w,nrow=l)) # build a new dataframe with these widths and lengths
colnames(d) <- c(variablesN,descriptN) # set the column names to so that data fields come first and then the descriptors
d[,variablesN] <- r[-c(1:4),] # write the trial-level data to the dataframe
d[,descriptN] <- descript[1,descriptN] # write the descriptor-level data to each trial
d <- d[,unique(colnames(d))] # remove any redundant columns,e.g., subjectGroup gets written to both trial-level and participant-level datafields
assign(files[i],d) # rewrite the file with the descriptor info appended to each trial
}
```
Here's an example of the rewritten data frame:
```{r exampleData2}
print(get(files[1])[1:5,c(1:8,24:29)]) ## Note that I'm only showing some of the data here to give a picture
```
## Combining the data
This next part takes those files and combines them into a single long-format data frame. Doing this is made slightly complicated by the fact that some results files may have different columns than other results files and some participants may have completed more trials than other particpiants. The solution implemented here is to have a single data frame that contains all possible columns, and each participant's data is written only to the subset of that data frame that data were recorded for.
```{r }
allNames <- c() # create vector to store all names used
allLengths <- c() # create vector to store all lengths observed
for(i in 1:length(files)){ # do this for each file separately
n <- colnames(get(files[i])) # get the column names for that file
allNames <- unique(c(n,allNames)) # append any unique column names to the vector of names
l <- length(get(files[i])[,1]) # get the length (number of trials) for that file
allLengths <- c(allLengths,l) # append the length of that file to the vector of lengths
}
totalRows <- sum(allLengths) # calculate the total number of trials across all data
# create vector of rows where each participants' data ends
stops <- allLengths
for(n in 1:length(allLengths)){
stops[n] <- sum(allLengths[0:n])
}
# create vector of rows where each participants' data starts
starts <- allLengths
for(n in 1:length(allLengths)){
starts[n] <- sum(allLengths[0:(n-1)])+1
}
# build the dataframe to contain all of the data from individual files
d <- data.frame(matrix(ncol=length(allNames),nrow=totalRows))
colnames(d) <- allNames
# write each individual participants' data to the relevant subset of that dataframe
for(i in 1:length(files)){
r <- get(files[i])
names <- colnames(r)
start <- starts[i]
stop <- stops[i]
d[c(start:stop),names] <- r
}
## this is a warning that your participants answered uneven numbers of trials
if(length(unique(allLengths))>1){
print("Warning: You have subjects with uneven numbers of responses!")}
```
## Saving the combined data
Lastly, we will save the combined data to a .csv file that can be easily imported into a separate analysis script.
Note that this will be saved in the same folder as this .Rmd file!
```{r writeData}
write.csv(d,file="combinedData.csv", row.names=F) #write data file ## You'll likely want to rename this
```