# NEXTdrawings TripletEmbeddingsAssembly
-  Karl Rosengren, PI; Heather Kirkorian, PI; Tim Rogers, PI;  University of Wisconsin – Madison
-  Clint Jensen, Graduate Student, University of Wisconsin – Madison

This script takes dimensional embedding data split over multiple .csv files and creates one file for visualizatons and graphs. Additionally, this script gathers the metadata .csv files for Study 1 and 3 so that additional factors of interest can be used to predict drawing outcomes (rank order and MDS position within an embedding). 

NEXT is an online suite of algorithms that allow for the presentation of stimuli (images, text, videos), so that participants can make judgments providing group level data about the set of stimuli presented. Additional tools within NEXT allow for the visualization of that data into representations, such as multidimensional spaces or rank order graphs. NEXT is hosted on AWS as an Amazon Machine Image (AMI). Participants for this project were Amazon MTurk workers. Each MTurk worker was asked to make 200 judgements and was paid 1 dollar for what was intended to be 10 minutes of work. 

### Script uses concatenated stub.yaml file NEXTdrawings data (DAP, as well as CUBE and CYLINDER 2D and 3D Objects and Renderings) for Triplets Visualization

#### Drawing Study 1: Drawing Across Media (DAM)

* Mediums: Children and adults copied simple shapes using each of three mediums (marker on paper, stylus on tablet, finger on tablet). 

* Shape copies: They copied 4 familiar shapes (circle=CI, square=SQ, triangle=TR, cross=CR) and 2 novel shapes (novel triangle=NT, novel cross=NC) using each of the three mediums, resulting in 18 shape copies per participant (1 for each shape/medium combination)

* Draw-a-person: They also drew a person using each of the three mediums (3 person drawings per participant). 


#### Drawing Study 3: Drawing 2D and 3D Objects and Renderings 
-- Study 3 also had children draw-a-person, which produced drawings used in our overall DAP analysis 

* Mediums: Children and adults copied simple shapes using each of three mediums (marker on paper, stylus on tablet, finger on tablet). 

* One 2D rendering was assigned to the marker-paper condition and the other was assigned to the two tablet conditions (stylus and finger). The opposite shape was used for 3D object drawings (e.g., if the paper condition has 2D cube and 3D cylinder, then the two tablet conditions have 3D cube and 2D cylinder).  
 
* For Study 3 DAP images, children drew one person at the end of the study during free draw, using a medium of their choice (they first chose a surface, then the marker/stylus was placed on the table to use if they wanted). 

### Code Book for Analysis Critical Elements
* Subject IDs for children are "DAM" followed by 3-digit number (e.g., DAM001). Subject IDs for adults are "DAMa" followed by 3-digit number (e.g., DAMa001). 
* marker on paper - (_P_)
* stylus on tablet - (_T_)
* finger on tablet - (_F_)


* Participant.ID - NEXT generated ID -- different from MTurk ID
* Timestamp - Date of participation -- stub.yaml requires a very specific format, script may need adjustment given potential .csv adjustments in Excel 
* Center - Image presented at top center -- images that the other two images are compared against for similarity judgement
* Left - Image presented at bottom left
* Right - Image presented at bottom right 
* Answer - Image participants chose as most similar to Center image
* Alg.label - All set to Test initially, these are adjusted to include 90% training designation and 10% test to create a hold out set for model fit comparisons
* Response.Time..s. - Response time for individual triplet set
* Participant.ID.sub - NEXT generated ID minus experimentID
* Condition - Name given to Study within this script
* Session - NEXT generated ID experimentID

#### Setting CRAN to load packages (Might be necessary depending on how jupyter is loaded)

In [82]:
options(repos=structure(c(CRAN="https://rweb.crmda.ku.edu/cran/")))

#### Loading packages used in analysis, plots, data managment

In [83]:
haspackage = require("tidyverse") # used for data structure codes
if (haspackage==FALSE){
  install.packages("tidyverse")
    }

haspackage = require("readtext")
if (haspackage==FALSE){
  install.packages("readtext", dependencies = TRUE)
}

library("tidyverse")
library("readtext") 

# <font color='BLUE'>Bringing in NEXT Draw-A-Person (DAP) Data</font>

#### Setting R to file where Embedding data is stored

In [84]:
#PC - must map network Z drive to rogerslab.drive.wisc.edu first
#setwd("Z:/NEXT/NEXTdrawings/Data/NEXTdrawingsData/TripletEmbeddings/2019_11_05stub.yamlStyle_DAP_triplets/1")

#Mac - must connect to rogerslab.drive.wisc.edu first
#setwd("/Volumes/rogerslab/NEXT/NEXTdrawings/Data/NEXTdrawingsData/TripletEmbeddings/2019_11_05stub.yamlStyle_DAP_triplets/1")
setwd("/Volumes/rogerslab/NEXT/NEXTdrawings/Data/NEXTdrawingsData/TripletEmbeddings/2020_10_27stub.yamlStyle_DAP_triplets/1")

#laptop
#setwd("/Users/clintjensen/NEXT/NEXTdrawingsDownloads/NEXTdrawingsData/TripletEmbeddings/2019_11_05stub.yamlStyle_DAP_triplets/1")
#setwd("/Users/clintjensen/NEXT/NEXTdrawingsDownloads/NEXTdrawingsData/TripletEmbeddings/2020_10_27stub.yamlStyle_DAP_triplets/1")

In [85]:
Server_data <- list.files(pattern=".", full.names = TRUE)

In [86]:
Server_data

In [87]:
loss <- read.csv("loss.csv", header=FALSE)
model <- read.csv("model.csv", header=FALSE)
labels <- read.delim("labels.txt", header = FALSE, sep = "\t", dec = ".")

In [88]:
loss
head(labels)
head(model)

V1,V2
<dbl>,<dbl>
0.2694422,0.3009471


V1
<fct>
DAM001_F_DAPedit
DAM001_P_DAPedit
DAM001_T_DAPedit
DAM002_F_DAPedit
DAM002_P_DAPedit
DAM002_T_DAPedit


V1,V2
<dbl>,<dbl>
-0.26757699,-0.12373534
-0.02132229,-0.94671753
-0.50324862,-0.07551372
-0.45441354,-0.43675412
-0.54208839,-0.73267946
-0.2809104,-0.37405575


In [89]:
labels <- labels %>% rename(image = V1)

In [90]:
model <- model %>% rename(DAP_x = V1, DAP_y = V2)

In [91]:
DAP_NEXT_TripletsXYdata <- cbind(labels,model)

#### Checking data file

In [92]:
head(DAP_NEXT_TripletsXYdata)

image,DAP_x,DAP_y
<fct>,<dbl>,<dbl>
DAM001_F_DAPedit,-0.26757699,-0.12373534
DAM001_P_DAPedit,-0.02132229,-0.94671753
DAM001_T_DAPedit,-0.50324862,-0.07551372
DAM002_F_DAPedit,-0.45441354,-0.43675412
DAM002_P_DAPedit,-0.54208839,-0.73267946
DAM002_T_DAPedit,-0.2809104,-0.37405575


#### Fixing image ids for a few image labels that were missing their media type or had a '-' instead of a '_' resulting in '%20' being inserted by NEXT

In [93]:
DAP_NEXT_TripletsXYdata$image<- str_replace(DAP_NEXT_TripletsXYdata$image, "DAMa015_DAPedit", "DAMa015_T_DAPedit")
DAP_NEXT_TripletsXYdata$image<- str_replace(DAP_NEXT_TripletsXYdata$image, "DAM3D026_DAPedit", "DAM3D026_P_DAPedit")
DAP_NEXT_TripletsXYdata$image<- str_replace(DAP_NEXT_TripletsXYdata$image, "DAM3D058_P%20_DAPedit", "DAM3D058_P_DAPedit")
DAP_NEXT_TripletsXYdata$image<- str_replace(DAP_NEXT_TripletsXYdata$image, "DAM3D060_P%20_DAPedit", "DAM3D060_P_DAPedit")
DAP_NEXT_TripletsXYdata$image<- str_replace(DAP_NEXT_TripletsXYdata$image, "DAM3D069_P%20_DAPedit", "DAM3D069_P_DAPedit")

In [94]:
#Removing image that is blank
DAP_NEXT_TripletsXYdata<- DAP_NEXT_TripletsXYdata %>%
  filter(!((image == "DAM072_P_DAPedit")))

In [95]:
# separating data based on underscore used in distinguishing common elements among variables
#separate( ) function: #Objective - Splitting a single variable into two
DAP_NEXT_TripletsXYdata <- separate(DAP_NEXT_TripletsXYdata, image, 
                            c("id", "DAP_mediaType","imageLabel"), sep = "_", remove = TRUE, convert = FALSE)

In [96]:
DAP_NEXT_TripletsXYdata <- select(DAP_NEXT_TripletsXYdata, id, DAP_mediaType, DAP_x, DAP_y)

In [113]:
head(DAP_NEXT_TripletsXYdata)

id,DAP_mediaType,DAP_x,DAP_y
<chr>,<chr>,<dbl>,<dbl>
DAM001,F,-0.26757699,-0.12373534
DAM001,P,-0.02132229,-0.94671753
DAM001,T,-0.50324862,-0.07551372
DAM002,F,-0.45441354,-0.43675412
DAM002,P,-0.54208839,-0.73267946
DAM002,T,-0.28091040,-0.37405575
DAM003,F,-0.35753167,-0.19157602
DAM003,P,-0.18081694,-0.35677236
DAM003,T,-0.31257299,-0.14542227
DAM004,F,-0.29255056,0.05899558


In [98]:
# how many total participants based on asq, Study and mediaType?
 aggregate(id ~ id +  DAP_mediaType, data=DAP_NEXT_TripletsXYdata , FUN=function(x) length(unique(x)))

DAP_mediaType,id
<chr>,<int>
F,102
P,89
T,114


#### Export data file

In [99]:
#PC - must map network Z drive to rogerslab.drive.wisc.edu first

#Mac - must connect to rogerslab.drive.wisc.edu first
setwd("/Volumes/rogerslab/NEXT/NEXTdrawings/Data/NEXTdrawingsData/EmbeddingData")

#laptop
#setwd("/Users/clintjensen/NEXT/NEXTdrawingsDownloads/Data/NEXTdrawingsData/EmbeddingData")

In [100]:
write.table(DAP_NEXT_TripletsXYdata, "DAP_NEXT_TripletsXYdata_2020_10_27.csv", sep=",", col.names=T, row.names=F)

# <font color='blue'>Bringing in NEXT Cube Data</font>

#### Setting R to file where Embedding data is stored

In [101]:
#PC - must map network Z drive to rogerslab.drive.wisc.edu first
#setwd("Z:/NEXT/NEXTdrawings/Data/NEXTdrawingsData/TripletEmbeddings/2019_10_30stub.yamlStyle_CUBE_triplets/1")

#Mac - must connect to rogerslab.drive.wisc.edu first
setwd("/Volumes/rogerslab/NEXT/NEXTdrawings/Data/NEXTdrawingsData/TripletEmbeddings/2019_10_30stub.yamlStyle_CUBE_triplets/1")

#laptop
#setwd("/Users/clintjensen/NEXT/NEXTdrawingsDownloads/Data/NEXTdrawingsData/TripletEmbeddings/2019_10_30stub.yamlStyle_CUBE_triplets/1")

In [102]:
Server_data <- list.files(pattern=".", full.names = TRUE)

In [103]:
Server_data

In [104]:
loss <- read.csv("loss.csv", header=FALSE)
labels <- read.delim("labels.txt", header = FALSE, sep = "\t", dec = ".")
model <- read.csv("model.csv", header=FALSE)

In [105]:
loss
head(labels)
head(model)

V1,V2
<dbl>,<dbl>
0.2643133,0.2824011


V1
<fct>
CUA_3D_Cube_edit
CUD_2D_Cube_edit
DAM3D001_P_CUAedit
DAM3D002_F_CUDedit
DAM3D002_P_CUAedit
DAM3D002_T_CUDedit


V1,V2
<dbl>,<dbl>
-0.4575054,-1.0960024
-0.5322104,-0.85375702
-0.5028356,-0.06529614
-0.5140222,-0.06359127
-0.5326146,-0.43417944
-0.9392444,-0.28619059


In [106]:
labels <- labels %>% rename(image = V1)

In [107]:
model <- model %>% rename(Cube_x = V1, Cube_y = V2)

In [108]:
Cube_NEXT_TripletsXYdata <-  cbind(labels,model)

#### Checking data file

In [109]:
head(Cube_NEXT_TripletsXYdata)

image,Cube_x,Cube_y
<fct>,<dbl>,<dbl>
CUA_3D_Cube_edit,-0.4575054,-1.0960024
CUD_2D_Cube_edit,-0.5322104,-0.85375702
DAM3D001_P_CUAedit,-0.5028356,-0.06529614
DAM3D002_F_CUDedit,-0.5140222,-0.06359127
DAM3D002_P_CUAedit,-0.5326146,-0.43417944
DAM3D002_T_CUDedit,-0.9392444,-0.28619059


In [110]:
# separating data based on underscore used in distinguishing common elements among variables
#separate( ) function: #Objective - Splitting a single variable into two
Cube_NEXT_TripletsXYdata <- separate(Cube_NEXT_TripletsXYdata, image, 
                            c("id", "Cube_mediaType","imageLabel"), sep = "_", remove = TRUE, convert = FALSE)

“Expected 3 pieces. Additional pieces discarded in 2 rows [1, 2].”

#### Explanation for Warning message: “Expected 3 pieces. Additional pieces discarded in 2 rows [1, 2].”
The images CUA and CUD are the stimuli used during the task, thus not produced by participants. 
For this reason their names differ from the common structure (they include '_edit' after the first two values).
Shown below in the output is how they are fit into that structure after this adjustment (the '_edit' is discarded).

In [111]:
Cube_NEXT_TripletsXYdata <- select(Cube_NEXT_TripletsXYdata, id, Cube_mediaType, Cube_x, Cube_y)

In [112]:
head(Cube_NEXT_TripletsXYdata)

id,Cube_mediaType,Cube_x,Cube_y
<chr>,<chr>,<dbl>,<dbl>
CUA,3D,-0.4575054,-1.0960024
CUD,2D,-0.5322104,-0.85375702
DAM3D001,P,-0.5028356,-0.06529614
DAM3D002,F,-0.5140222,-0.06359127
DAM3D002,P,-0.5326146,-0.43417944
DAM3D002,T,-0.9392444,-0.28619059


In [114]:
# how many total participants based on id and mediaType?
aggregate(id ~ id +  Cube_mediaType , data=Cube_NEXT_TripletsXYdata , FUN=function(x) length(unique(x)))

Cube_mediaType,id
<chr>,<int>
2D,1
3D,1
F,51
P,65
T,51


#### Export data file

In [115]:
#PC - must map network Z drive to rogerslab.drive.wisc.edu first

#Mac - must connect to rogerslab.drive.wisc.edu first
setwd("/Volumes/rogerslab/NEXT/NEXTdrawings/Data/NEXTdrawingsData/EmbeddingData")

#laptop
#setwd("/Users/clintjensen/NEXT/NEXTdrawingsDownloads/Data/NEXTdrawingsData/EmbeddingData")

In [116]:
#write.csv(Cube_NEXT_TripletsXYdata , file = "Cube_NEXT_TripletsXYdata.csv")

write.table(Cube_NEXT_TripletsXYdata, "Cube_NEXT_TripletsXYdata.csv", sep=",", col.names=T, row.names=F)

# <font color='blue'>  Bringing in NEXT Cylinder (CYL) Data </font>

#### Setting R to file where Embedding data is stored

In [117]:
#PC - must map network Z drive to rogerslab.drive.wisc.edu first
#setwd("Z:/NEXT/NEXTdrawings/Data/NEXTdrawingsData/TripletEmbeddings/2019_11_01stub.yamlStyle_CYLINDER_triplets/1")

#Mac - must connect to rogerslab.drive.wisc.edu first
setwd("/Volumes/rogerslab/NEXT/NEXTdrawings/Data/NEXTdrawingsData/TripletEmbeddings/2019_11_01stub.yamlStyle_CYLINDER_triplets/1")

#laptop
#setwd("/Users/clintjensen/NEXT/NEXTdrawingsDownloads/Data/NEXTdrawingsData/TripletEmbeddings/2019_11_01stub.yamlStyle_CYLINDER_triplets/1")

In [118]:
Server_data <- list.files(pattern=".", full.names = TRUE)

In [119]:
Server_data

In [120]:
loss <- read.csv("loss.csv", header=FALSE)
model <- read.csv("model.csv", header=FALSE)
labels <- read.delim("labels.txt", header = FALSE, sep = "\t", dec = ".")

In [122]:
loss
head(labels)
head(model)

V1,V2
<dbl>,<dbl>
0.2550747,0.249248


V1
<fct>
CYA_3D_Cylinder_edit
CYD_2D_Cylinder_edit
DAM3D001_F_CYAedit
DAM3D001_P_CYDedit
DAM3D001_T_CYAedit
DAM3D002_F_CYAedit


V1,V2
<dbl>,<dbl>
-0.16907949,-0.8764045
0.02231006,-0.9875438
-0.68576273,-0.1298833
-0.27881001,-0.1230406
-0.41527247,-0.1129261
-0.4927101,-0.3213007


In [123]:
labels <- labels %>% rename(image = V1)

In [124]:
model <- model %>% rename(CYL_x = V1, CYL_y = V2)

In [125]:
CYL_NEXT_TripletsXYdata <- cbind(labels,model)

#### Checking data file

In [126]:
head(CYL_NEXT_TripletsXYdata)

image,CYL_x,CYL_y
<fct>,<dbl>,<dbl>
CYA_3D_Cylinder_edit,-0.16907949,-0.8764045
CYD_2D_Cylinder_edit,0.02231006,-0.9875438
DAM3D001_F_CYAedit,-0.68576273,-0.1298833
DAM3D001_P_CYDedit,-0.27881001,-0.1230406
DAM3D001_T_CYAedit,-0.41527247,-0.1129261
DAM3D002_F_CYAedit,-0.4927101,-0.3213007


In [127]:
# separating data based on underscore used in distinguishing common elements among variables
#separate( ) function: #Objective - Splitting a single variable into two
CYL_NEXT_TripletsXYdata <- separate(CYL_NEXT_TripletsXYdata, image, 
                            c("id", "CYL_mediaType","imageLabel"), sep = "_", remove = TRUE, convert = FALSE)

“Expected 3 pieces. Additional pieces discarded in 2 rows [1, 2].”

#### Explanation for Warning message: “Expected 3 pieces. Additional pieces discarded in 2 rows [1, 2].”
The images CYA and CYD are the stimuli used during the task, thus not produced by participants. 
For this reason their names differ from the common structure (they include '_edit' after the first two values).
Shown below in the output is how they are fit into that structure after this adjustment (the '_edit' is discarded).

In [128]:
CYL_NEXT_TripletsXYdata <- select(CYL_NEXT_TripletsXYdata, id, CYL_mediaType, CYL_x, CYL_y)

In [129]:
head(CYL_NEXT_TripletsXYdata)

id,CYL_mediaType,CYL_x,CYL_y
<chr>,<chr>,<dbl>,<dbl>
CYA,3D,-0.16907949,-0.8764045
CYD,2D,0.02231006,-0.9875438
DAM3D001,F,-0.68576273,-0.1298833
DAM3D001,P,-0.27881001,-0.1230406
DAM3D001,T,-0.41527247,-0.1129261
DAM3D002,F,-0.4927101,-0.3213007


In [130]:
# how many total participants based on id and mediaType?
 aggregate(id ~ id +  CYL_mediaType , data=CYL_NEXT_TripletsXYdata , FUN=function(x) length(unique(x)))

CYL_mediaType,id
<chr>,<int>
2D,1
3D,1
F,59
P,65
T,56


#### Export data file

In [131]:
#PC - must map network Z drive to rogerslab.drive.wisc.edu first

#Mac - must connect to rogerslab.drive.wisc.edu first
setwd("/Volumes/rogerslab/NEXT/NEXTdrawings/Data/NEXTdrawingsData/EmbeddingData")

#laptop
#setwd("/Users/clintjensen/NEXT/NEXTdrawingsDownloads/Data/NEXTdrawingsData/EmbeddingData")

In [132]:
#write.csv(CYL_NEXT_TripletsXYdata , file = "CYL_NEXT_TripletsXYdata.csv")

write.table(CYL_NEXT_TripletsXYdata, "CYL_NEXT_TripletsXYdata.csv", sep=",", col.names=T, row.names=F)