## Seeking the unicorn datasets

*Principal Investigator / Contact person:*

Edgar J. Andrade-Lotero, Universidad del Rosario, Bogotá, Colombia,  edgar.andrade@urosario.edu.co

*In collaboration with:*

Robert L. Goldstone, Indiana University, Bloomington, Indiana, USA, rgoldsto@indiana.edu

*Date of data collection:* From 2017-11-1 to 2017-11-15

*Location of data collection:* Bloomington, Indiana, USA

*Publications using data described here:* 

[1] Andrade-Lotero, E., & Goldstone, R. L. (2019).  Self-Organized Division of Cognitive Labor.  Proceedings of the 41st Annual Conference of the Cognitive Science Society. (pp. 91-97). Montreal, Canada: Cognitive Science Society. <a href="https://cogsci.mindmodeling.org/2019/papers/0038/index.html">(pdf)</a>

[2] Andrade-Lotero, E., & Goldstone, R. L. (2021).  Self-Organized Division of Cognitive Labor. To appear in PLOS ONE.

*Datasets described here:*

* <a href="https://github.com/EAndrade-Lotero/SODCL/blob/master/Data/performances.csv">performances.csv</a>
* <a href="https://github.com/EAndrade-Lotero/SODCL/blob/master/Data/humans_only_absent.csv">humans_only_absent.csv</a>


### performances.csv

*Filename:* performances.csv

*Location:* https://github.com/EAndrade-Lotero/SODCL/blob/master/Data/performances.csv

Behavioral data of 45 dyads playing the <a href="https://www.protocols.io/view/seeking-the-unicorn-bts9nnh6">“Seek the unicorn” experiment</a>. 

*Method of data collection:* nodeGame platform. Freely available <a href="https://github.com/Slendercoder/Seeking_the_unicorn">here</a>

*Methods of data processing:* Dataset generated from raw data via a python script to create a single csv from multiple json output from nodeGame.



In [None]:
import pandas as pd

In [5]:
data = pd.read_csv('Data/performances.csv')
data.head()

Unnamed: 0,Dyad,Round,Player,Answer,Time,a11,a12,a13,a14,a15,...,b84,b85,b86,b87,b88,Score,Joint,Is_there,where_x,where_y
0,303-869,1,303-869PL1,Absent,126606,1,1,1,1,1,...,0,0,0,30,25,17,15,Unicorn_Absent,-1,-1
1,303-869,1,303-869PL2,Absent,46295,0,0,0,1,0,...,32,0,0,0,0,17,15,Unicorn_Absent,-1,-1
2,303-869,2,303-869PL1,Present,46056,1,0,0,1,1,...,30,25,26,27,2,30,2,Unicorn_Present,2,7
3,303-869,2,303-869PL2,Present,44503,0,0,0,1,0,...,4,0,0,0,0,30,2,Unicorn_Present,2,7
4,303-869,3,303-869PL1,Present,13585,1,1,0,0,0,...,0,0,0,0,0,32,0,Unicorn_Present,0,7


In [8]:
data.shape

(5400, 138)

In [9]:
data.columns

Index(['Dyad', 'Round', 'Player', 'Answer', 'Time', 'a11', 'a12', 'a13', 'a14',
       'a15',
       ...
       'b84', 'b85', 'b86', 'b87', 'b88', 'Score', 'Joint', 'Is_there',
       'where_x', 'where_y'],
      dtype='object', length=138)

*Number of variables:* 138

*Number of rows:* 5400

*Variables:*

* Dyad: name of dyad, which comes from the participants number which nodegame gives to each player 
* Round: number of round of the experiment
* Player: name of player, which comes from the name of dyad and whether the player is player 1 or 2.
* Answer: submitted answer during round, could be either “Absent” or “Present”.
* Time: number of milliseconds from start to finish of the round.
* a11 to a88: whether the tile was visited (1) or not (0). For example, if a34=1, tile on third row fourth column was visited during round.
* b11 to b88: order on which the tile was visited. For example, if b34=5, tile on third row fourth column was the fourth tile to be visited during round.
* Score: score of the round.
* Joint: number of overlapping tiles, that is, tiles visited by both players during round.
* Is_there: whether there was or not a unicorn during round.
* where_x: column where unicorn was placed, or -1 if unicorn absent.
* where_y: row where unicorn was placed, or -1 if unicorn absent.


### humans_full.csv

*Filename:* humans_only_absent.csv

*Location:* https://github.com/EAndrade-Lotero/SODCL/blob/master/Data/humans_only_absent.csv

Behavioral data keeping only rounds with unicorn absent and with additional measures (accumulated score, consistency, difference in consistency between players, DLindex, maximum similarity to focal region, best fit focal region) of 45 dyads playing the <a href="https://www.protocols.io/view/seeking-the-unicorn-bts9nnh6">“Seek the unicorn” experiment</a>. 

*Method of data collection:* Processing of `performances.csv`.

*Methods of data processing:* Dataset generated from `performances.csv` via a python script to obtain measures described in [2].


In [11]:
data = pd.read_csv('Data/humans_only_absent.csv')
data.head()

Unnamed: 0,Dyad,Round,Player,Answer,Time,a11,a12,a13,a14,a15,...,Dif_consist,DLIndex,Category,Similarity,Score_LAG1,Consistency_LEAD1,Joint_LAG1,Dif_consist_LAG1,RegionGo,Similarity_LAG1
0,140-615,1,140-615PL1,Absent,41857,0,0,1,0,0,...,,0.609375,RS,0.727273,,0.673077,,,RS,
1,140-615,9,140-615PL1,Absent,28215,0,0,0,0,0,...,0.149167,0.828125,RS,0.424242,18.0,0.0,14.0,0.249215,NOTHING,0.727273
2,140-615,10,140-615PL1,Absent,15686,0,0,0,0,0,...,0.415094,0.421875,NOTHING,1.0,27.0,1.0,5.0,0.149167,NOTHING,0.424242
3,140-615,11,140-615PL1,Absent,17704,0,0,0,0,0,...,0.613636,0.53125,NOTHING,1.0,32.0,1.0,0.0,0.415094,NOTHING,1.0
4,140-615,39,140-615PL1,Absent,26626,1,1,1,1,1,...,0.15843,0.546875,ALL,1.0,22.0,0.4375,10.0,0.684211,OUT,1.0


In [12]:
data.shape

(1244, 154)

In [16]:
data.columns[133:]

Index(['Score', 'Joint', 'Is_there', 'where_x', 'where_y', 'Is_there_LEAD',
       'Ac_Score', 'ScoreLEAD', 'Size_visited', 'Consistency',
       'Total_visited_dyad', 'Dif_consist', 'DLIndex', 'Category',
       'Similarity', 'Score_LAG1', 'Consistency_LEAD1', 'Joint_LAG1',
       'Dif_consist_LAG1', 'RegionGo', 'Similarity_LAG1'],
      dtype='object')

*Number of variables:* 154

*Number of rows:* 1244

*Variables:*

* Dyad: name of dyad, which comes from the participants number which nodegame gives to each player 
* Round: number of round of the experiment
* Player: name of player, which comes from the name of dyad and whether the player is player 1 or 2.
* Answer: submitted answer during round, could be either “Absent” or “Present”.
* Time: number of milliseconds from start to finish of the round.
* a11 to a88: whether the tile was visited (1) or not (0). For example, if a34=1, tile on third row fourth column was visited during round.
* b11 to b88: order on which the tile was visited. For example, if b34=5, tile on third row fourth column was the fourth tile to be visited during round.
* Score: score of the round.
* Joint: number of overlapping tiles, that is, tiles visited by both players during round.
* Is_there: whether there was or not a unicorn during round.
* where_x: column where unicorn was placed, or -1 if unicorn absent.
* where_y: row where unicorn was placed, or -1 if unicorn absent.
* Ac_Score: accumulated score up until round.
* ScoreLEAD: score from next round.
* Size_visited: number of tiles the player visited during round.
* Consistency: Overlapping uncovered tiles from Round $n{-}1$ to Round $n$ / Tiles uncovered in either of the two rounds.
* Total_visited_dyad: total number of tiles visited by both players during round.
* Dif_consist: difference between players' consistency during round.
* DLIndex: (Tiles uncovered by one footnotesize or both of the players -  Overlapping tiles) / Tiles in the grid.
* Category: best fitting focal region for visited tiles during round based on maximum similarity.
* Similarity: maximum similarity between visited tiles during round and the focal regions, based on $\mbox{sim}(a, b) =$ Number of tiles in both $a$ and $b$ / Number of tiles in $a$ or $b$.
* Score_LAG1: Score from previous round.
* Consistency_LEAD1: Consistency from next round.
* Joint_LAG1: number of overlapping tiles, that is, tiles visited by both players during previous round.
* Dif_consist_LAG1: difference between players' consistency during previous round.
* RegionGo: best fitting focal region for visited tiles on next round.
* Similarity_LAG1: maximum similarity between visited tiles on next round.