# Explore the Data

The data format is a bit complicated. Hopefully this notebook will make it easy to understand how everything is formatted.

In [1]:
%pylab inline
import json
import glob

Populating the interactive namespace from numpy and matplotlib


The data is split in files one for each simultaneous block of games. First, we'll read in all of these blocks and add them to a list:

In [2]:
output_dir = "../results-anonymized/experiment/"
files = glob.glob(output_dir+'block_*.json')

blocks = []
for file in files:
    with open(file) as f:
        blocks.append(json.load(f))

len(blocks)

30

Each block has two entries, one for games conducted on the caveman network, and one for games conducted on the dodecahedral network. These are split by network and not by (network * condition) because I originally developed the experiment with just the two information conditions, and added the second network later on. This was the fastest (but maybe not cleanest) way to implement the extra condition.

In [3]:
blocks[-1].keys()

dict_keys(['block_15_dodec_prereg_20200702_170602', 'block_15_caveman_prereg_20200702_170602'])

Within each network condition, we have information about when the 40 players in that condition began, etc. 
I'll describe the interesting ones below:

In [4]:
blocks[-1]['block_15_dodec_prereg_20200702_170602'].keys()

dict_keys(['_id', 'finishedAt', 'gameLobbyId', 'treatmentId', 'roundIds', 'playerIds', 'batchId', 'createdAt', 'data.nodes', 'data.clues', 'data.network', 'data.gameSetupId', 'players', 'gameSetupId', 'log', 'stages'])

**data.nodes** lists which elements were selected to make up the clues

In [5]:
blocks[-1]['block_15_dodec_prereg_20200702_170602']['data.nodes']

{'CrimeScene_1': 'the Kentwood Mansion',
 'StolenObject_1': 'the antique',
 'Suspect_1': 'Edwards',
 'Suspect_2': 'Stevens',
 'Suspect_3': 'Collins',
 'Clothing_1': 'a pair of overalls',
 'Clothing_2': 'a blue denim jacket',
 'Appearance_1': 'a blonde-haired man',
 'Appearance_2': 'a partially-bald man',
 'Tool_1': 'a serrated knife',
 'Tool_2': 'a set of hex keys',
 'Vehicle_1': 'a silver BMW',
 'Vehicle_2': 'a blue Chevrolet Corvette'}

**data.clues** lists both the treatment clues (title preceeded with 't', as in 'tclue_1_2') and control clues, with the components that went into building up their structure.

In [6]:
blocks[-1]['block_15_dodec_prereg_20200702_170602']['data.clues']

{'tclue_1_2': {'id': 'tclue_1_2',
  'nodeNames': ['StolenObject_1', 'CrimeScene_1'],
  'nodes': ['the antique', 'the Kentwood Mansion'],
  'edge': '{StolenObject_1} was kept in a case at {CrimeScene_1}',
  'content': 'The antique was kept in a case at the Kentwood Mansion.'},
 'tclue_1_3': {'id': 'tclue_1_3',
  'nodeNames': ['Suspect_1', 'CrimeScene_1'],
  'nodes': ['Edwards', 'the Kentwood Mansion'],
  'edge': '{Suspect_1} was seen at {CrimeScene_1}',
  'content': 'Edwards was seen at the Kentwood Mansion.'},
 'tclue_1_4': {'id': 'tclue_1_4',
  'nodeNames': ['Suspect_2', 'CrimeScene_1'],
  'nodes': ['Stevens', 'the Kentwood Mansion'],
  'edge': '{Suspect_2} was seen at {CrimeScene_1}',
  'content': 'Stevens was seen at the Kentwood Mansion.'},
 'tclue_1_5': {'id': 'tclue_1_5',
  'nodeNames': ['Suspect_3', 'CrimeScene_1'],
  'nodes': ['Collins', 'the Kentwood Mansion'],
  'edge': '{Suspect_3} was seen at {CrimeScene_1}',
  'content': 'Collins was seen at the Kentwood Mansion.'},
 'tclu

**data.network** gives the edgelist for both conditions.

In [7]:
blocks[-1]['block_15_dodec_prereg_20200702_170602']['data.network']

{'t0': ['t1', 't19', 't10'],
 'c0': ['c1', 'c19', 'c10'],
 't1': ['t0', 't2', 't8'],
 'c1': ['c0', 'c2', 'c8'],
 't2': ['t1', 't3', 't6'],
 'c2': ['c1', 'c3', 'c6'],
 't3': ['t2', 't4', 't19'],
 'c3': ['c2', 'c4', 'c19'],
 't4': ['t3', 't5', 't17'],
 'c4': ['c3', 'c5', 'c17'],
 't5': ['t4', 't6', 't15'],
 'c5': ['c4', 'c6', 'c15'],
 't6': ['t5', 't7', 't2'],
 'c6': ['c5', 'c7', 'c2'],
 't7': ['t6', 't8', 't14'],
 'c7': ['c6', 'c8', 'c14'],
 't8': ['t7', 't9', 't1'],
 'c8': ['c7', 'c9', 'c1'],
 't9': ['t8', 't10', 't13'],
 'c9': ['c8', 'c10', 'c13'],
 't10': ['t9', 't11', 't0'],
 'c10': ['c9', 'c11', 'c0'],
 't11': ['t10', 't12', 't18'],
 'c11': ['c10', 'c12', 'c18'],
 't12': ['t11', 't13', 't16'],
 'c12': ['c11', 'c13', 'c16'],
 't13': ['t12', 't14', 't9'],
 'c13': ['c12', 'c14', 'c9'],
 't14': ['t13', 't15', 't7'],
 'c14': ['c13', 'c15', 'c7'],
 't15': ['t14', 't16', 't5'],
 'c15': ['c14', 'c16', 'c5'],
 't16': ['t15', 't17', 't12'],
 'c16': ['c15', 'c17', 'c12'],
 't17': ['t16', 't18

The **players** list contains all of the players in the two information conditions, and specific attributes about them. Keys are their anonymous Empirica-assigned ids.

In [8]:
blocks[-1]['block_15_dodec_prereg_20200702_170602']['players'].keys()

dict_keys(['v8gTpCezwhpuoeC7K', 'EL83DFL5cRm4zbsgL', '3KDanQqAh4KwyX2Ji', 'LQS6QbML6aesu7zC9', 'fBsTCcKmSTpcFiW6a', '4NbXHH25sZgLPyrAR', 'G7Wo2qwXRtPmWoGtK', 'iJAqYtyfwsS4LRpX8', 'PgCy86c2CgL4fEsm2', '2iMW9rPaC6Rd5qNhb', 'tGsPfbW28BscAnCJd', 'QvKGtgGT7LiRGsnhP', 'HvQ3ko4mqriDrFa7L', 'PfimkSPrzSBAgv2tH', 'hu2jpz97peYAh29b6', 'JG5bHQ7wignwEw3Zs', 'PkWEEMC7vkpbYenvB', 'F6NdBEobk9JJh7sFb', 'E5xwTY6BcHGyCNn2s', 'dqhvwQTJEHXBcYdTw', 'pNW9vdB9mTnp96qc7', '9BHnTBJtJa4h6ovnB', 'yzXqSW22si9XFGvS6', 'ToEBti4LASFkNNdpb', '6hw7eEwdWi4JtJN7J', 'A5BJHS5zHufbGyKia', 'aRWEMKP7fGcBijBLm', '8hKCNMoxnjCwFy6wq', 'jKKTdLcy46Adv3SqA', 'E3AxouAMJRTjERD5C', 'DqGvJBfXvWiKdtMyy', 'cSF3Y54iCiXL3M95c', 'QxtGv6BcncYXyG47e', 'ErxrhYTuLaMMnnbe3', 'WqinggFqbpsXrDi2K', 'xJzEZmH688Cs62u5T', '4nwNCTHKtsfegeiSn', 'QL5pMezwZbLjoj3Nv', 'j6hX39CQyJc2redde', 'TDuRbgfBPweBzE4F6'])

Each **player** object gives information about when the player joined the game, how far they progressed through the game, which position they took in the social network, the ids of their social network neighbors, the final state of their player notebooks (ie, the ids of the clues they have listed as promising leads or dead ends) and their responses to each of the survey questions.

In [9]:
blocks[-1]['block_15_dodec_prereg_20200702_170602']['players']['v8gTpCezwhpuoeC7K']

{'_id': 'v8gTpCezwhpuoeC7K',
 'readyAt': '2020-07-10T18:31:06.323Z',
 'exitStepsDone': ['MakeTheCase', 'ExitSurvey'],
 'exitAt': '2020-07-10T18:43:21.442Z',
 'exitStatus': 'finished',
 'createdAt': '2020-07-10T18:31:05.344Z',
 'data.index': 37,
 'data.position': 't5',
 'data.alterIDs': ['jKKTdLcy46Adv3SqA',
  'DqGvJBfXvWiKdtMyy',
  'E5xwTY6BcHGyCNn2s'],
 'data.log': [],
 'data.activity': 'active',
 'data.notebookOrder': ['promising_leads', 'dead_ends'],
 'data.notebooks': {'promising_leads': {'id': 'promising_leads',
   'title': 'Promising Leads',
   'clueIDs': ['tclue_3_4',
    'tclue_1_10',
    'tclue_7_12',
    'tclue_3_10',
    'tclue_3_7',
    'tclue_10_12',
    'tclue_1_12',
    'tclue_9_10',
    'tclue_5_12',
    'tclue_1_9',
    'tclue_6_12',
    'tclue_4_9',
    'tclue_3_12',
    'tclue_9_12']},
  'dead_ends': {'id': 'dead_ends',
   'title': 'Dead Ends',
   'clueIDs': ['tclue_2_13']}},
 'data.initialState': {'promising_leads': {'id': 'promising_leads',
   'title': 'Promising L

The **log** lists each event that the player performs. The interesting ones are "pickup" events where the player grabs a clue card and begins to drag it, and "drop" events, where they place it in  their notebooks. The source of a drag is the id of the player who exposed the individual to that clue, and the destination is the notebook (and position within that notebook) that the clue was assigned to.

In [10]:
blocks[-1]['block_15_dodec_prereg_20200702_170602']['log'][400:404]

[{'_id': 'QtcbqeMkGnpTaRYqw',
  'playerId': 'ToEBti4LASFkNNdpb',
  'gameId': 'd2N6gipGzAM2B3GPn',
  'roundId': '6msnRJjnJAxC49E3o',
  'stageId': 'PLaYc7WSL6qR47TTs',
  'name': 'drop',
  'jsonData': '{"clue":"cclue_4_8","source":"PkWEEMC7vkpbYenvB","dest":"dead_ends","destIndex":0}',
  'createdAt': '2020-07-10T18:36:10.471Z',
  'data': {'clue': 'cclue_4_8',
   'source': 'PkWEEMC7vkpbYenvB',
   'dest': 'dead_ends',
   'destIndex': 0}},
 {'_id': 'ELe59zEvyJpAsNGud',
  'playerId': '2iMW9rPaC6Rd5qNhb',
  'gameId': 'd2N6gipGzAM2B3GPn',
  'roundId': '6msnRJjnJAxC49E3o',
  'stageId': 'PLaYc7WSL6qR47TTs',
  'name': 'pickup',
  'jsonData': '{"clue":"tclue_1_5","source":"8hKCNMoxnjCwFy6wq"}',
  'createdAt': '2020-07-10T18:36:10.713Z',
  'data': {'clue': 'tclue_1_5', 'source': '8hKCNMoxnjCwFy6wq'}},
 {'_id': 'BwZZi37jX8r4s4mtS',
  'playerId': '9BHnTBJtJa4h6ovnB',
  'gameId': 'd2N6gipGzAM2B3GPn',
  'roundId': '6msnRJjnJAxC49E3o',
  'stageId': 'PLaYc7WSL6qR47TTs',
  'name': 'drop',
  'jsonData': '{"