In [1]:
import pandas as pd

# Idea

The test data is a bunch of events (which may include assessments) followed by the start of an assessement, for which we have to predict the number of attempts. 

The train data has a bunch of events (which may include assessments) and we have the full data for all of the assessments.

This means that each observation is a collection of events up to the start of an assessment, and the labels are calculated by what happens during that assessment. So each installation id in train gives us a bunch of data points (one for each assessment).

We should drop data from installation_ids which do not have assessments.

In [2]:
train = pd.read_pickle('../data/processed/train.pkl')
train.head(3)

Unnamed: 0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type,world
4152505,27253bdc,5f9ff9bf9350a7ef,2019-07-23T02:12:17.279Z,"{""event_code"": 2000, ""event_count"": 1}",5b826029,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE
4152506,27253bdc,9fc66af070776a8d,2019-07-23T02:12:53.427Z,"{""event_code"": 2000, ""event_count"": 1}",5b826029,1,2000,0,Tree Top City - Level 1,Clip,TREETOPCITY
4152507,27253bdc,0f1889a6816bd427,2019-07-23T02:14:27.038Z,"{""event_code"": 2000, ""event_count"": 1}",5b826029,1,2000,0,Ordering Spheres,Clip,TREETOPCITY


We want to be able to recreate this information ourselves... this is because we're eventually going to have to calculate it for stuff in the test data.

In [14]:
labels = pd.read_csv('../data/raw/train_labels.csv')
labels.query("installation_id == '0006a69f'")

Unnamed: 0,game_session,installation_id,title,num_correct,num_incorrect,accuracy,accuracy_group
0,6bdf9623adc94d89,0006a69f,Mushroom Sorter (Assessment),1,0,1.0,3
1,77b8ee947eb84b4e,0006a69f,Bird Measurer (Assessment),0,11,0.0,0
2,901acc108f55a5a1,0006a69f,Mushroom Sorter (Assessment),1,0,1.0,3
3,9501794defd84e4d,0006a69f,Mushroom Sorter (Assessment),1,1,0.5,2
4,a9ef3ecb3d1acc6a,0006a69f,Bird Measurer (Assessment),1,0,1.0,3


In [4]:
labels.query("game_session == 'a9ef3ecb3d1acc6a'")

Unnamed: 0,game_session,installation_id,title,num_correct,num_incorrect,accuracy,accuracy_group
4,a9ef3ecb3d1acc6a,0006a69f,Bird Measurer (Assessment),1,0,1.0,3


For all assessments except bird measurer, attempts are captured with code 4100. Bird measurer uses 4110. The corresponding event_data contains one of 

- "correct":true
- "correct":false

to indicate whether the attempt was successful.

However, Bird measurer also has "correct":true and stuff in code 4100 so we must remove it.

Below is an example of all attempts by one particular installation. Pre-calculated labels for it are shown above.

In [8]:
train.query("installation_id == '0006a69f' and type == 'Assessment' and event_code in [4100, 4110]")

Unnamed: 0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type,world,attempt,correct
2228,25fa8af4,901acc108f55a5a1,2019-08-06T05:22:32.357Z,"{""correct"":true,""stumps"":[1,2,4],""event_count""...",0006a69f,44,4100,31011,Mushroom Sorter (Assessment),Assessment,TREETOPCITY,True,True
2709,17113b36,77b8ee947eb84b4e,2019-08-06T05:35:54.898Z,"{""correct"":false,""caterpillars"":[11,8,3],""even...",0006a69f,29,4110,35771,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2715,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:01.927Z,"{""correct"":false,""caterpillars"":[11,8,11],""eve...",0006a69f,35,4110,42805,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2720,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:06.512Z,"{""correct"":false,""caterpillars"":[11,8,5],""even...",0006a69f,40,4110,47388,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2725,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:09.739Z,"{""correct"":false,""caterpillars"":[11,8,7],""even...",0006a69f,45,4110,50605,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2730,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:13.951Z,"{""correct"":false,""caterpillars"":[11,8,4],""even...",0006a69f,50,4110,54822,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2733,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:17.407Z,"{""correct"":false,""caterpillars"":[11,8,4],""even...",0006a69f,53,4110,58280,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2738,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:21.390Z,"{""correct"":false,""caterpillars"":[11,8,2],""even...",0006a69f,58,4110,62256,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2743,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:26.296Z,"{""correct"":false,""caterpillars"":[11,8,1],""even...",0006a69f,63,4110,67164,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2750,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:32.187Z,"{""correct"":false,""caterpillars"":[11,8,1],""even...",0006a69f,70,4110,73056,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False


Add binary column specifying whether event is an attempt or not

In [15]:
train['attempt'] = train.type == "Assessment" & ((train.event_code == 4100) & (train.title != 'Bird Measurer (Assessment)') | ((train.event_code == 4110) & (train.title == 'Bird Measurer (Assessment)')))
train.head(2)

TypeError: cannot compare a dtyped [bool] array with a scalar of type [bool]

Add binary column specifying whether event is correct. True means success, false means fail, and NaN on all other rows.

In [7]:
train.loc[train.attempt, 'correct'] = train.loc[train.attempt].event_data.str.contains('"correct":true')
train

Unnamed: 0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type,world,attempt,correct
4152505,27253bdc,5f9ff9bf9350a7ef,2019-07-23T02:12:17.279Z,"{""event_code"": 2000, ""event_count"": 1}",5b826029,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE,False,
4152506,27253bdc,9fc66af070776a8d,2019-07-23T02:12:53.427Z,"{""event_code"": 2000, ""event_count"": 1}",5b826029,1,2000,0,Tree Top City - Level 1,Clip,TREETOPCITY,False,
4152507,27253bdc,0f1889a6816bd427,2019-07-23T02:14:27.038Z,"{""event_code"": 2000, ""event_count"": 1}",5b826029,1,2000,0,Ordering Spheres,Clip,TREETOPCITY,False,
4152508,27253bdc,eb119170f7cf8826,2019-07-23T02:37:51.719Z,"{""event_code"": 2000, ""event_count"": 1}",5b826029,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE,False,
3926440,27253bdc,be15f0d9402d5900,2019-07-23T14:38:25.256Z,"{""event_code"": 2000, ""event_count"": 1}",55ef8814,1,2000,0,Welcome to Lost Lagoon!,Clip,NONE,False,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9850786,363c86c9,55f34118ea63aa9c,2019-10-22T17:41:43.854Z,"{""source"":""resources"",""coordinates"":{""x"":774,""...",dbecd01e,37,4035,33581,Bug Measurer (Activity),Activity,TREETOPCITY,False,
9850787,0a08139c,55f34118ea63aa9c,2019-10-22T17:41:43.855Z,"{""description"":""Let's put this bug back where ...",dbecd01e,38,3010,33581,Bug Measurer (Activity),Activity,TREETOPCITY,False,
9850788,e79f3763,55f34118ea63aa9c,2019-10-22T17:41:44.052Z,"{""bug"":""grassHopper"",""source"":""resources"",""coo...",dbecd01e,39,4030,33741,Bug Measurer (Activity),Activity,TREETOPCITY,False,
9850789,363c86c9,55f34118ea63aa9c,2019-10-22T17:41:45.786Z,"{""source"":""resources"",""coordinates"":{""x"":527,""...",dbecd01e,40,4035,35558,Bug Measurer (Activity),Activity,TREETOPCITY,False,


In [12]:
train.query("installation_id == '0006a69f' and attempt")

Unnamed: 0,event_id,game_session,timestamp,event_data,installation_id,event_count,event_code,game_time,title,type,world,attempt,correct
2228,25fa8af4,901acc108f55a5a1,2019-08-06T05:22:32.357Z,"{""correct"":true,""stumps"":[1,2,4],""event_count""...",0006a69f,44,4100,31011,Mushroom Sorter (Assessment),Assessment,TREETOPCITY,True,True
2308,14de4c5d,80d34a30c2998653,2019-08-06T05:24:50.323Z,"{""distance"":10,""target_distances"":[5,6,7,8,9,1...",0006a69f,76,4100,114370,Air Show,Game,TREETOPCITY,True,True
2335,14de4c5d,80d34a30c2998653,2019-08-06T05:25:11.292Z,"{""distance"":9,""target_distances"":[5,6,7],""corr...",0006a69f,103,4100,135341,Air Show,Game,TREETOPCITY,True,False
2375,14de4c5d,80d34a30c2998653,2019-08-06T05:25:37.207Z,"{""distance"":3,""target_distances"":[5,6,7],""corr...",0006a69f,143,4100,161258,Air Show,Game,TREETOPCITY,True,False
2409,14de4c5d,80d34a30c2998653,2019-08-06T05:26:01.055Z,"{""distance"":8,""target_distances"":[5,6,7],""corr...",0006a69f,177,4100,185103,Air Show,Game,TREETOPCITY,True,False
2709,17113b36,77b8ee947eb84b4e,2019-08-06T05:35:54.898Z,"{""correct"":false,""caterpillars"":[11,8,3],""even...",0006a69f,29,4110,35771,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2715,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:01.927Z,"{""correct"":false,""caterpillars"":[11,8,11],""eve...",0006a69f,35,4110,42805,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2720,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:06.512Z,"{""correct"":false,""caterpillars"":[11,8,5],""even...",0006a69f,40,4110,47388,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2725,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:09.739Z,"{""correct"":false,""caterpillars"":[11,8,7],""even...",0006a69f,45,4110,50605,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
2730,17113b36,77b8ee947eb84b4e,2019-08-06T05:36:13.951Z,"{""correct"":false,""caterpillars"":[11,8,4],""even...",0006a69f,50,4110,54822,Bird Measurer (Assessment),Assessment,TREETOPCITY,True,False
