![image](https://py-rates.fr/assets/welcomePage/logoSmall.png)

# PyratesIA - Feature Engineering

This notebook is meant to process the raw logs generated by Pyrates into features vectors. The features are meant to capture the student interaction with Pyrates, by generating summative statistics (mean, sd...) on all possible types of behaviors available in Pyrates (e.g., copy from memo, execute solution).
- Input: data/ML_data.xlsx (the raw log files)
- Output: A nested dictionary with the game levels as the first keys ("Level1", "Level2"...), feature sets as the second keys ("sum_features_nohelp", "meansd_features_nohelp"), the feature names as the third keys ("CO_avg_while_impl", "CO_avg_string_impl"...), and the computed features as the values.

The purpose of the features and feature sets are documented in the code directly.

The outputed dictionary is exported as a pickle in "pickle/FEATURES".
Other exported pickles contain the constants used in the projects, namely pickle/FEATURES_CONSTANTS, pickle/LABELS_KEY, pickle/FEATURES_SETS_KEY, pickle/LEVELS_KEYS

## 1) Imports

In [33]:
import sys
print(sys.version)
print(sys.path)

3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)]
['c:\\Users\\Branthôme\\Google Drive\\Thèse\\Recherches\\Séquence\\MachineLearning\\PyratesIA_aied_modular\\PyratesIA', 'c:\\Users\\Branthôme\\AppData\\Local\\Programs\\Python\\Python39\\python39.zip', 'c:\\Users\\Branthôme\\AppData\\Local\\Programs\\Python\\Python39\\DLLs', 'c:\\Users\\Branthôme\\AppData\\Local\\Programs\\Python\\Python39\\lib', 'c:\\Users\\Branthôme\\AppData\\Local\\Programs\\Python\\Python39', '', 'c:\\Users\\Branthôme\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages', 'c:\\Users\\Branthôme\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\win32', 'c:\\Users\\Branthôme\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\win32\\lib', 'c:\\Users\\Branthôme\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\Pythonwin']


In [34]:
import pandas as pd
import numpy as np
from dateutil import parser
from locale import normalize
from statistics import mean
import pickle
import ast
import re

pd.options.display.max_rows = 999 #to be able to print full matrices
# from dill import dump

## 2) Constants

Define a set of constants used for processing the raw xAPI data and engineering the features.

### 2.1 Experimentation's students list

In [35]:
ELORN_E_1 = ['dCJ5QLs','MTE9Kcn','phXLQcS','b219sQC','VGk44aY','1JGuRXr','aHq6971','LUGAJ4L','bhvwsMn','drLR1FS','tpfFT4q','LwzBYJp','xGjAvY1','QMfie87','SAfuHiz']
ELORN_E_2 = ['bg8ccyh','RseCDTL','QUJ8XMJ','n8Lue7e','f242TBr','ur6bEsr','rybW57b','uvhTPQ5','XbDHeFk','nHH8igR','W4KvAaQ','iZP3yv1','gqkUHyc']

ELORN_F_1 = ['U76K7Jv','9rR71Ak','QrkubiW','pWPMxfR','BLqFpvD','Jh8Wndt','a6aYCCt','2y8TGBk','9WHRsPm','MMkyQby','fp7MQ5Q','ZX9gbZR','UxiMLJv','b4WkxX2','nnuUaP4','A8ZM66L']
ELORN_F_2 = ['tuw1qhm','rNuSCZm','UE4Tf1b','zA39HDn','r98dnSQ','mcgfhGs','mUBrQDS','n2VJTdQ','c4ULuXL','QvEbiut','w3GSSbq','KN82fs2','LjrUZqq','ej5zi1h','G7QQ5ay']

ELORN_I_1 = ['MsVugZB','jNwKFHH','CYfd944','dfu7FJY','72ES1hQ','G6gpDEr','MBjkEwf','B2Mjv4T','JyCWVWS','wdVnvFW','avAjLiu','LXgG71X','f1vVSkg','2zZuYNG','fpfruwh']
ELORN_I_2 = ['zsz7mcB','2kbDChZ','ERhr44G','ZTTFvps','THK1fjh','NZmc17R','RYx9QYw','Etjd6n8','2HbAWz6','2gABReE','7dhUBk7','Gw1Mfv9','Qh8zWTw','1CNxWqE','VgbyTfB','bd6vy2Q']

ELORN_J_1 = ['QL2sXGf','dnLkn9G','NtEzSbX','ZM8WhYr','jQJVCEk','2FL5w9z','PeGkX3n','FX75PyM','9wZ26Tr','7pYCG4M','HfqXQZL','6JDSdav','eL1Rk6q','72wayfW']
ELORN_J_2 = ['1YtLuhq','MyYnpgn','Cf9k4hR','je3V5Bf','rQLwv4v','eZrEvgC','KpLddTZ','n9jF2Sm','gADrXKJ','1jfkM1y','avUxea3','9wkfdpK','cWJMs5r','XjwabLH']

SAINT_LOUIS_2_1 = ['RQEj55k','52CWXjm','2qU9xTv','uPKK26Q','astTuAd','Zua3WB2','Vi2Rb36','54C9g49','AmfFmWQ','yHtKjj7','LbEfYjL','zKwCLGq']
SAINT_LOUIS_2_2 = ['QPLj3x6','cUdiLFR','YgQxZ4e','ebyHpN2','uSmNwZs','aEERv16','UwxfM5w','3jWiqH9','6Usc2NM','wDK9TH6','yzwPBBq','yAmrduN','xdebmpg']

SAINT_LOUIS_4_1 = ['iCBWfxp','Fw7Enr8','yXWUVNd','71fYz6c','DfPMh5j','m9RfM34','Y48DVQF','JbUbfW2','9mn1KRA','WZGwmLN','dARsDsc','Cw5Wixc','aXXNuGc','Kzibeh3','cgEFFdc']
SAINT_LOUIS_4_2 = ['8VzzQbz','p85GNLD','cFCe31R','AdQZZMK','WqM3FMN','cW2ZT7w','xpN5QZ5','F9mUzGJ','eSPcBuu','zAmLmvt','ZbWZCte','TwjMvs9','avnY5qS','PS8FRLS','Q9q9pX1']

SAINT_LOUIS_9 = ['qTRNqQ1','VzWXkp3','6jHr2Sf','B3Esnm7','dj3v8U7','Efha8cA','TwCJgTR','f3sav75','mZBPwe5','zg9vRMq','sZ1ckL9','bbQ8RNQ','AZxhDpg','T9dkP5K','5pUJnk9','XTBYghL','KU8Nn7y','xibU2tu','qgNJ6eQ','8akqupJ','x78PyGC','8yRRuAs']
SAINT_LOUIS_10 = ['gdhuFx3','VZRSAiq','Utv8Zrq','Jex4UEs','QYREsuy','75svE5F','NgBTgAn','pxSkcqf','Pnn34ks','L4shuas','cmsBycb','AT9F2T8','DFTwPFq','2EqPQVn','bY8aCmT','RMXbtKt','TH4z73V','4LAM74n','ApWm9vL','cMjJUmx']

### 2.2 Row data constants

Define the keywords used in the raw logs data generated by Pyrates.

In [36]:
# ---- Column keys ----
ID_DATA_KEY = "_id"
TYPE_DATA_KEY = "_type"
LEVEL_DATA_KEY = "_level"	
STUDENT_DATA_KEY = "_student"	
DATE_DATA_KEY = "_date"
OBJECT_ID_DATA_KEY = "_object_id"
GAME_ERROR_REASON_DATA_KEY = "_game_error_reason"
IMPLEMENTED_CONCEPTS_DATA_KEY = "_implemented_concepts"
USED_CONTROL_FUNCTIONS_DATA_KEY = "_used_control_functions"
LOST_LEVEL_DATA_KEY = "_lost_level"
GAME_PROGRESSION_DATA_KEY = "_game_progression"
DURATION_DATA_KEY = "_duration"
EXTRA_LINES_NUMBER_DATA_KEY ="_extra_lines_number"
ERROR_DATA_KEY ="_error"
GAME_TIME_DATA_KEY ="_game_time"
HELP_ORIGIN_DATA_KEY ="_help_origin"
STOPPED_LINE_DATA_KEY ="_stopped_line"
EXECUTION_SPEED_MULTIPLIER_DATA_KEY ="_execution_speed_multiplier"
EXECUTION_SPEED_CHANGED_DATA_KEY ="_execution_speed_changed"
CODE_DATA_KEY = "_code"

ALL_DATA_KEYS = [
    ID_DATA_KEY,
    TYPE_DATA_KEY ,
    LEVEL_DATA_KEY ,
    STUDENT_DATA_KEY ,
    DATE_DATA_KEY ,
    OBJECT_ID_DATA_KEY ,
    GAME_ERROR_REASON_DATA_KEY ,
    IMPLEMENTED_CONCEPTS_DATA_KEY ,
    USED_CONTROL_FUNCTIONS_DATA_KEY ,
    LOST_LEVEL_DATA_KEY ,
    GAME_PROGRESSION_DATA_KEY ,
    DURATION_DATA_KEY ,
    EXTRA_LINES_NUMBER_DATA_KEY ,
    ERROR_DATA_KEY ,
    GAME_TIME_DATA_KEY ,
    HELP_ORIGIN_DATA_KEY ,
    STOPPED_LINE_DATA_KEY ,
    EXECUTION_SPEED_MULTIPLIER_DATA_KEY ,
    EXECUTION_SPEED_CHANGED_DATA_KEY ,
    CODE_DATA_KEY
]

# ---- Column values ----

# _type values 
ASKED_TYPE ="https://py-rates.org/xAPI/verbs/asked"
CHANGED_TYPE = "https://py-rates.org/xAPI/verbs/changed"
COMPLETED_TYPE = "https://py-rates.org/xAPI/verbs/completed"
CONSULTED_TYPE = "https://py-rates.org/xAPI/verbs/consulted"
COPIED_TYPE = "https://py-rates.org/xAPI/verbs/copied"
LAUNCHED_TYPE ="https://py-rates.org/xAPI/verbs/launched"
LEAVED_TYPE ="https://py-rates.org/xAPI/verbs/leaved"
PASTED_TYPE = "https://py-rates.org/xAPI/verbs/pasted"
RECEIVED_TYPE = "https://py-rates.org/xAPI/verbs/received"
RESTARTED_TYPE = "https://py-rates.org/xAPI/verbs/restarted"
RESUMED_TYPE = "https://py-rates.org/xAPI/verbs/resumed"
STARTED_TYPE = "https://py-rates.org/xAPI/verbs/started"

# _level values
LEVEL_1 = "Level1"
LEVEL_2 = "Level2"
LEVEL_3 = "Level3"
LEVEL_4 = "Level4"
LEVEL_5 = "Level5"
LEVEL_6 = "Level6"
LEVEL_7 = "Level7"
LEVEL_8 = "Level8"
LEVELS_KEYS = [LEVEL_1,LEVEL_2,LEVEL_3,LEVEL_4,LEVEL_5,LEVEL_6,LEVEL_7,LEVEL_8]

# _object_id values
FULLY_EXECUTED_PROGRAM = "https://py-rates.org/xAPI/activities/programs/fully-executed"
SYNTACTIC_ERROR_PROGRAM = "https://py-rates.org/xAPI/activities/programs/syntactic-error"
GAME_ERROR_PROGRAM = "https://py-rates.org/xAPI/activities/programs/game-error"
USER_STOPPED_PROGRAM = "https://py-rates.org/xAPI/activities/programs/user-stopped"
LEVEL_COMPLETED_PROGRAM = "https://py-rates.org/xAPI/activities/programs/level-completed"
SEMANTIC_ERROR_PROGRAM = "https://py-rates.org/xAPI/activities/programs/semantic-error"
TOO_MANY_LINES_PROGRAM = "https://py-rates.org/xAPI/activities/programs/too-many-lines"
LEVEL_LOST_PROGRAM = "https://py-rates.org/xAPI/activities/programs/level-lost"

CODE_EDITOR_CONTENT ="https://py-rates.org/xAPI/activities/contents/code-editor"
CONTROL_FUNCTIONS_CONTENT ="https://py-rates.org/xAPI/activities/contents/control-functions"
HELP_CONTENT ="https://py-rates.org/xAPI/activities/contents/help-content"

STARTUP_OPERATION_CONTENT= "https://py-rates.org/xAPI/activities/contents/startup-operation"
STARTUP_GOAL_CONTENT= "https://py-rates.org/xAPI/activities/contents/startup-goal"
STARTUP_SAVE_CONTENT= "https://py-rates.org/xAPI/activities/contents/startup-save"

BASE_PROGRAM_CONTENT ="https://py-rates.org/xAPI/activities/contents/base-program"
BASE_ERROR_CONTENT ="https://py-rates.org/xAPI/activities/contents/base-error"
BASE_STRUCTURE_CONTENT ="https://py-rates.org/xAPI/activities/contents/base-structure"
BASE_COMMENT_CONTENT ="https://py-rates.org/xAPI/activities/contents/base-comment"

VAR_CREATION_CONTENT ="https://py-rates.org/xAPI/activities/contents/var-creation"
VAR_USAGE_CONTENT ="https://py-rates.org/xAPI/activities/contents/var-usage"
VAR_MODIFICATION_CONTENT ="https://py-rates.org/xAPI/activities/contents/var-modification"
VAR_TYPE_CONTENT ="https://py-rates.org/xAPI/activities/contents/var-type"

CONDI_1BRAN_CONTENT ="https://py-rates.org/xAPI/activities/contents/condi-1bran"
CONDI_2BRAN_CONTENT ="https://py-rates.org/xAPI/activities/contents/condi-2bran"
CONDI_3BRAN_CONTENT ="https://py-rates.org/xAPI/activities/contents/condi-3bran"

FOR_SIMPLE_CONTENT ="https://py-rates.org/xAPI/activities/contents/for-simple"
FOR_COUNTER_1_CONTENT ="https://py-rates.org/xAPI/activities/contents/for-counter-0"
FOR_COUNTER_N_CONTENT ="https://py-rates.org/xAPI/activities/contents/for-counter-n"

WHILE_SUB_CONTENT ="https://py-rates.org/xAPI/activities/contents/while-simple"

GAME_HELP ="https://py-rates.org/xAPI/activities/helps/game"
CONTROL_HELP ="https://py-rates.org/xAPI/activities/helps/control"
NOTION_HELP ="https://py-rates.org/xAPI/activities/helps/notion"
IMPLEMENTATION_HELP ="https://py-rates.org/xAPI/activities/helps/implementation"
SOLUTION_HELP ="https://py-rates.org/xAPI/activities/helps/solution"
OTHER_HELP ="https://py-rates.org/xAPI/activities/helps/other"

# Content type
STARTUP_CONTENT = "startup-content"
BASE_CONTENT = "base-content"
VAR_CONTENT = "var-content"
CONDI_CONTENT = "condi-content"
FOR_CONTENT = "for-content"
WHILE_CONTENT = "while-content"

# _game_error_reason
WALK_LOCATION_GAME_ERROR = "walk-location"
READ_MESSAGE_LOCATION_GAME_ERROR ="read-message-location"
FUNCTION_PARAMETERS_GAME_ERROR ="function_parameters"
NOT_ALLOWED_FUNCTION_GAME_ERROR ="not-allowed-function"
OPEN_CHEST_LOCATION_GAME_ERROR ="open-chest-location"
OPEN_CHEST_KEY_GAME_ERROR ="open-chest-key"

# _lost_level
SPIKES_TOUCH_LOST_LEVEL = "spikes-touch"
BARREL_EXPLOSION_LOST_LEVEL ="barrel-explosion"
PIRATE_SHOT_LOST_LEVEL ="pirate-shot"

# _implemented_concepts

VAR_AFFECTATION_CONCEPT = "var-affectation-concept"
BOOLEAN_CONCEPT = "boolean-concept"
STRING_CONCEPT = "string-concept"
IF_BRANCH_CONCEPT = "if-branch-concept"
ELIF_BRANCH_CONCEPT = "elif-branch-concept"
ELSE_BRANCH_CONCEPT = "else-branch-concept"
FOR_SIMPLE_CONCEPT = "for-simple-concept"
FOR_COUNTER_0_CONCEPT = "for-counter-0-concept"
FOR_COUNTER_N_CONCEPT = "for-counter-n-concept"
WHILE_CONCEPT = "while-concept"

DETECTED_CONCEPTS_LIST = [
    VAR_AFFECTATION_CONCEPT,
    BOOLEAN_CONCEPT,
    STRING_CONCEPT,
    IF_BRANCH_CONCEPT,
    ELIF_BRANCH_CONCEPT,
    ELSE_BRANCH_CONCEPT,
    FOR_SIMPLE_CONCEPT,
    FOR_COUNTER_0_CONCEPT,
    FOR_COUNTER_N_CONCEPT,
    WHILE_CONCEPT
]

# _used_control_functions
WALK_NAME = "avancer"
LEFT_NAME = "gauche"
RIGHT_NAME = "droite"
OPEN_NAME = "ouvrir"
JUMP_NAME = "sauter"
JUMP_HEIGHT_NAME = "sauter_hauteur"
JUMP_HIGH_NAME = "sauter_haut"
GET_HEIGHT_NAME = "mesurer_hauteur"
READ_STRING_NAME = "lire_chaine"
READ_INT_NAME = "lire_nombre"
ATTACK_NAME = "coup"
DETECT_OBSTACLE_NAME = "detecter_obstacle"
TURN_NAME = "tourner"
SHOOT_NAME = "tirer"

WALK_CTR_FCT = "walk"
LEFT_CTR_FCT = "left"
RIGHT_CTR_FCT = "right"
OPEN_CTR_FCT = "open"
JUMP_CTR_FCT = "jump"
JUMP_HEIGHT_CTR_FCT = "jump-height"
JUMP_HIGH_CTR_FCT = "jump-high"
GET_HEIGHT_CTR_FCT = "get-height"
READ_STRING_CTR_FCT = "read-string"
READ_INT_CTR_FCT = "read-int"
ATTACK_CTR_FCT = "attack"
DETECT_OBSTACLE_CTR_FCT = "detect-obstacle"
TURN_CTR_FCT = "turn"
SHOOT_CTR_FCT = "shoot"

DETECTED_CONTROL_FUNCTIONS_LIST = [
    WALK_CTR_FCT ,
    LEFT_CTR_FCT ,
    RIGHT_CTR_FCT ,
    OPEN_CTR_FCT ,
    JUMP_CTR_FCT ,
    JUMP_HEIGHT_CTR_FCT ,
    JUMP_HIGH_CTR_FCT ,
    GET_HEIGHT_CTR_FCT ,
    READ_STRING_CTR_FCT ,
    READ_INT_CTR_FCT ,
    ATTACK_CTR_FCT ,
    DETECT_OBSTACLE_CTR_FCT ,
    TURN_CTR_FCT ,
    SHOOT_CTR_FCT
]

### 2.3 ML Features constants

Define the list of feature names. 
Note that features are computed both:
- in a cumulative way by computing the *sum* of behaviors usages
- in a standardized way by taking the *rate* or the *mean/sd* of the behaviors

In [37]:
# --------------------
# --- Features ---
# --------------------

# Time spent in the level (ms)
LEVEL_TIME_SPENT_KEY="level_time_spent"


# ----------------
# Memo-related features
# ----------------

# Total display time of each memo rubric (var, loop...) in ms
BASE_DISPLAY_TOTAL_TIME_KEY = "CO_tot_base_disp_time"
VAR_DISPLAY_TOTAL_TIME_KEY = "CO_tot_var_disp_time"
CONDI_DISPLAY_TOTAL_TIME_KEY = "CO_tot_condi_disp_time"
FOR_DISPLAY_TOTAL_TIME_KEY = "CO_tot_for_disp_time"
WHILE_DISPLAY_TOTAL_TIME_KEY = "CO_tot_while_disp_time"

# Mean and std.dev display time of each memo rubric (var, loop...) in ms
BASE_DISPLAY_MEAN_TIME_KEY = "CO_avg_base_disp_time"
VAR_DISPLAY_MEAN_TIME_KEY = "CO_avg_var_disp_time"
CONDI_DISPLAY_MEAN_TIME_KEY = "CO_avg_condi_disp_time"
FOR_DISPLAY_MEAN_TIME_KEY = "CO_avg_for_disp_time"
WHILE_DISPLAY_MEAN_TIME_KEY = "CO_avg_while_disp_time"

BASE_DISPLAY_STD_TIME_KEY = "CO_std_base_disp_time"
VAR_DISPLAY_STD_TIME_KEY = "CO_std_var_disp_time"
CONDI_DISPLAY_STD_TIME_KEY = "CO_std_condi_disp_time"
FOR_DISPLAY_STD_TIME_KEY = "CO_std_for_disp_time"
WHILE_DISPLAY_STD_TIME_KEY = "CO_std_while_disp_time"

# All memo content display features together
CONTENT_DISPLAY_KEYS_TOTAL = [
    BASE_DISPLAY_TOTAL_TIME_KEY,
    VAR_DISPLAY_TOTAL_TIME_KEY,
    CONDI_DISPLAY_TOTAL_TIME_KEY,
    FOR_DISPLAY_TOTAL_TIME_KEY,
    WHILE_DISPLAY_TOTAL_TIME_KEY
]
CONTENT_DISPLAY_KEYS_MEAN = [
    BASE_DISPLAY_MEAN_TIME_KEY,
    VAR_DISPLAY_MEAN_TIME_KEY,
    CONDI_DISPLAY_MEAN_TIME_KEY,
    FOR_DISPLAY_MEAN_TIME_KEY,
    WHILE_DISPLAY_MEAN_TIME_KEY,
]
CONTENT_DISPLAY_KEYS_SD = [
    BASE_DISPLAY_STD_TIME_KEY,
    VAR_DISPLAY_STD_TIME_KEY,
    CONDI_DISPLAY_STD_TIME_KEY,
    FOR_DISPLAY_STD_TIME_KEY,
    WHILE_DISPLAY_STD_TIME_KEY
]

TIME_FEATURES_KEYS_TOTAL = [LEVEL_TIME_SPENT_KEY] + CONTENT_DISPLAY_KEYS_TOTAL


# Content copied from the memo (total + rate)
NB_CODE_EDITOR_COPIED_KEY = "CO_tot_code_editor_copied"
NB_CONTROL_FUNCTION_COPIED_KEY = "CO_tot_control_function_copied"
NB_HELP_COPIED_KEY = "CO_tot_help_copied"
NB_BASE_PROGRAM_COPIED_KEY = "CO_tot_base_program_copied"
NB_BASE_ERROR_COPIED_KEY = "CO_tot_base_error_copied"
NB_BASE_STRUCTURATION_COPIED_KEY = "CO_tot_base_structuration_copied"
NB_BASE_COMMENT_COPIED_KEY = "CO_tot_base_comment_copied"
NB_VAR_CREATION_COPIED_KEY = "CO_tot_var_creation_copied"
NB_VAR_MODIFICATION_COPIED_KEY = "CO_tot_var_modif_copied"
NB_VAR_USAGE_COPIED_KEY = "CO_tot_var_usage_copied"
NB_VAR_TYPE_COPIED_KEY = "CO_tot_var_type_copied"
NB_CONDI_1BRAN_COPIED_KEY = "CO_tot_condi_1bran_copied"
NB_CONDI_2BRAN_COPIED_KEY = "CO_tot_condi_2bran_copied"
NB_CONDI_3BRAN_COPIED_KEY = "CO_tot_condi_3bran_copied"
NB_FOR_SIMPLE_COPIED_KEY = "CO_tot_for_simple_copied"
NB_FOR_COUNTER_0_COPIED_KEY = "CO_tot_for_counter_0_copied"
NB_FOR_COUNTER_N_COPIED_KEY = "CO_tot_for_counter_n_copied"
NB_WHILE_SIMPLE_COPIED_KEY = "CO_tot_while_simple_copied"

RATE_CODE_EDITOR_COPIED_KEY = "CO_rate_code_editor_copied"
RATE_CONTROL_FUNCTION_COPIED_KEY = "CO_rate_control_function_copied"
RATE_HELP_COPIED_KEY = "CO_rate_help_copied"
RATE_BASE_PROGRAM_COPIED_KEY = "CO_rate_base_program_copied"
RATE_BASE_ERROR_COPIED_KEY = "CO_rate_base_error_copied"
RATE_BASE_STRUCTURATION_COPIED_KEY = "CO_rate_base_structuration_copied"
RATE_BASE_COMMENT_COPIED_KEY = "CO_rate_base_comment_copied"
RATE_VAR_CREATION_COPIED_KEY = "CO_rate_var_creation_copied"
RATE_VAR_MODIFICATION_COPIED_KEY = "CO_rate_var_modif_copied"
RATE_VAR_USAGE_COPIED_KEY = "CO_rate_var_usage_copied"
RATE_VAR_TYPE_COPIED_KEY = "CO_rate_var_type_copied"
RATE_CONDI_1BRAN_COPIED_KEY = "CO_rate_condi_1bran_copied"
RATE_CONDI_2BRAN_COPIED_KEY = "CO_rate_condi_2bran_copied"
RATE_CONDI_3BRAN_COPIED_KEY = "CO_rate_condi_3bran_copied"
RATE_FOR_SIMPLE_COPIED_KEY = "CO_rate_for_simple_copied"
RATE_FOR_COUNTER_0_COPIED_KEY = "CO_rate_for_counter_0_copied"
RATE_FOR_COUNTER_N_COPIED_KEY = "CO_rate_for_counter_n_copied"
RATE_WHILE_SIMPLE_COPIED_KEY = "CO_rate_while_simple_copied"
# Content past total
NB_PASTED_KEY = "CO_tot_pasted"
RATE_PASTED_KEY = "CO_rate_pasted"

CONTENT_COPIED_PASTED_KEYS_TOTAL = [
    NB_CODE_EDITOR_COPIED_KEY,
    NB_CONTROL_FUNCTION_COPIED_KEY,
    NB_BASE_PROGRAM_COPIED_KEY,
    NB_BASE_ERROR_COPIED_KEY,
    NB_BASE_STRUCTURATION_COPIED_KEY,
    NB_BASE_COMMENT_COPIED_KEY,
    NB_VAR_CREATION_COPIED_KEY,
    NB_VAR_MODIFICATION_COPIED_KEY,
    NB_VAR_USAGE_COPIED_KEY,
    NB_VAR_TYPE_COPIED_KEY,
    NB_CONDI_1BRAN_COPIED_KEY,
    NB_CONDI_2BRAN_COPIED_KEY,
    NB_CONDI_3BRAN_COPIED_KEY,
    NB_FOR_SIMPLE_COPIED_KEY,
    NB_FOR_COUNTER_0_COPIED_KEY,
    NB_FOR_COUNTER_N_COPIED_KEY,
    NB_WHILE_SIMPLE_COPIED_KEY,
    NB_PASTED_KEY    
]

HELP_COPIED_KEYS_TOTAL = [
    NB_HELP_COPIED_KEY
]

CONTENT_COPIED_PASTED_KEYS_RATE = [
    RATE_CODE_EDITOR_COPIED_KEY,
    RATE_CONTROL_FUNCTION_COPIED_KEY,
    RATE_BASE_PROGRAM_COPIED_KEY,
    RATE_BASE_ERROR_COPIED_KEY,
    RATE_BASE_STRUCTURATION_COPIED_KEY,
    RATE_BASE_COMMENT_COPIED_KEY,
    RATE_VAR_CREATION_COPIED_KEY,
    RATE_VAR_MODIFICATION_COPIED_KEY,
    RATE_VAR_USAGE_COPIED_KEY,
    RATE_VAR_TYPE_COPIED_KEY,
    RATE_CONDI_1BRAN_COPIED_KEY,
    RATE_CONDI_2BRAN_COPIED_KEY,
    RATE_CONDI_3BRAN_COPIED_KEY,
    RATE_FOR_SIMPLE_COPIED_KEY,
    RATE_FOR_COUNTER_0_COPIED_KEY,
    RATE_FOR_COUNTER_N_COPIED_KEY,
    RATE_WHILE_SIMPLE_COPIED_KEY,
    RATE_PASTED_KEY    
]

HELP_COPIED_KEYS_RATE = [
    RATE_HELP_COPIED_KEY,
]


# ----------------
# Features about the execution of the games, and the possible errors generated during the execution
# ----------------

# Game Errors total + mean + std
NB_GAME_ERROR_OPEN_CHEST_LOCATION_KEY = "GA_tot_open_chest_loc_error"
NB_GAME_ERROR_OPEN_CHEST_KEY_KEY = "GA_tot_open_chest_key_error"
NB_GAME_ERROR_READ_MESSAGE_LOCATION_KEY = "GA_tot_read_message_loc_error"
NB_GAME_ERROR_WALK_LOCATION_KEY = "GA_tot_walk_loc_error"
NB_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY = "GA_tot_not_allowed_func_error"
NB_GAME_ERROR_FUNCTION_PARAMETERS_KEY = "GA_tot_function_param_error"

MEAN_GAME_ERROR_OPEN_CHEST_LOCATION_KEY = "GA_avg_open_chest_loc_error"
MEAN_GAME_ERROR_OPEN_CHEST_KEY_KEY = "GA_avg_open_chest_key_error"
MEAN_GAME_ERROR_READ_MESSAGE_LOCATION_KEY = "GA_avg_read_message_loc_error"
MEAN_GAME_ERROR_WALK_LOCATION_KEY = "GA_avg_walk_loc_error"
MEAN_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY = "GA_avg_not_allowed_func_error"
MEAN_GAME_ERROR_FUNCTION_PARAMETERS_KEY = "GA_avg_function_param_error"

STD_GAME_ERROR_OPEN_CHEST_LOCATION_KEY = "GA_std_open_chest_loc_error"
STD_GAME_ERROR_OPEN_CHEST_KEY_KEY = "GA_std_open_chest_key_error"
STD_GAME_ERROR_READ_MESSAGE_LOCATION_KEY = "GA_std_read_message_loc_error"
STD_GAME_ERROR_WALK_LOCATION_KEY = "GA_std_walk_loc_error"
STD_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY = "GA_std_not_allowed_func_error"
STD_GAME_ERROR_FUNCTION_PARAMETERS_KEY = "GA_std_function_param_error"

ERRORS_KEYS_TOTAL = [
    NB_GAME_ERROR_OPEN_CHEST_LOCATION_KEY,
    NB_GAME_ERROR_OPEN_CHEST_KEY_KEY,
    NB_GAME_ERROR_READ_MESSAGE_LOCATION_KEY,
    NB_GAME_ERROR_WALK_LOCATION_KEY,
    NB_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY,
    NB_GAME_ERROR_FUNCTION_PARAMETERS_KEY
]

ERRORS_KEYS_MEAN = [
    MEAN_GAME_ERROR_OPEN_CHEST_LOCATION_KEY,
    MEAN_GAME_ERROR_OPEN_CHEST_KEY_KEY,
    MEAN_GAME_ERROR_READ_MESSAGE_LOCATION_KEY,
    MEAN_GAME_ERROR_WALK_LOCATION_KEY,
    MEAN_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY,
    MEAN_GAME_ERROR_FUNCTION_PARAMETERS_KEY
]

ERRORS_KEYS_STD = [
    STD_GAME_ERROR_OPEN_CHEST_LOCATION_KEY,
    STD_GAME_ERROR_OPEN_CHEST_KEY_KEY,
    STD_GAME_ERROR_READ_MESSAGE_LOCATION_KEY,
    STD_GAME_ERROR_WALK_LOCATION_KEY,
    STD_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY,
    STD_GAME_ERROR_FUNCTION_PARAMETERS_KEY
]

# Level lost total + mean + std
NB_LEVEL_LOST_SPIKE_TOUCH_KEY = "GA_tot_spike_touch_lost"
NB_LEVEL_LOST_BARREL_EXPLOSION_KEY = "GA_tot_barrel_explosion_lost"
NB_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY = "GA_tot_pirate_shot_lost"

MEAN_LEVEL_LOST_SPIKE_TOUCH_KEY = "GA_avg_spike_touch_lost"
MEAN_LEVEL_LOST_BARREL_EXPLOSION_KEY = "GA_avg_barrel_explosion_lost"
MEAN_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY = "GA_avg_pirate_shot_lost"

STD_LEVEL_LOST_SPIKE_TOUCH_KEY = "GA_std_spike_touch_lost"
STD_LEVEL_LOST_BARREL_EXPLOSION_KEY = "GA_std_barrel_explosion_lost"
STD_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY = "GA_std_pirate_shot_lost"

LEVEL_LOST_KEYS_TOTAL = [
    NB_LEVEL_LOST_SPIKE_TOUCH_KEY,
    NB_LEVEL_LOST_BARREL_EXPLOSION_KEY,
    NB_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY,
]

LEVEL_LOST_KEYS_MEAN = [
    MEAN_LEVEL_LOST_SPIKE_TOUCH_KEY,
    MEAN_LEVEL_LOST_BARREL_EXPLOSION_KEY,
    MEAN_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY,
]

LEVEL_LOST_KEYS_STD = [
    STD_LEVEL_LOST_SPIKE_TOUCH_KEY,
    STD_LEVEL_LOST_BARREL_EXPLOSION_KEY,
    STD_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY,
]

# Execution total+ mean + std

NB_TOO_MANY_LINES_ERROR_KEY = "EX_tot_too_many_lines_error"
NB_SYNTACTIC_ERROR_KEY = "EX_tot_syntactic_error"
NB_SEMANTIC_ERROR_KEY = "EX_tot_semantic_error"
NB_USER_STOPPED_EXECUTION_KEY = "EX_tot_user_stopped"
NB_COMPLETED_EXECUTION_KEY = "EX_tot_executed"

MEAN_TOO_MANY_LINES_ERROR_KEY = "EX_avg_too_many_lines_error"
MEAN_SYNTACTIC_ERROR_KEY = "EX_avg_syntactic_error"
MEAN_SEMANTIC_ERROR_KEY = "EX_avg_semantic_error"
MEAN_USER_STOPPED_EXECUTION_KEY = "EX_avg_user_stopped"
MEAN_COMPLETED_EXECUTION_KEY = "EX_avg_executed"

STD_TOO_MANY_LINES_ERROR_KEY = "EX_std_too_many_lines_error"
STD_SYNTACTIC_ERROR_KEY = "EX_std_syntactic_error"
STD_SEMANTIC_ERROR_KEY = "EX_std_semantic_error"
STD_USER_STOPPED_EXECUTION_KEY = "EX_std_user_stopped"
STD_COMPLETED_EXECUTION_KEY = "EX_std_executed"

EXECUTION_KEYS_TOTAL = [
    NB_TOO_MANY_LINES_ERROR_KEY ,
    NB_SYNTACTIC_ERROR_KEY ,
    NB_SEMANTIC_ERROR_KEY ,
    NB_USER_STOPPED_EXECUTION_KEY ,
    NB_COMPLETED_EXECUTION_KEY
]
EXECUTION_KEYS_MEAN = [
    MEAN_TOO_MANY_LINES_ERROR_KEY ,
    MEAN_SYNTACTIC_ERROR_KEY ,
    MEAN_SEMANTIC_ERROR_KEY ,
    MEAN_USER_STOPPED_EXECUTION_KEY ,
    MEAN_COMPLETED_EXECUTION_KEY
]
EXECUTION_KEYS_STD = [
    STD_TOO_MANY_LINES_ERROR_KEY ,
    STD_SYNTACTIC_ERROR_KEY ,
    STD_SEMANTIC_ERROR_KEY ,
    STD_USER_STOPPED_EXECUTION_KEY ,
    STD_COMPLETED_EXECUTION_KEY
]


# Max game progression (percentage)
MAX_GAME_PROGRESSION_KEY = "GA_max_progression"

# Execution speed changed total+rate
NB_EXECUTION_SPEED_CHANGED_KEY = "EX_tot_speed_changed"
RATE_EXECUTION_SPEED_CHANGED_KEY = "EX_rate_speed_changed"


# ----------------
# Features about the implemented concepts (var, loop, etc.) and the used control function (walk, etc.)
# Generated by analyzing the abstract syntax trees of the executed programs by the students
# ----------------

# Implemented concepts (var, loop..) per completed execution (total) 
TOTAL_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY = "CO_tot_var_affect_impl"
TOTAL_BOOLEAN_CONCEPT_IMPLEMENTED_KEY = "CO_tot_boolean_impl"
TOTAL_STRING_CONCEPT_IMPLEMENTED_KEY = "CO_tot_string_impl"
TOTAL_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_tot_if_branch_impl"
TOTAL_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_tot_elif_branch_impl"
TOTAL_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_tot_else_branch_impl"
TOTAL_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY = "CO_tot_for_simple_impl"
TOTAL_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY = "CO_tot_for_counter_0_impl"
TOTAL_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY = "CO_tot_for_counter_n_impl"
TOTAL_WHILE_CONCEPT_IMPLEMENTED_KEY = "CO_tot_while_impl"

# Implemented concepts (var, loop..) per completed execution (avg, sd ) 
MEAN_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY = "CO_avg_var_affect_impl"
MEAN_BOOLEAN_CONCEPT_IMPLEMENTED_KEY = "CO_avg_boolean_impl"
MEAN_STRING_CONCEPT_IMPLEMENTED_KEY = "CO_avg_string_impl"
MEAN_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_avg_if_branch_impl"
MEAN_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_avg_elif_branch_impl"
MEAN_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_avg_else_branch_impl"
MEAN_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY = "CO_avg_for_simple_impl"
MEAN_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY = "CO_avg_for_counter_0_impl"
MEAN_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY = "CO_avg_for_counter_n_impl"
MEAN_WHILE_CONCEPT_IMPLEMENTED_KEY = "CO_avg_while_impl"

STD_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY = "CO_std_var_affect_impl"
STD_BOOLEAN_CONCEPT_IMPLEMENTED_KEY = "CO_std_boolean_impl"
STD_STRING_CONCEPT_IMPLEMENTED_KEY = "CO_std_string_impl"
STD_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_std_if_branch_impl"
STD_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_std_elif_branch_impl"
STD_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY = "CO_std_else_branch_impl"
STD_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY = "CO_std_for_simple_impl"
STD_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY = "CO_std_for_counter_0_impl"
STD_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY = "CO_std_for_counter_n_impl"
STD_WHILE_CONCEPT_IMPLEMENTED_KEY = "CO_std_while_impl"

IMPLEMENTED_CONCEPTS_KEYS_TOTAL = [
    TOTAL_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_BOOLEAN_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_STRING_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY,
    TOTAL_WHILE_CONCEPT_IMPLEMENTED_KEY,
]

IMPLEMENTED_CONCEPTS_KEYS_MEAN = [
    MEAN_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY,
    MEAN_BOOLEAN_CONCEPT_IMPLEMENTED_KEY,
    MEAN_STRING_CONCEPT_IMPLEMENTED_KEY,
    MEAN_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    MEAN_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    MEAN_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    MEAN_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY,
    MEAN_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY,
    MEAN_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY,
    MEAN_WHILE_CONCEPT_IMPLEMENTED_KEY,
]

IMPLEMENTED_CONCEPTS_KEYS_SD = [
    STD_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY,
    STD_BOOLEAN_CONCEPT_IMPLEMENTED_KEY,
    STD_STRING_CONCEPT_IMPLEMENTED_KEY,
    STD_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    STD_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    STD_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    STD_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY,
    STD_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY,
    STD_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY,
    STD_WHILE_CONCEPT_IMPLEMENTED_KEY,
]

# Use of each control function (walk...) per completed execution (total)
TOTAL_WALK_CTR_FUN_USED_KEY = "GA_tot_walk_ctr_fun_used"
TOTAL_LEFT_CTR_FUN_USED_KEY = "GA_tot_left_ctr_fun_used"
TOTAL_RIGHT_CTR_FUN_USED_KEY = "GA_tot_right_ctr_fun_used"
TOTAL_OPEN_CTR_FUN_USED_KEY = "GA_tot_open_ctr_fun_used"
TOTAL_JUMP_CTR_FUN_USED_KEY = "GA_tot_jump_ctr_fun_used"
TOTAL_JUMP_HEIGHT_CTR_FUN_USED_KEY = "GA_tot_jump_height_ctr_fun_used"
TOTAL_JUMP_HIGH_CTR_FUN_USED_KEY = "GA_tot_jump_high_ctr_fun_used"
TOTAL_GET_HEIGHT_CTR_FUN_USED_KEY = "GA_tot_get_height_ctr_fun_used"
TOTAL_READ_STRING_CTR_FUN_USED_KEY = "GA_tot_read_string_ctr_fun_used"
TOTAL_READ_INT_CTR_FUN_USED_KEY = "GA_tot_read_int_ctr_fun_used"
TOTAL_ATTACK_CTR_FUN_USED_KEY = "GA_tot_attack_ctr_fun_used"
TOTAL_DETECT_OBSTACLE_CTR_FUN_USED_KEY = "GA_tot_detect_obstacle_ctr_fun_used"
TOTAL_TURN_CTR_FUN_USED_KEY = "GA_tot_turn_ctr_fun_used"
TOTAL_SHOOT_CTR_FUN_USED_KEY = "GA_tot_shoot_ctr_fun_used"

# Use of each control function (walk...) per completed execution (avg, sd )
MEAN_WALK_CTR_FUN_USED_KEY = "GA_avg_walk_ctr_fun_used"
MEAN_LEFT_CTR_FUN_USED_KEY = "GA_avg_left_ctr_fun_used"
MEAN_RIGHT_CTR_FUN_USED_KEY = "GA_avg_right_ctr_fun_used"
MEAN_OPEN_CTR_FUN_USED_KEY = "GA_avg_open_ctr_fun_used"
MEAN_JUMP_CTR_FUN_USED_KEY = "GA_avg_jump_ctr_fun_used"
MEAN_JUMP_HEIGHT_CTR_FUN_USED_KEY = "GA_avg_jump_height_ctr_fun_used"
MEAN_JUMP_HIGH_CTR_FUN_USED_KEY = "GA_avg_jump_high_ctr_fun_used"
MEAN_GET_HEIGHT_CTR_FUN_USED_KEY = "GA_avg_get_height_ctr_fun_used"
MEAN_READ_STRING_CTR_FUN_USED_KEY = "GA_avg_read_string_ctr_fun_used"
MEAN_READ_INT_CTR_FUN_USED_KEY = "GA_avg_read_int_ctr_fun_used"
MEAN_ATTACK_CTR_FUN_USED_KEY = "GA_avg_attack_ctr_fun_used"
MEAN_DETECT_OBSTACLE_CTR_FUN_USED_KEY = "GA_avg_detect_obstacle_ctr_fun_used"
MEAN_TURN_CTR_FUN_USED_KEY = "GA_avg_turn_ctr_fun_used"
MEAN_SHOOT_CTR_FUN_USED_KEY = "GA_avg_shoot_ctr_fun_used"

STD_WALK_CTR_FUN_USED_KEY = "GA_std_walk_ctr_fun_used"
STD_LEFT_CTR_FUN_USED_KEY = "GA_std_left_ctr_fun_used"
STD_RIGHT_CTR_FUN_USED_KEY = "GA_std_right_ctr_fun_used"
STD_OPEN_CTR_FUN_USED_KEY = "GA_std_open_ctr_fun_used"
STD_JUMP_CTR_FUN_USED_KEY = "GA_std_jump_ctr_fun_used"
STD_JUMP_HEIGHT_CTR_FUN_USED_KEY = "GA_std_jump_height_ctr_fun_used"
STD_JUMP_HIGH_CTR_FUN_USED_KEY = "GA_std_jump_high_ctr_fun_used"
STD_GET_HEIGHT_CTR_FUN_USED_KEY = "GA_std_get_height_ctr_fun_used"
STD_READ_STRING_CTR_FUN_USED_KEY = "GA_std_read_string_ctr_fun_used"
STD_READ_INT_CTR_FUN_USED_KEY = "GA_std_read_int_ctr_fun_used"
STD_ATTACK_CTR_FUN_USED_KEY = "GA_std_attack_ctr_fun_used"
STD_DETECT_OBSTACLE_CTR_FUN_USED_KEY = "GA_std_detect_obstacle_ctr_fun_used"
STD_TURN_CTR_FUN_USED_KEY = "GA_std_turn_ctr_fun_used"
STD_SHOOT_CTR_FUN_USED_KEY = "GA_std_shoot_ctr_fun_used"

CONTROL_FUNCTION_KEYS_TOTAL = [
    TOTAL_WALK_CTR_FUN_USED_KEY,
    TOTAL_LEFT_CTR_FUN_USED_KEY,
    TOTAL_RIGHT_CTR_FUN_USED_KEY,
    TOTAL_OPEN_CTR_FUN_USED_KEY,
    TOTAL_JUMP_CTR_FUN_USED_KEY,
    TOTAL_JUMP_HEIGHT_CTR_FUN_USED_KEY,
    TOTAL_JUMP_HIGH_CTR_FUN_USED_KEY,
    TOTAL_GET_HEIGHT_CTR_FUN_USED_KEY,
    TOTAL_READ_STRING_CTR_FUN_USED_KEY,
    TOTAL_READ_INT_CTR_FUN_USED_KEY,
    TOTAL_ATTACK_CTR_FUN_USED_KEY,
    TOTAL_DETECT_OBSTACLE_CTR_FUN_USED_KEY,
    TOTAL_TURN_CTR_FUN_USED_KEY,
    TOTAL_SHOOT_CTR_FUN_USED_KEY
]

CONTROL_FUNCTION_KEYS_MEAN = [
    MEAN_WALK_CTR_FUN_USED_KEY,
    MEAN_LEFT_CTR_FUN_USED_KEY,
    MEAN_RIGHT_CTR_FUN_USED_KEY,
    MEAN_OPEN_CTR_FUN_USED_KEY,
    MEAN_JUMP_CTR_FUN_USED_KEY,
    MEAN_JUMP_HEIGHT_CTR_FUN_USED_KEY,
    MEAN_JUMP_HIGH_CTR_FUN_USED_KEY,
    MEAN_GET_HEIGHT_CTR_FUN_USED_KEY,
    MEAN_READ_STRING_CTR_FUN_USED_KEY,
    MEAN_READ_INT_CTR_FUN_USED_KEY,
    MEAN_ATTACK_CTR_FUN_USED_KEY,
    MEAN_DETECT_OBSTACLE_CTR_FUN_USED_KEY,
    MEAN_TURN_CTR_FUN_USED_KEY,
    MEAN_SHOOT_CTR_FUN_USED_KEY,
]

CONTROL_FUNCTION_KEYS_SD = [
    STD_WALK_CTR_FUN_USED_KEY,
    STD_LEFT_CTR_FUN_USED_KEY,
    STD_RIGHT_CTR_FUN_USED_KEY,
    STD_OPEN_CTR_FUN_USED_KEY,
    STD_JUMP_CTR_FUN_USED_KEY,
    STD_JUMP_HEIGHT_CTR_FUN_USED_KEY,
    STD_JUMP_HIGH_CTR_FUN_USED_KEY,
    STD_GET_HEIGHT_CTR_FUN_USED_KEY,
    STD_READ_STRING_CTR_FUN_USED_KEY,
    STD_READ_INT_CTR_FUN_USED_KEY,
    STD_ATTACK_CTR_FUN_USED_KEY,
    STD_DETECT_OBSTACLE_CTR_FUN_USED_KEY,
    STD_TURN_CTR_FUN_USED_KEY,
    STD_SHOOT_CTR_FUN_USED_KEY,
]


# ----------------
# Help labels
# ----------------

# Different type of received help (0 or 1)
CONTROL_HELP_RECEIVED_KEY = "FE_tot_control_received"
NOTION_HELP_RECEIVED_KEY = "FE_tot_notion_received"
IMPLEMENTATION_HELP_RECEIVED_KEY = "FE_tot_implementation_received"
SOLUTION_HELP_RECEIVED_KEY = "FE_tot_solution_received"

RECEIVED_HELP_KEYS = [
    CONTROL_HELP_RECEIVED_KEY,
    NOTION_HELP_RECEIVED_KEY,    
    IMPLEMENTATION_HELP_RECEIVED_KEY,
    SOLUTION_HELP_RECEIVED_KEY,
]
# Requested help (no details about the received help type)
REQUESTED_HELP_TOTAL = "FE_tot_requested"
# REQUESTED_HELP_RATE = "FE_rate_requested" => Not relevant ?

REQUESTED_HELP_KEYS = [
    REQUESTED_HELP_TOTAL, 
    # REQUESTED_HELP_RATE
]
# Used to compute the rates
ALL_RATE_FEATURES = CONTENT_COPIED_PASTED_KEYS_RATE + HELP_COPIED_KEYS_RATE + [RATE_EXECUTION_SPEED_CHANGED_KEY] # + [REQUESTED_HELP_RATE]


In [38]:
# ----------------
# --- Features ---
# ----------------

# All behaviors data features (SUM/MEAN/SD/RATE)
SUM_MEAN_SD_RATE_FEATURES_KEYS = [LEVEL_TIME_SPENT_KEY] \
    + CONTENT_DISPLAY_KEYS_TOTAL \
    + CONTENT_DISPLAY_KEYS_MEAN \
    + CONTENT_DISPLAY_KEYS_SD \
    + CONTENT_COPIED_PASTED_KEYS_TOTAL \
    + CONTENT_COPIED_PASTED_KEYS_RATE \
    + ERRORS_KEYS_TOTAL \
    + ERRORS_KEYS_MEAN \
    + ERRORS_KEYS_STD \
    + LEVEL_LOST_KEYS_TOTAL \
    + LEVEL_LOST_KEYS_MEAN \
    + LEVEL_LOST_KEYS_STD \
    + EXECUTION_KEYS_TOTAL \
    + EXECUTION_KEYS_MEAN \
    + EXECUTION_KEYS_STD \
    + IMPLEMENTED_CONCEPTS_KEYS_TOTAL \
    + IMPLEMENTED_CONCEPTS_KEYS_MEAN \
    + IMPLEMENTED_CONCEPTS_KEYS_SD \
    + CONTROL_FUNCTION_KEYS_TOTAL \
    + CONTROL_FUNCTION_KEYS_MEAN \
    + CONTROL_FUNCTION_KEYS_SD \
    + [MAX_GAME_PROGRESSION_KEY] \
    + [NB_EXECUTION_SPEED_CHANGED_KEY, RATE_EXECUTION_SPEED_CHANGED_KEY]


# Sum behaviors data features
SUM_FEATURES_KEYS = [LEVEL_TIME_SPENT_KEY] \
    + CONTENT_DISPLAY_KEYS_TOTAL \
    + CONTENT_COPIED_PASTED_KEYS_TOTAL \
    + ERRORS_KEYS_TOTAL \
    + LEVEL_LOST_KEYS_TOTAL \
    + EXECUTION_KEYS_TOTAL \
    + IMPLEMENTED_CONCEPTS_KEYS_TOTAL \
    + CONTROL_FUNCTION_KEYS_TOTAL \
    + [MAX_GAME_PROGRESSION_KEY] \
    + [NB_EXECUTION_SPEED_CHANGED_KEY]


# Mean/SD/RATE behaviors data features
MEAN_SD_RATE_FEATURES_KEYS = CONTENT_DISPLAY_KEYS_MEAN \
    + CONTENT_DISPLAY_KEYS_SD \
    + CONTENT_COPIED_PASTED_KEYS_RATE \
    + ERRORS_KEYS_MEAN \
    + ERRORS_KEYS_STD \
    + LEVEL_LOST_KEYS_MEAN \
    + LEVEL_LOST_KEYS_STD \
    + EXECUTION_KEYS_MEAN \
    + EXECUTION_KEYS_STD \
    + IMPLEMENTED_CONCEPTS_KEYS_MEAN \
    + IMPLEMENTED_CONCEPTS_KEYS_SD \
    + CONTROL_FUNCTION_KEYS_MEAN \
    + CONTROL_FUNCTION_KEYS_SD \
    + [MAX_GAME_PROGRESSION_KEY] \
    + [RATE_EXECUTION_SPEED_CHANGED_KEY]

# All data features with help received/requested or no help
ALL_FEATURES_KEYS_HELP_RECEIVED = SUM_MEAN_SD_RATE_FEATURES_KEYS \
    + HELP_COPIED_KEYS_TOTAL \
    + HELP_COPIED_KEYS_RATE \
    + RECEIVED_HELP_KEYS

ALL_FEATURES_KEYS_HELP_REQUESTED = SUM_MEAN_SD_RATE_FEATURES_KEYS \
    + HELP_COPIED_KEYS_TOTAL \
    + HELP_COPIED_KEYS_RATE \
    + [REQUESTED_HELP_TOTAL]

ALL_FEATURES_KEYS_NOHELP = SUM_MEAN_SD_RATE_FEATURES_KEYS + [] # to avoid list referencing (create a new list)

# Sum features with help received/requested or no help
SUM_FEATURES_KEYS_HELP_RECEIVED = SUM_FEATURES_KEYS \
    + HELP_COPIED_KEYS_TOTAL \
    + RECEIVED_HELP_KEYS

SUM_FEATURES_KEYS_HELP_REQUESTED = SUM_FEATURES_KEYS \
    + HELP_COPIED_KEYS_TOTAL \
    + [REQUESTED_HELP_TOTAL]

SUM_FEATURES_KEYS_NOHELP = SUM_FEATURES_KEYS + [] # to avoid list referencing (create a new list)

# Mean/SD features with help received/requested or no help
MEAN_SD_RATE_FEATURES_KEYS_HELP_RECEIVED = MEAN_SD_RATE_FEATURES_KEYS \
    + HELP_COPIED_KEYS_RATE \
    + RECEIVED_HELP_KEYS
MEAN_SD_RATE_FEATURES_KEYS_HELP_REQUESTED = MEAN_SD_RATE_FEATURES_KEYS \
    + HELP_COPIED_KEYS_RATE \
    + [REQUESTED_HELP_TOTAL]

MEAN_SD_RATE_FEATURES_KEYS_NOHELP = MEAN_SD_RATE_FEATURES_KEYS + [] # to avoid list referencing (create a new list)

# -------------
# --- Group ---
# -------------
# Student id (used for GroupStratifiedKFold to avoid data contamination)
STUDENT_ID_KEY = "student_id"

# --------------
# --- Labels ---
# --------------
HELP_TYPE_KEY = "help_type"
# Label values
CONTROL_HELP_LABEL_VALUE_KEY = 1
NOTION_HELP_LABEL_VALUE_KEY = 2
IMPLEMENTATION_HELP_LABEL_VALUE_KEY = 3
SOLUTION_HELP_LABEL_VALUE_KEY = 4
# Data label
LABELS_KEY = HELP_TYPE_KEY

# -----------------
# --- Data sets ---
# -----------------

FEATURES_SETS = {
    "all_features_help_received": ALL_FEATURES_KEYS_HELP_RECEIVED,
    "all_features_help_requested": ALL_FEATURES_KEYS_HELP_REQUESTED,
    "all_features_nohelp": ALL_FEATURES_KEYS_NOHELP,
    "sum_features_help_received": SUM_FEATURES_KEYS_HELP_RECEIVED,
    "sum_features_help_requested": SUM_FEATURES_KEYS_HELP_REQUESTED,
    "sum_features_nohelp": SUM_FEATURES_KEYS_NOHELP,
    "mean_sd_rate_features_help_received": MEAN_SD_RATE_FEATURES_KEYS_HELP_RECEIVED,
    "mean_sd_rate_features_help_requested": MEAN_SD_RATE_FEATURES_KEYS_HELP_REQUESTED,
    "mean_sd_rate_features_nohelp": MEAN_SD_RATE_FEATURES_KEYS_NOHELP
}
for key in FEATURES_SETS:
    FEATURES_SETS[key] += [STUDENT_ID_KEY] + [LABELS_KEY]

# All keys of the dataset : features, group and labels
ALL_KEYS = SUM_MEAN_SD_RATE_FEATURES_KEYS \
    + HELP_COPIED_KEYS_TOTAL \
    + HELP_COPIED_KEYS_RATE \
    + RECEIVED_HELP_KEYS \
    + REQUESTED_HELP_KEYS \
    + [STUDENT_ID_KEY] \
    + [LABELS_KEY]

In [39]:
ALL_KEYS

['level_time_spent',
 'CO_tot_base_disp_time',
 'CO_tot_var_disp_time',
 'CO_tot_condi_disp_time',
 'CO_tot_for_disp_time',
 'CO_tot_while_disp_time',
 'CO_avg_base_disp_time',
 'CO_avg_var_disp_time',
 'CO_avg_condi_disp_time',
 'CO_avg_for_disp_time',
 'CO_avg_while_disp_time',
 'CO_std_base_disp_time',
 'CO_std_var_disp_time',
 'CO_std_condi_disp_time',
 'CO_std_for_disp_time',
 'CO_std_while_disp_time',
 'CO_tot_code_editor_copied',
 'CO_tot_control_function_copied',
 'CO_tot_base_program_copied',
 'CO_tot_base_error_copied',
 'CO_tot_base_structuration_copied',
 'CO_tot_base_comment_copied',
 'CO_tot_var_creation_copied',
 'CO_tot_var_modif_copied',
 'CO_tot_var_usage_copied',
 'CO_tot_var_type_copied',
 'CO_tot_condi_1bran_copied',
 'CO_tot_condi_2bran_copied',
 'CO_tot_condi_3bran_copied',
 'CO_tot_for_simple_copied',
 'CO_tot_for_counter_0_copied',
 'CO_tot_for_counter_n_copied',
 'CO_tot_while_simple_copied',
 'CO_tot_pasted',
 'CO_rate_code_editor_copied',
 'CO_rate_control_f

## 3) Data fetching

Data exportation from LRS : Learning locker export configuration

In [40]:
# {
#   "_help_origin": "$statement.context.extensions.https://py-rates&46;org/xAPI/extensions/help-origin",
#   "_stopped_line": "$statement.result.extensions.https://py-rates&46;org/xAPI/extensions/stopped-line",
#   "_level": "$statement.context.contextActivities.other.definition.name.en-US",
#   "_implemented_concepts": "$statement.object.definition.extensions.https://py-rates&46;org/xAPI/extensions/implemented-concepts",
#   "_lost_level": "$statement.result.extensions.https://py-rates&46;org/xAPI/extensions/level-lost-reason",
#   "_error": "$statement.result.extensions.https://py-rates&46;org/xAPI/extensions/error",
#   "_game_error_reason": "$statement.result.extensions.https://py-rates&46;org/xAPI/extensions/game-error-reason",
#   "_code": "$statement.object.definition.extensions.https://py-rates&46;org/xAPI/extensions/code",
#   "_game_progression": "$statement.result.extensions.https://py-rates&46;org/xAPI/extensions/progression",
#   "_date": "$statement.timestamp",
#   "_duration": "$statement.result.duration",
#   "_used_control_functions": "$statement.object.definition.extensions.https://py-rates&46;org/xAPI/extensions/used-control-functions",
#   "_execution_speed_changed": "$statement.object.definition.extensions.https://py-rates&46;org/xAPI/extensions/execution-speed-multiplier",
#   "_extra_lines_number": "$statement.result.extensions.https://py-rates&46;org/xAPI/extensions/extra-lines-number",
#   "_type": "$statement.verb.id",
#   "_id": 1,
#   "_game_time": "$statement.context.extensions.https://py-rates&46;org/xAPI/extensions/game-time",
#   "_object_id": "$statement.object.id",
#   "_student": "$statement.actor.account.name",
#   "_execution_speed_multiplier": "$statement.result.extensions.https://py-rates&46;org/xAPI/extensions/execution-speed-multiplier"
# }

Raw cvs file from LRS to Pandas dataframe

In [41]:
# Load data form xls file
all_data = pd.read_csv("data/raw_data.csv",header = 0, quotechar="\"")

# Reorder columns
all_data=all_data[ALL_DATA_KEYS]

# Keep experimentation data from date
START_XP_DATE = "2022-09-26T12:38:54.190+02:00"
END_XP_DATE = "2022-10-18T14:19:08.897+02:00"
all_data = all_data.loc[all_data[DATE_DATA_KEY].between(START_XP_DATE,END_XP_DATE)]

# Delete lists extra characters
all_data[LEVEL_DATA_KEY] = all_data[LEVEL_DATA_KEY].str.replace('\[\"',"",regex=True)
all_data[LEVEL_DATA_KEY] = all_data[LEVEL_DATA_KEY].str.replace('\"\]',"",regex=True)
all_data[LEVEL_DATA_KEY] = all_data[LEVEL_DATA_KEY].str.replace('Level ','Level',regex=True)

all_data[IMPLEMENTED_CONCEPTS_DATA_KEY] = all_data[IMPLEMENTED_CONCEPTS_DATA_KEY].str.replace('\[\"',"",regex=True)
all_data[IMPLEMENTED_CONCEPTS_DATA_KEY] = all_data[IMPLEMENTED_CONCEPTS_DATA_KEY].str.replace('\"\]',"",regex=True)
all_data[IMPLEMENTED_CONCEPTS_DATA_KEY] = all_data[IMPLEMENTED_CONCEPTS_DATA_KEY].str.replace('\"\,\"',",",regex=True)

all_data[USED_CONTROL_FUNCTIONS_DATA_KEY] = all_data[USED_CONTROL_FUNCTIONS_DATA_KEY].str.replace('\[\"',"",regex=True)
all_data[USED_CONTROL_FUNCTIONS_DATA_KEY] = all_data[USED_CONTROL_FUNCTIONS_DATA_KEY].str.replace('\"\]',"",regex=True)
all_data[USED_CONTROL_FUNCTIONS_DATA_KEY] = all_data[USED_CONTROL_FUNCTIONS_DATA_KEY].str.replace('\"\,\"',",",regex=True)

# Add zero value on some unassigned _game_progression
cond = (all_data[OBJECT_ID_DATA_KEY].isin([FULLY_EXECUTED_PROGRAM, USER_STOPPED_PROGRAM])) & (all_data[GAME_PROGRESSION_DATA_KEY].isna())
all_data[GAME_PROGRESSION_DATA_KEY].mask(cond ,0, inplace=True)


# Export to Excel file for manual checking
# all_data.to_excel("data/ML_data.xlsx")
print(all_data.info())



<class 'pandas.core.frame.DataFrame'>
Int64Index: 73189 entries, 1186 to 74374
Data columns (total 20 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   _id                          73189 non-null  object 
 1   _type                        73189 non-null  object 
 2   _level                       73189 non-null  object 
 3   _student                     73189 non-null  object 
 4   _date                        73189 non-null  object 
 5   _object_id                   73189 non-null  object 
 6   _game_error_reason           7975 non-null   object 
 7   _implemented_concepts        8785 non-null   object 
 8   _used_control_functions      11192 non-null  object 
 9   _lost_level                  191 non-null    object 
 10  _game_progression            10636 non-null  float64
 11  _duration                    51529 non-null  object 
 12  _extra_lines_number          339 non-null    float64
 13  _error       

In [42]:

# # Load data form xls file
# all_data = pd.read_excel("data/ML_data.xlsx",header = 0, sheet_name="data")
# print(all_data.info())

## 4) Data filtration

In [43]:
# Delete manipulation errors during experimentation
ERRONEOUS_STATEMENTS = [
    '634e6a59521857063da7b63e', 
    '633ea161521857063da766c9',
    '633aad73521857063da752c8',
    '63451355521857063da7849a',
    '634cf95c490d8f065137ef1b']
all_data.drop(all_data[all_data[ID_DATA_KEY].isin(ERRONEOUS_STATEMENTS)].index,inplace=True)

# Keep only the data from the students of the experimentation
XP_STUDENTS =  ELORN_E_1 + ELORN_E_2 \
    + ELORN_F_1 + ELORN_F_2 \
    + ELORN_I_1 + ELORN_I_2 \
    + ELORN_J_1 + ELORN_J_2 \
    + SAINT_LOUIS_2_1 + SAINT_LOUIS_2_2 \
    + SAINT_LOUIS_4_1 + SAINT_LOUIS_4_2 \
    + SAINT_LOUIS_9 + SAINT_LOUIS_10

print("Number of students :", len(XP_STUDENTS))

students_data = all_data[all_data[STUDENT_DATA_KEY].isin(XP_STUDENTS)]

print("Size of raw dataset :", len(students_data))

# Delete startup guide content view trace (irrelevant)
students_data = students_data[~students_data[OBJECT_ID_DATA_KEY].isin([STARTUP_GOAL_CONTENT,STARTUP_OPERATION_CONTENT,STARTUP_SAVE_CONTENT])]
# print(students_data.head())

# level filtering
levels_raw_dataframes = {}
for level_key in LEVELS_KEYS :
    levels_raw_dataframes[level_key] = students_data[students_data[LEVEL_DATA_KEY]==level_key]
    
print(levels_raw_dataframes.keys())

Number of students : 215
Size of raw dataset : 72932
dict_keys(['Level1', 'Level2', 'Level3', 'Level4', 'Level5', 'Level6', 'Level7', 'Level8'])


## 5) Raw dataset to machine learning dataset

### 5.1 Content mapping

This defines the mapping from the raw data constant to the ML features constant. Essentially this indicates from what behaviors each features is linked to.

In [44]:
# Display time of contents
CONTENT_DISPLAY_TOTAL_TIME_MAPPING = {
    BASE_CONTENT : BASE_DISPLAY_TOTAL_TIME_KEY,
    VAR_CONTENT : VAR_DISPLAY_TOTAL_TIME_KEY,
    CONDI_CONTENT : CONDI_DISPLAY_TOTAL_TIME_KEY,
    FOR_CONTENT : FOR_DISPLAY_TOTAL_TIME_KEY,
    WHILE_CONTENT : WHILE_DISPLAY_TOTAL_TIME_KEY
}

CONTENT_DISPLAY_MEAN_TIME_MAPPING = {
    BASE_CONTENT : BASE_DISPLAY_MEAN_TIME_KEY,
    VAR_CONTENT : VAR_DISPLAY_MEAN_TIME_KEY,
    CONDI_CONTENT : CONDI_DISPLAY_MEAN_TIME_KEY,
    FOR_CONTENT : FOR_DISPLAY_MEAN_TIME_KEY,
    WHILE_CONTENT : WHILE_DISPLAY_MEAN_TIME_KEY
}

CONTENT_DISPLAY_STD_TIME_MAPPING = {
    BASE_CONTENT : BASE_DISPLAY_STD_TIME_KEY,
    VAR_CONTENT : VAR_DISPLAY_STD_TIME_KEY,
    CONDI_CONTENT : CONDI_DISPLAY_STD_TIME_KEY,
    FOR_CONTENT : FOR_DISPLAY_STD_TIME_KEY,
    WHILE_CONTENT : WHILE_DISPLAY_STD_TIME_KEY
}

# Sub-content classification in contents
SUB_CONTENT_CLASSIFICATION = {
    BASE_PROGRAM_CONTENT : BASE_CONTENT,
    BASE_ERROR_CONTENT : BASE_CONTENT,
    BASE_STRUCTURE_CONTENT : BASE_CONTENT,
    BASE_COMMENT_CONTENT : BASE_CONTENT,    
    VAR_CREATION_CONTENT : VAR_CONTENT,
    VAR_USAGE_CONTENT : VAR_CONTENT,
    VAR_MODIFICATION_CONTENT : VAR_CONTENT,
    VAR_TYPE_CONTENT : VAR_CONTENT,
    CONDI_1BRAN_CONTENT : CONDI_CONTENT,
    CONDI_2BRAN_CONTENT : CONDI_CONTENT,
    CONDI_3BRAN_CONTENT : CONDI_CONTENT,
    FOR_SIMPLE_CONTENT : FOR_CONTENT,
    FOR_COUNTER_1_CONTENT : FOR_CONTENT,
    FOR_COUNTER_N_CONTENT : FOR_CONTENT,
    WHILE_SUB_CONTENT : WHILE_CONTENT
}

# Copied content
COPIED_CONTENT_MAPPING = {
    CODE_EDITOR_CONTENT : NB_CODE_EDITOR_COPIED_KEY,
    CONTROL_FUNCTIONS_CONTENT : NB_CONTROL_FUNCTION_COPIED_KEY,
    HELP_CONTENT : NB_HELP_COPIED_KEY,
    BASE_PROGRAM_CONTENT : NB_BASE_PROGRAM_COPIED_KEY,
    BASE_ERROR_CONTENT : NB_BASE_ERROR_COPIED_KEY,
    BASE_STRUCTURE_CONTENT : NB_BASE_STRUCTURATION_COPIED_KEY,
    BASE_COMMENT_CONTENT : NB_BASE_COMMENT_COPIED_KEY,
    VAR_CREATION_CONTENT : NB_VAR_CREATION_COPIED_KEY,
    VAR_USAGE_CONTENT : NB_VAR_MODIFICATION_COPIED_KEY,
    VAR_MODIFICATION_CONTENT : NB_VAR_USAGE_COPIED_KEY,
    VAR_TYPE_CONTENT : NB_VAR_TYPE_COPIED_KEY,
    CONDI_1BRAN_CONTENT : NB_CONDI_1BRAN_COPIED_KEY,
    CONDI_2BRAN_CONTENT : NB_CONDI_2BRAN_COPIED_KEY,
    CONDI_3BRAN_CONTENT : NB_CONDI_3BRAN_COPIED_KEY,
    FOR_SIMPLE_CONTENT : NB_FOR_SIMPLE_COPIED_KEY,
    FOR_COUNTER_1_CONTENT : NB_FOR_COUNTER_0_COPIED_KEY,
    FOR_COUNTER_N_CONTENT : NB_FOR_COUNTER_N_COPIED_KEY,
    WHILE_SUB_CONTENT : NB_WHILE_SIMPLE_COPIED_KEY
}

# Implemented concept
TOTAL_IMPLEMENTED_CONCEPT_MAPPING = {
    VAR_AFFECTATION_CONCEPT : TOTAL_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY,
    BOOLEAN_CONCEPT : TOTAL_BOOLEAN_CONCEPT_IMPLEMENTED_KEY,
    STRING_CONCEPT : TOTAL_STRING_CONCEPT_IMPLEMENTED_KEY,
    IF_BRANCH_CONCEPT : TOTAL_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    ELIF_BRANCH_CONCEPT : TOTAL_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    ELSE_BRANCH_CONCEPT : TOTAL_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    FOR_SIMPLE_CONCEPT : TOTAL_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY,
    FOR_COUNTER_0_CONCEPT : TOTAL_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY,
    FOR_COUNTER_N_CONCEPT : TOTAL_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY,
    WHILE_CONCEPT : TOTAL_WHILE_CONCEPT_IMPLEMENTED_KEY
}

MEAN_IMPLEMENTED_CONCEPT_MAPPING = {
    VAR_AFFECTATION_CONCEPT : MEAN_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY,
    BOOLEAN_CONCEPT : MEAN_BOOLEAN_CONCEPT_IMPLEMENTED_KEY,
    STRING_CONCEPT : MEAN_STRING_CONCEPT_IMPLEMENTED_KEY,
    IF_BRANCH_CONCEPT : MEAN_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    ELIF_BRANCH_CONCEPT : MEAN_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    ELSE_BRANCH_CONCEPT : MEAN_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    FOR_SIMPLE_CONCEPT : MEAN_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY,
    FOR_COUNTER_0_CONCEPT : MEAN_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY,
    FOR_COUNTER_N_CONCEPT : MEAN_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY,
    WHILE_CONCEPT : MEAN_WHILE_CONCEPT_IMPLEMENTED_KEY
}

STD_IMPLEMENTED_CONCEPT_MAPPING = {
    VAR_AFFECTATION_CONCEPT : STD_VAR_AFFECTATION_CONCEPT_IMPLEMENTED_KEY,
    BOOLEAN_CONCEPT : STD_BOOLEAN_CONCEPT_IMPLEMENTED_KEY,
    STRING_CONCEPT : STD_STRING_CONCEPT_IMPLEMENTED_KEY,
    IF_BRANCH_CONCEPT : STD_IF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    ELIF_BRANCH_CONCEPT : STD_ELIF_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    ELSE_BRANCH_CONCEPT : STD_ELSE_BRANCH_CONCEPT_IMPLEMENTED_KEY,
    FOR_SIMPLE_CONCEPT : STD_FOR_SIMPLE_CONCEPT_IMPLEMENTED_KEY,
    FOR_COUNTER_0_CONCEPT : STD_FOR_COUNTER_0_CONCEPT_IMPLEMENTED_KEY,
    FOR_COUNTER_N_CONCEPT : STD_FOR_COUNTER_N_CONCEPT_IMPLEMENTED_KEY,
    WHILE_CONCEPT : STD_WHILE_CONCEPT_IMPLEMENTED_KEY
}

# Game errors
TOTAL_GAME_ERROR_MAPPING = {

    OPEN_CHEST_LOCATION_GAME_ERROR : NB_GAME_ERROR_OPEN_CHEST_LOCATION_KEY ,
    OPEN_CHEST_KEY_GAME_ERROR  : NB_GAME_ERROR_OPEN_CHEST_KEY_KEY ,
    READ_MESSAGE_LOCATION_GAME_ERROR  : NB_GAME_ERROR_READ_MESSAGE_LOCATION_KEY ,
    WALK_LOCATION_GAME_ERROR  : NB_GAME_ERROR_WALK_LOCATION_KEY ,
    NOT_ALLOWED_FUNCTION_GAME_ERROR : NB_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY ,
    FUNCTION_PARAMETERS_GAME_ERROR  : NB_GAME_ERROR_FUNCTION_PARAMETERS_KEY ,

    SPIKES_TOUCH_LOST_LEVEL : NB_LEVEL_LOST_SPIKE_TOUCH_KEY ,
    BARREL_EXPLOSION_LOST_LEVEL : NB_LEVEL_LOST_BARREL_EXPLOSION_KEY ,
    PIRATE_SHOT_LOST_LEVEL : NB_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY ,
}

MEAN_GAME_ERROR_MAPPING = {
   
    OPEN_CHEST_LOCATION_GAME_ERROR : MEAN_GAME_ERROR_OPEN_CHEST_LOCATION_KEY ,
    OPEN_CHEST_KEY_GAME_ERROR  : MEAN_GAME_ERROR_OPEN_CHEST_KEY_KEY ,
    READ_MESSAGE_LOCATION_GAME_ERROR  : MEAN_GAME_ERROR_READ_MESSAGE_LOCATION_KEY ,
    WALK_LOCATION_GAME_ERROR  : MEAN_GAME_ERROR_WALK_LOCATION_KEY ,
    NOT_ALLOWED_FUNCTION_GAME_ERROR : MEAN_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY ,
    FUNCTION_PARAMETERS_GAME_ERROR  : MEAN_GAME_ERROR_FUNCTION_PARAMETERS_KEY ,

    SPIKES_TOUCH_LOST_LEVEL : MEAN_LEVEL_LOST_SPIKE_TOUCH_KEY ,
    BARREL_EXPLOSION_LOST_LEVEL : MEAN_LEVEL_LOST_BARREL_EXPLOSION_KEY ,
    PIRATE_SHOT_LOST_LEVEL : MEAN_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY ,
}

STD_GAME_ERROR_MAPPING = {
    
    OPEN_CHEST_LOCATION_GAME_ERROR : STD_GAME_ERROR_OPEN_CHEST_LOCATION_KEY ,
    OPEN_CHEST_KEY_GAME_ERROR  : STD_GAME_ERROR_OPEN_CHEST_KEY_KEY ,
    READ_MESSAGE_LOCATION_GAME_ERROR  : STD_GAME_ERROR_READ_MESSAGE_LOCATION_KEY ,
    WALK_LOCATION_GAME_ERROR  : STD_GAME_ERROR_WALK_LOCATION_KEY ,
    NOT_ALLOWED_FUNCTION_GAME_ERROR : STD_GAME_ERROR_NOT_ALLOWED_FUNCTION_KEY ,
    FUNCTION_PARAMETERS_GAME_ERROR  : STD_GAME_ERROR_FUNCTION_PARAMETERS_KEY ,

    SPIKES_TOUCH_LOST_LEVEL : STD_LEVEL_LOST_SPIKE_TOUCH_KEY ,
    BARREL_EXPLOSION_LOST_LEVEL : STD_LEVEL_LOST_BARREL_EXPLOSION_KEY ,
    PIRATE_SHOT_LOST_LEVEL : STD_LEVEL_LOST_OTHER_PIRATE_SHOT_KEY ,
}

# Execution
TOTAL_EXECUTION_MAPPING = {
    TOO_MANY_LINES_PROGRAM : NB_TOO_MANY_LINES_ERROR_KEY ,
    SYNTACTIC_ERROR_PROGRAM : NB_SYNTACTIC_ERROR_KEY ,
    SEMANTIC_ERROR_PROGRAM : NB_SEMANTIC_ERROR_KEY ,
    USER_STOPPED_PROGRAM : NB_USER_STOPPED_EXECUTION_KEY,
    FULLY_EXECUTED_PROGRAM : NB_COMPLETED_EXECUTION_KEY,
}

MEAN_EXECUTION_MAPPING = {
    TOO_MANY_LINES_PROGRAM : MEAN_TOO_MANY_LINES_ERROR_KEY ,
    SYNTACTIC_ERROR_PROGRAM : MEAN_SYNTACTIC_ERROR_KEY ,
    SEMANTIC_ERROR_PROGRAM : MEAN_SEMANTIC_ERROR_KEY ,
    USER_STOPPED_PROGRAM : MEAN_USER_STOPPED_EXECUTION_KEY,
    FULLY_EXECUTED_PROGRAM : MEAN_COMPLETED_EXECUTION_KEY,
}

STD_EXECUTION_MAPPING = {
    TOO_MANY_LINES_PROGRAM : STD_TOO_MANY_LINES_ERROR_KEY ,
    SYNTACTIC_ERROR_PROGRAM : STD_SYNTACTIC_ERROR_KEY ,
    SEMANTIC_ERROR_PROGRAM : STD_SEMANTIC_ERROR_KEY ,
    USER_STOPPED_PROGRAM : STD_USER_STOPPED_EXECUTION_KEY,
    FULLY_EXECUTED_PROGRAM : STD_COMPLETED_EXECUTION_KEY,
}

# Used control functions

CONTROL_FUNCTION_NAME_TO_ID_MAPPING = {
    WALK_NAME : WALK_CTR_FCT,
    LEFT_NAME : LEFT_CTR_FCT,
    RIGHT_NAME : RIGHT_CTR_FCT,
    OPEN_NAME : OPEN_CTR_FCT,
    JUMP_NAME : JUMP_CTR_FCT,
    JUMP_HEIGHT_NAME : JUMP_HEIGHT_CTR_FCT,
    JUMP_HIGH_NAME : JUMP_HIGH_CTR_FCT,
    GET_HEIGHT_NAME : GET_HEIGHT_CTR_FCT,
    READ_STRING_NAME : READ_STRING_CTR_FCT,
    READ_INT_NAME : READ_INT_CTR_FCT,
    ATTACK_NAME : ATTACK_CTR_FCT,
    DETECT_OBSTACLE_NAME : DETECT_OBSTACLE_CTR_FCT,
    TURN_NAME : TURN_CTR_FCT,
    SHOOT_NAME : SHOOT_CTR_FCT
}

TOTAL_USED_CONTROL_FUNCTION_MAPPING = {
    WALK_CTR_FCT : TOTAL_WALK_CTR_FUN_USED_KEY,
    LEFT_CTR_FCT : TOTAL_LEFT_CTR_FUN_USED_KEY,
    RIGHT_CTR_FCT : TOTAL_RIGHT_CTR_FUN_USED_KEY,
    OPEN_CTR_FCT : TOTAL_OPEN_CTR_FUN_USED_KEY,
    JUMP_CTR_FCT : TOTAL_JUMP_CTR_FUN_USED_KEY,
    JUMP_HEIGHT_CTR_FCT : TOTAL_JUMP_HEIGHT_CTR_FUN_USED_KEY,
    JUMP_HIGH_CTR_FCT : TOTAL_JUMP_HIGH_CTR_FUN_USED_KEY,
    GET_HEIGHT_CTR_FCT : TOTAL_GET_HEIGHT_CTR_FUN_USED_KEY,
    READ_STRING_CTR_FCT : TOTAL_READ_STRING_CTR_FUN_USED_KEY,
    READ_INT_CTR_FCT : TOTAL_READ_INT_CTR_FUN_USED_KEY,
    ATTACK_CTR_FCT : TOTAL_ATTACK_CTR_FUN_USED_KEY,
    DETECT_OBSTACLE_CTR_FCT : TOTAL_DETECT_OBSTACLE_CTR_FUN_USED_KEY,
    TURN_CTR_FCT : TOTAL_TURN_CTR_FUN_USED_KEY,
    SHOOT_CTR_FCT : TOTAL_SHOOT_CTR_FUN_USED_KEY,
}

MEAN_USED_CONTROL_FUNCTION_MAPPING = {
    WALK_CTR_FCT : MEAN_WALK_CTR_FUN_USED_KEY,
    LEFT_CTR_FCT : MEAN_LEFT_CTR_FUN_USED_KEY,
    RIGHT_CTR_FCT : MEAN_RIGHT_CTR_FUN_USED_KEY,
    OPEN_CTR_FCT : MEAN_OPEN_CTR_FUN_USED_KEY,
    JUMP_CTR_FCT : MEAN_JUMP_CTR_FUN_USED_KEY,
    JUMP_HEIGHT_CTR_FCT : MEAN_JUMP_HEIGHT_CTR_FUN_USED_KEY,
    JUMP_HIGH_CTR_FCT : MEAN_JUMP_HIGH_CTR_FUN_USED_KEY,
    GET_HEIGHT_CTR_FCT : MEAN_GET_HEIGHT_CTR_FUN_USED_KEY,
    READ_STRING_CTR_FCT : MEAN_READ_STRING_CTR_FUN_USED_KEY,
    READ_INT_CTR_FCT : MEAN_READ_INT_CTR_FUN_USED_KEY,
    ATTACK_CTR_FCT : MEAN_ATTACK_CTR_FUN_USED_KEY,
    DETECT_OBSTACLE_CTR_FCT : MEAN_DETECT_OBSTACLE_CTR_FUN_USED_KEY,
    TURN_CTR_FCT : MEAN_TURN_CTR_FUN_USED_KEY,
    SHOOT_CTR_FCT : MEAN_SHOOT_CTR_FUN_USED_KEY,
}

STD_USED_CONTROL_FUNCTION_MAPPING = {
    WALK_CTR_FCT : STD_WALK_CTR_FUN_USED_KEY,
    LEFT_CTR_FCT : STD_LEFT_CTR_FUN_USED_KEY,
    RIGHT_CTR_FCT : STD_RIGHT_CTR_FUN_USED_KEY,
    OPEN_CTR_FCT : STD_OPEN_CTR_FUN_USED_KEY,
    JUMP_CTR_FCT : STD_JUMP_CTR_FUN_USED_KEY,
    JUMP_HEIGHT_CTR_FCT : STD_JUMP_HEIGHT_CTR_FUN_USED_KEY,
    JUMP_HIGH_CTR_FCT : STD_JUMP_HIGH_CTR_FUN_USED_KEY,
    GET_HEIGHT_CTR_FCT : STD_GET_HEIGHT_CTR_FUN_USED_KEY,
    READ_STRING_CTR_FCT : STD_READ_STRING_CTR_FUN_USED_KEY,
    READ_INT_CTR_FCT : STD_READ_INT_CTR_FUN_USED_KEY,
    ATTACK_CTR_FCT : STD_ATTACK_CTR_FUN_USED_KEY,
    DETECT_OBSTACLE_CTR_FCT : STD_DETECT_OBSTACLE_CTR_FUN_USED_KEY,
    TURN_CTR_FCT : STD_TURN_CTR_FUN_USED_KEY,
    SHOOT_CTR_FCT : STD_SHOOT_CTR_FUN_USED_KEY,
}

# Received helps
RECEIVED_HELP_MAPPING = {
    GAME_HELP : CONTROL_HELP_RECEIVED_KEY,
    CONTROL_HELP : CONTROL_HELP_RECEIVED_KEY,
    NOTION_HELP : NOTION_HELP_RECEIVED_KEY,
    IMPLEMENTATION_HELP : IMPLEMENTATION_HELP_RECEIVED_KEY,
    SOLUTION_HELP : SOLUTION_HELP_RECEIVED_KEY,
}
# Help Label
HELP_LABEL_MAPPING = {
    GAME_HELP : CONTROL_HELP_LABEL_VALUE_KEY,
    CONTROL_HELP : CONTROL_HELP_LABEL_VALUE_KEY,
    NOTION_HELP : NOTION_HELP_LABEL_VALUE_KEY,
    IMPLEMENTATION_HELP : IMPLEMENTATION_HELP_LABEL_VALUE_KEY,
    SOLUTION_HELP : SOLUTION_HELP_LABEL_VALUE_KEY,
}

### 5.2 Compute feature values

Helper functions

In [45]:
# ##########
# Used for help type classification
# ##########
def exists_higher_help(state_dict,current_help_key):
    result = False
    current_help_grade = RECEIVED_HELP_KEYS.index(current_help_key)
    if current_help_grade < len(RECEIVED_HELP_KEYS):
        for help_grade in range(current_help_grade+1,len(RECEIVED_HELP_KEYS)):
            result = result or (state_dict[RECEIVED_HELP_KEYS[help_grade]] == 1)
    return result

def exists_equal_help(state_dict,current_help_key):
    result = False
    current_help_grade = RECEIVED_HELP_KEYS.index(current_help_key)
    return state_dict[RECEIVED_HELP_KEYS[current_help_grade]] == 1

# ##########
# Implemented concepts detection using AST
# ##########

class ConceptLister(ast.NodeVisitor):
    def __init__(self):
        self.concepts_list = []
    def __str__(self):
        res = "["
        for concept in self.concepts_list:
            res += concept+", "
        res += "]"
        return res
    # Each visit_* function is automatically called depend on node type during
    # tree traversal
    def visit_While(self, node):
        self.concepts_list.append(WHILE_CONCEPT)
        self.generic_visit(node)
    
    def visit_For(self, node):
        loop_var_name = node.target.id
        is_simple_for = not self.var_used(node,loop_var_name)
        if is_simple_for :
            self.concepts_list.append(FOR_SIMPLE_CONCEPT)
        else:
            for_iter = node.iter
            if isinstance(for_iter,ast.Call):
                if for_iter.func.id == "range":
                    nb_args = len(for_iter.args)
                    if nb_args == 1 :
                        self.concepts_list.append(FOR_COUNTER_0_CONCEPT)
                    elif nb_args >= 2:
                        if for_iter.args[0].value == 0:
                            self.concepts_list.append(FOR_COUNTER_0_CONCEPT)
                        else:
                            self.concepts_list.append(FOR_COUNTER_N_CONCEPT)
        self.generic_visit(node)
                
    def visit_Assign(self, node):
        self.concepts_list.append(VAR_AFFECTATION_CONCEPT)
        self.generic_visit(node)
    
    def visit_AugAssign(self, node):
        self.concepts_list.append(VAR_AFFECTATION_CONCEPT)
        self.generic_visit(node)
        
    def visit_If(self, node):
        is_parent_if = isinstance(node.parent,ast.If)
        if is_parent_if:
            self.concepts_list.append(ELIF_BRANCH_CONCEPT)
        else:
            self.concepts_list.append(IF_BRANCH_CONCEPT)
        has_else = len(node.orelse) != 0 and not isinstance(node.orelse[0],ast.If)
        if has_else:
            self.concepts_list.append(ELSE_BRANCH_CONCEPT)
        self.generic_visit(node)
    
    def visit_Constant(self, node):
        if isinstance(node.value,str):
            self.concepts_list.append(STRING_CONCEPT)
        elif isinstance(node.value,bool):
            self.concepts_list.append(BOOLEAN_CONCEPT)
        self.generic_visit(node)
    
    # Check if a variable (name) is used in a sub-tree (node)
    def var_used(self,node,name):
        result = False
        child_nodes = ast.iter_child_nodes(node)
        for child_node in child_nodes:
            child_result = False
            if isinstance(child_node,ast.Name) and isinstance(child_node.ctx,ast.Load):
                child_result = (child_node.id == name)
            else:
                child_result = self.var_used(child_node,name)
            result = result or child_result
        return result

#  Add parents nodes in current nodes (used to detect elif statements)
class Parentage(ast.NodeTransformer):
    # current parent (module)
    parent = None

    def visit(self, node):
        # set parent attribute for this node
        node.parent = self.parent
        # This node becomes the new parent
        self.parent = node
        # Do any work required by super class 
        node = super().visit(node)
        # If we have a valid node (ie. node not being removed)
        if isinstance(node, ast.AST):
            # update the parent, since this may have been transformed 
            # to a different node by super
            self.parent = node.parent
        return node
# ##########
# Used control functions detection using AST
# ##########

class ControlFunctionLister(ast.NodeVisitor):
    def __init__(self):
        self.control_functions_list = []
    def __str__(self):
        res = "["
        for control_function in self.control_functions_list:
            res += control_function+", "
        res += "]"
        return res
    # Each visit_* function is automatically called depend on node type during
    # tree traversal
    def visit_Call(self, node):
        function_name = node.func.id
        if function_name in CONTROL_FUNCTION_NAME_TO_ID_MAPPING.keys():
            self.control_functions_list.append(CONTROL_FUNCTION_NAME_TO_ID_MAPPING[function_name])
        self.generic_visit(node)
    
    

Iterate over all raw traces to compute features values

In [46]:
# Initialise the dict of levels restructured datasets
levels_processed_dataframes = {}
# Iterate on levels
for level_key, level_raw_dataframe in levels_raw_dataframes.items():
    print(f"------------------ {level_key} ------------------")
    # For statistics reason
    nb_other_helps = 0
    nb_stored_helps = 0
    nb_equal_helps = 0
    nb_higher_helps = 0
    # Initialise the list of level processed data
    level_processed_data = []
    # Iterate on students
    for student in XP_STUDENTS:
        # Filter student rows
        student_level_data = level_raw_dataframe[level_raw_dataframe[STUDENT_DATA_KEY]==student]
        # Order rows by date
        student_level_data = student_level_data.sort_values(by=DATE_DATA_KEY)
        # If the student had start the current level
        level_start_data = student_level_data[student_level_data[TYPE_DATA_KEY]==STARTED_TYPE]
        if(len(level_start_data) > 0):
            # Init state dictionary
            state_dict = dict()
            for key in ALL_KEYS :
                state_dict[key] = 0
            # Get level start date
            level_start_string_date = level_start_data.iloc[0][DATE_DATA_KEY]
            level_start_date = parser.parse(level_start_string_date)
            # Initialise date variables
            level_last_action_date = level_start_date
            level_spent_time = 0
            # Init max progression
            level_max_progression = 0
            # Init content display time dict of lists
            content_display_time_lists = {}
            for content_type in CONTENT_DISPLAY_MEAN_TIME_MAPPING.keys():
                content_display_time_lists[content_type] = []
            # Init implemented concepts dict of lists
            implemented_concepts_lists = {}
            for concept_type in DETECTED_CONCEPTS_LIST:
                implemented_concepts_lists[concept_type] = []
            # Init used control function dict of lists
            used_control_functions_lists = {}
            for control_function_type in DETECTED_CONTROL_FUNCTIONS_LIST:
                used_control_functions_lists[control_function_type] = []
            # Init game errors dict of lists
            game_error_lists = {}
            for game_error_type in MEAN_GAME_ERROR_MAPPING.keys():
                game_error_lists[game_error_type] = []
            # Init execution dict of lists
            execution_lists = {}
            for execution_type in MEAN_EXECUTION_MAPPING.keys():
                execution_lists[execution_type] = []
            
            # Iterate on actions
            for index, row in student_level_data.iterrows():
                # Initialise store flag
                store_state = False
                # Initialise current help type
                current_help_type = OTHER_HELP
                # viewed content
                if row[TYPE_DATA_KEY]==CONSULTED_TYPE :
                    viewed_sub_content = row[OBJECT_ID_DATA_KEY]
                    # Manage sub-content
                    sub_content_string_display_time = row[DURATION_DATA_KEY]
                    sub_content_display_time = pd.Timedelta(sub_content_string_display_time).total_seconds()*1000
                    
                    # Manage content
                    viewed_content = SUB_CONTENT_CLASSIFICATION[viewed_sub_content]
                    content_display_time = sub_content_display_time
                    # Save to content displayed list
                    content_display_time_lists[viewed_content].append(content_display_time)
                    # Get total, mean and std
                    content_total_display_time = np.sum(content_display_time_lists[viewed_content])
                    content_mean_display_time = np.mean(content_display_time_lists[viewed_content])
                    content_std_display_time = np.std(content_display_time_lists[viewed_content])
                    # Store total, mean and std
                    state_dict[CONTENT_DISPLAY_TOTAL_TIME_MAPPING[viewed_content]]=content_total_display_time
                    state_dict[CONTENT_DISPLAY_MEAN_TIME_MAPPING[viewed_content]]=content_mean_display_time
                    state_dict[CONTENT_DISPLAY_STD_TIME_MAPPING[viewed_content]]=content_std_display_time

                # Copied content    
                elif row[TYPE_DATA_KEY]== COPIED_TYPE :
                    copied_concept = row[OBJECT_ID_DATA_KEY]
                    state_dict[COPIED_CONTENT_MAPPING[copied_concept]]+=1
                # Pasted content    
                elif row[TYPE_DATA_KEY]== PASTED_TYPE :
                    state_dict[NB_PASTED_KEY]+=1                    
                elif row[TYPE_DATA_KEY]== LAUNCHED_TYPE :
                    # Reset flags
                    manage_code_content = False
                    manage_game_progression = False
                    manage_game_errors = False
                    manage_execution = False
                    
                    # Too many lines
                    if row[OBJECT_ID_DATA_KEY] == TOO_MANY_LINES_PROGRAM :
                        manage_execution = True
                        manage_game_errors = True
                    # Syntactic error
                    elif row[OBJECT_ID_DATA_KEY] == SYNTACTIC_ERROR_PROGRAM :
                        manage_execution = True
                        manage_game_errors = True
                    # Semantic error
                    elif row[OBJECT_ID_DATA_KEY] == SEMANTIC_ERROR_PROGRAM :
                        manage_execution = True
                        manage_game_errors = True
                    # Game error
                    elif row[OBJECT_ID_DATA_KEY] == GAME_ERROR_PROGRAM :
                        manage_execution = True
                        manage_game_errors = True
                    # Level lost
                    elif row[OBJECT_ID_DATA_KEY] == LEVEL_LOST_PROGRAM :
                        manage_execution = True
                        manage_game_errors = True
                        # manage_code_content = True
                    # Stopped execution
                    elif row[OBJECT_ID_DATA_KEY] == USER_STOPPED_PROGRAM :
                        manage_execution = True
                        manage_game_errors = True
                        # manage_code_content = True

                    # Fully executed
                    elif row[OBJECT_ID_DATA_KEY] == FULLY_EXECUTED_PROGRAM :
                        manage_code_content = True
                        manage_game_progression = True
                        manage_execution = True
                        manage_game_errors = True

                    #  Level completed
                    elif row[OBJECT_ID_DATA_KEY] == LEVEL_COMPLETED_PROGRAM :
                        # manage_code_content = True
                        # manage_game_progression = True
                        # manage_execution = True
                        # manage_game_errors = True
                        pass
                    # Game errors
                    if manage_game_errors:
                        # Update game errors counters
                        for game_error_type in game_error_lists.keys():
                            if row[OBJECT_ID_DATA_KEY] == GAME_ERROR_PROGRAM:
                                game_error_reason = row[GAME_ERROR_REASON_DATA_KEY] 
                                if game_error_reason == game_error_type:
                                    game_error_lists[game_error_type].append(1)
                                else: 
                                    game_error_lists[game_error_type].append(0)
                                    
                            if row[OBJECT_ID_DATA_KEY] == LEVEL_LOST_PROGRAM:
                                level_lost_reason = row[LOST_LEVEL_DATA_KEY]
                                if level_lost_reason == game_error_type:
                                    game_error_lists[game_error_type].append(1)
                                else:
                                    game_error_lists[game_error_type].append(0)
                            else:
                                game_error_lists[game_error_type].append(0)
                        # Store data
                        for game_error_type in game_error_lists.keys():
                            # Get total, mean and std
                            total_game_error = np.sum(game_error_lists[game_error_type])
                            mean_game_error = np.mean(game_error_lists[game_error_type])
                            std_game_error = np.std(game_error_lists[game_error_type])
                            # Store total, mean and std
                            state_dict[TOTAL_GAME_ERROR_MAPPING[game_error_type]]= total_game_error
                            state_dict[MEAN_GAME_ERROR_MAPPING[game_error_type]]=mean_game_error
                            state_dict[STD_GAME_ERROR_MAPPING[game_error_type]]=std_game_error
                    
                    if manage_execution:
                        # Update execution counters
                        for execution_type in execution_lists.keys():
                            if execution_type == row[OBJECT_ID_DATA_KEY]:
                                execution_lists[execution_type].append(1)
                            else :
                                execution_lists[execution_type].append(0)
                        
                        # Store data
                        for execution_type in execution_lists.keys():
                            # Get total, mean and std
                            total_execution = np.sum(execution_lists[execution_type])
                            mean_execution = np.mean(execution_lists[execution_type])
                            std_execution = np.std(execution_lists[execution_type])
                            # print("execution_lists: ", execution_lists[execution_type])
                            # print(f"total_execution : {total_execution}, mean_execution : {mean_execution},  std_execution : {std_execution}")
                            # Store total, mean and std
                            state_dict[TOTAL_EXECUTION_MAPPING[execution_type]] = total_execution
                            state_dict[MEAN_EXECUTION_MAPPING[execution_type]] = mean_execution
                            state_dict[STD_EXECUTION_MAPPING[execution_type]] = std_execution
                        

                    # Code content
                    if manage_code_content :
                        # Update implemented concepts for the current execution
                        try:
                            program = row[CODE_DATA_KEY]
                            #  Transform tab in 4 spaces (like its done in
                            #  Pyrates before run code)
                            program = re.sub("\t", "    ", str(program))
                            # Parse code in AST
                            programTree = ast.parse(program)
                            # print(ast.dump(programTree))
                            # Get concepts from AST
                            Parentage().visit(programTree)
                            concept_visitor = ConceptLister()
                            concept_visitor.visit(programTree)
                            program_concepts_list = concept_visitor.concepts_list
                            # Get control functions from AST
                            control_function_visitor = ControlFunctionLister()
                            control_function_visitor.visit(programTree)
                            program_control_functions_list = control_function_visitor.control_functions_list
                            
                            # Count used concepts
                            unique, counts = np.unique(program_concepts_list, return_counts=True)
                            concepts_counts = dict(zip(unique, counts))
                            # Update level concepts counters
                            for concept_type in implemented_concepts_lists.keys():
                                if concept_type in concepts_counts.keys():
                                    implemented_concepts_lists[concept_type].append(concepts_counts[concept_type])
                                else:
                                    implemented_concepts_lists[concept_type].append(0)
                            # Store data
                            for concept_type in implemented_concepts_lists.keys():
                                # Get total, mean and std
                                total_concept_implemented = np.sum(implemented_concepts_lists[concept_type])
                                mean_concept_implemented = np.mean(implemented_concepts_lists[concept_type])
                                std_concept_implemented = np.std(implemented_concepts_lists[concept_type])
                                # Store total, mean and std
                                state_dict[TOTAL_IMPLEMENTED_CONCEPT_MAPPING[concept_type]]= total_concept_implemented
                                state_dict[MEAN_IMPLEMENTED_CONCEPT_MAPPING[concept_type]]=mean_concept_implemented
                                state_dict[STD_IMPLEMENTED_CONCEPT_MAPPING[concept_type]]=std_concept_implemented

                            # Update used control functions for the current
                            # execution
                            
                            # Count used control functions
                            unique, counts = np.unique(program_control_functions_list, return_counts=True)
                            control_functions_counts = dict(zip(unique, counts))
                            # Update level control functions counters
                            for control_function in used_control_functions_lists.keys():
                                if control_function in control_functions_counts.keys():
                                    used_control_functions_lists[control_function].append(control_functions_counts[control_function])
                                else:
                                    used_control_functions_lists[control_function].append(0)
                            # Store data
                            for control_function in used_control_functions_lists.keys():
                                # Get total, mean and std
                                total_control_functions_used = np.sum(used_control_functions_lists[control_function])
                                mean_control_functions_used = np.mean(used_control_functions_lists[control_function])
                                std_control_functions_used = np.std(used_control_functions_lists[control_function])
                                # Store total, mean and std
                                state_dict[TOTAL_USED_CONTROL_FUNCTION_MAPPING[control_function]]= total_control_functions_used
                                state_dict[MEAN_USED_CONTROL_FUNCTION_MAPPING[control_function]]=mean_control_functions_used
                                state_dict[STD_USED_CONTROL_FUNCTION_MAPPING[control_function]]=std_control_functions_used

                        except Exception as error:
                            raw_id = row[ID_DATA_KEY]
                            print("Program AST error : ",error)
                            print("RawId: ",raw_id)
                            print(program) 

                    # Game progression
                    if manage_game_progression:
                        current_game_progression = row[GAME_PROGRESSION_DATA_KEY]
                        if current_game_progression > level_max_progression : 
                            level_max_progression = current_game_progression

                # Execution speed changed    
                elif row[TYPE_DATA_KEY] == CHANGED_TYPE:
                    state_dict[NB_EXECUTION_SPEED_CHANGED_KEY]+=1

                # Received help (labels)
                elif row[TYPE_DATA_KEY]== RECEIVED_TYPE:
                    current_help_type = row[OBJECT_ID_DATA_KEY]
                    if current_help_type == OTHER_HELP : 
                        nb_other_helps +=1
                    else :
                        # Check if a higher or equal help grade have been
                        # already given in the level
                        if(exists_higher_help(state_dict,RECEIVED_HELP_MAPPING[current_help_type])):
                            nb_higher_helps +=1
                        elif(exists_equal_help(state_dict,RECEIVED_HELP_MAPPING[current_help_type])):
                            nb_equal_helps +=1
                        else :
                            state_dict[REQUESTED_HELP_TOTAL] += 1
                            state_dict[HELP_TYPE_KEY]=HELP_LABEL_MAPPING[current_help_type]
                            nb_stored_helps +=1
                            store_state = True
                             
                # Level resumed or restarted    
                elif row[TYPE_DATA_KEY]== RESUMED_TYPE or row[TYPE_DATA_KEY]== RESTARTED_TYPE:
                    resumed_string_date = row[DATE_DATA_KEY]
                    resumed_date = parser.parse(resumed_string_date)
                    level_last_action_date = resumed_date
                if store_state :
                    # Time spent calculation
                    row_string_date = row[DATE_DATA_KEY]
                    row_date = parser.parse(row_string_date)
                    current_spent_time = (row_date - level_last_action_date).total_seconds() * 1000
                    level_spent_time += current_spent_time
                    level_last_action_date = row_date
                    state_dict[LEVEL_TIME_SPENT_KEY] = level_spent_time
                    # Set max game progression
                    state_dict[MAX_GAME_PROGRESSION_KEY] = level_max_progression
                    
                    #Set rate features
                    for rate_feature in ALL_RATE_FEATURES:
                        state_dict[rate_feature] = state_dict[rate_feature.replace("_rate_", "_tot_", 1)] / float(level_spent_time)
                    
                    # Set group identifier
                    state_dict[STUDENT_ID_KEY] = row[STUDENT_DATA_KEY]
                    # copy current state
                    level_processed_data.append(state_dict.copy())
                    # Set current help type
                    state_dict[RECEIVED_HELP_MAPPING[current_help_type]]=1
    # Create the level restructured dataframe
    level_dataframe =  pd.DataFrame(level_processed_data, columns=ALL_KEYS)
    levels_processed_dataframes[level_key] = level_dataframe
    # Export to Excel file for manual checking
    level_dataframe.to_excel("data/df_"+level_key+".xlsx")
    
    # For statistic reasons
    nb_relevant_helps = nb_stored_helps + nb_equal_helps + nb_higher_helps
    nb_total_helps = nb_relevant_helps + nb_other_helps
    print(f"Number of other type helps:  {nb_other_helps} / ratio: {nb_other_helps/nb_total_helps*100}")
    print(f"Number of stored helps:  {nb_stored_helps} / ratio: {nb_stored_helps/nb_relevant_helps*100}")
    print(f"Number of equal helps:  {nb_equal_helps} / ratio: {nb_equal_helps/nb_relevant_helps*100}")
    print(f"Number of higher helps:  {nb_higher_helps} / ratio: {nb_higher_helps/nb_relevant_helps*100}")


------------------ Level1 ------------------


Number of other type helps:  18 / ratio: 8.144796380090497
Number of stored helps:  176 / ratio: 86.69950738916256
Number of equal helps:  18 / ratio: 8.866995073891626
Number of higher helps:  9 / ratio: 4.433497536945813
------------------ Level2 ------------------
Number of other type helps:  9 / ratio: 19.565217391304348
Number of stored helps:  35 / ratio: 94.5945945945946
Number of equal helps:  1 / ratio: 2.7027027027027026
Number of higher helps:  1 / ratio: 2.7027027027027026
------------------ Level3 ------------------
Number of other type helps:  56 / ratio: 10.546139359698682
Number of stored helps:  408 / ratio: 85.89473684210527
Number of equal helps:  55 / ratio: 11.578947368421053
Number of higher helps:  12 / ratio: 2.526315789473684
------------------ Level4 ------------------
Number of other type helps:  47 / ratio: 10.398230088495575
Number of stored helps:  338 / ratio: 83.4567901234568
Number of equal helps:  59 / ratio: 14.5679012345679
Number of higher helps:  8

In [47]:
#check
levels_processed_dataframes["Level1"].describe()

Unnamed: 0,level_time_spent,CO_tot_base_disp_time,CO_tot_var_disp_time,CO_tot_condi_disp_time,CO_tot_for_disp_time,CO_tot_while_disp_time,CO_avg_base_disp_time,CO_avg_var_disp_time,CO_avg_condi_disp_time,CO_avg_for_disp_time,...,EX_tot_speed_changed,EX_rate_speed_changed,CO_tot_help_copied,CO_rate_help_copied,FE_tot_control_received,FE_tot_notion_received,FE_tot_implementation_received,FE_tot_solution_received,FE_tot_requested,help_type
count,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,...,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0
mean,767101.2,57486.522727,53287.477273,23497.079545,70819.3125,3541.125,8703.864694,4920.51628,3785.768467,7774.630339,...,1.5,2.291985e-06,0.034091,2.77462e-08,0.136364,0.215909,0.039773,0.0,1.392045,2.301136
std,302141.2,95549.125172,98828.196163,76729.446591,99215.319791,12986.301594,15154.821421,6524.377346,9661.403871,9205.178195,...,1.465801,3.139976e-06,0.25962,2.085836e-07,0.344153,0.412625,0.195982,0.0,0.641199,0.774743
min,37466.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0
25%,556179.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,7.51128e-07,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0
50%,781614.5,0.0,0.0,0.0,9868.0,0.0,0.0,0.0,0.0,4994.666667,...,1.0,1.687212e-06,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0
75%,957715.5,75549.0,58300.0,6684.5,133935.25,0.0,12551.458333,8444.395833,3521.625,13600.75,...,2.0,2.96679e-06,0.0,0.0,0.0,0.0,0.0,0.0,2.0,3.0
max,1794794.0,385015.0,518469.0,748198.0,542296.0,81932.0,95574.25,29409.75,70987.0,37300.0,...,11.0,2.669087e-05,3.0,2.334276e-06,1.0,1.0,1.0,0.0,4.0,4.0


# Pickle dumps

In [48]:
with open('pickle/FEATURES_CONSTANTS', 'wb') as f:
    pickle.dump(ALL_KEYS, f)

with open('pickle/FEATURES', 'wb') as f:
    pickle.dump(levels_processed_dataframes, f)

with open('pickle/LABELS_KEY', 'wb') as f:
    pickle.dump(LABELS_KEY, f)
    
with open('pickle/RECEIVED_HELP_KEYS', 'wb') as f:
    pickle.dump(RECEIVED_HELP_KEYS, f)
    
with open('pickle/FEATURES_SETS_KEY', 'wb') as f:
    pickle.dump(FEATURES_SETS, f)
    
with open('pickle/LEVELS_KEYS', 'wb') as f:
    pickle.dump(LEVELS_KEYS, f)   
    
with open('pickle/TIME_FEATURES_KEYS_TOTAL', 'wb') as f:
    pickle.dump(TIME_FEATURES_KEYS_TOTAL, f)      