## Analysis of xAPI statements

In this notebook, we analyze the statements collected during the evaluation of the app in a school environment. We perform an exploratory data analysis

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
from pathlib import Path
import json

Let's define here the file(s) we are going to use. Each trial store the statements in a specific file 

In [2]:
input_file = Path('statements_prueba_salesianos.csv')

In [3]:
# We don't need the "_id" column
df = pd.read_csv(input_file,
                 usecols = ["timestamp", "actor", "verb", "object", "result", "stored"])

df['timestamp'] = pd.to_datetime(df['timestamp'])

df = df.sort_values(by=['timestamp'], ascending=True)

Let's define now some helper functions that will help us clean our dataframe by keeping only the relevant information. 

In [4]:
def clean_entry(val, column):
    tmp = json.loads(val)
    if column == 'actor':
        return tmp['name']
    elif column == 'verb':
        return tmp['display']['en-US']
    elif column == 'object':   
        return tmp['definition']['name']['en-US']
    else:
        pass # just ignore it
    
def filter_dates(df, start_date, end_date=""):
    df = df[df['timestamp'] >= start_date]
    #print(len(df))
    if end_date != "":
        df = df[df['timestamp'] <= end_date]
        #print(len(df))
    return df

In [5]:
df = filter_dates(df, '2023-03-10T10:00:00.000Z', '2023-03-10T18:00:00.000Z') # evaluation was on March 10

In [6]:
# Save the filtered dataset as csv
df.to_csv('Salesianos_filtered.csv')

In [7]:
for nm in ['actor', 'verb', 'object']:
    df[nm] = df[nm].map(lambda x: clean_entry(x, nm))

In [8]:
# Save the cleaned dataset as csv
#df.to_csv('Salesianos_filtered_cleaned.csv')

In [9]:
df.head(10)

Unnamed: 0,timestamp,stored,actor,verb,object,result
1399,2023-03-10 11:41:29.439000+00:00,2023-03-10T11:41:29.439Z,Nuria,Logged In,Salesianos,
1398,2023-03-10 11:41:41.906000+00:00,2023-03-10T11:41:41.906Z,Eider,Logged In,Salesianos,
1397,2023-03-10 11:41:42.372000+00:00,2023-03-10T11:41:42.372Z,Janire,Logged In,Salesianos,
1396,2023-03-10 11:42:19.063000+00:00,2023-03-10T11:42:19.063Z,Lucia,Logged In,Salesianos,
1395,2023-03-10 11:42:29.061000+00:00,2023-03-10T11:42:29.061Z,unai,Logged In,Salesianos,
1394,2023-03-10 11:45:09.638000+00:00,2023-03-10T11:45:09.638Z,Teacher,Logged In,Salesianos,
1393,2023-03-10 11:52:00.020000+00:00,2023-03-10T11:52:00.020Z,PC006,Logged In,Salesianos,
1392,2023-03-10 11:52:04.063000+00:00,2023-03-10T11:52:04.063Z,PC008,Logged In,Salesianos,
1391,2023-03-10 11:52:05.177000+00:00,2023-03-10T11:52:05.177Z,Tablet1,Logged In,Salesianos,"{""score"":{""raw"":0}}"
1390,2023-03-10 11:52:05.679000+00:00,2023-03-10T11:52:05.679Z,PC004,Logged In,Salesianos,


In [10]:
df['object'].unique()

array(['Salesianos', 'Earth', 'Left', 'Right', '5/0', 'test', '0',
       'iPad2', 'europe', 'africa', 'iPhone_1',
       '(0.37046387385144497,_0.3210237599378206, 0.09849016117744623), (39.94, 14.89)',
       'up',
       '(0.3478570580482483,_0.34704893827438354, 0.09247999638319016), (39.94, 14.89) 19',
       '{"type":"Answer","sender":"#u_s_iPhone_1","name":"Earth","screenshot":"","px":0.3478570580482483,"py":0.34704893827438354,"pz":0.09247999638319016,"sx":0.0020000000949949026,"sy":0.0020000000949949026,"sz":0.0020000000949949026,"rx":0.4296983480453491,"ry":0.339872807264328,"rz":0.8172593116760254,"time_left":19.152952194213867}',
       '7.72;iPhone_1', '5/1', '1', 'Android2',
       '(0.38506929627142844,_0.30822955487654385, -0.08195193332199552), (38.06, -12.01)',
       '(0.3590110097241027,_0.3479541735942822, 0.006261896238456117), (44.10, 1.00)',
       '(0.3337985873222351,_0.3722161054611206, 0.005822139326483011), (44.10, 1.00) 14',
       '8.15;Android2',
       

In [11]:
len(df)

1400