# Read xAPI statements stored in a csv file

> The methods in this notebook implement the functionalities for reading a collection of xAPI statements stored in a ```csv``` file
The standard way to export statements from a Learning Locker instance is in the form of ```csv``` files, which 
Here we show how to import the data and parse file, as well as helper functions to process the information and create useful plots to perform exploratory data analysis

In [None]:
#| default_exp input_csv

In [None]:
#| hide
from nbdev.showdoc import *

The libraries used to import the data:

In [None]:
import pandas as pd
import numpy as np
from typing import Set, List
from datetime import datetime
from fastcore.test import *

As an example, in this package we provide a ```csv``` file containing around 1000 xAPI statements.

In [None]:
csv_file = '../example_statements.csv'

#### Load statements from file
Let's start by reading the csv file

In [None]:
statements = pd.read_csv(csv_file, index_col=0, delimiter=',').reset_index(drop=True)
statements.head()

Unnamed: 0,timestamp,stored,actor,verb,object,result
0,2023-03-10 11:45:09.638000+00:00,2023-03-10T11:45:09.638Z,Teacher,Logged In,Salesianos,
1,2023-03-10 11:52:00.020000+00:00,2023-03-10T11:52:00.020Z,PC006,Logged In,Salesianos,
2,2023-03-10 11:52:04.063000+00:00,2023-03-10T11:52:04.063Z,PC008,Logged In,Salesianos,
3,2023-03-10 11:52:05.177000+00:00,2023-03-10T11:52:05.177Z,Tablet1,Logged In,Salesianos,"{""score"":{""raw"":0}}"
4,2023-03-10 11:52:05.679000+00:00,2023-03-10T11:52:05.679Z,PC004,Logged In,Salesianos,


The three most important columns are **actor**, **verb** and **object**, which create a sentence-like structure. We can see the actions that the app registers from the verb column.

In [None]:
#| export
def get_all_verbs(df: pd.DataFrame # The dataset containing the xAPI statements (one statement per row)
                 ) -> Set: # Set containing all the verbs occurring in the dataset
    """
    Returns a set with all verbs in the dataset
    """
    return set(df["verb"].unique())

In [None]:
test_verbs = {'Logged In', 'Placed', 'Swiped', 'Asked', 'Started', 'Logged Out',
       'Accepted', 'Set Turn', 'Suggested', 'Ran Out', 'Sent', 'Checked',
       'Assigned', 'Canceled', 'Ended'}
test_eq(get_all_verbs(statements), test_verbs)

We provide similar functions for **actors** and **objects**

In [None]:
#| export
def get_all_actors(df: pd.DataFrame # The dataset containing the xAPI statements (one statement per row)
                 ) -> Set: # Set containing all the actors occurring in the dataset
    """
    Returns a set with all actors in the dataset
    """
    return set(df["actor"].unique())

In [None]:
test_actors = {'Teacher', 'PC006', 'PC008', 'Tablet1', 'PC004', 'PC009', 'PC007', 'PC003', 'Iphone 1',
       'PC005', 'iPad2', 'Tablet 2', 'Android1', 'Android2', 'iPad1', 'PC002', 'Android4', 'Android3',
       'iphone 1', 'iPhone 1', 'Ipad1', 'Tablet1 ', 'Ipad2'}
test_eq(get_all_actors(statements), test_actors)

In [None]:
#| export
def get_all_objects(df: pd.DataFrame # The dataset containing the xAPI statements (one statement per row)
                 ) -> Set: # Set containing all the objects occurring in the dataset
    """
    Returns a set with all objects in the dataset
    """
    return set(df["object"].unique())

The list of unique objects is quite big, so we will not print it in this example.

As the **actor** values are usually associated to a user input (for example the username provided when starting the app), it makes sense to clean the values as to avoid that *User1*, *user1* and *user 1* are trated as the same user. The following functions allow to do just that, on the desired columns.

In [None]:
#| export
def remove_whitespaces(df: pd.DataFrame, # The dataset containing the xAPI statements (one statement per row)
                       cols: List # the columns on which whitespaces should be removed
                      ) -> pd.DataFrame: # The dataframe after applying the function
    """
    Removes whitespaces from the specified columns in the dataframe.
    """
    df[cols] = df[cols].apply(lambda s : s.str.replace(" ", ""))
    return df

In [None]:
#| export
def to_lowercase(df: pd.DataFrame, # The dataset containing the xAPI statements (one statement per row)
                       cols: List # the columns whose content should be made lowercase
                      ) -> pd.DataFrame: # The dataframe after applying the function
    """
    Converts to lowercase the elements in the specified columns.
    The function only applies to columnns whose type is *str*
    """
    df[cols] = df[cols].applymap(lambda s: s.lower() if type(s) == str else s)
    return df

In [None]:
test_actors = {'teacher', 'pc006', 'pc008', 'tablet1', 'pc004', 'pc009', 'pc007', 'pc003', 'iphone1',
               'pc005', 'ipad2', 'tablet2', 'android1', 'android2', 'ipad1', 'pc002', 'android4', 'android3'}
df = remove_whitespaces(statements, ["actor"])
df2 = to_lowercase(df, ["actor"])
test_eq(get_all_actors(df2), test_actors)

We may also be interested in removing specific rows from the dataset, for examples the ones associated to an **actor** that opted out of the intervention, or for **verbs** we do not care about. This could be the case for example for verbs like *Log In* or *Log out*, which provides information about when a user starts and stops the app, but may be not relevant in case our analysis is only about the interactions from within the app.

In [None]:
#| export
def remove_actors(df: pd.DataFrame, # The dataset containing the xAPI statements (one statement per row)
                       cols: List # the list of actors to remove
                      ) -> pd.DataFrame: # The dataframe with the specified actors removed
    """
    Removes from the dataframe all the rows whose actor is in the specified list
    """
    return df[~df['actor'].isin(cols)]

In [None]:
statements = pd.read_csv(csv_file, index_col=0, delimiter=',').reset_index(drop=True)
test_actors = {'Teacher', 'PC006', 'PC008', 'Tablet1', 'PC004', 'PC009', 'PC007', 'PC003', 'Iphone 1',
       'PC005', 'iPad2', 'Tablet 2', 'Android1', 'Android2'}
test_df = remove_actors(statements, ['iPad1', 'PC002', 'Android4', 'Android3',
       'iphone 1', 'iPhone 1', 'Ipad1', 'Tablet1 ', 'Ipad2'])
test_eq(get_all_actors(test_df), test_actors)

In [None]:
#| export
def remove_verbs(df: pd.DataFrame, # The dataset containing the xAPI statements (one statement per row)
                       cols: List # the list of verbs to remove
                      ) -> pd.DataFrame: # The dataframe with the specified verbs removed
    """
    Removes from the dataframe all the rows whose actor is in the specified list
    """
    return df[~df['verb'].isin(cols)]

In [None]:
test_verbs = {'Placed', 'Swiped', 'Asked', 'Started', 'Accepted', 'Set Turn', 'Suggested', 'Ran Out',
              'Sent', 'Checked', 'Assigned', 'Canceled', 'Ended'}
test_df = remove_verbs(statements, ["Logged In", "Logged Out"])
test_eq(get_all_verbs(test_df), test_verbs)

#### Statement analysys
Here we present some functions that are typically applied when analysing xAPI statements data. For this, we will use a clean version of the statements dataset, where some of the functions described above has been applied

In [None]:
statements = remove_whitespaces(statements, ["actor"])
statements = to_lowercase(df, ["actor"])
statements = remove_verbs(statements, ["Logged In", "Logged Out"])
statements = remove_actors(statements, ["android3"])
statements.head(5)

Unnamed: 0,timestamp,stored,actor,verb,object,result
14,2023-03-10 11:52:18.277000+00:00,2023-03-10T11:52:18.277Z,iphone1,Placed,Earth,"{""score"":{""raw"":0}}"
15,2023-03-10 11:52:18.847000+00:00,2023-03-10T11:52:18.847Z,iphone1,Swiped,Left,"{""score"":{""raw"":0}}"
18,2023-03-10 11:52:29.001000+00:00,2023-03-10T11:52:29.001Z,iphone1,Placed,Earth,"{""score"":{""raw"":0}}"
19,2023-03-10 11:52:29.094000+00:00,2023-03-10T11:52:29.094Z,android2,Placed,Earth,"{""score"":{""raw"":0}}"
20,2023-03-10 11:52:29.194000+00:00,2023-03-10T11:52:29.194Z,iphone1,Swiped,Right,"{""score"":{""raw"":0}}"


In [None]:
#| hide
import nbdev; nbdev.nbdev_export()