# Indicateurs des données collectées sur Quick-Pi

Les indicateurs à implémenter

## Stratégie

Tous les indicateurs sont collectés entre chaque prompts sans tenir en compte les évènements des précèdents.

| Indicateur | Description | Apport |
| --- | --- | --- |
| **Évolution version exercices (evol_version)** | Moyenne des versions exercices ouverts et essayés | Comportement linéaire ou maximisation des points |
| **Évolution version exercices validés (evol_version_val)** | Moyenne des versions des exercices réussis | Réussite de la stratégie par rapport à **Évolution difficulté exercices (evol_diff)** |
| **Taux d'abandon (abandonment_rate)** | (exercices - exercices validés) / exercices | Capacité à se réorienter sur d'autres exercices, persistance dans la difficulté |
| **Taux d'abandon dans le temps (time_abandonment_rate)** | (temps passés sur exercices - Temps passés sur exercices validés) / temps passés sur exercices | Capacité à se réorienter sur d'autres exercices, persistance dans la difficulté |
| **Taux de tests par exercice (first_test_time_rate)** | Moyenne du temps du premier test | temps de lecture de l'enoncé + réflexion |
| **Taux de re-tests par exercice (retest_time_rate)** | Moyenne de temps entre chaque test par exercice | Stratégie de tests compulsifs, Réflexion entre chaque tests |
| **Nombre de tests par exercices (test_nb)** | Moyenne de tests par exercice | Stratégie de tests compulsifs, Réflexion entre chaque tests |
| **Nombre de tests par exercices+version (test_nb_version_n) avec n=[2,3,4]** | Moyenne de tests par exercice et difficulté | Stratégie de tests compulsifs, Réflexion entre chaque tests |
| **Nombre de navigation par module (nav_nb_module)** | Nombre de navigation entre les modules quick-pi | Exploration de l'environnement, utilisateur perdu, comportement stable ou changeant |

## Performances / Comportements

| Indicateur | Description | Apport |
| --- | --- | --- |
| **Temps par version de sujet (time_version_n)** | Moyenne du temps passés par l'élève sur des exercices d'une certaine difficulté | Performance face à la difficulté, capacité à résoudre plus ou moins vite un problème |
| **Temps cumulés par zone (cumul_time_zone)** | Temps passés dans une zone | Utilisation des ressources à disposition (aide, énoncé, pas à pas ...) |
| **Temps par zone (zone_time)** | Moyenne du Temps passés dans une zone | Si l'utilisateur ne fait que survoler ou utilise réeellement la zone |
| **Évolution du score (evol_score)** | Score total ajouté à chaque prompt | Évolution et progession de l'utilisateur |
| **Score rapporté au temps (time_score_n)** | Score moyen rapporté sur un temps donné (exemple : x / 1minute) | Performance et constance de l'utilisateur |
| **Nombre cumulé d'exercices validés par version (nb_version_val)** | Fraction du nombre d'exercices validés sur le nombre total d'exercice de version fixée | Progression et difficulté des exercices résolus |
| **Utilisation du pas à pas par exercice (pasapas_usage)** | Moyenne d'utilisation du pas à pas par exercice | Utilisation des outils à disposition |
| **Taille de modification par exercice (modif_size)** | Moyenne de taille des modifications par exercice | Proche de la solution ou éloigné |
| **Taux d'erreurs (error_rate)** | Moyenne des validation fausses / toutes validation | Capacité à vite trouver la solution, se poser pour réfléchir |
| **Taux d'erreurs par version (error_rate_version)** | Moyenne des validation fausses / toutes validation par version | Capacité à vite trouver la solution, se poser pour réfléchir face à la difficulté |

## SRL

Reprendre les questions SRL telles quelles.

# Process

## Imports

### Data science

In [None]:
# visualization
import matplotlib.pyplot as plt
import seaborn
# data handling
import numpy as np
import pandas as pd

### Python basics

In [None]:
# Database
#!sudo apt-get install python3-dev default-libmysqlclient-dev
!pip install mysql-connector-python
from mysql.connector import connect
# error
import traceback
from collections import defaultdict
from datetime import time, timedelta, datetime



## Config

In [None]:
_db_config = {
    'user': 'cajuge',
    'password': 'Kz1773qMWIVhRZUZ',
    'host': 'franceioi.cinniket56wn.eu-central-1.rds.amazonaws.com',
    'database': 'srl',
    'port':'3306'
}

In [None]:
_tables = ["clavier", "focus", "modification",
          "navigation", "pas_a_pas",
          "souris","srl_final_prompt",
          "srl_initial_prompt","srl_prompt","validation"]

## Classes

### Data from DB

#### Participant

In [None]:
class Participant:
    def __init__(self, connection, id_participation, timestamp_server, timestamp, tables):
        self.id_participation = id_participation
        self.connections = [connection]
        self.timestamps_server = [timestamp_server]
        self.timestamps = [timestamp]
        self.tables = {i:[] for i in tables}
        self.tables_index = {}
        
    # ------------------------------------------------------------------------------------------------------------------- #
    # ------------------------------------------ AUXILIARIES FUNCTION --------------------------------------------------- #
    # ------------------------------------------------------------------------------------------------------------------- #
    def set_indexes(self, indexes):
        self.tables_index = indexes
    def find_index(self, table, column):
        for i, col in enumerate(self.tables_index[table]):
            if col == column: return i
        return -1
    
    # returns the last event regarding a connection
    def get_last_event(self, connection):
        tables = ["clavier", "focus", "modification", "navigation", "pas_a_pas", "validation"]
        last_events = []
        for table in tables:
            for i in range(self.tables[table].shape[0]-1, -1, -1):
                if self.tables[table][i, self.find_index(table, "id_connexion")] == connection:
                    for j in self.tables[table][i, :]:
                        if isinstance(j, datetime):
                            last_events.append(j)
                    break
        return(max(last_events))

    # returns the time spent on an exercise + version 
    def get_time_spent_sujet_version(self, sujet, version, connection):
        table = "navigation"
        table_content = self.tables[table]
        time = timedelta(0)
        for i in range(table_content.shape[0]):
            if table_content[i, self.find_index(table, "id_sujet")] == sujet and table_content[i, self.find_index(table, "version")] == version and table_content[i, self.find_index(table, "id_connexion")] == connection:
                temp_time = table_content[i, self.find_index(table, "timestamp")]
                for j in range(i+1, table_content.shape[0]):
                    if not(table_content[i, self.find_index(table, "id_sujet")] == sujet and table_content[i, self.find_index(table, "version")] == version) and table_content[i, self.find_index(table, "id_connexion")] == connection:
                        temp_time = table_content[j, self.find_index(table, "timestamp")] - temp_time
                        time += temp_time
                        break
                else:
                    temp_time = self.get_last_event(connection) - temp_time
                    time += temp_time
        return time

    # get trace by order of apparition
    def get_ordered_trace(self, timestamp_indexes):
        traces = []
        indices = {i:0 for i in self.tables.keys() if self.tables[i].shape[0] != 0}
        while(indices != {}):
            # get each current row of tables
            rows = []
            for table in indices.keys():
                rows.append(self.tables[table][indices[table], timestamp_indexes[table]])
            rows = np.array(rows)
            # find the oldest
            if rows.shape[0] != 0:
                min_index = np.argmin(rows)
                table_index = list(indices.keys())[min_index]
                traces.append(self.tables[table_index][indices[table_index]])
                
                # handle increment
                indices[table_index] = indices[table_index] + 1
                if indices[table_index] >= self.tables[table_index].shape[0]:
                    del indices[table_index]
            else:
                return traces
        return traces

    def __repr__(self):
        return self.__str__()
    def __str__(self):
        s = "# -------------------------------------------------- #"
        s += " > id : {}\n".format(self.id_participation)
        s += "    > connections : {}\n".format(self.connections)
        s += "    > timestamps : {}\n".format(self.timestamps)
        s += "    > timestamps_server : {}\n".format(self.timestamps_server)
        for table in self.tables.keys():
            s+= "    > {} : {}\n".format(table, self.tables[table].shape)
        s += "# -------------------------------------------------- #\n"
        return s
    # ------------------------------------------------------------------------------------------------------------------- #
    # ----------------------------------------------- INDICATORS -------------------------------------------------------- #
    # ------------------------------------------------------------------------------------------------------------------- #

    # -------------------------------- evol_version ----------------------------------------- # 
    def evol_version(self, date):
        table = "validation"
        table_content = self.tables[table]
        evol_version_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                rows = [i for i in range(table_content.shape[0]) if table_content[i, self.find_index(table, "id_connexion")] == connection]
                evol_version_indicators[str(connection)] = np.mean(table_content[rows, self.find_index(table, "version")]) if len(rows) > 0 else 0.
        return evol_version_indicators
    
    def evol_version_columns(self):
        return ['evol_version']
    def evol_version_columns_type(self):
        return [np.float]
    
    # -------------------------------- evol_version_val ----------------------------------------- #   
    def evol_version_val(self, date):
        table = "validation"
        table_content = self.tables[table]
        evol_version_val_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                rows = [i for i in range(table_content.shape[0]) if table_content[i, self.find_index(table, "id_connexion")] == connection and table_content[i, self.find_index(table, "score")] > 0]
                evol_version_val_indicators[str(connection)] = np.mean(table_content[rows, self.find_index(table, "version")]) if len(rows) > 0 else 0
        return evol_version_val_indicators
 
    def evol_version_val_columns(self):
        return ['evol_version_val']
    def evol_version_val_columns_type(self):
        return [np.float]

    # -------------------------------- abandonment_rate ----------------------------------------- #     
    def abandonment_rate(self, date):
        table = "validation"
        table_content = self.tables[table]
        abandonment_rate_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                sujet = defaultdict(int)
                for i in range(table_content.shape[0]):
                    if table_content[i, self.find_index(table, "id_connexion")] == connection:
                        sujet["{},{}".format(table_content[i, self.find_index(table, "id_sujet")], table_content[i, self.find_index(table, "version")])] += table_content[i, self.find_index(table, "score")]
                abandonment_rate_indicators[str(connection)] = (len(sujet) - sum([1 for sujet_version in sujet.keys() if sujet[sujet_version] > 0])) / len(sujet) if len(sujet) > 0 else 0
        return abandonment_rate_indicators

    def abandonment_rate_columns(self):
        return ['abandonment_rate']
    def abandonment_rate_columns_type(self):
        return [np.float]

    # -------------------------------- time_abandonment_rate ----------------------------------------- #  
    def time_abandonment_rate(self, date):
        table = "validation"
        table_content = self.tables[table]
        time_abandonment_rate_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                sujet_score = defaultdict(int)
                sujet_time = defaultdict(timedelta)
                for j in range(table_content.shape[0]):
                    if table_content[j, self.find_index(table, "id_connexion")] == connection:
                        key = "{},{}".format(table_content[j, self.find_index(table, "id_sujet")], table_content[j, self.find_index(table, "version")])
                        sujet_score[key] += table_content[j, self.find_index(table, "score")]
                        if key not in sujet_time.keys():
                            sujet_time[key] = self.get_time_spent_sujet_version(table_content[j, self.find_index(table, "id_sujet")], table_content[j, self.find_index(table, "version")], connection)
                #.total_seconds()
                if len(sujet_time.keys()) == 0:
                    time_abandonment_rate_indicators[str(connection)] = timedelta(seconds=0)
                else:
                    sum_time = sum([sujet_time[sujet_version] for sujet_version in sujet_time], timedelta(0))
                    sum_time_validated = sum([sujet_time[sujet_version] for sujet_version in sujet_time if sujet_score[sujet_version] > 0], timedelta(0))
                    print(sum_time, sum_time_validated)
                    time_abandonment_rate_indicators[str(connection)] = (sum_time - sum_time_validated) / sum_time if sum_time != timedelta(0) else timedelta(seconds=0)
                    print(type(time_abandonment_rate_indicators[str(connection)]))
        return time_abandonment_rate_indicators

    def time_abandonment_rate_columns(self):
        return ['time_abandonment_rate']
    def time_abandonment_rate_columns_type(self):
        return [timedelta]

    # ------------------------------------ test_nb ----------------------------------------- # 
    def test_nb(self, date):
        table = "validation"
        table_content = self.tables[table]
        test_nb_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                sujet_test = defaultdict(int)
                for j in range(table_content.shape[0]):
                    if table_content[j, self.find_index(table, "id_connexion")] == connection:
                        sujet_test["{},{}".format(table_content[j, self.find_index(table, "id_sujet")], table_content[j, self.find_index(table, "version")])] += 1
                test_nb_indicators[str(connection)] = np.mean([sujet_test[sujet_version] for sujet_version in sujet_test])
        return test_nb_indicators

    def test_nb_columns(self):
        return ['test_nb']
    def test_nb_columns_type(self):
        return [np.float]

    # ------------------------------------ test_nb_version_n ----------------------------------------- # 
    def test_nb_version_n(self, date):
        table = "validation"
        table_content = self.tables[table]
        test_nb_version_n_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                sujet_test = defaultdict(int)
                for j in range(table_content.shape[0]):
                    if table_content[j, self.find_index(table, "id_connexion")] == connection:
                        sujet_test["{},{}".format(table_content[j, self.find_index(table, "id_sujet")], table_content[j, self.find_index(table, "version")])] += 1
                version = defaultdict(int)
                counter = defaultdict(int)
                for sujet_version in sujet_test:
                    version[sujet_version[-1]] += sujet_test[sujet_version]
                    counter[sujet_version[-1]] += 1
                for key in version.keys():
                    version[key] /= counter[key]
                test_nb_version_n_indicators[str(connection)] = [version["2"],version["3"],version["4"]]
        return test_nb_version_n_indicators

    def test_nb_version_n_columns(self):
        return ['test_nb_version_2','test_nb_version_3','test_nb_version_4']
    def test_nb_version_n_columns_type(self):
        return [np.float, np.float, np.float]

    # -------------------------------- retest_time_rate ----------------------------------------- #  
    def retest_time_rate(self, date):
        table = "validation"
        table_content = self.tables[table]
        retest_time_rate_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                sujet_time = defaultdict(list)
                sujet = {}
                for j in range(table_content.shape[0]):
                    if table_content[j, self.find_index(table, "id_connexion")] == connection:
                        sujet_time["{},{}".format(table_content[j, self.find_index(table, "id_sujet")], table_content[j, self.find_index(table, "version")])].append(table_content[j, self.find_index(table, "timestamp")])
                for sujet_version in sujet_time.keys():
                    values = sujet_time[sujet_version]
                    if len(values) <= 1:
                        continue
                    else:
                        for i in range(len(values)-1):
                            values[i] = values[i+1] - values[i]
                        sujet[sujet_version] = np.mean(values[:-1])
                retest_time_rate_indicators[str(connection)] = np.mean([sujet[sujet_version] for sujet_version in sujet.keys()] if len(sujet.keys()) > 0 else 0)
        return retest_time_rate_indicators

    def retest_time_rate_columns(self):
        return ['retest_time_rate']
    def retest_time_rate_columns_type(self):
        return [timedelta]

    # -------------------------------- first_test_time_rate ----------------------------------------- #  
    def first_test_time_rate(self, date):
        table = "validation"
        table_content = self.tables[table]
        first_test_time_rate_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                sujet_test = {}
                for j in range(table_content.shape[0]):
                    if table_content[j, self.find_index(table, "id_connexion")] == connection:
                        key = "{},{}".format(table_content[j, self.find_index(table, "id_sujet")], table_content[j, self.find_index(table, "version")])
                        if key not in sujet_test.keys():
                            sujet_test[key] = table_content[j, self.find_index(table, "timestamp")]
                table = "navigation"
                table_content = self.tables[table]
                sujet_found = {}
                if table_content.shape[0] != 0:
                    for j in range(table_content.shape[0]):
                        key = "{},{}".format(table_content[j, self.find_index(table, "id_sujet")], table_content[j, self.find_index(table, "version")])
                        if key not in sujet_found.keys() and key in sujet_test.keys():
                            sujet_test[key] -= table_content[j, self.find_index(table, "timestamp")]
                            sujet_found[key] = 1
                first_test_time_rate_indicators[str(connection)] = np.mean([sujet_test[sujet_version] for sujet_version in sujet_found]) if len(sujet_found.keys()) > 0 else timedelta(seconds=0)
        return first_test_time_rate_indicators

    def first_test_time_rate_columns(self):
        return ['first_test_time_rate']
    def first_test_time_rate_columns_type(self):
        return [timedelta]

    # -------------------------------- nav_nb ----------------------------------------- #  
    def nav_nb_module(self, date):
        table = "navigation"
        table_content = self.tables[table]
        nav_nb_module_indicators = {}
        for i, connection in enumerate(self.connections):
            if self.timestamps[i] >= date and table_content.shape[0] != 0:
                sujet_nav = defaultdict(int)
                for j in range(table_content.shape[0]):
                    if table_content[j, self.find_index(table, "id_connexion")] == connection:
                        module = table_content[j, self.find_index(table, "module")]
                        module = "_" if module == "" else module
                        sujet_nav["{},{},{}".format(table_content[j, self.find_index(table, "id_sujet")], table_content[j, self.find_index(table, "version")], module)] += 1
                sujet = defaultdict(int)
                for sujet_version in sujet_nav:
                    sujet[sujet_version.split(",")[2]] += sujet_nav[sujet_version]
                nav_nb_module_indicators[str(connection)] = [sujet["Accueil"], sujet["Exercice"], sujet["Aide"]]
        return nav_nb_module_indicators

    def nav_nb_module_columns(self):
        return ['nav_nb_accueil', 'nav_nb_exercice', 'nav_nb_aide']
    def nav_nb_module_columns_type(self):
        return [np.int, np.int, np.int]

#### DB Iterator

In [None]:
class DataDB:
    def __init__(self):
        self.participants = []
        self.tables_index = {}
        
    def get_timestamp_indexes(self):
        indexes = {}
        for table in self.tables_index.keys():
            for i, index in enumerate(self.tables_index[table]):
                if index == "timestamp":
                    indexes[table] = i
                    break
        return indexes
    
    # get all the data from DB and rearrange them
    def get_data_from_db(self, connection, tables, min_id=0):
        with connection.cursor() as cursor:
            try:
                # ----------- Get every connection / participant ---------- #
                query_select = "SELECT id, id_participation, timestamp_server, timestamp FROM connexion WHERE id >= %s"
                cursor.execute(query_select, (min_id,))

                result = cursor.fetchall()
                ids = {}
                reversed_ids = {}
                
                for row in result:
                    reversed_ids[row[0]] = row[1]
                    if row[1] not in ids.keys():
                        ids[row[1]] = len(self.participants)
                        obj = Participant(row[0],row[1],row[2],row[3],tables)
                        self.participants.append(obj)
                    else:
                        obj = self.participants[ids[row[1]]]
                        obj.connections.append(row[0])
                        obj.timestamps_server.append(row[2])
                        obj.timestamps.append(row[3])
                
                # ----------- Get every table per participant ---------- #
                for table in tables:
                    query_select = "SELECT * FROM {} WHERE id_connexion >= %s ORDER BY id_connexion, timestamp".format(table)
                    cursor.execute(query_select, (min_id,))

                    result = cursor.fetchall()
                    id_connexion_index = -1
                    self.tables_index[table] = [cursor.description[i][0] for i in range(len(cursor.description)) 
                                              if cursor.description[i][0]]
                    for i, index in enumerate(cursor.description):
                        if index[0] == "id_connexion":
                            id_connexion_index = i
                            break
                    for row in result:
                        obj = self.participants[ids[reversed_ids[row[id_connexion_index]]]]
                        obj.tables[table].append([row[i] for i in range(len(row))])
                
                #convert to numpy array
                for participant in self.participants:
                    participant.set_indexes(self.tables_index)
                    for table in participant.tables.keys():
                        participant.tables[table] = np.array(participant.tables[table])
                
                # end transaction
                connection.commit()
            except Exception:
                cursor.close()
                print("SQL Error while selecting in {}".format("participer"))
                traceback.print_exc()
                raise Exception("---- SQL Error ----")

    # get all participants evol_version indicator
    def evol_version(self, date):
        evol_version_indicators = {}
        for participant in self.participants:
            evol_version_indicators = dict(evol_version_indicators, **participant.evol_version(date))
        evol_version_indicators = pd.DataFrame.from_dict(evol_version_indicators, orient='index', columns=participant.evol_version_columns())
        return evol_version_indicators
    
    def evol_version_val(self, date):
        evol_version_val_indicators = {}
        for participant in self.participants:
            evol_version_val_indicators = dict(evol_version_val_indicators, **participant.evol_version_val(date))
        evol_version_val_indicators = pd.DataFrame.from_dict(evol_version_val_indicators, orient='index', columns=participant.evol_version_val_columns())
        return evol_version_val_indicators

    def abandonment_rate(self, date):
        abandonment_rate_indicators = {}
        for participant in self.participants:
            abandonment_rate_indicators = dict(abandonment_rate_indicators, **participant.abandonment_rate(date))
        abandonment_rate_indicators = pd.DataFrame.from_dict(abandonment_rate_indicators, orient='index', columns=participant.abandonment_rate_columns())
        return abandonment_rate_indicators

    def time_abandonment_rate(self, date):
        time_abandonment_rate_indicators = {}
        for participant in self.participants:
            time_abandonment_rate_indicators = dict(time_abandonment_rate_indicators, **participant.time_abandonment_rate(date))
        time_abandonment_rate_indicators = pd.DataFrame.from_dict(time_abandonment_rate_indicators, orient='index', columns=participant.time_abandonment_rate_columns())
        return time_abandonment_rate_indicators

    def test_nb(self, date):
        test_nb_indicators = {}
        for participant in self.participants:
            test_nb_indicators = dict(test_nb_indicators, **participant.test_nb(date))
        test_nb_indicators = pd.DataFrame.from_dict(test_nb_indicators, orient='index', columns=participant.test_nb_columns())
        return test_nb_indicators 

    def test_nb_version_n(self, date):
        test_nb_version_n_indicators = {}
        for participant in self.participants:
            test_nb_version_n_indicators = dict(test_nb_version_n_indicators, **participant.test_nb_version_n(date))
        test_nb_version_n_indicators = pd.DataFrame.from_dict(test_nb_version_n_indicators, orient='index', columns=participant.test_nb_version_n_columns())
        return test_nb_version_n_indicators
    
    def retest_time_rate(self, date):
        retest_time_rate_indicators = {}
        for participant in self.participants:
            retest_time_rate_indicators = dict(retest_time_rate_indicators, **participant.retest_time_rate(date))
        retest_time_rate_indicators = pd.DataFrame.from_dict(retest_time_rate_indicators, orient='index', columns=participant.retest_time_rate_columns())
        return retest_time_rate_indicators

    def first_test_time_rate(self, date):
        first_test_time_rate_indicators = {}
        for participant in self.participants:
            first_test_time_rate_indicators = dict(first_test_time_rate_indicators, **participant.first_test_time_rate(date))
        first_test_time_rate_indicators = pd.DataFrame.from_dict(first_test_time_rate_indicators, orient='index', columns=participant.first_test_time_rate_columns())
        return first_test_time_rate_indicators
    
    def nav_nb_module(self, date):
        nav_nb_module_indicators = {}
        for participant in self.participants:
            nav_nb_module_indicators = dict(nav_nb_module_indicators, **participant.nav_nb_module(date))
        nav_nb_module_indicators = pd.DataFrame.from_dict(nav_nb_module_indicators, orient='index', columns=participant.nav_nb_module_columns())
        return nav_nb_module_indicators

    def get_strategy(self, date):
        indicators = [self.evol_version(date), self.evol_version_val(date), self.abandonment_rate(date), self.time_abandonment_rate(date), self.test_nb(date), 
            self.test_nb_version_n(date), self.retest_time_rate(date), self.first_test_time_rate(date), self.nav_nb_module(date)]
        indexes = set()
        for i in indicators:
            for index in list(i.index):
                indexes.add(index)
        indexes = list(indexes)
        p = self.participants[0]
        columns_name = p.evol_version_columns()+p.evol_version_val_columns()+p.abandonment_rate_columns()+p.time_abandonment_rate_columns()+p.test_nb_columns()+\
            p.test_nb_version_n_columns()+p.retest_time_rate_columns()+p.first_test_time_rate_columns()+p.nav_nb_module_columns()
        columns_type = [
            p.evol_version_columns_type,
            p.evol_version_val_columns_type,
            p.abandonment_rate_columns_type,
            p.time_abandonment_rate_columns_type,
            p.test_nb_columns_type,
            p.test_nb_version_n_columns_type,
            p.retest_time_rate_columns_type,
            p.first_test_time_rate_columns_type,
            p.nav_nb_module_columns_type
        ]

        df = []
        for i, type_f in enumerate(columns_type):
            types = type_f()
            for j in range(len(types)):
                new_col = np.zeros((len(indexes),), dtype=types[j])
                dataf = indicators[i]
                dataf_index = list(dataf.index)
                dataf_col = list(dataf.values[:,j])
                for k, index in enumerate(indexes):
                    if index not in dataf_index:
                        new_col[k] = timedelta(0) if types[j] == timedelta else (0 if types[j] == np.int else (0. if types[j] == np.float else ""))
                    else:
                        new_col[k] = dataf_col[dataf_index.index(index)]
                df.append(new_col)
        return pd.DataFrame(np.transpose(np.array(df)), index=indexes, columns=columns_name)

#### Functions

In [None]:
def get_connection(config):
    return connect(**config)
def close_connection(connection):
    connection.close()

## Process

### Handle DB

In [None]:
mysql_connection = get_connection(_db_config)

In [None]:
data = DataDB()

In [None]:
min_date = datetime(2021, 6, 2)
min_id = 138

In [None]:
data.get_data_from_db(mysql_connection, _tables, min_id=min_id)

In [None]:
last_connect = data.participants[0]
print("Participants : {}".format(len(data.participants)))
last_connect

Participants : 16


# -------------------------------------------------- # > id : 92
    > connections : [138]
    > timestamps : [datetime.datetime(2021, 6, 2, 8, 3, 55)]
    > timestamps_server : [datetime.datetime(2021, 6, 2, 10, 4, 21, 881000)]
    > clavier : (3, 6)
    > focus : (0,)
    > modification : (54, 7)
    > navigation : (23, 6)
    > pas_a_pas : (14, 7)
    > souris : (361, 17)
    > srl_final_prompt : (0,)
    > srl_initial_prompt : (1, 9)
    > srl_prompt : (0,)
    > validation : (3, 7)
# -------------------------------------------------- #

### Strategy indicators

In [None]:
data.evol_version(min_date)

Unnamed: 0,evol_version
138,0.0
140,0.0
153,2.666667


In [None]:
data.evol_version_val(min_date)

Unnamed: 0,evol_version_val
138,0.0
140,0.0
153,2.5


In [None]:
data.abandonment_rate(min_date)

Unnamed: 0,abandonment_rate
138,0.5
140,0.0
153,0.0


In [None]:
data.time_abandonment_rate(min_date)

0:40:08 0:24:10
<class 'float'>
0:00:53 0:00:53
<class 'float'>
0:00:00 0:00:00
<class 'datetime.timedelta'>


Unnamed: 0,time_abandonment_rate
138,0.397841
140,0
153,0:00:00


In [None]:
data.test_nb(min_date)

Unnamed: 0,test_nb
138,1.5
140,1.0
153,1.5


In [None]:
data.test_nb_version_n(min_date)

Unnamed: 0,test_nb_version_2,test_nb_version_3,test_nb_version_4
138,0.0,0.0,0
140,0.0,0.0,0
153,1.0,2.0,0


In [None]:
data.retest_time_rate(min_date)

Unnamed: 0,retest_time_rate
138,0:00:40
140,0
153,0:00:25


In [None]:
data.first_test_time_rate(min_date)

Unnamed: 0,first_test_time_rate
138,0 days 00:00:05.500000
140,0 days 00:00:04
153,0 days 00:00:00


In [None]:
data.nav_nb_module(min_date)

Unnamed: 0,nav_nb_accueil,nav_nb_exercice,nav_nb_aide
138,0,0,0
140,0,0,0
146,0,0,0
151,0,0,0
152,2,3,0
153,2,21,3


In [None]:
data.get_strategy(min_date)

0:40:08 0:24:10
<class 'float'>
0:00:53 0:00:53
<class 'float'>
0:00:00 0:00:00
<class 'datetime.timedelta'>


Unnamed: 0,evol_version,evol_version_val,abandonment_rate,time_abandonment_rate,test_nb,test_nb_version_2,test_nb_version_3,test_nb_version_4,retest_time_rate,first_test_time_rate,nav_nb_accueil,nav_nb_exercice,nav_nb_aide
138,0.0,0.0,0.5,0.397841,1.5,0,0,0,0:00:40,0 days 00:00:05.500000,0,0,0
140,0.0,0.0,0.0,0,1.0,0,0,0,0,0 days 00:00:04,0,0,0
152,0.0,0.0,0.0,0:00:00,0.0,0,0,0,0:00:00,0 days 00:00:00,2,3,0
151,0.0,0.0,0.0,0:00:00,0.0,0,0,0,0:00:00,0 days 00:00:00,0,0,0
146,0.0,0.0,0.0,0:00:00,0.0,0,0,0,0:00:00,0 days 00:00:00,0,0,0
153,2.66667,2.5,0.0,0:00:00,1.5,1,2,0,0:00:25,0 days 00:00:00,2,21,3


### Behavior indicators