This notebook reproduces the evaluation and comparison of various learning approaches across 10 experiments.

**Approaches**

The approaches considered are listed below.

Activity discovery
- baseline activities
- relational activities, based on instances ground truth (to assess potential)
- relational activities, based on instances supervized baseline predictions
- iterative relational approach  based on frozen instances predictions from supervized baseline

Instances discovery
- baseline instances (supervized)
- baseline instances (unsupervized)
- relational instances, based on activity ground truth (to assess potential)
- relational instances, based on activity baseline predictions 
- iterative collaborative approach

**Process**

For each experience, the experimentation process:

1.   generates dedicated train and test datasets
2.   builds the corresponding classifiers
3.   tests and evaluates them using ground truth labels

Results are then averaged accross all experiences. Considering the long training time on Google Colab®, seeds are increased and set for each experience, in case the notebook disconnects. By doing so, one can resume training at the exact same point it was before the disconnection during an experience.

**Requirements**

Original emails first to be need cleansed and formatted into a dataframe using the notebook 0_Camel_Email_Dataset_Extraction.ipynb. This dataframe then needs to be pre-processed using 1_Dataset_preprocessing.ipynb.

This notebook has been designed to be run on Google Colab®, using input and output files locations on Google Drive®. Therefore, the input CSV of preprocessed emails must be first uploaded to Google Drive® in order for this notebook to access it.

# Parameters

In [20]:
# Experience parameters
nb_xp = 10
nb_collaborative_iter = 5
act_threshold = 0.95

# Parameters if resuming the experimentation process
resume = False # Set to True if resuming the experimentation process after a notebook disconnection between experiences
nb_xp_total = 10 # Used to average the aggregated results after resuming
seed = 42 # If resuming the experimentation process, input the corresponding incrementeed seed value at the resuming iteration

# Input files paths (Google Drive)
email_embedded_path_csv = '/content/drive/MyDrive/input/path/to/preprocessed/emails.csv'

# Output files paths (Google Drive)
results_sup_save_path = '/content/drive/MyDrive/output/path/for/results_supervized_approaches.pickle'
results_unsup_save_path = '/content/drive/MyDrive/output/path/for/results_unsupervized_approaches.pickle'
user_metrics_save_path = '/content/drive/MyDrive/output/path/for/results_user_queries.pickle'
best_preds_path = '/content/drive/MyDrive/output/path/for/emails/predictions.csv'


# Imports

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
!pip install spacy --upgrade
!python -m spacy download en_core_web_lg

Collecting spacy
  Downloading spacy-3.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB)
[K     |████████████████████████████████| 6.4 MB 3.9 MB/s 
[?25hCollecting typer<0.4.0,>=0.3.0
  Downloading typer-0.3.2-py3-none-any.whl (21 kB)
Collecting spacy-legacy<3.1.0,>=3.0.7
  Downloading spacy_legacy-3.0.8-py2.py3-none-any.whl (14 kB)
Collecting pathy>=0.3.5
  Downloading pathy-0.6.0-py3-none-any.whl (42 kB)
[K     |████████████████████████████████| 42 kB 1.1 MB/s 
[?25hCollecting srsly<3.0.0,>=2.4.1
  Downloading srsly-2.4.1-cp37-cp37m-manylinux2014_x86_64.whl (456 kB)
[K     |████████████████████████████████| 456 kB 26.3 MB/s 
[?25hCollecting catalogue<2.1.0,>=2.0.4
  Downloading catalogue-2.0.5-py3-none-any.whl (17 kB)
Collecting thinc<8.1.0,>=8.0.8
  Downloading thinc-8.0.8-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (621 kB)
[K     |████████████████████████████████| 621 kB 40.3 MB/s 
Collecting pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4
  Download

In [4]:
import numpy as np
import pickle
import pandas as pd
import pprint
import math
import spacy
import random
import csv
from sklearn.metrics import f1_score

In [5]:
nlp = spacy.load("en_core_web_lg")

In [105]:
!cp /content/drive/MyDrive/RECHERCHE/Computing_Email_mining/notebooks/notebooks\ finaux/rendu_ICPM/functions.py .

In [7]:
from functions import *

In [8]:
random.seed(seed)
np.random.seed(seed) 

# Data loading

In [11]:
emails_relational = pd.read_csv(email_embedded_path_csv, quoting=csv.QUOTE_ALL).sort_values('Email_ID')
# Remove single-step instances and single-instance activities
emails_relational = filter_emails_for_split(emails_relational, [1, 5, 6, 7, 9, 12, 14, 16, 17, 19, 20, 24, 26, 27, 28, 29, 32, 33, 34, 38, 39, 43, 44, 46, 47, 51, 54, 55, 56, 58, 59, 60, 64, 65])
emails_relational, label_encoder, onehot_encoder, ohe_domain = preprocess(emails_relational, nlp)

Number of remaining emails:  157


# Experiences

## Functions

In [12]:
def test_approaches(
    threshold_act, rel_acts_output_classes,
    train, test, n_iter,
    baseline_inst_sup, baseline_inst_unsup, 
    baseline_act_sup, 
    clf_rel_inst, 
    clf_rel_act, 
    onehot_encoder, label_encoder, ohe_domain, nlp,
    approaches=[
        'rel_with_gt_inst', 'rel_with_gt_act', 
        'rel_with_baseline_inst', 'rel_with_baseline_act',
        'collab_inst_and_act',
        'collab_threshold_inst_and_act',
        'iterative_with_frozen_baseline_inst_act',
        'iterative_threshold_with_frozen_baseline_inst_act'
    ],
    baselines=True
):
    
    
    X = test.copy(deep=True)  
    generations = nb_generations(train) + 1
    
    user_metrics = {
        'ground_truth_inst_metrics': {
            'avg_length_inst': get_avg_max_length_inst(X, 'Trace_ID')[0],
            'max_length_inst': get_avg_max_length_inst(X, 'Trace_ID')[1],
            'avg_steps_inst': get_avg_max_steps_inst(X, 'Trace_ID')[0],
            'max_steps_inst': get_avg_max_steps_inst(X, 'Trace_ID')[1],
            'avg_nb_users_inst': get_avg_max_nb_users_inst(X, 'Trace_ID')[0],
            'max_nb_users_inst': get_avg_max_nb_users_inst(X, 'Trace_ID')[1]
        },
        'ground_truth_acts_metrics': {
            'avg_length_act': get_avg_max_length_act(X, 'Action', label_encoder)[0][('act_length', 'mean')],
            'max_length_act': get_avg_max_length_act(X, 'Action', label_encoder)[1][('act_length', 'max')],
            'avg_nb_users_act': get_avg_max_nb_users_act(X, 'Action', label_encoder)[0][('nb_users', 'mean')],
            'max_nb_users_act': get_avg_max_nb_users_act(X, 'Action', label_encoder)[1][('nb_users', 'max')]
        },
        'relational_inst_metrics' : [],
        'relational_acts_metrics_from_sup_inst' : [],
        'relational_acts_metrics_from_unsup_inst' : [],
        'relational_inst_metrics_collab': [],
        'relational_inst_metrics_collab_threshold': [],
        'relational_acts_metrics_from_sup_inst_collab' : [],
        'relational_acts_metrics_from_sup_inst_frozen' : [],
        'relational_acts_metrics_from_sup_inst_collab_threshold': [],
        'relational_acts_metrics_from_sup_inst_frozen_threshold': []
    }
    
    results_sup = []
    results_unsup = []
    
    ###
    ### Apply and test baselines
    ###
    
    if baselines:
    
        ## Instances 

        # Supervized
        X_pairs, X_insts_sup, y_instances = get_X_y_instances(X, nlp)
        pred_instances_sup = baseline_inst_sup.predict(X_insts_sup)
        for pair, pred in zip(X_pairs, pred_instances_sup):
            label_pairs_of_rows_from_ground_truth_instances(pair)
            pair['same_instance_pred_sup'] = pred
        precision, recall, f_score = evaluate_instances_discovery(X_pairs, 'same_instance', 'same_instance_pred_sup')
        results_baseline_inst_sup = {
            'baseline_instances_sup':{
                'precision':precision,
                'recall':recall,
                'f1':f_score
            }
        }
        results_sup.append(results_baseline_inst_sup)
        X_baseline = get_trace_from_insts_preds(X, X_pairs, 'same_instance_pred_sup', 'Trace_ID_pred_sup')
        X_pairs_sup = get_pairs_of_rows(X_baseline)
        X = X.merge(X_baseline[['Email_ID', 'Trace_ID_pred_sup']], on='Email_ID', how='left')

        # Unsupervized
        pred_instances_unsup = baseline_inst_unsup(X)
        X['Trace_ID_pred_unsup'] = pred_instances_unsup
        X_pairs_unsup = get_pairs_of_rows(X)
        for pair_unsup in X_pairs_unsup:
            label_pairs_of_rows_from_ground_truth_instances(pair_unsup)
            pair_unsup_traces = pair_unsup['Trace_ID_pred_unsup'].drop_duplicates()
            if pair_unsup_traces.shape[0] == 1 :
                pair_unsup['same_instance_pred_unsup'] = 1 
            else:
                pair_unsup['same_instance_pred_unsup'] = 0
        precision, recall, f_score = evaluate_instances_discovery(X_pairs_unsup, 'same_instance', 'same_instance_pred_unsup')
        results_baseline_inst_unsup = {
            'baseline_instances_unsup':{
                'precision':precision,
                'recall':recall,
                'f1':f_score
            }
        }
        results_unsup.append(results_baseline_inst_unsup)
        
        user_metrics['baseline_inst_sup_metrics'] = {
            'avg_length_inst': get_avg_max_length_inst(X_baseline, 'Trace_ID_pred_sup')[0],
            'max_length_inst': get_avg_max_length_inst(X_baseline, 'Trace_ID_pred_sup')[1],
            'avg_steps_inst': get_avg_max_steps_inst(X_baseline, 'Trace_ID_pred_sup')[0],
            'max_steps_inst': get_avg_max_steps_inst(X_baseline, 'Trace_ID_pred_sup')[1],
            'avg_nb_users_inst': get_avg_max_nb_users_inst(X_baseline, 'Trace_ID_pred_sup')[0],
            'max_nb_users_inst': get_avg_max_nb_users_inst(X_baseline, 'Trace_ID_pred_sup')[1]
        }
        user_metrics['baseline_inst_unsup_metrics'] = {
            'avg_length_inst': get_avg_max_length_inst(X, 'Trace_ID_pred_unsup')[0],
            'max_length_inst': get_avg_max_length_inst(X, 'Trace_ID_pred_unsup')[1],
            'avg_steps_inst': get_avg_max_steps_inst(X, 'Trace_ID_pred_unsup')[0],
            'max_steps_inst': get_avg_max_steps_inst(X, 'Trace_ID_pred_unsup')[1],
            'avg_nb_users_inst': get_avg_max_nb_users_inst(X, 'Trace_ID_pred_unsup')[0],
            'max_nb_users_inst': get_avg_max_nb_users_inst(X, 'Trace_ID_pred_unsup')[1]
        }


        ## Activities

        X_acts_sup, y_acts = get_X_y_activities(X, label_encoder, ohe_domain)
        pred_activities_sup = baseline_act_sup.predict(X_acts_sup)
        score_baseline_act_sup = precision_recall_fscore_support(y_acts, pred_activities_sup, average='micro')
        X['act_pred_sup'] = label_encoder.inverse_transform(pred_activities_sup)

        results_baseline_act_sup = {
            'baseline_activities_sup':{
                'unpredicted_labels': set(y_acts) - set(pred_activities_sup),
                'precision':score_baseline_act_sup[0],
                'recall':score_baseline_act_sup[1],
                # 'f1':score_baseline_act_sup[2]
                'f1':f1_score(y_acts, pred_activities_sup, average=None)

            }
        }
        results_sup.append(results_baseline_act_sup)

        user_metrics['baseline_act_sup_metrics'] = {
            'avg_length_act': get_avg_max_length_act(X, 'act_pred_sup', label_encoder)[0][('act_length', 'mean')],
            'max_length_act': get_avg_max_length_act(X, 'act_pred_sup', label_encoder)[1][('act_length', 'max')],
            'avg_nb_users_act': get_avg_max_nb_users_act(X, 'act_pred_sup', label_encoder)[0][('nb_users', 'mean')],
            'max_nb_users_act': get_avg_max_nb_users_act(X, 'act_pred_sup', label_encoder)[1][('nb_users', 'max')]
        }
    
    
    ###
    ### Relational with ground truth
    ###
    
    
    if 'rel_with_gt_inst' in approaches:
    
        ## Instances

        X_pairs, X_insts_rel_sup, y_instances = get_X_y_instances_rel(X, 'Action', nlp, onehot_encoder)
        pred_instances_rel_gt_sup = clf_rel_inst.predict(X_insts_rel_sup)
        pred_instances_rel_gt_sup_list = []
        for pair, pred in zip(X_pairs, pred_instances_rel_gt_sup):
            label_pairs_of_rows_from_ground_truth_instances(pair)
            pair['same_instance_pred_rel_gt_sup'] = pred
        precision, recall, f_score = evaluate_instances_discovery(X_pairs, 'same_instance', 'same_instance_pred_rel_gt_sup')
        X_rel_gt = get_trace_from_insts_preds(X, X_pairs, 'same_instance_pred_rel_gt_sup', 'Trace_ID_pred_rel_gt_sup')
        X_pairs_rel_gt = get_pairs_of_rows(X_rel_gt)

        results_rel_gt_inst_sup = {
            'rel_gt_instances_sup':{
                'precision':precision,
                'recall':recall,
                'f1':f_score
            }
        }
        results_sup.append(results_rel_gt_inst_sup)
        
        user_metrics['rel_gt_inst_sup_metrics'] = {
            'avg_length_inst': get_avg_max_length_inst(X_rel_gt, 'Trace_ID_pred_rel_gt_sup')[0],
            'max_length_inst': get_avg_max_length_inst(X_rel_gt, 'Trace_ID_pred_rel_gt_sup')[1],
            'avg_steps_inst': get_avg_max_steps_inst(X_rel_gt, 'Trace_ID_pred_rel_gt_sup')[0],
            'max_steps_inst': get_avg_max_steps_inst(X_rel_gt, 'Trace_ID_pred_rel_gt_sup')[1],
            'avg_nb_users_inst': get_avg_max_nb_users_inst(X_rel_gt, 'Trace_ID_pred_rel_gt_sup')[0],
            'max_nb_users_inst': get_avg_max_nb_users_inst(X_rel_gt, 'Trace_ID_pred_rel_gt_sup')[1]
        }
    
    if 'rel_with_gt_act' in approaches:
    
        ## Activities

        X_acts_rel_sup, y_acts = get_X_y_activities_rel(X, 'Trace_ID', 'Action', label_encoder, onehot_encoder, ohe_domain, generations)
        pred_activities_rel_sup = clf_rel_act.predict(X_acts_rel_sup)
        score_rel_act_sup = precision_recall_fscore_support(y_acts, pred_activities_rel_sup, average='micro')
        X['act_pred_rel_gt_sup'] = label_encoder.inverse_transform(pred_activities_rel_sup)

        results_rel_gt_act_sup = {
            'rel_gt_activities_sup':{
                'unpredicted_labels': set(y_acts) - set(pred_activities_rel_sup),
                'precision':score_rel_act_sup[0],
                'recall':score_rel_act_sup[1],
                'f1':f1_score(y_acts, pred_activities_rel_sup, average=None)

            }
        }
        results_sup.append(results_rel_gt_act_sup)
        
        user_metrics['rel_gt_act_sup_metrics'] = {
            'avg_length_act': get_avg_max_length_act(X, 'act_pred_rel_gt_sup', label_encoder)[0][('act_length', 'mean')],
            'max_length_act': get_avg_max_length_act(X, 'act_pred_rel_gt_sup', label_encoder)[1][('act_length', 'max')],
            'avg_nb_users_act': get_avg_max_nb_users_act(X, 'act_pred_rel_gt_sup', label_encoder)[0][('nb_users', 'mean')],
            'max_nb_users_act': get_avg_max_nb_users_act(X, 'act_pred_rel_gt_sup', label_encoder)[1][('nb_users', 'max')]
        }
    
    
    ###
    ### Relational with baselines predictions
    ###
    
    
    rel_baseline_results = {}

    if 'rel_with_baseline_inst' in approaches:

        ## Instances with activity baseline preds

        X_pairs, X_insts_rel_sup, y_instances = get_X_y_instances_rel(X, 'act_pred_sup', nlp, onehot_encoder)
        pred_instances_rel_sup = clf_rel_inst.predict(X_insts_rel_sup)
        pred_instances_rel_sup_list = []
        for pair, pred in zip(X_pairs, pred_instances_rel_sup):
            label_pairs_of_rows_from_ground_truth_instances(pair)
            pair['same_instance_pred_rel_sup'] = pred
        precision, recall, f_score = evaluate_instances_discovery(X_pairs, 'same_instance', 'same_instance_pred_rel_sup')
        X_rel_sup = get_trace_from_insts_preds(X, X_pairs, 'same_instance_pred_rel_sup', 'Trace_ID_pred_rel_sup')
        X_pairs_rel_sup = get_pairs_of_rows(X_rel_sup)
        
        rel_baseline_results['rel_instances'] = {
            'precision': precision,
            'recall': recall,
            'f1': f_score
        }
        user_metrics['relational_inst_metrics'] ={
            'avg_length_inst': get_avg_max_length_inst(X_rel_sup, 'Trace_ID_pred_rel_sup')[0],
            'max_length_inst': get_avg_max_length_inst(X_rel_sup, 'Trace_ID_pred_rel_sup')[1],
            'avg_steps_inst': get_avg_max_steps_inst(X_rel_sup, 'Trace_ID_pred_rel_sup')[0],
            'max_steps_inst': get_avg_max_steps_inst(X_rel_sup, 'Trace_ID_pred_rel_sup')[1],
            'avg_nb_users_inst': get_avg_max_nb_users_inst(X_rel_sup, 'Trace_ID_pred_rel_sup')[0],
            'max_nb_users_inst': get_avg_max_nb_users_inst(X_rel_sup, 'Trace_ID_pred_rel_sup')[1]
        }
    
    if 'rel_with_baseline_act' in approaches:

        ## Activity with instances baseline preds

        # With instances supervized baseline preds
        X_acts_rel_sup, y_acts = get_X_y_activities_rel(X, 'Trace_ID_pred_sup', 'act_pred_sup', label_encoder, onehot_encoder, ohe_domain, generations)
        pred_activities_rel_sup = clf_rel_act.predict(X_acts_rel_sup)
        score_rel_act_sup = precision_recall_fscore_support(y_acts, pred_activities_rel_sup, average='micro')
        X['act_pred_sup_from_sup_inst'] = label_encoder.inverse_transform(pred_activities_rel_sup)

        rel_baseline_results_sup = copy.deepcopy(rel_baseline_results)
        rel_baseline_results_sup['rel_activities_from_sup_inst'] = {
            'unpredicted_labels': set(y_acts) - set(pred_activities_rel_sup),
            'precision':score_rel_act_sup[0],
            'recall':score_rel_act_sup[1],
            'f1':f1_score(y_acts, pred_activities_rel_sup, average=None)

        }
        user_metrics['relational_acts_metrics_from_sup_inst'] = {
            'avg_length_act': get_avg_max_length_act(X, 'act_pred_sup_from_sup_inst', label_encoder)[0][('act_length', 'mean')],
            'max_length_act': get_avg_max_length_act(X, 'act_pred_sup_from_sup_inst', label_encoder)[1][('act_length', 'max')],
            'avg_nb_users_act': get_avg_max_nb_users_act(X, 'act_pred_sup_from_sup_inst', label_encoder)[0][('nb_users', 'mean')],
            'max_nb_users_act': get_avg_max_nb_users_act(X, 'act_pred_sup_from_sup_inst', label_encoder)[1][('nb_users', 'max')]
        }

    results_sup.append(rel_baseline_results_sup)        
        
    ###    
    ### Relational collaborative
    ###
    
    
    X_frozen = copy.deepcopy(X)
    X_frozen_threshold = copy.deepcopy(X)
    X_collab_threshold = copy.deepcopy(X)
    
    for i in range(n_iter):
        
        rel_collab_results = {}
        rel_collab_threshold_results = {}
        
        if 'iterative_threshold_with_frozen_baseline_inst_act' in approaches:
        
            ## Activities with frozen instances from supervized baseline

            X_frozen_acts_rel_sup, y_frozen_acts = get_X_y_activities_rel(X_frozen_threshold, 'Trace_ID_pred_sup', 'act_pred_sup', label_encoder, onehot_encoder, ohe_domain, generations)
            pred_frozen_activities_rel_sup = clf_rel_act.predict_proba(X_frozen_acts_rel_sup)
            X_frozen_threshold['act_pred_amax'] = np.argmax(pred_frozen_activities_rel_sup, 1)
            X_frozen_threshold['act_pred_max'] = [np.max(x) for x in pred_frozen_activities_rel_sup]
            X_frozen_threshold['act_pred_sup'] = np.where(X_frozen_threshold['act_pred_max'] > threshold_act, label_encoder.inverse_transform(rel_acts_output_classes[X_frozen_threshold['act_pred_amax']]), X_frozen_threshold['act_pred_sup'])
            pred_frozen_activities_rel_sup_list = label_encoder.transform(X_frozen_threshold['act_pred_sup'])
            score_frozen_rel_act_sup = precision_recall_fscore_support(y_frozen_acts, pred_frozen_activities_rel_sup_list, average='micro')

            rel_frozen_threshold_results = {
                'rel_activities_from_sup_inst_frozen_threshold': {
                    'unpredicted_labels': set(),
                    'precision':score_frozen_rel_act_sup[0],
                    'recall':score_frozen_rel_act_sup[1],
                    'f1':f1_score(y_frozen_acts, pred_frozen_activities_rel_sup_list, average=None)

                }
            }
            results_sup.append(rel_frozen_threshold_results)
            
            user_metrics['relational_acts_metrics_from_sup_inst_frozen_threshold'].append(
                {
                    'avg_length_act': get_avg_max_length_act(X_frozen, 'act_pred_sup', label_encoder)[0][('act_length', 'mean')],
                    'max_length_act': get_avg_max_length_act(X_frozen, 'act_pred_sup', label_encoder)[1][('act_length', 'max')],
                    'avg_nb_users_act': get_avg_max_nb_users_act(X_frozen, 'act_pred_sup', label_encoder)[0][('nb_users', 'mean')],
                    'max_nb_users_act': get_avg_max_nb_users_act(X_frozen, 'act_pred_sup', label_encoder)[1][('nb_users', 'max')]
                }
            )

            # For best predictions CSV output
            if i == (n_iter-1):
              print("Writing best preds to CSV.")
              df = X_frozen_threshold[['Email_ID', 'Trace_ID', 'Action', 'Trace_ID_pred_sup', 'act_pred_sup']]
              df.to_csv(best_preds_path, index=False)            

        if 'collab_inst_and_act' in approaches:
            
            ## Instances with activity rel preds

            X_pairs, X_insts_collab, y_instances = get_X_y_instances_rel(X, 'act_pred_sup', nlp, onehot_encoder)
            pred_instances_collab = clf_rel_inst.predict(X_insts_collab)
            pred_instances_collab_list = []
            for pair, pred in zip(X_pairs, pred_instances_collab):
                label_pairs_of_rows_from_ground_truth_instances(pair)
                pair['same_instance_pred_rel_sup'] = pred
            X_collab = get_trace_from_insts_preds(X, X_pairs, 'same_instance_pred_rel_sup', 'Trace_ID_pred_sup')
            X_pairs_collab = get_pairs_of_rows(X_collab)
            precision, recall, f_score = evaluate_instances_discovery(X_pairs, 'same_instance', 'same_instance_pred_rel_sup')
            rel_collab_results['rel_instances_collab'] = {
                'precision': precision,
                'recall': recall,
                'f1': f_score
            }
            user_metrics['relational_inst_metrics_collab'].append(
                {
                    'avg_length_inst': get_avg_max_length_inst(X_collab, 'Trace_ID_pred_sup')[0],
                    'max_length_inst': get_avg_max_length_inst(X_collab, 'Trace_ID_pred_sup')[1],
                    'avg_steps_inst': get_avg_max_steps_inst(X_collab, 'Trace_ID_pred_sup')[0],
                    'max_steps_inst': get_avg_max_steps_inst(X_collab, 'Trace_ID_pred_sup')[1],
                    'avg_nb_users_inst': get_avg_max_nb_users_inst(X_collab, 'Trace_ID_pred_sup')[0],
                    'max_nb_users_inst': get_avg_max_nb_users_inst(X_collab, 'Trace_ID_pred_sup')[1]
                }
            )

            ## Activities with instances rel preds

            # With instances supervized rel preds
            X_acts_rel_sup, y_acts = get_X_y_activities_rel(X, 'Trace_ID_pred_sup', 'act_pred_sup', label_encoder, onehot_encoder, ohe_domain, generations)
            pred_activities_rel_sup = clf_rel_act.predict(X_acts_rel_sup)
            score_rel_act_sup = precision_recall_fscore_support(y_acts, pred_activities_rel_sup, average='micro')
            X['act_pred_sup'] = label_encoder.inverse_transform(pred_activities_rel_sup)

            rel_collab_results['rel_activities_from_sup_inst_collab'] = {
                'unpredicted_labels': set(y_acts) - set(pred_activities_rel_sup),
                'precision':score_rel_act_sup[0],
                'recall':score_rel_act_sup[1],
                'f1':f1_score(y_acts, pred_activities_rel_sup, average=None)

            }
            results_sup.append(rel_collab_results)
            
            user_metrics['relational_acts_metrics_from_sup_inst_collab'].append(
                {
                    'avg_length_act': get_avg_max_length_act(X, 'act_pred_sup', label_encoder)[0][('act_length', 'mean')],
                    'max_length_act': get_avg_max_length_act(X, 'act_pred_sup', label_encoder)[1][('act_length', 'max')],
                    'avg_nb_users_act': get_avg_max_nb_users_act(X, 'act_pred_sup', label_encoder)[0][('nb_users', 'mean')],
                    'max_nb_users_act': get_avg_max_nb_users_act(X, 'act_pred_sup', label_encoder)[1][('nb_users', 'max')]
                }
            )
        
            # Update instances for next iteration
            X['Trace_ID_pred_sup'] = X_collab['Trace_ID_pred_sup']
        
        # Decreasing treshold
        threshold_act = threshold_act / 2

    return results_sup, results_unsup, user_metrics


In [13]:
def experiences(n_xp, nb_collaborative_iter, act_threshold, onehot_encoder, label_encoder, ohe_domain, nlp, approaches=[
    'rel_with_gt_inst', 'rel_with_gt_act',
    'rel_with_baseline_inst', 'rel_with_baseline_act',
    'collab_inst_and_act',
    'collab_threshold_inst_and_act',
    'iterative_with_frozen_baseline_inst_act',
    'iterative_threshold_with_frozen_baseline_inst_act'
], resume=False, seed=42):
    results_sup = []
    results_unsup = []
    user_metrics = []

    if resume:
      with open(results_sup_save_path, 'rb') as f:
        results_sup = pickle.load(f)
      with open(results_unsup_save_path, 'rb') as f:
        results_unsup = pickle.load(f)
      with open(user_metrics_save_path, 'rb') as f:
        user_metrics = pickle.load(f)

    for i in range(n_xp):
        print("=====================")
        print("START EXPERIENCE: ", str(i))

        seed = seed + 1
        print("Seed value: ", seed)
        random.seed(seed)
        np.random.seed(seed)
        
        train, test = split_train_test(emails_relational)
        
        supervized_baseline_instances_structured(train, test, nlp, seed=42, save_path="baseline_inst.pickle.dat")
        supervized_baseline_activities_structured(train, test, label_encoder, ohe_domain, seed=seed, save_path="baseline_act.pickle.dat")
        supervized_relational_instances_structured(train, test, nlp, onehot_encoder, seed=42, save_path="relational_inst.pickle.dat")
        supervized_relational_activities_structured(train, test, label_encoder, onehot_encoder, ohe_domain, seed=seed, save_path="relational_act.pickle.dat")     
            
        clf_supervized_baseline_instances = pickle.load(open("baseline_inst.pickle.dat", "rb"))
        clf_supervized_baseline_activities = pickle.load(open("baseline_act.pickle.dat", "rb"))
        model_relational_instances = pickle.load(open("relational_inst.pickle.dat", "rb"))
        model_relational_activities = pickle.load(open("relational_act.pickle.dat", "rb"))
        rel_acts_output_classes = model_relational_activities.classes_
        
        res_sup, res_unsup, u_metrics = test_approaches(
            act_threshold, rel_acts_output_classes,
            train, test, nb_collaborative_iter,
            clf_supervized_baseline_instances, unsupervized_baseline_instances,
            clf_supervized_baseline_activities, 
            model_relational_instances, 
            model_relational_activities,
            onehot_encoder, label_encoder, ohe_domain, nlp,
            approaches, 
            baselines=True
        )
        
        results_sup.append(res_sup)
        results_unsup.append(res_unsup)
        user_metrics.append(u_metrics)
        
        del train
        del test

        # Save results and seed betwween each iteration
        with open(results_sup_save_path, 'wb') as f:
          pickle.dump(results_sup, f, protocol=pickle.HIGHEST_PROTOCOL)
        with open(results_sup_save_path, 'rb') as f:
            results_sup = pickle.load(f)
        
        with open(results_unsup_save_path, 'wb') as f:
            pickle.dump(results_unsup, f, protocol=pickle.HIGHEST_PROTOCOL)
        with open(results_unsup_save_path, 'rb') as f:
            results_unsup = pickle.load(f)

        with open(user_metrics_save_path, 'wb') as f:
            pickle.dump(user_metrics, f, protocol=pickle.HIGHEST_PROTOCOL)
        with open(user_metrics_save_path, 'rb') as f:
            user_metrics = pickle.load(f)

        print("END EXPERIENCE: ", str(i))
        print("=====================")
        
    return results_sup, results_unsup, user_metrics

## Application

We evaluate our various approaches and compare their results.

In [14]:
%%time
results_sup, results_unsup, user_metrics = experiences(nb_xp, nb_collaborative_iter, act_threshold, onehot_encoder, label_encoder, ohe_domain, nlp, approaches=[
    'rel_with_gt_inst', 'rel_with_gt_act',
    'rel_with_baseline_inst', 'rel_with_baseline_act',
    'collab_inst_and_act',
    'iterative_threshold_with_frozen_baseline_inst_act'
], resume=resume, seed=seed)

START EXPERIENCE:  0
43
Writing best preds to CSV.
END EXPERIENCE:  0
CPU times: user 1h 8min 53s, sys: 1min 50s, total: 1h 10min 43s
Wall time: 2h 22min 29s


# Average results

## Functions

In [87]:
def clean_nan_dict(d):
    gt_inst_metrics = d.get('ground_truth_inst_metrics')
    gt_act_metrics = d.get('ground_truth_acts_metrics')
    baseline_inst_sup_metrics = d.get('baseline_inst_sup_metrics')
    baseline_inst_unsup_metrics = d.get('baseline_inst_unsup_metrics')
    baseline_act_sup_metrics = d.get('baseline_act_sup_metrics')
    rel_baseline_inst_metrics = d.get('relational_inst_metrics')
    rel_act_from_sup_frozen_metrics_threshold = d.get('relational_acts_metrics_from_sup_inst_frozen_threshold')
    
    for i, elt in enumerate(rel_act_from_sup_frozen_metrics_threshold):
        for k,v in elt.items():
            for v, metric in elt[k].items():
                if math.isnan(metric):
                    rel_act_from_sup_frozen_metrics_threshold[i][k][v] = 0
                                  
    for k,v in baseline_act_sup_metrics.items():
        for v, metric in baseline_act_sup_metrics[k].items():
            if math.isnan(metric):
                baseline_act_sup_metrics[k][v] = 0
            
    for k, metrics in baseline_inst_sup_metrics.items():
        if math.isnan(metric):
            baseline_inst_sup_metrics[k] = 0
                            
    for k,v in gt_act_metrics.items():
        for v, metric in gt_act_metrics[k].items():
            if math.isnan(metric):
                gt_act_metrics[k][v] = 0
            
    for k, metrics in gt_inst_metrics.items():
        if math.isnan(metric):
            gt_inst_metrics[k] = 0
            
    result = {
        'ground_truth_inst_metrics': gt_inst_metrics,
        'ground_truth_acts_metrics': gt_act_metrics,
        'baseline_inst_sup_metrics': baseline_inst_sup_metrics,
        'baseline_act_sup_metrics' : baseline_act_sup_metrics,
    }
    
    return result
                 
def get_avg_results_sup(results, nb_xp, nb_collaborative_iter):
    summed_results = {
        'baseline_instances_sup': {
            'f1': 0,
            'precision': 0,
            'recall': 0
        },
        'baseline_activities_sup': {
            'f1': [],
            'precision': 0,
            'recall': 0,
            'unpredicted_labels':[]
        },
        'rel_gt_instances_sup': {
            'f1': 0,
            'precision': 0,
            'recall': 0
        },
        'rel_gt_activities_sup': {
            'f1': [],
            'precision': 0,
            'recall': 0,
            'unpredicted_labels':[]
        },
        'rel_instances': {
            'f1': 0,
            'precision': 0,
            'recall': 0,
        },
        'rel_activities_from_sup_inst': {
            'f1': [],
            'precision': 0,
            'recall': 0,
            'unpredicted_labels':[]
        },        
        'rel_instances_collab': {
            'f1': [],
            'precision': [],
            'recall': []
        },
        'rel_activities_from_sup_inst_collab': {
            'f1': [],
            'precision': [],
            'recall': [],
            'unpredicted_labels':[]
        },
        'rel_activities_from_sup_inst_frozen_threshold': {
            'f1': [],
            'precision': [],
            'recall': [],
            'unpredicted_labels':[]
        },
        
    }

    for xp in results:
        for i in xp:
            for model in i.keys():
                scores = i[model]
                if model in [
                    'rel_instances_collab', 'rel_activities_from_sup_inst_collab', 
                    'rel_instances_collab_threshold', 'rel_activities_from_sup_inst_collab_threshold',
                    'rel_activities_from_sup_inst_frozen', 'rel_activities_from_sup_inst_frozen_threshold'
                
                ]:
                    for k in scores.keys():
                        if (k in ['unpredicted_labels']) or (k in ['f1'] and 'activities' in model):
                            summed_results[model][k].append(scores[k])
                        else:                
                            summed_results[model][k].append(scores[k])
                else:
                    for k in scores.keys():
                        if (k in ['unpredicted_labels']) or (k in ['f1'] and 'activities' in model):
                            summed_results[model][k].append(scores[k])
                        else:     
                            summed_results[model][k] = summed_results[model][k] + scores[k]

    avg_results = copy.deepcopy(summed_results)

    for model in avg_results.keys():
        if model in [
            'rel_instances_collab', 'rel_activities_from_sup_inst_collab', 
            'rel_instances_collab_threshold', 'rel_activities_from_sup_inst_collab_threshold',
            'rel_activities_from_sup_inst_frozen', 'rel_activities_from_sup_inst_frozen_threshold'
        ]:
            for k in avg_results[model]:
                if (k == 'f1' and 'activities' in model):
                    results = []
                    for i in range(nb_collaborative_iter):
                        iteration_res = np.mean(np.array([avg_results[model][k][j] for j in range(i, nb_xp*nb_collaborative_iter, nb_collaborative_iter)]), axis=0)
                        results.append(iteration_res)
                    avg_results[model][k] = results
                elif (k not in ['unpredicted_labels']):
                    tmp = copy.deepcopy(avg_results[model][k])
                    avg_results[model][k] = [0] * nb_collaborative_iter
                    for j in range(nb_xp * nb_collaborative_iter):
                        try:
                            avg_results[model][k][j % nb_collaborative_iter] = avg_results[model][k][j % nb_collaborative_iter] + tmp[j]
                        except Exception:
                            pass
                    avg_results[model][k] = [avg_results[model][k][i]/nb_xp for i in list(range(nb_collaborative_iter))]
        else:
            for k in avg_results[model]:
                if (k == 'f1' and 'activities' in model):
                    avg_results[model][k] = np.mean(np.array([avg_results[model][k][i] for i in range(nb_xp)]), axis=0)
                elif (k not in ['unpredicted_labels']):
                    avg_results[model][k] = avg_results[model][k] / nb_xp
                    
    return avg_results, summed_results

### Metrics initializers

In [84]:
summed_results_camel = {
        'ground_truth_inst_metrics': {
            'avg_length_inst': 0,
            'max_length_inst': 0,
            'avg_steps_inst': 0,
            'max_steps_inst': 0,
            'avg_nb_users_inst': 0,
            'max_nb_users_inst': 0
        },
        'ground_truth_acts_metrics': {
            'avg_length_act': {
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            },
            'max_length_act': {
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            },
            'avg_nb_users_act': {
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            },
            'max_nb_users_act': {
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            }
        },
        'baseline_inst_sup_metrics': {
            'avg_length_inst': 0,
            'max_length_inst': 0,
            'avg_steps_inst': 0,
            'max_steps_inst': 0,
            'avg_nb_users_inst': 0,
            'max_nb_users_inst': 0
        },
        'baseline_act_sup_metrics': {
            'avg_length_act': {
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            },
            'max_length_act': {
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            },
            'avg_nb_users_act': {
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            },
            'max_nb_users_act': {
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            }
        },
        'relational_acts_metrics_from_sup_inst_frozen_threshold': {
            'avg_length_act': [{
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            } for i in range(nb_collaborative_iter)],
            'max_length_act': [{
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            } for i in range(nb_collaborative_iter)],
            'avg_nb_users_act': [{
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            } for i in range(nb_collaborative_iter)],
            'max_nb_users_act': [{
                'ask a question': 0,
                'assign issue': 0,
                'automated comment issue': 0,
                'build system update': 0,
                'close pull request': 0,
                'close question': 0,
                'commit changes': 0,
                'create issue': 0,
                'distribute situational awareness': 0,
                'issue comment': 0,
                'issue update': 0,
                'open pull request': 0,
                'provide support': 0,
                'publicity': 0,
                'reopen issue': 0,
                'resolve issue': 0,
                'update question': 0,
                'version release planning': 0,
                'work started issue': 0
            } for i in range(nb_collaborative_iter)]
        },
    }

### Average unsup results and activity metrics

In [95]:
def get_avg_results_unsup(results, nb_xp, nb_collaborative_iter):
    summed_results = {
        'baseline_instances_unsup': {
            'f1': 0,
            'precision': 0,
            'recall': 0
        }
    }

    for xp in results:
        for i in xp:
            for model in i.keys():
                scores = i[model]
                if model in ['rel_instances_collab', 'rel_activities_from_sup_inst_collab']:
                    for k in scores.keys():
                        if (k == 'unpredicted_labels') or (k in ['f1'] and 'activities' in model):
                            summed_results[model][k].append(scores[k])
                        else:                
                            summed_results[model][k].append(scores[k])
                else:
                    for k in scores.keys():
                        if (k == 'unpredicted_labels') or (k in ['f1'] and 'activities' in model):
                            summed_results[model][k].append(scores[k])
                        else:     
                            summed_results[model][k] = summed_results[model][k] + scores[k]

    avg_results = copy.deepcopy(summed_results)

    for model in avg_results.keys():
        if model in ['rel_instances_collab', 'rel_activities_from_sup_inst_collab']:
            for k in avg_results[model]:

                if (k == 'f1' and 'activities' in model):
                    results = []
                    for i in range(nb_collaborative_iter):
                        iteration_res = np.mean(np.array([avg_results[model][k][j] for j in range(i, nb_xp*nb_collaborative_iter, nb_collaborative_iter)]), axis=0)
                        results.append(iteration_res)
                    avg_results[model][k] = results
                elif (k not in ['unpredicted_labels']):

                    tmp = copy.deepcopy(avg_results[model][k])
                    avg_results[model][k] = [0] * nb_collaborative_iter
                    for j in range(nb_xp * nb_collaborative_iter):
                        try:
                            avg_results[model][k][j % nb_collaborative_iter] = avg_results[model][k][j % nb_collaborative_iter] + tmp[j]
                        except Exception:
                            pass
                    avg_results[model][k] = [avg_results[model][k][i]/nb_xp for i in list(range(nb_collaborative_iter))]
        else:
            for k in avg_results[model]:
                if (k == 'f1' and 'activities' in model):
                  try:
                    avg_results[model][k] = np.mean(np.array([avg_results[model][k][i] for i in range(nb_xp)]), axis=0)
                  except Exception:
                    pass
                elif (k not in ['unpredicted_labels']):
                    avg_results[model][k] = avg_results[model][k] / nb_xp
                    
    return avg_results, summed_results
    

def get_avg_user_metrics(user_metrics, nb_xp, nb_collaborative_iter, summed_results):
    summed_results = summed_results
    
    for xp in user_metrics:
        
        # Replace NaN
        try:
            xp = clean_nan_dict(xp)
        except Exception:
            pass
        for key in xp.keys():
            # Instances metrics
            if key in [
                'ground_truth_inst_metrics', 
                'baseline_inst_sup_metrics', 
                'baseline_inst_unsup_metrics', 
                'rel_gt_inst_sup_metrics', 
                'relational_inst_metrics', 
                'relational_inst_metrics_collab',
                'relational_inst_metrics_collab_threshold'
            ]:
                if key not in ['relational_inst_metrics_collab', 'relational_inst_metrics_collab_threshold']:
                    for subkey in xp[key]:
                        summed_results[key][subkey] = summed_results[key][subkey] + xp[key][subkey]
                else:
                    for i, d in enumerate(xp[key]):
                        for c in d.keys():
                            summed_results[key][i][c] = summed_results[key][i][c] + xp[key][i][c]
                        
            # Activities metrics       
            else:
                if key not in [
                    'relational_acts_metrics_from_sup_inst_collab', 
                    'relational_acts_metrics_from_sup_inst_frozen',
                    'relational_acts_metrics_from_sup_inst_collab_threshold', 
                    'relational_acts_metrics_from_sup_inst_frozen_threshold'
                ]:
                    try:
                        for subkey in xp[key].keys():
                            for s_subkey in xp[key][subkey].keys():
                                summed_results[key][subkey][s_subkey] = summed_results[key][subkey][s_subkey] + xp[key][subkey][s_subkey]
                    except Exception:
                        pass
                else:
                    for i, d in enumerate(xp[key]):
                        for c in d.keys():
                            for elt in d[c].keys():
                                summed_results[key][c][i][elt] = summed_results[key][c][i][elt] + xp[key][i][c][elt]
                    
    avg_results = copy.deepcopy(summed_results)  
    
    for key in avg_results.keys():
        # Instances metrics
        if key in [
            'ground_truth_inst_metrics', 
            'baseline_inst_sup_metrics', 
            'baseline_inst_unsup_metrics', 
            'rel_gt_inst_sup_metrics', 
            'relational_inst_metrics', 
            'relational_inst_metrics_collab',
            'relational_inst_metrics_collab_threshold'
        ]:
            if key not in ['relational_inst_metrics_collab', 'relational_inst_metrics_collab_threshold']:
                for subkey in avg_results[key]:
                    avg_results[key][subkey] = avg_results[key][subkey] / nb_xp
            else:
                for i, d in enumerate(avg_results[key]):
                    for c in d.keys(): 
                        avg_results[key][i][c] = avg_results[key][i][c] / nb_xp
        
        # Activities metrics
        else:
            if key not in [
                'relational_acts_metrics_from_sup_inst_collab', 
                'relational_acts_metrics_from_sup_inst_frozen',
                'relational_acts_metrics_from_sup_inst_collab_threshold', 
                'relational_acts_metrics_from_sup_inst_frozen_threshold'
            ]:
                for subkey in avg_results[key].keys():
                    for s_subkey in avg_results[key][subkey].keys():
                        avg_results[key][subkey][s_subkey] = avg_results[key][subkey][s_subkey] / nb_xp
            else:
                for metric in avg_results[key].keys():
                    for i, d in enumerate(avg_results[key][metric]):
                        for c in d.keys():
                            avg_results[key][metric][i][c] = avg_results[key][metric][i][c] / nb_xp
                
        
    return avg_results, summed_results 

## Application

In [106]:
if resume:
  nb_xp = nb_xp_total
  
avg_results_sup, summed_results_sup = get_avg_results_sup(results_sup, nb_xp, nb_collaborative_iter)
avg_results_unsup, summed_results_unsup = get_avg_results_unsup(results_unsup, nb_xp, nb_collaborative_iter)
avg_metrics, summed_metrics = get_avg_user_metrics(user_metrics, nb_xp, nb_collaborative_iter, summed_results_camel)

Save results dictionary.

In [107]:
with open(results_sup_save_path, 'wb') as f:
    pickle.dump(avg_results_sup, f, protocol=pickle.HIGHEST_PROTOCOL)

with open(results_sup_save_path, 'rb') as f:
    final_results_sup = pickle.load(f)
    
with open(results_unsup_save_path, 'wb') as f:
    pickle.dump(avg_results_unsup, f, protocol=pickle.HIGHEST_PROTOCOL)

with open(results_unsup_save_path, 'rb') as f:
    final_results_unsup = pickle.load(f)
    
with open(user_metrics_save_path, 'wb') as f:
    pickle.dump(avg_metrics, f, protocol=pickle.HIGHEST_PROTOCOL)

with open(user_metrics_save_path, 'rb') as f:
    final_metrics = pickle.load(f)

In [108]:
pprint.pprint(final_results_sup)

{'baseline_activities_sup': {'f1': array([0.        , 0.        , 0.4       , 1.        , 0.88      ,
       0.66666667, 0.28571429, 0.5       , 0.66666667, 0.        ,
       1.        ]),
                             'precision': 0.6470588235294118,
                             'recall': 0.6470588235294118,
                             'unpredicted_labels': [{0, 1, 9}]},
 'baseline_instances_sup': {'f1': 0.7311827956989247,
                            'precision': 0.7391304347826086,
                            'recall': 0.723404255319149},
 'rel_activities_from_sup_inst': {'f1': array([0.        , 0.66666667, 1.        , 1.        , 0.91666667,
       1.        , 0.8       , 0.4       , 1.        , 0.        ,
       1.        ]),
                                  'precision': 0.7941176470588235,
                                  'recall': 0.7941176470588235,
                                  'unpredicted_labels': [{0, 9}]},
 'rel_activities_from_sup_inst_collab': {'f1': [array([0. 

In [109]:
pprint.pprint(final_results_unsup)

{'baseline_instances_unsup': {'f1': 0.5438596491228069,
                              'precision': 0.4626865671641791,
                              'recall': 0.6595744680851063}}


In [100]:
pprint.pprint(final_metrics)

{'baseline_act_sup_metrics': {'avg_length_act': {'ask a question': 0.0,
                                                 'assign issue': 0.0,
                                                 'automated comment issue': 668.25,
                                                 'build system update': 0.0,
                                                 'close pull request': 554.6666666666666,
                                                 'close question': 0.0,
                                                 'commit changes': 13612.384615384615,
                                                 'create issue': 3059.0,
                                                 'distribute situational awareness': 0.0,
                                                 'issue comment': 459.0,
                                                 'issue update': 44573.666666666664,
                                                 'open pull request': 29557.0,
                                                

# Encodings and information

In [101]:
def get_integer_mapping(le):
    '''
    Return a dict mapping labels to their integer values
    from an SKlearn LabelEncoder
    le = a fitted SKlearn LabelEncoder
    '''
    res = {}
    for cl in le.classes_:
        res.update({cl:le.transform([cl])[0]})

    return res

In [102]:
print("Label encoding for activity prediction:")
get_integer_mapping(label_encoder)

Label encoding for activity prediction:


{'ask a question': 0,
 'assign issue': 1,
 'automated comment issue': 2,
 'close pull request': 3,
 'commit changes': 4,
 'create issue': 5,
 'issue comment': 6,
 'issue update': 7,
 'open pull request': 8,
 'provide support': 9,
 'resolve issue': 10}