# **1. Info**
This notebook is intended to apply the naive estimators for the next event occuring and the start time of the next event. The data used is the data preprocessed and outputted by the 2_Preprocessing.ipynb file

# **2. Initialization**
Here is where you load all the dependencies required for the good execution of (most of) the notebook. Here is where you load your libraries and instantiate your functions/classes.

## **2.1 Loading libraries**

In [1]:
import pandas as pd
import numpy as np

import statistics

In [2]:
df_train = pd.read_csv('preprocessed_train.csv')
test = pd.read_csv('preprocessed_test.csv')

## **2.2 Functions**

#### **2.2.1 Function 'naive_event_estimator'**

In [3]:
def naive_event_estimator(df, current_event, position):
    return 'you got pranked bro'

#### **2.2.2 Function improved_event_estimator**

In [4]:
def complex_event_estimator(df, current_event, position, prev_event=None, second_prev_event=None):
    """An estimator for predicting the next event, makes use of the previous 2 events, if they exist

    Args:
        df (pd.DataFrame): preprocessed dataframe
        current_event (str): the name of the current event
        position (int): _description_
        prev_event (str, optional): the event 1 position before the current event. Defaults to None.
        second_prev_event (str, optional): the event 2 positions before the current event. Defaults to None.

    Returns:
        number: Return the most common next event type 
    """

    next_list=[]
    
    for row in df.iterrows(): 
        
        # if 2 previous events exist and are provided, go through the data and record the next event (if it exists) 
        # every time the same 3 events (as 2nd prev, prev, current) occur in the same order
        
        if position >= 3:     
            
            # verify the data point examined also has 2 preceeding events
            if row[1]['event concept:name'] == current_event and row[1]['position'] >= 3:
                
                # check if the 2 previous events in the data match the previous 2 events of our current event
                if df.iloc[row[0]-1]['event concept:name'] == prev_event and df.iloc[row[0]-2]['event concept:name'] == second_prev_event:
                    
                    # check if the next event in the data is a part of the same sequence
                    if row[1]['case concept:name'] == df.iloc[row[0]+1]['case concept:name']:
                        
                        # if yes, store the next event
                        next_list.append(df.iloc[row[0]+1]['event concept:name'])
           
        # if only the last previous event exists, do the same but without having the 2nd previous event
        
        if position == 2:
            if row[1]['event concept:name'] == current_event and row[1]['position'] >= 2:
                if df.iloc[row[0]-1]['event concept:name'] == prev_event:
                    if row[1]['case concept:name'] == df.iloc[row[0]+1]['case concept:name']:
                        next_list.append(df.iloc[row[0]+1]['event concept:name'])
           
        # if no previous events exist, get the most common event that follows after the current one in the data
        
        if position == 1:
            if row[1]['event concept:name'] == current_event:
                if row[1]['case concept:name'] == df.iloc[row[0]+1]['case concept:name']:
                    next_list.append(df.iloc[row[0]+1]['event concept:name'])

        
    return statistics.mode(next_list), next_list

In [5]:
# test['complex_event']= None
# dct={}

# for idx, row in test.iterrows():
    
#         next_event = complex_event_estimator(test, row['event concept:name'], 
#                                              row['position'], row['prev_event'], row['2prev_event'])
        
#         test['complex_event'].iloc[idx] = next_event[0]

### **2.2.3 Function 'naive_time_estimator'**

In [6]:
def naive_time_estimator(df):
    """A naive estimator for predicting the time of the next event

    Args:
        df (pd.DataFrame): A Pandas DataFrame without the naive time estimation

    Returns:
        pd.DataFrame: A Pandas DataFrame with the naive time estimation
    """
    
    df['Prediction time'] = 0

    avg_time = [0] * 175 #create list of avg times until next position,
                        #avg_time[0] will be the avg time from 1->2 and so on.
    n = 1
    
    for index, row in df.iterrows():
        if index < df.shape[0]-1 and row['case concept:name'] == df.iloc[index + 1]['case concept:name']:
            position = row['position']
            t_finish_last = row['timestamp_finish']
            t_start_next_event = df.iloc[index + 1]['timestamp_finish']
            t_delta = t_start_next_event - t_finish_last
            avg_time[position - 1] = (avg_time[position - 1] + t_delta) / n
            n += 1
            
    for index, row in df.iterrows():
        position = row['position']
        t1 = row['timestamp_finish']
        t2 = avg_time[position - 1]
        df['Prediction time'][index] = t1 + t2
        
    return df

# **3. Naive estimation**
The main part of this topic is the Naive estimation of both the events and the time.

## **3.1 Event estimation**

In [7]:
event_estimator = complex_event_estimator(df_train, 'W_Completeren aanvraag', 1, 'A_PREACCEPTED', 'A_PARTLYSUBMITTED')

In [8]:
event_estimator

('W_Completeren aanvraag',
 ['W_Completeren aanvraag',
  'A_ACCEPTED',
  'W_Nabellen offertes',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'A_ACCEPTED',
  'W_Nabellen offertes',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Nabellen offertes',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'A_ACCEPTED',
  'W_Nabellen offertes',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'A_CANCELLED',
  'W_Afhandelen leads',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'A_DECLINED',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'A_CANCELLED',
  'W_Completeren aanvraag',
  'W_Afhandelen leads',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',
  'W_Completeren aanvraag',

### **3.1.1 Results**

Our results for the naive event estimator is that our code does not work properly yet. We have reformatted it, but since then we cannot seem to get it working again. For now, the code is just for interpretation, but will get fixed as soon as possible during the next sprint

## **3.2 Time estimation**

In [9]:
time_estimator = naive_time_estimator(df_train)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


KeyboardInterrupt: 

In [None]:
timer_estimator

### **3.2.1 Results**

Our results for the naive time estimator is that our code does not work properly yet. We have reformatted it, but since then we cannot seem to get it working again. For now, the code is just for interpretation, but will get fixed as soon as possible during the next sprint