Nonparametric estimation of the baseline and the kernel by an Expectation Maximization algorithm using piecewise functions

EM package : custom functions inspired by ticks library

In [1]:
import EM
import pandas as pd
import numpy as np

In [2]:
import matplotlib.pyplot as plt

30 trading periods for the month of April 2015

In [3]:
df_avril =pd.read_csv(filepath_or_buffer="dates_avril_2015.csv", parse_dates=['date'])

In [4]:
liste_date = df_avril['date'].tolist()
n = len(liste_date)

In [5]:
filename_a ="timestamp_"
filename_c =".csv"
filename_ps_a ="delivery_start_"

In [6]:
ticks_avril_2015 = []

In [7]:
for i in range(n):
    s = filename_a + str(i) + filename_c
    s_ps = filename_ps_a + str(i) + filename_c    
    df_i = pd.read_csv(filepath_or_buffer=s, parse_dates=['timestamp'])
    df_ps_i = pd.read_csv(filepath_or_buffer=s_ps, parse_dates=['trading_start']) 
    ticks_avril_2015.append((np.float64(df_i['timestamp'].values - df_ps_i['trading_start'].values))/(1e9*3600*8.25))

To instance the EM class, the size $\eta$ of the kernel support $[0,\eta]$ may be specified in order to deal with different scales for the baseline and the kernel

Default values :
- kernel_support=1
- n_bins_baseline=10
- n_bins_kernel=10
- end_time=1

In [8]:
ticksEM= EM.EM(kernel_support=1,n_bins_baseline=10, n_bins_kernel=10)

fit method loads the list of arrays (one array of dates belonging to $[0,1]$ per period)

In [9]:
ticksEM.fit(ticks_avril_2015)

<EM.EM at 0x1693b311a90>

Estimated values are stored in baseline and kernel

In [10]:
ticksEM.baseline

array([0.70779872, 1.66666667, 1.39315921, 1.27053389, 2.02143165,
       2.48299122, 1.30990209, 1.59571543, 1.34706924, 0.96903935])

In [11]:
ticksEM.kernel

array([11.47044085,  2.64418194,  2.09696698,  1.96410044,  2.02174424,
        4.11959899,  8.3366716 ,  4.44600228,  2.73558957, 24.40229089])

Preliminary values for the baseline can be used

In [16]:
baseline_0 = np.array([1 , 1 , 1 , 1, 200])

In [17]:
ticksEM_0= EM.EM(kernel_support = 0.1, n_bins_baseline = 5)

In [18]:
ticksEM_0.fit(events=ticks_avril_2015,baseline_start=baseline_0)

<EM.EM at 0x28170671e48>

In [19]:
ticksEM_0.baseline

array([1.19187053, 1.30944064, 2.51270068, 1.64763288, 1.37064803])

In [20]:
ticksEM_0.baseline_start

array([  1,   1,   1,   1, 200])

In [21]:
ticksEM_0.kernel

array([ 1.53042967e+01,  1.87266712e-01,  1.98393727e-01,  6.66078564e+00,
        2.68167196e+02, -3.61028561e+01, -1.83927774e-16,  4.16929487e-54,
        3.03018418e-62,  5.15585000e-81,  9.46197224e+00])