In [10]:
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view


In [7]:
X_guided = np.load("./guided/guided_dataset_X.npy")
Y_guided = np.load("./guided/guided_dataset_y.npy")
testset = np.load("./guided/guided_testset_X.npy")


In [8]:
print(testset.shape)
print(X_guided.shape)
print(Y_guided.shape)

(5, 332, 8, 500)
(5, 8, 230000)
(5, 51, 230000)


#### Question 1

In [9]:
from scipy.signal import butter, filtfilt, firwin

#### Question 2

For this question, we decided to use the sliding_window_view function from the Numpy library for several reasons:

-Numpy functions are built in C, making them faster than implementing a double loop in Python.

-The sliding_window_view function creates a view of the array rather than copying the data, minimizing memory usage.

-The function simplifies the implementation by automating window creation and indexing.

In [16]:
def create_overlap_windows(x,y,window_size, axis,overlap):

    step = int(window_size * (1 - overlap))

    # sliding_windows_view Generate all possible windows with the corresponding step, that not what we want.
    x_w = sliding_window_view(x,window_size,axis)
    y_w = sliding_window_view(y,window_size,axis)

    print(x_w.shape)
    print(y_w.shape)

    # We only keep the windows where the step is a multiple of our step 
    x_w = x_w[:,:,::step,:]
    y_w = y_w[:,:,::step,:]

    print(x_w.shape)
    print(y_w.shape)

    # We transpose the axes windows and electrode/signal 
    x_w = x_w.transpose(0, 2, 1, 3)     #  (session, window, electrode, time) and not  (session, electrode, window, time)
    y_w = y_w.transpose(0, 2, 1, 3) 
    
    print(x_w.shape)
    print(y_w.shape)


    return x_w, y_w

x_windows,y_windows = create_overlap_windows(X_guided,Y_guided,500,2,0.5)


print(x_w[0,0,:,250:])
print(x_w[0,1,:,:250])


(5, 8, 229501, 500)
(5, 51, 229501, 500)
(5, 8, 919, 500)
(5, 51, 919, 500)
(5, 919, 8, 500)
(5, 919, 51, 500)
[[  1.0986581   -8.41673625  -4.84972558 ...  -1.76873947  -6.56577493
  -10.88819442]
 [ 12.21837735  15.3262705    4.77276143 ...  -3.94538397  -5.50181954
   -7.3692657 ]
 [ -1.02035145   2.75337535 -18.0932997  ...  -9.69223461 -21.23125079
   -6.27851702]
 ...
 [ -4.18208003  -8.56125167  -3.20060061 ... -22.45992589 -95.24715478
  -53.12527546]
 [-14.74996522 -16.58254728 -10.64747431 ...  -1.2357502  -12.37910359
   -1.52116206]
 [-11.45309457 -12.52041445   2.03051126 ...   1.56962661   1.08560054
   -2.22653059]]
[[  1.0986581   -8.41673625  -4.84972558 ...  -1.76873947  -6.56577493
  -10.88819442]
 [ 12.21837735  15.3262705    4.77276143 ...  -3.94538397  -5.50181954
   -7.3692657 ]
 [ -1.02035145   2.75337535 -18.0932997  ...  -9.69223461 -21.23125079
   -6.27851702]
 ...
 [ -4.18208003  -8.56125167  -3.20060061 ... -22.45992589 -95.24715478
  -53.12527546]
 [-14.74

#### Question 3

For this question, we have thought about various methods of cross validation. First, our data are continous because it's a signal, so preserving temporal structure is important. We can’t use a method of cross validation which randomly shuffles our windows. 

We also need to prevents data leaking so we can't use a methode who use the windows of one session for training AND validation because we have overlapping data in each session, two windows in the same session can share the same datas, and if these two windows are in train and validation, it will lead to data leakage and overly optimistic performance (data in the train set will also be in the validation set). 

So it's naturally that we have chosen the "Leave One Group Out" method, this method will use each session as the validation set once and the other for training. We completly prevent data leakage because each session is indepandent from the other, and we reduce the bias because each session will be used for validation.

In our case, "LOGO" and "GroupKFold(5)" produce the same splits, but we choose "LOGO" because it's more explicit, readers will immediatly see that we use one session for validation each time while "GroupKFold" need to have 5 in parameter to do the same thong

In [28]:
n_sessions = x_windows.shape[0]
n_windows = x_windows.shape[1]
print(n_sessions,n_windows)

groups = np.repeat(np.arange(1,n_sessions+1),n_windows ) # 111 (919 times), 222 (919 times), ...
print(groups)
print(groups.shape)

# We need to flatten the dataset x and y because the function logo (and latter "croos_val_score"
# want all the data in a list form, we will know have for the dataset X for exemple.
# [4595, 8, 500] and not [5,919,8,500], 4595 is the multiplication of 5 and 919,
# Now all the windows are store in a list and the "groups" list above allow the function 
# logo to know at wich session each windows belong
# The windows 3 for example (x_windows_flat[2]) belong to the sessions groups[2] = 1
x_windows_flat = x_windows.reshape(n_sessions * n_windows, x_windows.shape[2], x_windows.shape[3])


5 919
[1 1 1 ... 5 5 5]
(4595,)


In [12]:
from sklearn.model_selection import LeaveOneGroupOut

logo = LeaveOneGroupOut()