## Описание соревнования

<a href=https://www.kaggle.com/competitions/tlvmc-parkinsons-freezing-gait-prediction/overview>Соревнование</a>  на Kaggle посвящено обнаружению эпизодов "застывания походки" (Freezing of Gait, FOG) у людей с болезнью Паркинсона. Необходимо определить начало и конец каждого эпизода застывания. Застывание бывают трех типов: "Застывание при начале движения" (Start Hesitation), "застывание при повороте" (Turn) и "застывание при ходьбе" (Walking). Оценка производится с использованием Mean Average Precision для каждого из типов событий, с последующим усреднением результатов.

## Описание набора данных

Датасет состоит из данных 3D акселерометра, собранных у пациентов с симптомами замедления походки. Он включает два поднабора данных:

1. **tdcsfog**: данные, собранные в лабораторных условиях во время выполнения протокола, провоцирующего FOG.
2. **defog**: данные, собранные в домашних условиях, также во время выполнения FOG-протокола.

Данные в поднаборах **tdcsfog** и **defog** аннотированы экспертами, которые отметили начало, конец и типы эпизодов. 

Файлы метаданных содержат информацию о респондентах, их состоянии и проведенных испытаниях.

## Цель

Цель состоит в том, чтобы разработать модель для обнаружения эпизодов FOG, используя данные акселерометра и дополнительную информацию из метаданных.

## Решение
В этом решении используется библиотека tsfel, которая с помощью скользящего окна генерирует признаки, основанные на различных статистических и частотных характеристиках сигнала. В качестве классификатора используется lightgbm. 

### Результат 139 место из 1379

<a id="0"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 0. Импортирование библиотек и загрузка данных</b></div>

In [None]:
import time, sys
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 

import pandas as pd
import seaborn as sns
import numpy as np
from glob import glob

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV

from sklearn.calibration import CalibratedClassifierCV
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
import lightgbm as lgb

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix, ConfusionMatrixDisplay, precision_score, make_scorer, recall_score, average_precision_score


!pip -q install --no-index --no-deps ../input/tsfel-15-whl/wheelhouse/*.whl

import tsfel

In [None]:
data_dir = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/'

daily_metadata = pd.read_csv(f'{data_dir}daily_metadata.csv')
defog_metadata = pd.read_csv(f'{data_dir}defog_metadata.csv')
tdcsfog_metadata = pd.read_csv(f'{data_dir}tdcsfog_metadata.csv')

events_data = pd.read_csv(f'{data_dir}events.csv')
subjects_data = pd.read_csv(f'{data_dir}subjects.csv')
tasks_data = pd.read_csv(f'{data_dir}tasks.csv')

<a id="1"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 1. Вспомогательные функции </b></div>

In [None]:
def easy_read(path):
    df = pd.read_csv(path)
    df["Id"] = path.split("/")[-1].split(".")[0]
    
    df['time_frac'] = df['Time'] / (len(df) - 1)
    
    return df

In [None]:
# Функция для генерации скорости
def vel_gen(df, dt):
    
    aV = df['AccV'].values
    aML = df['AccML'].values
    aAP = df['AccV'].values
    
    vV = np.zeros(aV.shape[0])
    vML = np.zeros(aML.shape[0])
    vAP = np.zeros(aAP.shape[0])

    v_mean = df['AccV'].mean()
    ml_mean = df['AccML'].mean()
    ap_mean = df['AccAP'].mean()
    
    # Интегрирование ускорения для получения скорости
    for i in range(1, vV.shape[0]):
        vV[i] = vV[i - 1] + (aV[i] - v_mean) * dt
        vML[i] = vML[i - 1] + (aML[i] - ml_mean) * dt
        vAP[i] = vAP[i - 1] + (aAP[i] - ap_mean) * dt

        
    df['VelV'] = vV
    df['VelML'] = vML
    df['VelAP'] = vAP
        
    return df

In [None]:
cfg_file = {'temporal': {'Absolute energy': {'complexity': 'log',
   'description': 'Computes the absolute energy of the signal.',
   'function': 'tsfel.abs_energy',
   'parameters': '',
   'n_features': 1,
   'use': 'yes',
   'tag': 'audio'},
  'Area under the curve': {'complexity': 'log',
   'description': 'Computes the area under the curve of the signal computed with trapezoid rule.',
   'function': 'tsfel.auc',
   'parameters': {'fs': 100},
   'n_features': 1,
   'use': 'yes'},
  'Centroid': {'complexity': 'constant',
   'description': 'Computes the centroid along the time axis.',
   'function': 'tsfel.calc_centroid',
   'parameters': {'fs': 100},
   'n_features': 1,
   'use': 'yes'},
  'Mean absolute diff': {'complexity': 'constant',
   'description': 'Computes mean absolute differences of the signal.',
   'function': 'tsfel.mean_abs_diff',
   'parameters': '',
   'n_features': 1,
   'use': 'yes'},
  'Mean diff': {'complexity': 'constant',
   'description': 'Computes mean of differences of the signal.',
   'function': 'tsfel.mean_diff',
   'parameters': '',
   'n_features': 1,
   'use': 'yes'},
  'Median absolute diff': {'complexity': 'constant',
   'description': 'Computes median absolute differences of the signal.',
   'function': 'tsfel.median_abs_diff',
   'parameters': '',
   'n_features': 1,
   'use': 'yes'},
  'Median diff': {'complexity': 'constant',
   'description': 'Computes median of differences of the signal.',
   'function': 'tsfel.median_diff',
   'parameters': '',
   'n_features': 1,
   'use': 'yes'},
  'Neighbourhood peaks': {'complexity': 'constant',
   'description': 'Computes the number of peaks from a defined neighbourhood of the signal.',
   'function': 'tsfel.neighbourhood_peaks',
   'parameters': {'n': 10},
   'n_features': 1,
   'use': 'yes'},
  'Peak to peak distance': {'complexity': 'constant',
   'description': 'Computes the peak to peak distance.',
   'function': 'tsfel.pk_pk_distance',
   'parameters': '',
   'n_features': 1,
   'use': 'yes'},
  'Signal distance': {'complexity': 'constant',
   'description': 'Computes signal traveled distance.',
   'function': 'tsfel.distance',
   'parameters': '',
   'n_features': 1,
   'use': 'yes'},
  'Slope': {'complexity': 'log',
   'description': 'Computes the slope of the signal by fitting a linear equation to the observed data.',
   'function': 'tsfel.slope',
   'parameters': '',
   'n_features': 1,
   'use': 'yes'},
  'Sum absolute diff': {'complexity': 'constant',
   'description': 'Computes sum of absolute differences of the signal.',
   'function': 'tsfel.sum_abs_diff',
   'parameters': '',
   'n_features': 1,
   'use': 'yes'}}}

# Функция для извлечения временных характеристик
def add_f(data, w_size, f_s):
    df = tsfel.time_series_features_extractor(cfg_file, data[['AccV', 'AccML', 'AccAP', 'VelV', 'VelML', 'VelAP']], verbose=0, fs=f_s, window_size=w_size)
    df = df.set_index(pd.Index([i * w_size for i in range(len(df))])) 
    data = data.merge(df, how="left", left_index=True, right_index=True)
    data.fillna(method="ffill", inplace=True)
    
    return data

<a id="2"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 2. Работа с набором defog </b></div>
<a id="2.1"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> 2.1 Подготовка данных к обучению</div>

In [None]:
paths = glob(f"/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/defog/*")
train_defog_df = pd.concat([add_f(vel_gen(easy_read(p), 1/100), 500, 100) for p in tqdm(paths)], axis=0)
train_defog_df = train_defog_df.merge(defog_metadata, left_on='Id', right_on='Id')
train_defog_df = train_defog_df.merge(subjects_data, left_on=['Subject', 'Visit'], right_on=['Subject', 'Visit'])
    
    
train_defog_df = train_defog_df[(train_defog_df['Valid'] == True) & (train_defog_df['Task'] == True)]
train_defog_df = train_defog_df.drop(['Valid', 'Task', 'Id', 'Subject'], axis=1)

sex_encoder = LabelEncoder()
medication_encoder = LabelEncoder()

train_defog_df["Medication"] = sex_encoder.fit_transform(train_defog_df["Medication"])
train_defog_df["Sex"] = medication_encoder.fit_transform(train_defog_df["Sex"])
train_defog_df = train_defog_df.set_index([pd.Index(range(len(train_defog_df)))])

In [None]:
cols = [c for c in train_defog_df.columns if c not in ['StartHesitation', 'Turn' , 'Walking']]
pcols = ['StartHesitation', 'Turn' , 'Walking']

best_params_ = {'colsample_bytree': 0.5282057895135501,
 'max_depth': 8,
 'min_child_weight': 3.1233911067827616,
 'n_estimators': 200,
 'subsample': 0.9961057796456088,
 }

def custom_average_precision(y_true, y_pred):
    score = average_precision_score(y_true, y_pred)
    return 'average_precision', score, True

X_defog_train, X_defog_test, y_defog_train, y_defog_test= train_test_split(train_defog_df[cols],
                                                                           train_defog_df[pcols],test_size=0.3, random_state=42)

del train_defog_df

<a id="2.2"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> 2.2 Обучение модели и получение предсказаний для тестового набора</div>

In [None]:
aps = []
models = []

for label in pcols :
    
    y_tr= y_defog_train[label]
    y_ts = y_defog_test[label]
    scale_pos_weight= np.sqrt(y_tr.value_counts()[0]/y_tr.value_counts()[1])
    multioutput_classifier = lgb.LGBMClassifier(objective="binary",scale_pos_weight=scale_pos_weight,max_depth=9)
    
    multioutput_classifier.fit(
                X_defog_train, y_tr,
                eval_set=(X_defog_test, y_ts),
                eval_metric=custom_average_precision,
                callbacks=[lgb.early_stopping(20)]
            )

    calibrated_classifier = CalibratedClassifierCV(multioutput_classifier, cv='prefit', method='sigmoid')
    calibrated_classifier.fit(X_defog_train, y_tr)

    ap = average_precision_score(y_ts, calibrated_classifier.predict(X_defog_test))
    print("average percision for {} = {}".format(label,ap))
    print("recall for {} = {} ".format(label,recall_score(y_ts, calibrated_classifier.predict(X_defog_test))))
    aps.append(ap)
    models.append(calibrated_classifier)

print('ap for 3 labels = {}'.format(np.mean(aps)))


In [None]:
def defog_sub(Id):
  
    test_defog_df = pd.read_csv(f'{data_dir}test/defog/{Id}.csv')
    test_defog_df['Id'] = Id

    test_defog_df['time_frac'] = test_defog_df['Time'] / (len(test_defog_df) - 1)

    
    test_defog_df = add_f(vel_gen(test_defog_df, 1/100), 500, 100)

    
    test_defog_df = test_defog_df.merge(defog_metadata, left_on='Id', right_on='Id')
    test_defog_df = test_defog_df.merge(subjects_data, left_on=['Subject', 'Visit'], right_on=['Subject', 'Visit'])

    test_defog_df = test_defog_df.drop(['Id', 'Subject'], axis=1)    

    test_defog_df["Medication"] = sex_encoder.transform(test_defog_df["Medication"])
    test_defog_df["Sex"] = medication_encoder.transform(test_defog_df["Sex"])
            
        
    res_vals = []
    defog_submit = pd.DataFrame({'Id' : [Id + '_' + str(i) for i in range(len(test_defog_df))]})
    for i,label in enumerate(pcols):
        pred = models[i].predict_proba(test_defog_df[cols])[:, 1]
        defog_submit[label] = pred
    
                
    return defog_submit


In [None]:
pathh = glob(f"/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/defog/*")
defog_submit = pd.concat([defog_sub(p.split("/")[-1].split(".")[0]) for p in pathh], axis=0)

<a id="3"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 3. Работа с набором tdcsfog </b></div>
<a id="3.1"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> 3.1 Подготовка данных к обучению</div>

In [None]:
paths = glob(f"/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/tdcsfog/*")
train_tdcsfog_df = pd.concat([add_f(vel_gen(easy_read(p), 1/128), 500, 128) for p in paths], axis=0)
train_tdcsfog_df = train_tdcsfog_df.merge(tdcsfog_metadata, left_on='Id', right_on='Id')
train_tdcsfog_df = train_tdcsfog_df.merge(subjects_data, left_on=['Subject'], right_on=['Subject'])

        
train_tdcsfog_df = train_tdcsfog_df.drop(['Id', 'Subject', 'Visit_x', 'Visit_y'], axis=1)


sex_encoder = LabelEncoder()
medication_encoder = LabelEncoder()


train_tdcsfog_df["Medication"] = sex_encoder.fit_transform(train_tdcsfog_df["Medication"])
train_tdcsfog_df["Sex"] = medication_encoder.fit_transform(train_tdcsfog_df["Sex"])
train_tdcsfog_df = train_tdcsfog_df.set_index([pd.Index(range(len(train_tdcsfog_df)))]) 

In [None]:
cols = [c for c in train_tdcsfog_df.columns if c not in ['StartHesitation', 'Turn' , 'Walking']]
pcols = ['StartHesitation', 'Turn' , 'Walking']

best_params_ = {'colsample_bytree': 0.5282057895135501,
 'max_depth': 8,
 'min_child_weight': 3.1233911067827616,
 'n_estimators': 200,
 'subsample': 0.9961057796456088,
 }

def custom_average_precision(y_true, y_pred):
    score = average_precision_score(y_true, y_pred)
    return 'average_precision', score, True

X_tdcsfog_train, X_tdcsfog_test, y_tdcsfog_train, y_tdcsfog_test = train_test_split(train_tdcsfog_df[cols],train_tdcsfog_df[pcols],test_size=0.3, random_state=42)

del train_tdcsfog_df

<a id="3.2"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> 3.2 Обучение модели и получение предсказаний для тестового набора</div>

In [None]:
aps = []
models = []

for label in pcols :
    
    y_tr= y_tdcsfog_train[label]
    y_ts =  y_tdcsfog_test[label]
    scale_pos_weight= np.sqrt(y_tr.value_counts()[0]/y_tr.value_counts()[1])
    multioutput_classifier = lgb.LGBMClassifier(objective="binary",scale_pos_weight=scale_pos_weight,max_depth=9)
    
    multioutput_classifier.fit(
                X_tdcsfog_train, y_tr,
                eval_set=(X_tdcsfog_test, y_ts),
                eval_metric=custom_average_precision,
                callbacks=[lgb.early_stopping(20)]
            )

    calibrated_classifier = CalibratedClassifierCV(multioutput_classifier, cv='prefit', method='sigmoid')
    calibrated_classifier.fit(X_tdcsfog_train, y_tr)

    ap = average_precision_score(y_ts, calibrated_classifier.predict(X_tdcsfog_test))
    print("average percision for {} = {}".format(label,ap))
    print("recall for {} = {} ".format(label,recall_score(y_ts, calibrated_classifier.predict(X_tdcsfog_test))))
    aps.append(ap)
    models.append(calibrated_classifier)

print('ap for 3 labels = {}'.format(np.mean(aps)))

In [None]:
del X_tdcsfog_train
del X_tdcsfog_test
del y_tdcsfog_train
del y_tdcsfog_test

In [None]:
def tdcsfog_sub(Id):
    test_tdcsfog_df = pd.read_csv(f'{data_dir}test/tdcsfog/{Id}.csv')
    test_tdcsfog_df['Id'] = Id

    test_tdcsfog_df['time_frac'] = test_tdcsfog_df['Time'] / (len(test_tdcsfog_df) - 1)
    
    test_tdcsfog_df = add_f(vel_gen(test_tdcsfog_df, 1/128), 500, 128)

    
    test_tdcsfog_df = test_tdcsfog_df.merge(tdcsfog_metadata, left_on='Id', right_on='Id')
    test_tdcsfog_df = test_tdcsfog_df.merge(subjects_data, left_on=['Subject'], right_on=['Subject'])
    
    test_tdcsfog_df = test_tdcsfog_df.drop(['Id', 'Subject', 'Visit_x', 'Visit_y'], axis=1)    

    test_tdcsfog_df["Medication"] = sex_encoder.transform(test_tdcsfog_df["Medication"])
    test_tdcsfog_df["Sex"] = medication_encoder.transform(test_tdcsfog_df["Sex"])
        
        
    res_vals = []
    tdcsfog_submit = pd.DataFrame({'Id' : [Id + '_' + str(i) for i in range(len(test_tdcsfog_df))]})
    for i,label in enumerate(pcols):
        pred = models[i].predict_proba(test_tdcsfog_df[cols])[:, 1]
        tdcsfog_submit[label] = pred

    return tdcsfog_submit


In [None]:
pathh = glob(f"/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/tdcsfog/*")
tdcsfog_submit = pd.concat([tdcsfog_sub(p.split("/")[-1].split(".")[0]) for p in pathh], axis=0)

In [None]:
submit = pd.concat([tdcsfog_submit, defog_submit], axis=0)


In [None]:
submit.to_csv('submission.csv', index=False)