<a href="https://www.kaggle.com/code/mohdmuttalib/parkinson-s-freezing-of-gait-prediction-eda?scriptVersionId=132688938" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<a id="0"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 0. Import all dependencies </b></div>

In [1]:
import os
import random
import cv2
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt

In [2]:
class color:
   PURPLE = '\033[95m'
   CYAN = '\033[96m'
   DARKCYAN = '\033[36m'
   BLUE = '\033[94m'
   GREEN = '\033[92m'
   YELLOW = '\033[93m'
   RED = '\033[91m'
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'

<a id="1"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 1. Overview availiable directories </b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ The data series include three datasets, collected under distinct circumstances:</p>

* <p style="font-family: consolas; font-size: 16px;">The <b>tDCS FOG</b> (<code>tdcsfog</code>) dataset, comprising data series collected in the lab, as subjects completed a FOG-provoking protocol.</p>
* <p style="font-family: consolas; font-size: 16px;">The <b>DeFOG</b> (<code>defog</code>) dataset, comprising data series collected in the subject's home, as subjects completed a FOG-provoking protocol.</p>
* <p style="font-family: consolas; font-size: 16px;">The <b>Daily Living</b> (<code>daily</code>) dataset, comprising one week of continuous 24/7 recordings from sixty-five subjects. Forty-five subjects exhibit FOG symptoms and also have series in the <code>defog</code> dataset, while the other twenty subjects do not exhibit FOG symptoms and do not have series elsewhere in the data.</p>


<p style="font-family: consolas; font-size: 16px;">⚪ Trials from the <code>tdcsfog</code> and <code>defog</code> datasets were videotaped and annotated by expert reviewers documented the freezing of gait episodes. That is, the start, end and type of each episode were marked by the experts. Series in the <code>daily</code> dataset are unannotated.</p>
<p style="font-family: consolas; font-size: 16px;">⚪ You will be detecting FOG episodes for the <code>tdcsfog</code> and <code>defog</code> series. You may wish to apply unsupervised or semi-supervised methods to the series in the <code>daily</code> dataset to support your detection modelling.</p>
<p style="font-family: consolas; font-size: 16px;">🔴 See this page for more on these datasets as well as video examples of freezing of gait events: <a href="https://www.kaggle.com/competitions/tlvmc-parkinsons-freezing-gait-prediction/overview/additional-data-documentation"><strong>Additional Data Documentation</strong></a>.</p>

<a id="1.1"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> <b> 1.1 Overview <i>train/</i> directory</b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>train/</b> Folder containing the data series in the training set within three subfolders: <b>tdcsfog/</b>, <b>defog/</b>, and <b>notype/</b>. Series in the notype folder are from the <code>defog</code> dataset but lack event-type annotations. The fields present in these series vary by folder.</p>

* <p style="font-family: consolas; font-size: 16px;"><code>Time</code> An integer timestep. Series from the <code>tdcsfog</code> dataset are recorded at 128Hz (128 timesteps per second), while series from the <code>defog</code> and <code>daily</code> series are recorded at 100Hz (100 timesteps per second).</p>
* <p style="font-family: consolas; font-size: 16px;"><code>AccV</code>, <code>AccML</code>, and <code>AccAP</code> Acceleration in units of g, from a lower-back sensor on three axes: V - vertical, ML - mediolateral, AP - anteroposterior.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>StartHesitation</code>, <code>Turn</code>, <code>Walking</code> Indicator variables for the occurrence of each of the event types.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Event</code> Indicator variable for the occurrence of any FOG-type event. Present only in the <b>notype</b> series, which lack type-level annotations.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Valid</code> There were cases during the video annotation that were hard for the annotator to decide if there was an Akinetic (i.e., essentially no movement) FoG or the subject stopped voluntarily. Only event annotations where the series is marked true should be considered as unambiguous.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Task</code> Series were only annotated where this value is <b>true</b>. Portions marked <b>false</b> should be considered unannotated.</p>

<p style="font-family: consolas; font-size: 16px;">❔ let's check what directories are in the train folder.</p>

In [3]:
os.listdir("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train")

['defog', 'tdcsfog', 'notype']

In [4]:
temp = len(os.listdir("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/tdcsfog"))
print(
    f"Number of files in folder tdcsfog/: {color.BLUE}{temp}{color.END}",
)

Number of files in folder tdcsfog/: [94m833[0m


<p style="font-family: consolas; font-size: 16px;">❔ How the data looks like.</p>

In [5]:
train_tdcsfog_example_df = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/tdcsfog/003f117e14.csv")

In [6]:
temp = len(train_tdcsfog_example_df)
print(
    f"Length of dataframe: {color.BLUE}{temp}{color.END}",
)

Length of dataframe: [94m4682[0m


In [7]:
train_tdcsfog_example_df.head()

Unnamed: 0,Time,AccV,AccML,AccAP,StartHesitation,Turn,Walking
0,0,-9.533939,0.566322,-1.413525,0,0,0
1,1,-9.53614,0.564137,-1.440621,0,0,0
2,2,-9.529345,0.561765,-1.429332,0,0,0
3,3,-9.531239,0.564227,-1.41549,0,0,0
4,4,-9.540825,0.561854,-1.429471,0,0,0


In [8]:
train_tdcsfog_example_df.describe()

Unnamed: 0,Time,AccV,AccML,AccAP,StartHesitation,Turn,Walking
count,4682.0,4682.0,4682.0,4682.0,4682.0,4682.0,4682.0
mean,2340.5,-9.151214,0.753518,2.471637,0.0,0.168304,0.0
std,1351.72131,1.38439,1.102125,2.239906,0.0,0.374176,0.0
min,0.0,-23.796051,-9.09737,-7.353417,0.0,0.0,0.0
25%,1170.25,-9.537719,0.322877,1.966646,0.0,0.0,0.0
50%,2340.5,-9.234702,0.580891,3.137857,0.0,0.0,0.0
75%,3510.75,-8.47046,1.368355,3.819931,0.0,0.0,0.0
max,4681.0,-3.91559,5.996704,10.28108,0.0,1.0,0.0


<p style="font-family: consolas; font-size: 16px;">❔ Is there any NaN values in the dataframe.</p>

In [9]:
train_tdcsfog_example_df.isnull().sum()

Time               0
AccV               0
AccML              0
AccAP              0
StartHesitation    0
Turn               0
Walking            0
dtype: int64

<p style="font-family: consolas; font-size: 16px;">❔ How many files are in folder <i>defog/</i>.</p>

In [10]:
temp = len(os.listdir("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/defog"))
print(
    f"Number of files in folder defog/: {color.BLUE}{temp}{color.END}",
)

Number of files in folder defog/: [94m91[0m


<p style="font-family: consolas; font-size: 16px;">❔ How many files are in folder <i>notype/</i>.</p>

In [11]:
temp = len(os.listdir("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/notype"))
print(
    f"Number of files in folder notype/: {color.BLUE}{temp}{color.END}",
)

Number of files in folder notype/: [94m46[0m


<a id="1.2"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> <b> 1.2 Overview <i>test/</i> directory</b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>test/</b> Only the <code>Time</code>, <code>AccV</code>, <code>AccML</code>, and <code>AccAP</code> fields are provided for the test series.</p>


In [12]:
os.listdir("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test")

['defog', 'tdcsfog']

<p style="font-family: consolas; font-size: 16px;">❔ How many files are in folder <i>tdcsfog/</i>.</p>

In [13]:
temp = len(os.listdir("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/tdcsfog"))
print(
    f"Number of files in folder tdcsfog/: {color.BLUE}{temp}{color.END}",
)

Number of files in folder tdcsfog/: [94m1[0m


<p style="font-family: consolas; font-size: 16px;">❔ How many files are in folder <i>defog/</i>.</p>

In [14]:
temp = len(os.listdir("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/defog"))
print(
    f"Number of files in folder defog/: {color.BLUE}{temp}{color.END}",
)

Number of files in folder defog/: [94m1[0m


<p style="font-family: consolas; font-size: 16px;">❔ How the test data looks like.</p>

In [15]:
test_tdcsfog_example_df = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/tdcsfog/003f117e14.csv")

In [16]:
temp = len(test_tdcsfog_example_df)
print(
    f"Length of dataframe: {color.BLUE}{temp}{color.END}",
)

Length of dataframe: [94m4682[0m


In [17]:
test_tdcsfog_example_df.head()

Unnamed: 0,Time,AccV,AccML,AccAP
0,0,-9.533939,0.566322,-1.413525
1,1,-9.53614,0.564137,-1.440621
2,2,-9.529345,0.561765,-1.429332
3,3,-9.531239,0.564227,-1.41549
4,4,-9.540825,0.561854,-1.429471


In [18]:
test_tdcsfog_example_df.describe()

Unnamed: 0,Time,AccV,AccML,AccAP
count,4682.0,4682.0,4682.0,4682.0
mean,2340.5,-9.151214,0.753518,2.471637
std,1351.72131,1.38439,1.102125,2.239906
min,0.0,-23.796051,-9.09737,-7.353417
25%,1170.25,-9.537719,0.322877,1.966646
50%,2340.5,-9.234702,0.580891,3.137857
75%,3510.75,-8.47046,1.368355,3.819931
max,4681.0,-3.91559,5.996704,10.28108


<a id="1.3"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> <b> 1.3 Overview <i>unlabeled/</i> directory</b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>unlabeled/</b> Folder containing the unannotated data series from the <code>daily</code> dataset, one series per subject. Forty-five of the subjects also have series in the <code>defog</code> dataset, some in the training split and some in the test split.</p>


<p style="font-family: consolas; font-size: 16px;">❔ How many files are in folder <i>unlabeled/</i>.</p>

In [19]:
temp = len(os.listdir("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/unlabeled"))
print(
    f"Number of files in folder unlabeled/: {color.BLUE}{temp}{color.END}",
)

Number of files in folder unlabeled/: [94m65[0m


<p style="font-family: consolas; font-size: 16px;">❔ How the data looks like.</p>

In [20]:
unlabeled_example_df = pd.read_parquet("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/unlabeled/00c4c9313d.parquet")

In [21]:
unlabeled_example_df.head()

Unnamed: 0,Time,AccV,AccML,AccAP
0,0,0.328125,-0.109375,0.671875
1,1,0.453108,-0.124721,0.811273
2,2,0.423042,-0.264046,0.921238
3,3,0.150015,-0.310241,0.937483
4,4,-0.202003,-0.545908,0.890842


In [22]:
unlabeled_example_df.describe()

Unnamed: 0,Time,AccV,AccML,AccAP
count,69722390.0,69722390.0,69722390.0,69722390.0
mean,34861190.0,-0.5277971,-0.08455515,-0.04769068
std,20127120.0,0.4339047,0.4878394,0.5178868
min,0.0,-7.035982,-6.183435,-6.459452
25%,17430600.0,-0.953125,-0.1875,-0.328125
50%,34861190.0,-0.59375,-0.015625,-0.0625
75%,52291790.0,-0.07835454,0.1253366,0.2291177
max,69722390.0,2.936576,6.082396,7.138464


<a id="2"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 2. Overview availiable csv files </b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>tdcsfog_metadata.csv</b> Identifies each series in the <code>tdcsfog</code> dataset by a unique <code>Subject, Visit, Test, Medication</code> condition.</p>

* <p style="font-family: consolas; font-size: 16px;"><code>Visit</code> Lab visits consist of a baseline assessment, two post-treatment assessments for different treatment stages, and one follow-up assessment.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Test</code> Which of three test types was performed, with 3 the most challenging.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Medication</code> Subjects may have been either off or on anti-parkinsonian medication during the recording.</p>

<p style="font-family: consolas; font-size: 16px;">❔ How the data looks like.</p>

In [23]:
tdcsfog_metadata_df = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/tdcsfog_metadata.csv") 

In [24]:
tdcsfog_metadata_df.head()

Unnamed: 0,Id,Subject,Visit,Test,Medication
0,003f117e14,4dc2f8,3,2,on
1,009ee11563,f62eec,4,2,on
2,011322847a,231c3b,2,2,on
3,01d0fe7266,231c3b,2,1,off
4,024418ba39,fa8764,19,3,on


In [25]:
tdcsfog_metadata_df.describe()

Unnamed: 0,Visit,Test
count,833.0,833.0
mean,6.460984,1.97479
std,6.171914,0.813402
min,2.0,1.0
25%,2.0,1.0
50%,4.0,2.0
75%,5.0,3.0
max,20.0,3.0


<p style="font-family: consolas; font-size: 16px;">❔ What is length of the dataframe.</p>

In [26]:
temp = len(tdcsfog_metadata_df)
print(
    f"Length of the tdcsfog_metadata.csv file is: {color.BLUE}{temp}{color.END}",
)

Length of the tdcsfog_metadata.csv file is: [94m833[0m


<p style="font-family: consolas; font-size: 16px;">❔ How many unique subjects the dataframe has.</p>

In [27]:
temp = len(tdcsfog_metadata_df.Subject.unique())
print(
    f"Number of unique subjects: {color.BLUE}{temp}{color.END}",
)

Number of unique subjects: [94m62[0m


<p style="font-family: consolas; font-size: 16px;">❔ How the data for an unique subject looks like.</p>

In [28]:
unique_subject_id = "13abfd"
tdcsfog_metadata_df[tdcsfog_metadata_df.Subject == unique_subject_id]

Unnamed: 0,Id,Subject,Visit,Test,Medication


<p style="font-family: consolas; font-size: 16px;">❔ Is there any missing data in the dataframe.</p>

In [29]:
tdcsfog_metadata_df.isnull().sum()

Id            0
Subject       0
Visit         0
Test          0
Medication    0
dtype: int64

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Visit</code>.</p>

In [30]:
tdcsfog_visit_counts = tdcsfog_metadata_df.Visit.value_counts()

fig = px.bar(x=tdcsfog_visit_counts.index, y=tdcsfog_visit_counts.values)
fig.update_layout(xaxis_title="Visit", yaxis_title="Count")
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Test</code>.</p>

In [31]:
tdcsfog_test_counts = tdcsfog_metadata_df.Test.value_counts()

fig = px.bar(x=tdcsfog_test_counts.index, y=tdcsfog_test_counts.values)
fig.update_layout(xaxis_title="Test", yaxis_title="Count")
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Medication</code>.</p>

In [32]:
tdcsfog_medication_counts = tdcsfog_metadata_df.Medication.value_counts()

fig = px.bar(x=tdcsfog_medication_counts.index, y=tdcsfog_medication_counts.values)
fig.update_layout(xaxis_title="Medication", yaxis_title="Count")
fig.show()

<a id="2.2"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> <b> 2.2 Overview <i>defog_metadata.csv</i> file</b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>defog_metadata.csv</b> Identifies each series in the <code>defog</code> dataset by a unique <code>Subject, Visit, Medication</code> condition.</p>

<p style="font-family: consolas; font-size: 16px;">❔ How the data looks like.</p>

In [33]:
defog_metadata_df = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/defog_metadata.csv")

In [34]:
defog_metadata_df.head()

Unnamed: 0,Id,Subject,Visit,Medication
0,02ab235146,e1f62e,2,on
1,02ea782681,ae2d35,2,on
2,06414383cf,8c1f5e,2,off
3,092b4c1819,2874c5,1,off
4,0a900ed8a2,0e3d49,2,on


<p style="font-family: consolas; font-size: 16px;">❔ Is there any missing data.</p>

In [35]:
defog_metadata_df.isnull().sum()

Id            0
Subject       0
Visit         0
Medication    0
dtype: int64

<p style="font-family: consolas; font-size: 16px;">❔ What is length of the dataframe.</p>

In [36]:
temp = len(defog_metadata_df)
print(
    f"Length of the defog_metadata.csv file is: {color.BLUE}{temp}{color.END}",
)

Length of the defog_metadata.csv file is: [94m137[0m


In [37]:
temp = len(defog_metadata_df.Subject.unique())
print(
    f"Number of unique subjects: {color.BLUE}{temp}{color.END}",
)

Number of unique subjects: [94m45[0m


<p style="font-family: consolas; font-size: 16px;">❔ How the data for an unique subject looks like.</p>

In [38]:
unique_subject_id = "bf608b"
defog_metadata_df[defog_metadata_df.Subject == unique_subject_id]

Unnamed: 0,Id,Subject,Visit,Medication


<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Visit</code>.</p>

In [39]:
defog_visit_counts = defog_metadata_df.Visit.value_counts()

fig = px.bar(x=defog_visit_counts.index, y=defog_visit_counts.values)
fig.update_layout(xaxis_title="Visit", yaxis_title="Count")
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Medication</code>.</p>

In [40]:
defog_medication_counts = defog_metadata_df.Medication.value_counts()

fig = px.bar(x=defog_medication_counts.index, y=defog_medication_counts.values)
fig.update_layout(xaxis_title="Medication", yaxis_title="Count")
fig.show()

<a id="2.3"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> <b> 2.3 Overview <i>daily_metadata.csv</i> file</b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>daily_metadata.csv</b> Each series in the <code>daily</code> dataset is identified by the <code>Subject</code> id. This file also contains the time of day the recording began.</p>

<p style="font-family: consolas; font-size: 16px;">❔ How the data looks like.</p>

In [41]:
daily_metadata_df = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/daily_metadata.csv")

In [42]:
daily_metadata_df.head()

Unnamed: 0,Id,Subject,Visit,Beginning of recording [00:00-23:59]
0,00c4c9313d,fba3a3,1,10:19
1,07a96f89ec,7da72f,1,07:30
2,0d1bc672a8,056372,2,08:30
3,0e333c9833,b4bd22,1,11:30
4,164adaed7b,9f72eb,1,13:00


<p style="font-family: consolas; font-size: 16px;">❔ Is there any missing data.</p>

In [43]:
daily_metadata_df.isnull().sum()

Id                                      0
Subject                                 0
Visit                                   0
Beginning of recording [00:00-23:59]    0
dtype: int64

<p style="font-family: consolas; font-size: 16px;">❔ What is length of the dataframe.</p>

In [44]:
temp = len(daily_metadata_df)
print(
    f"Length of the daily_metadata.csv file is: {color.BLUE}{temp}{color.END}",
)

Length of the daily_metadata.csv file is: [94m65[0m


In [45]:
temp = len(daily_metadata_df.Subject.unique())
print(
    f"Number of unique subjects: {color.BLUE}{temp}{color.END}",
)

Number of unique subjects: [94m65[0m


<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Visit</code>.</p>

In [46]:
daily_visit_counts = daily_metadata_df.Visit.value_counts()

fig = px.bar(x=daily_visit_counts.index, y=daily_visit_counts.values)
fig.update_layout(xaxis_title="Visit", yaxis_title="Count")
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for the field <code>Beginning of recording</code>.</p>

In [47]:
daily_bor_counts = daily_metadata_df["Beginning of recording [00:00-23:59]"].value_counts()

fig = px.bar(x=daily_bor_counts.index, y=daily_bor_counts.values)
fig.update_layout(xaxis_title="Beginning of recording", yaxis_title="Count")
fig.show()

<a id="2.4"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> <b> 2.4 Overview <i>subjects.csv</i> file</b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>subjects.csv</b> Metadata for each <code>Subject</code> in the study, including their <code>Age</code> and <code>Sex</code> as well as:</p>

* <p style="font-family: consolas; font-size: 16px;"><code>Visit</code> Only available for subjects in the daily and defog datasets.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>YearsSinceDx</code> Years since Parkinson's diagnosis.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>UPDRSIIIOn</code>/<code>UPDRSIIIOff</code> Unified Parkinson's Disease Rating Scale score during on/off medication respectively.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>NFOGQ</code> Self-report <a href="https://pubmed.ncbi.nlm.nih.gov/19660949/"><strong>FoG questionnaire score</strong></a>.</p>

<p style="font-family: consolas; font-size: 16px;">❔ How the data looks like.</p>

In [48]:
subjects_df = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/subjects.csv")

In [49]:
subjects_df.head()

Unnamed: 0,Subject,Visit,Age,Sex,YearsSinceDx,UPDRSIII_On,UPDRSIII_Off,NFOGQ
0,00f674,2.0,63,M,27.0,43.0,49.0,24
1,00f674,1.0,63,M,27.0,31.0,30.0,26
2,02bc69,,69,M,4.0,21.0,,22
3,040587,2.0,75,M,26.0,52.0,69.0,21
4,040587,1.0,75,M,26.0,47.0,75.0,24


<p style="font-family: consolas; font-size: 16px;">❔ Is there any missing data.</p>

In [50]:
subjects_df.isnull().sum()

Subject          0
Visit           62
Age              0
Sex              0
YearsSinceDx     0
UPDRSIII_On      1
UPDRSIII_Off    41
NFOGQ            0
dtype: int64

<p style="font-family: consolas; font-size: 16px;">❔ What is length of the dataframe.</p>

In [51]:
temp = len(subjects_df)
print(
    f"Length of the subjects.csv file is: {color.BLUE}{temp}{color.END}",
)

Length of the subjects.csv file is: [94m173[0m


In [52]:
temp = len(subjects_df.Subject.unique())
print(
    f"Number of unique subjects: {color.BLUE}{temp}{color.END}",
)

Number of unique subjects: [94m136[0m


<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Visit</code>.</p>

In [53]:
subjects_visit_counts = subjects_df.Visit.value_counts()

fig = px.bar(x=subjects_visit_counts.index, y=subjects_visit_counts.values)
fig.update_layout(xaxis_title="Visit", yaxis_title="Count")
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ What is the distibution of the field <code>Age</code>.</p>

In [54]:
fig = px.histogram(subjects_df, x="Age", nbins=30)
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Sex</code>.</p>

In [55]:
subjects_sex_counts = subjects_df.Sex.value_counts()

fig = px.bar(x=subjects_sex_counts.index, y=subjects_sex_counts.values)
fig.update_layout(xaxis_title="Sex", yaxis_title="Count")
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ What is the distibution of the field <code>YearsSinceDx</code>.</p>

In [56]:
fig = px.histogram(subjects_df, x="YearsSinceDx", nbins=30)
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ What is the distibution of the field <code>UPDRSIII_On</code>.</p>

In [57]:
fig = px.histogram(subjects_df, x="UPDRSIII_On", nbins=30)
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ What is the distibution of the field <code>NFOGQ</code>.</p>

In [58]:
fig = px.histogram(subjects_df, x="NFOGQ", nbins=20)
fig.show()

<a id="2.5"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> <b> 2.5 Overview <i>events.csv</i> file</b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>events.csv</b> Metadata for each FoG event in all data series. The event times agree with the labels in the data series.</p>

* <p style="font-family: consolas; font-size: 16px;"><code>Visit</code> The data series the event occured in.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Init</code> Time (s) the event began.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Completion</code> Time (s) the event ended.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Type</code> Whether <code>StartHesitation</code>, <code>Turn</code>, or <code>Walking</code>.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Kinetic</code> Whether the event was kinetic (1) and involved movement, or akinetic (0) and static.</p>

<p style="font-family: consolas; font-size: 16px;">❔ How the data looks like.</p>

In [59]:
events_df = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/events.csv")

In [60]:
events_df.head()

Unnamed: 0,Id,Init,Completion,Type,Kinetic
0,003f117e14,8.61312,14.7731,Turn,1.0
1,009ee11563,11.3847,41.1847,Turn,1.0
2,009ee11563,54.6647,58.7847,Turn,1.0
3,011322847a,28.0966,30.2966,Turn,1.0
4,01d0fe7266,30.3184,31.8784,Turn,1.0


<p style="font-family: consolas; font-size: 16px;">❔ What is length of the dataframe.</p>

In [61]:
temp = len(events_df)
print(
    f"Length of the events.csv file is: {color.BLUE}{temp}{color.END}",
)

Length of the events.csv file is: [94m3544[0m


<p style="font-family: consolas; font-size: 16px;">❔ Is there any missing data.</p>

In [62]:
events_df.isnull().sum()

Id               0
Init             0
Completion       0
Type          1042
Kinetic       1042
dtype: int64

<p style="font-family: consolas; font-size: 16px;">❔ What is the distibution of the field <code>Completion</code>.</p>

In [63]:
fig = px.histogram(events_df, x="Completion", nbins=50)
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Type</code>.</p>

In [64]:
events_type_counts = events_df.Type.value_counts()

fig = px.bar(x=events_type_counts.index, y=events_type_counts.values)
fig.update_layout(xaxis_title="Type", yaxis_title="Count")
fig.show()

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Kinetic</code>.</p>

In [65]:
events_kinetic_counts = events_df.Kinetic.value_counts()

fig = px.bar(x=events_kinetic_counts.index, y=events_kinetic_counts.values)
fig.update_layout(xaxis_title="Kinetic", yaxis_title="Count")
fig.show()

<a id="2.6"></a>
## <div style="box-shadow: rgba(0, 0, 0, 0.18) 0px 2px 4px inset; padding:20px; font-size:24px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(67, 66, 66)"> <b> 2.6 Overview <i>tasks.csv</i> file</b></div>

<p style="font-family: consolas; font-size: 16px;">⚪ <b>tasks.csv</b> Task metadata for series in the <code>defog</code> dataset. (Not relevant for the series in the <code>fog</code> or <code>daily</code> datasets.)</p>

* <p style="font-family: consolas; font-size: 16px;"><code>Id</code> The data series where the task was measured.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Begin</code> Time (s) the task began.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>End</code> Time (s) the task ended.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Task</code> One of seven tasks types in the DeFOG protocol, described on <a href="https://www.kaggle.com/competitions/tlvmc-parkinsons-freezing-gait-prediction/overview/additional-data-documentation"><strong>this page</strong></a>.</p>
* <p style="font-family: consolas; font-size: 16px;"><code>Description</code> Description of the task.</p>

<p style="font-family: consolas; font-size: 16px;">❔ How the data looks like.</p>

In [66]:
tasks_df = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/tasks.csv")

In [67]:
tasks_df.head()

Unnamed: 0,Id,Begin,End,Task
0,02ab235146,10.0,190.48,Rest1
1,02ab235146,211.24,271.56,Rest2
2,02ab235146,505.88,522.4,4MW
3,02ab235146,577.96,594.64,4MW-C
4,02ab235146,701.32,715.28,MB1


<p style="font-family: consolas; font-size: 16px;">❔ What is length of the dataframe.</p>

In [68]:
temp = len(tasks_df)
print(
    f"Length of the subjects.csv file is: {color.BLUE}{temp}{color.END}",
)

Length of the subjects.csv file is: [94m2817[0m


<p style="font-family: consolas; font-size: 16px;">❔ Is there any missing data.</p>

In [69]:
tasks_df.isnull().sum()

Id       0
Begin    0
End      0
Task     0
dtype: int64

<p style="font-family: consolas; font-size: 16px;">❔ What is the distibution of the difference between fields <code>Begin</code> and <code>End</code>.</p>

In [70]:
task_time = tasks_df.End - tasks_df.Begin
fig = px.histogram(x=task_time, nbins=30)
fig.show()
#fix namings

<p style="font-family: consolas; font-size: 16px;">❔ How the bar chart looks like for categorical field <code>Task</code>.</p>

In [71]:
tasks_task_counts = tasks_df.Task.value_counts()

fig = px.bar(x=tasks_task_counts.index, y=tasks_task_counts.values)
fig.update_layout(xaxis_title="Task", yaxis_title="Count")
fig.show()

# <div style="box-shadow: rgba(240, 46, 170, 0.4) -5px 5px inset, rgba(240, 46, 170, 0.3) -10px 10px inset, rgba(240, 46, 170, 0.2) -15px 15px inset, rgba(240, 46, 170, 0.1) -20px 20px inset, rgba(240, 46, 170, 0.05) -25px 25px inset; padding:20px; font-size:30px; font-family: consolas; display:fill; border-radius:15px; color: rgba(240, 46, 170, 0.7)"> <b> ༼⁠ ⁠つ⁠ ⁠◕⁠‿⁠◕⁠ ⁠༽⁠つ Thank You!</b></div>

<p style="font-family:verdana; color:rgb(34, 34, 34); font-family: consolas; font-size: 16px;"> 💌 Thank you for taking the time to read through my notebook. I hope you found it interesting and informative. If you have any feedback or suggestions for improvement, please don't hesitate to let me know in the comments. <br><br> 🚀 If you liked this notebook, please consider upvoting it so that others can discover it too. Your support means a lot to me, and it helps to motivate me to create more content in the future. <br><br> ❤️ Once again, thank you for your support, and I hope to see you again soon!</p>