<a href="https://colab.research.google.com/github/ErlantzCalvo/Parkinson_Detection/blob/master/Parkinson_disease.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import os
import pandas as pd
from tqdm import tqdm

In [3]:
%cd 'pd_datasets/'

/home/unaice/UNI/CUARTO/MLNN/Parkinson_Detection/pd_datasets


# Data explanation
The data .csv has the next format: \\
  X ; Y; Z; Pressure; GripAngle; Timestamp; Test ID

----------------
Test ID: \\
0: Static Spiral Test ( Draw on the given spiral pattern) \\
1: Dynamic Spiral Test ( Spiral pattern will blink in a certain time, so subjects need to continue on their draw) \\
2: Circular Motion Test (Subjectd draw circles around the red point)

## Sources and explanation
Isenkul, Muhammed & Sakar, Betul & Kursun, O.. (2014). Improved Spiral Test Using Digitized Graphics Tablet for Monitoring Parkinson’s Disease. 10.13140/RG.2.1.1898.6005.

Sakar, Betul & Isenkul, Muhammed & Sakar, C. Okan & Sertbaş, Ahmet & Gurgen, F. & Delil, Sakir & Apaydin, Hulya & Kursun, Olcay. (2013). Collection and Analysis of a Parkinson Speech Dataset With Multiple Types of Sound Recordings. Biomedical and Health Informatics, IEEE Journal of. 17. 828-834. 10.1109/JBHI.2013.2245674.

## Dataset
https://archive.ics.uci.edu/ml/datasets/Parkinson+Disease+Spiral+Drawings+Using+Digitized+Graphics+Tablet

# Loading data
We've got 2 different datasets:
  

*   **Control**: People who did the tests without having the Parkinson disease. 
*   **Parkinson**: People who did the tests having the Parkinson disease. 

*Note:* The parkinson dataset is splitted in two different paths, so we'll load both of them and then mix them.

In [44]:
CONTROL_DATASET_PATH = 'hw_dataset/control'
PARKINSON_DATASET_PATH = 'hw_dataset/parkinson'
PARKINSON_DATASET_PATH_2 = 'new_dataset/parkinson'
COLUMN_NAMES = ['X', 'Y', 'Z', 'Pressure', 'GripAngle', 'Timestamp', 'Test ID','UserId']

userid = 0
#Load Control dataset
control_files = os.listdir(CONTROL_DATASET_PATH)
df_control = pd.DataFrame()
for i in tqdm(range(len(control_files)), desc= 'Control files: ', unit=' files'):
  new_user_path = os.path.join(CONTROL_DATASET_PATH, control_files[i])
  new_user = pd.read_csv(new_user_path, header=None,  sep=';')
  new_user['UserId'] = userid
  df_control = df_control.append(new_user)
  userid +=1
  
df_control.columns = COLUMN_NAMES


#Load parkinson dataset
parkinson_files = os.listdir(PARKINSON_DATASET_PATH)
df_parkinson = pd.DataFrame()

for j in tqdm(range(len(parkinson_files)), desc= 'Parkinson files: ', unit=' files'):
  new_user_path = os.path.join(PARKINSON_DATASET_PATH, parkinson_files[i])  
  new_user = pd.read_csv(new_user_path, header=None,  sep=';')
  new_user['UserId'] = userid
  df_parkinson = df_parkinson.append(new_user)
  userid +=1

parkinson_files =  os.listdir(PARKINSON_DATASET_PATH_2)
for k in tqdm(range(len(parkinson_files)), desc= 'Parkinson files: ', unit=' files'):
  new_user_path = os.path.join(PARKINSON_DATASET_PATH_2, parkinson_files[i])  
  new_user = pd.read_csv(new_user_path, header=None,  sep=';')
  new_user['UserId'] = userid
  df_parkinson = df_parkinson.append(new_user)
  userid +=1


df_parkinson.columns = COLUMN_NAMES
df_parkinson


Control files: 100%|██████████| 15/15 [00:00<00:00, 89.23 files/s]
Parkinson files: 100%|██████████| 25/25 [00:00<00:00, 100.02 files/s]
Parkinson files: 100%|██████████| 37/37 [00:00<00:00, 102.33 files/s]


Unnamed: 0,X,Y,Z,Pressure,GripAngle,Timestamp,Test ID,UserId
0,274,206,0,178,1490,5482221,0,15
1,273,206,0,222,1490,5482230,0,15
2,273,206,0,261,1480,5482239,0,15
3,273,206,0,273,1480,5482248,0,15
4,273,206,0,283,1480,5482257,0,15
5,273,205,0,316,1480,5482266,0,15
6,274,205,0,350,1480,5482275,0,15
7,274,205,0,370,1480,5482284,0,15
8,275,205,12,382,1480,5482293,0,15
9,275,205,0,396,1480,5482302,0,15


### Feature creation
#### Pressure_delta:

In [60]:
# Parkinson

prev_pressure = 0
pressure_delta = []
angle_delta = []

# Number (id) of pacients
n_pacient = df_parkinson.loc[0]['UserId']
for i in n_pacient:
    # Actual i pacient
    actual_pacient = df_parkinson.loc[df_parkinson['UserId']==i]
    # Samples of i pacient
    n_samples = actual_pacient.index
    prev_pressure = 0
    prev_angle = 0
    for j in n_samples:
        if j > 0:
            pressure_delta_value = abs(actual_pacient.loc[j]['Pressure'] - prev_pressure)
            angle_delta_value = abs(actual_control.loc[j]['GripAngle'] - prev_angle)
        else:
            pressure_delta_value = 0
            angle_delta_value = 0
        angle_delta.append(angle_delta_value)
        pressure_delta.append(pressure_delta_value)
        prev_angle = actual_pacient.loc[j]['GripAngle']
        prev_pressure = actual_pacient.loc[j]['Pressure']

df_parkinson['PressureDelta'] = pressure_delta
df_parkinson['GripAngleDelta'] = angle_delta


# Control
        
prev_pressure = 0
pressure_delta = []
angle_delta = []

# Number (id) of control pacients
n_control = df_control.loc[0]['UserId']
for i in n_control:
    # Actual i pacient
    actual_control = df_control.loc[df_control['UserId']==i]
    # Samples of i pacient
    n_samples = actual_control.index
    prev_pressure = 0
    prev_angle = 0
    for j in n_samples:
        if j > 0:
            pressure_delta_value = abs(actual_control.loc[j]['Pressure'] - prev_pressure)
            angle_delta_value = abs(actual_control.loc[j]['GripAngle'] - prev_angle)
        else:
            pressure_delta_value = 0
            angle_delta_value = 0
        angle_delta.append(angle_delta_value)
        pressure_delta.append(pressure_delta_value)
        prev_pressure = actual_control.loc[j]['Pressure']
df_control['GripAngleDelta'] = angle_delta
df_control['PressureDelta'] = pressure_delta

In [61]:
c_mean = sum(df_control['PressureDelta'])/len(df_control['PressureDelta'])
p_mean = sum(df_parkinson['PressureDelta'])/len(df_parkinson['PressureDelta'])
print("Control: {}\nParkinson {}".format(c_mean, p_mean))

c_mean = sum(df_control['GripAngleDelta'])/len(df_control['GripAngleDelta'])
p_mean = sum(df_parkinson['GripAngleDelta'])/len(df_parkinson['GripAngleDelta'])
print("Control: {}\nParkinson {}".format(c_mean, p_mean))


Control: 2.2118610256836506
Parkinson 2.477980497011639
Control: 10.443271548499709
Parkinson 350.05557303135157


In [100]:
pacient_sample = df_parkinson.loc[(df_parkinson['Test ID'] == 0) & (df_parkinson['UserId'] == 15)]
pressure_delta = []

for i in range(len(pacient_sample.index)):
    if i > 0:
        pressure_delta.append(abs(pacient_sample.loc[i]['Pressure']-pacient_sample.loc[i-1]['Pressure']))
    else:
        pressure_delta.append(0);

pacient_sample['Pressure Delta'] = pressure_delta



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.


In [101]:
control_sample = df_control.loc[(df_control['Test ID'] == 0) & (df_control['UserId'] == 14)]
pressure_delta_control = []

for i in range(len(control_sample.index)):
    if i > 0:
        pressure_delta_control.append(abs(control_sample.loc[i]['Pressure']-control_sample.loc[i-1]['Pressure']))
    else:
        pressure_delta_control.append(0);

control_sample['Pressure Delta'] = pressure_delta_control


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.


In [117]:
a = sum(pressure_delta_control[:(len(pressure_delta_control)//2)])
b = sum
print("Size control: {}\t Size pacient: {}".format(len(pressure_delta_control)/2,len(pressure_delta)))
print("Control pressure delta: {}\nPacient pressure delta: {}".format(a/(len(pressure_delta_control)/2), sum(pressure_delta)/(len(pressure_delta)/2)))

Size control: 3041.0	 Size pacient: 3620
Control pressure delta: 1.8704373561328511
Pacient pressure delta: 4.392817679558011
