# Detecting the Severity of Parkinson's Disease

Kelvin Basdeo

## Reference

https://archive.ics.uci.edu/dataset/189/parkinsons+telemonitoring

## Introduction

TBD

## Business Problem Statement

TBD

## Goal

TBD

## Exploratory Data Analysis

### Importing Libraries and Loading Data

In [1]:
# imports

from ucimlrepo import fetch_ucirepo

In [12]:
# fetch dataset
pt_data = fetch_ucirepo(id=189)

#
pt_features = pt_data.data.features
pt_target = pt_data.data.targets

print(pt_features.shape, pt_target.shape)

(5875, 19) (5875, 2)


In [13]:
pt_features.head()

Unnamed: 0,age,test_time,Jitter(%),Jitter(Abs),Jitter:RAP,Jitter:PPQ5,Jitter:DDP,Shimmer,Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,Shimmer:APQ11,Shimmer:DDA,NHR,HNR,RPDE,DFA,PPE,sex
0,72,5.6431,0.00662,3.4e-05,0.00401,0.00317,0.01204,0.02565,0.23,0.01438,0.01309,0.01662,0.04314,0.01429,21.64,0.41888,0.54842,0.16006,0
1,72,12.666,0.003,1.7e-05,0.00132,0.0015,0.00395,0.02024,0.179,0.00994,0.01072,0.01689,0.02982,0.011112,27.183,0.43493,0.56477,0.1081,0
2,72,19.681,0.00481,2.5e-05,0.00205,0.00208,0.00616,0.01675,0.181,0.00734,0.00844,0.01458,0.02202,0.02022,23.047,0.46222,0.54405,0.21014,0
3,72,25.647,0.00528,2.7e-05,0.00191,0.00264,0.00573,0.02309,0.327,0.01106,0.01265,0.01963,0.03317,0.027837,24.445,0.4873,0.57794,0.33277,0
4,72,33.642,0.00335,2e-05,0.00093,0.0013,0.00278,0.01703,0.176,0.00679,0.00929,0.01819,0.02036,0.011625,26.126,0.47188,0.56122,0.19361,0


### DataFrame Description

In [14]:
pt_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5875 entries, 0 to 5874
Data columns (total 19 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   age            5875 non-null   int64  
 1   test_time      5875 non-null   float64
 2   Jitter(%)      5875 non-null   float64
 3   Jitter(Abs)    5875 non-null   float64
 4   Jitter:RAP     5875 non-null   float64
 5   Jitter:PPQ5    5875 non-null   float64
 6   Jitter:DDP     5875 non-null   float64
 7   Shimmer        5875 non-null   float64
 8   Shimmer(dB)    5875 non-null   float64
 9   Shimmer:APQ3   5875 non-null   float64
 10  Shimmer:APQ5   5875 non-null   float64
 11  Shimmer:APQ11  5875 non-null   float64
 12  Shimmer:DDA    5875 non-null   float64
 13  NHR            5875 non-null   float64
 14  HNR            5875 non-null   float64
 15  RPDE           5875 non-null   float64
 16  DFA            5875 non-null   float64
 17  PPE            5875 non-null   float64
 18  sex     

### Feature Description

- Age: Age of Participant
- test_time: Length of time in seconds of voice recording
- Jitter(%): Average percentage variation in pitch between consecutive voice cycles
- Jitter(Abs): The absolute difference in pitch frequency between cycles
- Jitter:RAP: The average pitch variation between each cycle and its two nearest neighbors
- Jitter:PPQ5: Average variation in pitch over five consecutive cycles
- Jitter:DDP: RAP multiplied by 3 which emphasizes short-term pitch irregularity
- Shimmer: Absolute difference in amplitude between consecutive cycles
- Shimmer(dB): Shimmer expressed in decibels
- Shimmer:APQ3: Average amplitude variation over 3 consecutive cycles
- Shimmer:APQ5: Average amplitude variation over 5 consecutive cycles
- Shimmer:APQ11: Average amplitude variation over 11 cycles - focuses on global amplitude
- Shimmer:DDA: APQ3 times 3 which emphasizes short-term loudness variation
- NHR: Amount of noise energy relative to harmonic energy in the signal
- HNR: Inverse of NHR representing the strength of the harmonic components
- RPDE: Measures how unpredictable the vocal fold vibrations are
- DFA: Assesses the fractal scaling of the voice signal
- PPE: A nonlinear measure of pitch variability
- sex: Gender of patient

In [19]:
pt_target.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5875 entries, 0 to 5874
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   motor_UPDRS  5875 non-null   float64
 1   total_UPDRS  5875 non-null   float64
dtypes: float64(2)
memory usage: 91.9 KB


### Target Description

- motor_UDPRS: The clinician's motor symptom score - used as a target to assess disease severity
- total_UPDRS: The overall Parkinson's disease severity score, combining motor and non-motor symptoms

In [20]:
pt_features.describe()

Unnamed: 0,age,test_time,Jitter(%),Jitter(Abs),Jitter:RAP,Jitter:PPQ5,Jitter:DDP,Shimmer,Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,Shimmer:APQ11,Shimmer:DDA,NHR,HNR,RPDE,DFA,PPE,sex
count,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0,5875.0
mean,64.804936,92.863722,0.006154,4.4e-05,0.002987,0.003277,0.008962,0.034035,0.31096,0.017156,0.020144,0.027481,0.051467,0.03212,21.679495,0.541473,0.65324,0.219589,0.317787
std,8.821524,53.445602,0.005624,3.6e-05,0.003124,0.003732,0.009371,0.025835,0.230254,0.013237,0.016664,0.019986,0.039711,0.059692,4.291096,0.100986,0.070902,0.091498,0.465656
min,36.0,-4.2625,0.00083,2e-06,0.00033,0.00043,0.00098,0.00306,0.026,0.00161,0.00194,0.00249,0.00484,0.000286,1.659,0.15102,0.51404,0.021983,0.0
25%,58.0,46.8475,0.00358,2.2e-05,0.00158,0.00182,0.00473,0.01912,0.175,0.00928,0.01079,0.015665,0.02783,0.010955,19.406,0.469785,0.59618,0.15634,0.0
50%,65.0,91.523,0.0049,3.4e-05,0.00225,0.00249,0.00675,0.02751,0.253,0.0137,0.01594,0.02271,0.04111,0.018448,21.92,0.54225,0.6436,0.2055,0.0
75%,72.0,138.445,0.0068,5.3e-05,0.00329,0.00346,0.00987,0.03975,0.365,0.020575,0.023755,0.032715,0.061735,0.031463,24.444,0.614045,0.711335,0.26449,1.0
max,85.0,215.49,0.09999,0.000446,0.05754,0.06956,0.17263,0.26863,2.107,0.16267,0.16702,0.27546,0.48802,0.74826,37.875,0.96608,0.8656,0.73173,1.0
