# Parkinson Disease Diagnosis

Kaggle: https://www.kaggle.com/debasisdotcom/parkinson-disease-detection

Features Information:
    
name - ASCII subject name and recording number

MDVP:Fo(Hz) - Average vocal fundamental frequency
    
MDVP:Fhi(Hz) - Maximum vocal fundamental frequency
    
MDVP:Flo(Hz) - Minimum vocal fundamental frequency
    
MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Several measures of variation in fundamental frequency
                    
MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude

NHR,HNR - Two measures of ratio of noise to tonal components in the voice

status - Health status of the subject (one) - Parkinson's, (zero) - healthy

RPDE,D2 - Two nonlinear dynamical complexity measures

DFA - Signal fractal scaling exponent

spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

<b>Citation Request: </b>
    
If you use this dataset, please cite the following paper:

'Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection',
Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM.
BioMedical Engineering OnLine 2007, 6:23 (26 June 2007)

In [5]:
import tensorflow as tf
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [6]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [7]:
df = pd.read_csv('/kaggle/input/parkinson-disease-detection/Parkinsson disease.csv')

In [8]:
df.head()

In [9]:
df.isnull().sum()

In [10]:
df.info()

In [11]:
df.describe()

## EDA

In [12]:
df.corr()['status'][:-1].sort_values().plot(kind='bar')

In [13]:
df = df.drop('name', axis = 1)

In [14]:
plt.figure(figsize=(30,30))
sns.heatmap(df.corr(), annot = True, cmap= "coolwarm")

# <font color='green'><b> NEURAL NETWORKS </b> (Multi-Layer Perceptron)</font>

In [15]:
X = df.drop('status', axis = 1)
Y = df['status']

### Train Test Split

In [16]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.25,random_state=101)

### Scaling Data

In [17]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

In [18]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [19]:
X_train_scaled.shape

### Creating The Model

In [20]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation,Dropout

In [21]:
model = Sequential()

model.add(Dense(units=30,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units=25,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units=10,activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(units=1,activation='sigmoid'))

# For a binary classification problem
model.compile(loss='binary_crossentropy', optimizer='adam')

In [22]:
from tensorflow.keras.callbacks import EarlyStopping
cb = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)

In [23]:
model.fit(x=X_train_scaled,y=Y_train, validation_data=(X_test_scaled, Y_test), batch_size=450, epochs=600, callbacks=[cb])

In [24]:
losses = pd.DataFrame(model.history.history)

In [25]:
losses.plot()

### Model Evaluation

In [26]:
predictions = (model.predict(X_test_scaled) > 0.5).astype("int32")

In [27]:
from sklearn.metrics import classification_report,confusion_matrix, accuracy_score

In [28]:
print(confusion_matrix(Y_test,predictions))

In [29]:
print(classification_report(Y_test,predictions))

In [30]:
print(accuracy_score(Y_test,predictions))