# Parkinsson Disease Detector
## Dataset Info
##### Matrix column entries (attributes):
##### name - ASCII subject name and recording number
##### MDVP:Fo(Hz) - Average vocal fundamental frequency
##### MDVP:Fhi(Hz) - Maximum vocal fundamental frequency
##### MDVP:Flo(Hz) - Minimum vocal fundamental frequency
##### MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Several
##### measures of variation in fundamental frequency
##### MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude
##### NHR,HNR - Two measures of ratio of noise to tonal components in the voice
##### status - Health status of the subject (one) - Parkinson's, (zero) - healthy
##### RPDE,D2 - Two nonlinear dynamical complexity measures
##### DFA - Signal fractal scaling exponent
##### spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation

## Importing Libraries

In [1]:
import pandas as pd
import sklearn
import numpy as np

## Importing Dataset and spliting data

In [7]:
from sklearn.model_selection import train_test_split 
dataset = pd.read_csv('Parkinsson Disease.csv')
X = dataset.drop(['status','name'], axis=1)
y = dataset['status']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

## Feature Scaling

In [10]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train)
print(X_test)

[[ 0.42106183  0.05057195 -0.93690774 ... -0.0748254   0.49483246
  -0.64440366]
 [-0.95694554 -0.73133513 -0.10848063 ...  0.00608646  0.11335106
  -0.22920446]
 [ 1.21847901  0.28739437 -0.67783415 ... -0.22879857  0.17981954
  -0.18449393]
 ...
 [ 0.41390707  2.52073344 -0.85166044 ...  1.6624923   1.51751222
   0.85121716]
 [ 2.46491405  0.59348186  1.65322464 ... -0.37747054 -0.99101608
  -1.31837596]
 [-1.09278562 -0.91648442 -0.34208385 ... -0.47870566 -1.36072382
  -0.3488855 ]]
[[-1.39240702e+00 -1.03177646e+00 -6.39882078e-01 -4.06992866e-01
  -1.42903286e-01 -2.98448321e-01 -3.22593528e-01 -2.97492537e-01
  -4.07764083e-01 -4.23973499e-01 -2.99011770e-01 -3.73880760e-01
  -5.11323405e-01 -2.98999852e-01 -4.23039828e-01  1.08991102e-01
   7.63394256e-01  1.02739841e+00 -3.01882476e-01 -1.83882781e+00
  -9.24841548e-01 -3.14383159e-01]
 [-4.35375863e-01 -4.44074995e-01  4.09642413e-01 -6.72245888e-01
  -6.83337529e-01 -6.97675522e-01 -6.78064904e-01 -6.96714983e-01
  -8.999203

# Model

In [12]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=200, n_jobs=4, random_state=0, criterion='entropy')

model.fit(X_train, y_train)

RandomForestClassifier(criterion='entropy', n_estimators=200, n_jobs=4,
                       random_state=0)

## Prediction and accuracy

In [13]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[ 9  1]
 [ 2 27]]


0.9230769230769231