Detecting Parkinson’s Disease – Python Machine Learning Project
What is Parkinson’s Disease?
Parkinson’s disease is a progressive disorder of the central nervous system affecting movement and inducing tremors and stiffness. It has 5 stages to it and affects more than 1 million individuals every year in India. This is chronic and has no cure yet. It is a neurodegenerative disorder affecting dopamine-producing neurons in the brain.

What is XGBoost?
XGBoost is a new Machine Learning algorithm designed with speed and performance in mind. XGBoost stands for eXtreme Gradient Boosting and is based on decision trees. In this project, we will import the XGBClassifier from the xgboost library; this is an implementation of the scikit-learn API for XGBoost classification.

Detecting Parkinson’s Disease with XGBoost – Objective
To build a model to accurately detect the presence of Parkinson’s disease in an individual.

Detecting Parkinson’s Disease with XGBoost – About the Python Machine Learning Project
In this Python machine learning project, using the Python libraries scikit-learn, numpy, pandas, and xgboost, we will build a model using an XGBClassifier. We’ll load the data, get the features and labels, scale the features, then split the dataset, build an XGBClassifier, and then calculate the accuracy of our model.

In [40]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from xgboost import XGBClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [41]:
data = pd.read_csv(r"C:\Users\amany\Desktop\DSProject\parkisondata.csv")
data.head()

Unnamed: 0.1,Unnamed: 0,no_strokes_st,no_strokes_dy,speed_st,speed_dy,magnitude_vel_st,magnitude_horz_vel_st,magnitude_vert_vel_st,magnitude_vel_dy,magnitude_horz_vel_dy,...,magnitude_horz_jerk_dy,magnitude_vert_jerk_dy,ncv_st,ncv_dy,nca_st,nca_dy,in_air_stcp,on_surface_st,on_surface_dy,target
0,0,12.0,2.0,0.000293,0.000431,0.061342,0.038319,0.03905,0.084891,0.053885,...,6e-06,6e-06,185.25,412.857143,61.833333,470.0,0.0,3678.0,4852.0,1.0
1,1,4.0,6.0,0.000286,0.000281,0.119159,0.077012,0.074216,0.160497,0.10136,...,1e-05,9e-06,192.777778,173.875,102.5,54.0,0.0,1688.0,1587.0,1.0
2,2,4.0,4.0,0.000278,0.000277,0.113889,0.071547,0.073336,0.191965,0.117724,...,2.3e-05,4.2e-05,263.571429,136.333333,120.5,25.2,0.0,2246.0,1332.0,1.0
3,3,4.0,4.0,0.000286,0.000275,0.154621,0.101146,0.0963,0.152413,0.101235,...,9e-06,5e-06,225.285714,308.4,59.333333,177.0,0.0,1775.0,1703.0,1.0
4,4,2.0,2.0,0.000279,0.00027,0.060058,0.039449,0.036737,0.081577,0.052451,...,1e-05,7e-06,226.7,168.0,79.5,43.857143,0.0,4764.0,3335.0,1.0


In [42]:
print(data.info()),print(data.shape);

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 31 columns):
Unnamed: 0                77 non-null int64
no_strokes_st             77 non-null float64
no_strokes_dy             77 non-null float64
speed_st                  77 non-null float64
speed_dy                  77 non-null float64
magnitude_vel_st          77 non-null float64
magnitude_horz_vel_st     77 non-null float64
magnitude_vert_vel_st     77 non-null float64
magnitude_vel_dy          77 non-null float64
magnitude_horz_vel_dy     77 non-null float64
magnitude_vert_vel_dy     77 non-null float64
magnitude_acc_st          77 non-null float64
magnitude_horz_acc_st     77 non-null float64
magnitude_vert_acc_st     77 non-null float64
magnitude_acc_dy          77 non-null float64
magnitude_horz_acc_dy     77 non-null float64
magnitude_vert_acc_dy     77 non-null float64
magnitude_jerk_st         77 non-null float64
magnitude_horz_jerk_st    77 non-null float64
magnitude_vert_jerk_st   

In [43]:
features=data.loc[:,data.columns!='target'].values[:,1:]
label=data.target.values

In [44]:
print(features),print(label);

[[1.20000000e+01 2.00000000e+00 2.92878961e-04 ... 0.00000000e+00
  3.67800000e+03 4.85200000e+03]
 [4.00000000e+00 6.00000000e+00 2.85679200e-04 ... 0.00000000e+00
  1.68800000e+03 1.58700000e+03]
 [4.00000000e+00 4.00000000e+00 2.77679639e-04 ... 0.00000000e+00
  2.24600000e+03 1.33200000e+03]
 ...
 [4.80000000e+01 1.76000000e+02 1.20092956e-03 ... 1.53600000e+03
  5.72000000e+02 5.75000000e+02]
 [6.00000000e+00 1.00000000e+01 1.15945149e-03 ... 1.53600000e+03
  8.35000000e+02 3.63000000e+02]
 [0.00000000e+00 0.00000000e+00 1.15165005e-03 ... 1.53600000e+03
  0.00000000e+00 0.00000000e+00]]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0.]


In [45]:
data.target.value_counts()

1.0    62
0.0    15
Name: target, dtype: int64

In [46]:
scaler=MinMaxScaler((-1,1))
x=scaler.fit_transform(features)
y=label

In [47]:
print(x),print(y);

[[-0.91608392 -0.97727273 -0.99152353 ... -1.         -0.29795763
   0.2089199 ]
 [-0.97202797 -0.93181818 -0.99173191 ... -1.         -0.67780111
  -0.60458453]
 [-0.97202797 -0.95454545 -0.99196343 ... -1.         -0.57129223
  -0.66812009]
 ...
 [-0.66433566  1.         -0.96524283 ... -0.37801174 -0.89081886
  -0.85673352]
 [-0.95804196 -0.88636364 -0.96644329 ... -0.37801174 -0.84061844
  -0.90955525]
 [-1.         -1.         -0.96666908 ... -0.37801174 -1.
  -1.        ]]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0.]


In [48]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=7)
print(x_train.shape),print(y_train.shape),print(x_test.shape),print(y_test.shape);

(61, 29)
(61,)
(16, 29)
(16,)


In [70]:
model=XGBClassifier( n_estimators=80)
model.fit(x_train,y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=80, n_jobs=0, num_parallel_tree=1,
              objective='binary:logistic', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [71]:
y_pred=model.predict(x_test)

In [72]:
print(accuracy_score(y_pred,y_test)*100)

100.0
