# About Dataset

Parkinson's Data Set
This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds to one of 195 voice recordings from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to the "status" column which is set to 0 for healthy and 1 for PD.

The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column.For further information or to pass on comments, please contact Max Little (little '@' robots.ox.ac.uk).

Further details are contained in the following reference -- if you use this dataset, please cite:
Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2008), 'Suitability of dysphonia measurements for telemonitoring of Parkinson's disease', IEEE Transactions on Biomedical Engineering (to appear).

# Attribute Information:

Matrix column entries (attributes):

1.name - ASCII subject name and recording number

2.MDVP:Fo(Hz) - Average vocal fundamental frequency

3.MDVP:Fhi(Hz) - Maximum vocal fundamental frequency
    
4.MDVP:Flo(Hz) - Minimum vocal fundamental frequency
    
5.MDVP:Jitter(%), MDVP:Jitter(Abs), MDVP:RAP, MDVP:PPQ, Jitter:DDP - Several measures of variation in fundamental frequency
                    
6.MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude NHR, HNR - Two measures of the ratio of noise to tonal components in the voice

7.status - The health status of the subject (one) - Parkinson's, (zero) - healthy RPDE, D2 - Two nonlinear dynamical complexity measures

8.DFA - Signal fractal scaling exponent

9.spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation

In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [2]:
df=pd.read_csv('parkinsons.data')

In [3]:
df.head()

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,...,0.0827,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.1047,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 24 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              195 non-null    object 
 1   MDVP:Fo(Hz)       195 non-null    float64
 2   MDVP:Fhi(Hz)      195 non-null    float64
 3   MDVP:Flo(Hz)      195 non-null    float64
 4   MDVP:Jitter(%)    195 non-null    float64
 5   MDVP:Jitter(Abs)  195 non-null    float64
 6   MDVP:RAP          195 non-null    float64
 7   MDVP:PPQ          195 non-null    float64
 8   Jitter:DDP        195 non-null    float64
 9   MDVP:Shimmer      195 non-null    float64
 10  MDVP:Shimmer(dB)  195 non-null    float64
 11  Shimmer:APQ3      195 non-null    float64
 12  Shimmer:APQ5      195 non-null    float64
 13  MDVP:APQ          195 non-null    float64
 14  Shimmer:DDA       195 non-null    float64
 15  NHR               195 non-null    float64
 16  HNR               195 non-null    float64
 1

In [5]:
df=df.drop('name',axis=1)

In [6]:
df.head()

Unnamed: 0,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,MDVP:Shimmer(dB),...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,0.426,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,0.626,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674
2,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,0.482,...,0.0827,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,0.517,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,0.584,...,0.1047,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335


In [7]:
df.columns

Index(['MDVP:Fo(Hz)', 'MDVP:Fhi(Hz)', 'MDVP:Flo(Hz)', 'MDVP:Jitter(%)',
       'MDVP:Jitter(Abs)', 'MDVP:RAP', 'MDVP:PPQ', 'Jitter:DDP',
       'MDVP:Shimmer', 'MDVP:Shimmer(dB)', 'Shimmer:APQ3', 'Shimmer:APQ5',
       'MDVP:APQ', 'Shimmer:DDA', 'NHR', 'HNR', 'status', 'RPDE', 'DFA',
       'spread1', 'spread2', 'D2', 'PPE'],
      dtype='object')

In [8]:
df=df.rename(columns={'MDVP:Fo(Hz)':'MDVP_Fo_Hz',      'MDVP:Fhi(Hz)':'MDVP_Fhi_Hz',    'MDVP:Flo(Hz)':'MDVP_Flo_Hz',
                      'MDVP:Jitter(%)':'MDVP_Jitter_%','MDVP:Jitter(Abs)':'MDVP_Jitter_Abs', 'MDVP:RAP':'MDVP_RAP', 
                      'MDVP:PPQ':'MDVP_PPQ', 'Jitter:DDP':'Jitter_DDP','MDVP:Shimmer':'MDVP_Shimmer', 
                      'MDVP:Shimmer(dB)':'MDVP_Shimmer_dB', 'Shimmer:APQ3':'Shimmer_APQ_3', 'Shimmer:APQ5':'Shimmer_APQ_5',
       'MDVP:APQ':'MDVP_APQ', 'Shimmer:DDA':'Shimmer_DDA',})

In [9]:
df.columns

Index(['MDVP_Fo_Hz', 'MDVP_Fhi_Hz', 'MDVP_Flo_Hz', 'MDVP_Jitter_%',
       'MDVP_Jitter_Abs', 'MDVP_RAP', 'MDVP_PPQ', 'Jitter_DDP', 'MDVP_Shimmer',
       'MDVP_Shimmer_dB', 'Shimmer_APQ_3', 'Shimmer_APQ_5', 'MDVP_APQ',
       'Shimmer_DDA', 'NHR', 'HNR', 'status', 'RPDE', 'DFA', 'spread1',
       'spread2', 'D2', 'PPE'],
      dtype='object')

In [10]:
from sklearn.model_selection import train_test_split
x=df.drop('status',axis=1)
y=df.status
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.8,random_state=345)

In [11]:
from sklearn.preprocessing import MinMaxScaler
sc=MinMaxScaler()
x_train=sc.fit_transform(x_train)

In [12]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix,accuracy_score
lr=LogisticRegression()
lr.fit(x_train,y_train)
y_pred=lr.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for Logistic Regression',accuracy_score(y_test,y_pred))

[[15  0]
 [24  0]]
Accuracy for Logistic Regression 0.38461538461538464


In [13]:
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix,accuracy_score
svc=SVC(class_weight='balanced')
svc.fit(x_train,y_train)
y_pred=svc.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for SVC (RBF)',accuracy_score(y_test,y_pred))

[[ 0 15]
 [ 0 24]]
Accuracy for SVC (RBF) 0.6153846153846154


In [14]:
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix,accuracy_score
svc=SVC(class_weight='balanced',kernel='linear')
svc.fit(x_train,y_train)
y_pred=svc.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for SVC(Linear)',accuracy_score(y_test,y_pred))


[[ 2 13]
 [ 2 22]]
Accuracy for SVC(Linear) 0.6153846153846154


In [15]:
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix,accuracy_score
svc=SVC(class_weight='balanced',kernel='poly')
svc.fit(x_train,y_train)
y_pred=svc.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for SVC (Poly)',accuracy_score(y_test,y_pred))

[[15  0]
 [24  0]]
Accuracy for SVC (Poly) 0.38461538461538464


In [16]:
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix,accuracy_score
svc=SVC(class_weight='balanced',kernel='sigmoid')
svc.fit(x_train,y_train)
y_pred=svc.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for SVC(Sigmoid)',accuracy_score(y_test,y_pred))


[[15  0]
 [24  0]]
Accuracy for SVC(Sigmoid) 0.38461538461538464


In [17]:
from sklearn.tree import DecisionTreeClassifier
dt=DecisionTreeClassifier(class_weight='balanced')
dt.fit(x_train,y_train)
y_pred=dt.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for Decision Tree (GINI)',accuracy_score(y_test,y_pred))

[[15  0]
 [12 12]]
Accuracy for Decision Tree (GINI) 0.6923076923076923


In [18]:
from sklearn.tree import DecisionTreeClassifier
dt=DecisionTreeClassifier(class_weight='balanced',criterion='entropy',max_depth=5)
dt.fit(x_train,y_train)
y_pred=dt.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for Decision Tree (ENTROPY)',accuracy_score(y_test,y_pred))

[[15  0]
 [12 12]]
Accuracy for Decision Tree (ENTROPY) 0.6923076923076923


In [19]:
from sklearn.ensemble import RandomForestClassifier
rf=RandomForestClassifier(class_weight='balanced')
rf.fit(x_train,y_train)
y_pred=rf.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for Random Forest',accuracy_score(y_test,y_pred))

[[10  5]
 [ 2 22]]
Accuracy for Random Forest 0.8205128205128205


In [20]:
from sklearn.ensemble import ExtraTreesClassifier
exc=ExtraTreesClassifier(class_weight='balanced')
exc.fit(x_train,y_train)
y_pred=exc.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for Extratrees Classifier',accuracy_score(y_test,y_pred))

[[15  0]
 [24  0]]
Accuracy for Extratrees Classifier 0.38461538461538464


In [21]:
from catboost import CatBoostClassifier
from sklearn.metrics import confusion_matrix,accuracy_score
cbc=CatBoostClassifier()
cbc.fit(x_train,y_train)
y_pred=cbc.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for CatBoost Classifier',accuracy_score(y_test,y_pred))

Learning rate set to 0.00466
0:	learn: 0.6881011	total: 138ms	remaining: 2m 18s
1:	learn: 0.6847345	total: 142ms	remaining: 1m 10s
2:	learn: 0.6805422	total: 144ms	remaining: 47.9s
3:	learn: 0.6767162	total: 147ms	remaining: 36.5s
4:	learn: 0.6731048	total: 149ms	remaining: 29.7s
5:	learn: 0.6688288	total: 152ms	remaining: 25.1s
6:	learn: 0.6638891	total: 154ms	remaining: 21.9s
7:	learn: 0.6602894	total: 156ms	remaining: 19.4s
8:	learn: 0.6559933	total: 159ms	remaining: 17.5s
9:	learn: 0.6511996	total: 161ms	remaining: 16s
10:	learn: 0.6466554	total: 164ms	remaining: 14.7s
11:	learn: 0.6421386	total: 166ms	remaining: 13.7s
12:	learn: 0.6367405	total: 169ms	remaining: 12.8s
13:	learn: 0.6328004	total: 171ms	remaining: 12.1s
14:	learn: 0.6289093	total: 174ms	remaining: 11.4s
15:	learn: 0.6249447	total: 176ms	remaining: 10.8s
16:	learn: 0.6199186	total: 179ms	remaining: 10.3s
17:	learn: 0.6168762	total: 182ms	remaining: 9.9s
18:	learn: 0.6125609	total: 184ms	remaining: 9.5s
19:	learn: 0.6

210:	learn: 0.2299974	total: 728ms	remaining: 2.72s
211:	learn: 0.2289099	total: 731ms	remaining: 2.72s
212:	learn: 0.2278367	total: 734ms	remaining: 2.71s
213:	learn: 0.2272221	total: 737ms	remaining: 2.71s
214:	learn: 0.2263348	total: 740ms	remaining: 2.7s
215:	learn: 0.2255618	total: 744ms	remaining: 2.7s
216:	learn: 0.2246585	total: 746ms	remaining: 2.69s
217:	learn: 0.2237213	total: 750ms	remaining: 2.69s
218:	learn: 0.2229429	total: 752ms	remaining: 2.68s
219:	learn: 0.2217718	total: 755ms	remaining: 2.68s
220:	learn: 0.2211001	total: 757ms	remaining: 2.67s
221:	learn: 0.2203654	total: 760ms	remaining: 2.66s
222:	learn: 0.2195929	total: 763ms	remaining: 2.66s
223:	learn: 0.2184444	total: 766ms	remaining: 2.65s
224:	learn: 0.2176660	total: 769ms	remaining: 2.65s
225:	learn: 0.2167300	total: 771ms	remaining: 2.64s
226:	learn: 0.2157069	total: 774ms	remaining: 2.63s
227:	learn: 0.2147402	total: 777ms	remaining: 2.63s
228:	learn: 0.2137624	total: 781ms	remaining: 2.63s
229:	learn: 0.

411:	learn: 0.1185485	total: 1.31s	remaining: 1.87s
412:	learn: 0.1180329	total: 1.31s	remaining: 1.87s
413:	learn: 0.1176589	total: 1.31s	remaining: 1.86s
414:	learn: 0.1172604	total: 1.32s	remaining: 1.86s
415:	learn: 0.1168058	total: 1.32s	remaining: 1.85s
416:	learn: 0.1165343	total: 1.32s	remaining: 1.85s
417:	learn: 0.1162626	total: 1.33s	remaining: 1.85s
418:	learn: 0.1157271	total: 1.33s	remaining: 1.84s
419:	learn: 0.1153760	total: 1.33s	remaining: 1.84s
420:	learn: 0.1150252	total: 1.34s	remaining: 1.84s
421:	learn: 0.1148463	total: 1.34s	remaining: 1.83s
422:	learn: 0.1144175	total: 1.34s	remaining: 1.83s
423:	learn: 0.1140987	total: 1.34s	remaining: 1.83s
424:	learn: 0.1138189	total: 1.35s	remaining: 1.82s
425:	learn: 0.1134455	total: 1.35s	remaining: 1.82s
426:	learn: 0.1131978	total: 1.35s	remaining: 1.81s
427:	learn: 0.1130228	total: 1.35s	remaining: 1.81s
428:	learn: 0.1127855	total: 1.36s	remaining: 1.81s
429:	learn: 0.1123860	total: 1.36s	remaining: 1.8s
430:	learn: 0

577:	learn: 0.0786883	total: 1.77s	remaining: 1.29s
578:	learn: 0.0785285	total: 1.77s	remaining: 1.29s
579:	learn: 0.0784547	total: 1.78s	remaining: 1.29s
580:	learn: 0.0782901	total: 1.78s	remaining: 1.28s
581:	learn: 0.0780911	total: 1.78s	remaining: 1.28s
582:	learn: 0.0779545	total: 1.79s	remaining: 1.28s
583:	learn: 0.0778301	total: 1.79s	remaining: 1.27s
584:	learn: 0.0776221	total: 1.79s	remaining: 1.27s
585:	learn: 0.0773465	total: 1.8s	remaining: 1.27s
586:	learn: 0.0771081	total: 1.8s	remaining: 1.27s
587:	learn: 0.0769039	total: 1.8s	remaining: 1.26s
588:	learn: 0.0766870	total: 1.8s	remaining: 1.26s
589:	learn: 0.0764308	total: 1.81s	remaining: 1.26s
590:	learn: 0.0763253	total: 1.81s	remaining: 1.25s
591:	learn: 0.0761062	total: 1.81s	remaining: 1.25s
592:	learn: 0.0759750	total: 1.82s	remaining: 1.25s
593:	learn: 0.0758295	total: 1.82s	remaining: 1.24s
594:	learn: 0.0756213	total: 1.82s	remaining: 1.24s
595:	learn: 0.0755340	total: 1.82s	remaining: 1.24s
596:	learn: 0.07

755:	learn: 0.0542113	total: 2.28s	remaining: 736ms
756:	learn: 0.0541119	total: 2.28s	remaining: 733ms
757:	learn: 0.0540029	total: 2.29s	remaining: 730ms
758:	learn: 0.0538460	total: 2.29s	remaining: 727ms
759:	learn: 0.0537289	total: 2.29s	remaining: 724ms
760:	learn: 0.0536533	total: 2.3s	remaining: 721ms
761:	learn: 0.0535373	total: 2.3s	remaining: 718ms
762:	learn: 0.0534354	total: 2.3s	remaining: 715ms
763:	learn: 0.0533742	total: 2.3s	remaining: 712ms
764:	learn: 0.0532901	total: 2.31s	remaining: 709ms
765:	learn: 0.0531719	total: 2.31s	remaining: 706ms
766:	learn: 0.0531062	total: 2.31s	remaining: 703ms
767:	learn: 0.0529924	total: 2.32s	remaining: 700ms
768:	learn: 0.0529374	total: 2.32s	remaining: 697ms
769:	learn: 0.0528108	total: 2.32s	remaining: 693ms
770:	learn: 0.0527563	total: 2.32s	remaining: 690ms
771:	learn: 0.0526372	total: 2.33s	remaining: 687ms
772:	learn: 0.0525576	total: 2.33s	remaining: 684ms
773:	learn: 0.0525108	total: 2.33s	remaining: 681ms
774:	learn: 0.05

924:	learn: 0.0401552	total: 2.77s	remaining: 224ms
925:	learn: 0.0400888	total: 2.77s	remaining: 222ms
926:	learn: 0.0400077	total: 2.77s	remaining: 219ms
927:	learn: 0.0399646	total: 2.78s	remaining: 216ms
928:	learn: 0.0398967	total: 2.78s	remaining: 213ms
929:	learn: 0.0398374	total: 2.79s	remaining: 210ms
930:	learn: 0.0397491	total: 2.79s	remaining: 207ms
931:	learn: 0.0396886	total: 2.79s	remaining: 204ms
932:	learn: 0.0396295	total: 2.79s	remaining: 201ms
933:	learn: 0.0395493	total: 2.8s	remaining: 198ms
934:	learn: 0.0394666	total: 2.8s	remaining: 195ms
935:	learn: 0.0394188	total: 2.81s	remaining: 192ms
936:	learn: 0.0393351	total: 2.81s	remaining: 189ms
937:	learn: 0.0392625	total: 2.81s	remaining: 186ms
938:	learn: 0.0392125	total: 2.81s	remaining: 183ms
939:	learn: 0.0391375	total: 2.82s	remaining: 180ms
940:	learn: 0.0390712	total: 2.82s	remaining: 177ms
941:	learn: 0.0389888	total: 2.82s	remaining: 174ms
942:	learn: 0.0389425	total: 2.83s	remaining: 171ms
943:	learn: 0.

In [22]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.8,random_state=345)
from imblearn.over_sampling import SMOTE
sm=SMOTE()
x_train,y_train=sm.fit_resample(x_train,y_train)

In [23]:
from catboost import CatBoostClassifier
from sklearn.metrics import confusion_matrix,accuracy_score
cbc=CatBoostClassifier()
cbc.fit(x_train,y_train)
y_pred=cbc.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for CatBoost Classifier',accuracy_score(y_test,y_pred))

Learning rate set to 0.005661
0:	learn: 0.6870732	total: 4.25ms	remaining: 4.24s
1:	learn: 0.6816596	total: 7.29ms	remaining: 3.64s
2:	learn: 0.6766889	total: 10.3ms	remaining: 3.42s
3:	learn: 0.6696515	total: 13.9ms	remaining: 3.47s
4:	learn: 0.6621713	total: 17.2ms	remaining: 3.42s
5:	learn: 0.6553533	total: 20.3ms	remaining: 3.36s
6:	learn: 0.6500209	total: 23.2ms	remaining: 3.29s
7:	learn: 0.6453391	total: 26.4ms	remaining: 3.27s
8:	learn: 0.6408097	total: 29.9ms	remaining: 3.29s
9:	learn: 0.6345228	total: 33.2ms	remaining: 3.29s
10:	learn: 0.6288013	total: 36.7ms	remaining: 3.3s
11:	learn: 0.6213898	total: 40.2ms	remaining: 3.31s
12:	learn: 0.6170566	total: 44.5ms	remaining: 3.38s
13:	learn: 0.6129461	total: 49.4ms	remaining: 3.48s
14:	learn: 0.6086534	total: 53.6ms	remaining: 3.52s
15:	learn: 0.6041162	total: 58.6ms	remaining: 3.6s
16:	learn: 0.5995665	total: 63.5ms	remaining: 3.67s
17:	learn: 0.5945309	total: 68.2ms	remaining: 3.72s
18:	learn: 0.5894277	total: 72.3ms	remaining: 

203:	learn: 0.1813418	total: 821ms	remaining: 3.2s
204:	learn: 0.1808146	total: 826ms	remaining: 3.2s
205:	learn: 0.1798508	total: 832ms	remaining: 3.21s
206:	learn: 0.1790383	total: 836ms	remaining: 3.2s
207:	learn: 0.1780762	total: 842ms	remaining: 3.2s
208:	learn: 0.1773073	total: 845ms	remaining: 3.2s
209:	learn: 0.1762350	total: 849ms	remaining: 3.19s
210:	learn: 0.1756751	total: 853ms	remaining: 3.19s
211:	learn: 0.1749544	total: 858ms	remaining: 3.19s
212:	learn: 0.1740288	total: 862ms	remaining: 3.19s
213:	learn: 0.1730763	total: 867ms	remaining: 3.18s
214:	learn: 0.1721105	total: 871ms	remaining: 3.18s
215:	learn: 0.1707987	total: 875ms	remaining: 3.18s
216:	learn: 0.1697675	total: 880ms	remaining: 3.17s
217:	learn: 0.1689545	total: 885ms	remaining: 3.17s
218:	learn: 0.1679753	total: 889ms	remaining: 3.17s
219:	learn: 0.1675003	total: 893ms	remaining: 3.17s
220:	learn: 0.1666633	total: 897ms	remaining: 3.16s
221:	learn: 0.1661395	total: 901ms	remaining: 3.16s
222:	learn: 0.165

400:	learn: 0.0840759	total: 1.59s	remaining: 2.38s
401:	learn: 0.0837756	total: 1.6s	remaining: 2.38s
402:	learn: 0.0836020	total: 1.6s	remaining: 2.37s
403:	learn: 0.0832705	total: 1.61s	remaining: 2.37s
404:	learn: 0.0828833	total: 1.62s	remaining: 2.38s
405:	learn: 0.0826882	total: 1.62s	remaining: 2.38s
406:	learn: 0.0823843	total: 1.63s	remaining: 2.37s
407:	learn: 0.0821787	total: 1.63s	remaining: 2.37s
408:	learn: 0.0818896	total: 1.64s	remaining: 2.37s
409:	learn: 0.0815078	total: 1.64s	remaining: 2.36s
410:	learn: 0.0813461	total: 1.65s	remaining: 2.36s
411:	learn: 0.0811127	total: 1.65s	remaining: 2.36s
412:	learn: 0.0807883	total: 1.66s	remaining: 2.35s
413:	learn: 0.0804866	total: 1.66s	remaining: 2.35s
414:	learn: 0.0801625	total: 1.66s	remaining: 2.35s
415:	learn: 0.0799234	total: 1.67s	remaining: 2.34s
416:	learn: 0.0795783	total: 1.67s	remaining: 2.34s
417:	learn: 0.0792556	total: 1.68s	remaining: 2.33s
418:	learn: 0.0789220	total: 1.68s	remaining: 2.33s
419:	learn: 0.

602:	learn: 0.0477154	total: 2.38s	remaining: 1.57s
603:	learn: 0.0475787	total: 2.39s	remaining: 1.57s
604:	learn: 0.0474520	total: 2.39s	remaining: 1.56s
605:	learn: 0.0473518	total: 2.4s	remaining: 1.56s
606:	learn: 0.0472334	total: 2.4s	remaining: 1.55s
607:	learn: 0.0470942	total: 2.41s	remaining: 1.55s
608:	learn: 0.0469631	total: 2.41s	remaining: 1.55s
609:	learn: 0.0468216	total: 2.41s	remaining: 1.54s
610:	learn: 0.0467257	total: 2.42s	remaining: 1.54s
611:	learn: 0.0466487	total: 2.42s	remaining: 1.53s
612:	learn: 0.0465836	total: 2.42s	remaining: 1.53s
613:	learn: 0.0464751	total: 2.43s	remaining: 1.53s
614:	learn: 0.0463390	total: 2.43s	remaining: 1.52s
615:	learn: 0.0461581	total: 2.44s	remaining: 1.52s
616:	learn: 0.0461073	total: 2.44s	remaining: 1.51s
617:	learn: 0.0459859	total: 2.44s	remaining: 1.51s
618:	learn: 0.0458479	total: 2.45s	remaining: 1.51s
619:	learn: 0.0457274	total: 2.45s	remaining: 1.5s
620:	learn: 0.0455725	total: 2.45s	remaining: 1.5s
621:	learn: 0.04

808:	learn: 0.0301789	total: 3.17s	remaining: 749ms
809:	learn: 0.0301484	total: 3.18s	remaining: 745ms
810:	learn: 0.0300836	total: 3.18s	remaining: 741ms
811:	learn: 0.0300422	total: 3.19s	remaining: 738ms
812:	learn: 0.0299514	total: 3.19s	remaining: 734ms
813:	learn: 0.0298982	total: 3.19s	remaining: 730ms
814:	learn: 0.0298479	total: 3.2s	remaining: 726ms
815:	learn: 0.0298158	total: 3.2s	remaining: 722ms
816:	learn: 0.0297612	total: 3.21s	remaining: 718ms
817:	learn: 0.0297148	total: 3.21s	remaining: 715ms
818:	learn: 0.0296319	total: 3.21s	remaining: 711ms
819:	learn: 0.0295774	total: 3.22s	remaining: 707ms
820:	learn: 0.0295234	total: 3.22s	remaining: 703ms
821:	learn: 0.0294570	total: 3.23s	remaining: 699ms
822:	learn: 0.0293773	total: 3.23s	remaining: 695ms
823:	learn: 0.0293076	total: 3.23s	remaining: 691ms
824:	learn: 0.0292593	total: 3.24s	remaining: 687ms
825:	learn: 0.0292242	total: 3.24s	remaining: 683ms
826:	learn: 0.0291833	total: 3.25s	remaining: 679ms
827:	learn: 0.

[[15  0]
 [ 0 24]]
Accuracy for CatBoost Classifier 1.0


In [24]:
fea_imp=pd.DataFrame()
fea_imp['Features']=x_train.columns
fea_imp['Importance']=rf.feature_importances_


In [25]:
fea_imp=fea_imp.sort_values('Importance',ascending=False)

In [26]:
fea_imp

Unnamed: 0,Features,Importance
21,PPE,0.119725
18,spread1,0.099142
0,MDVP_Fo_Hz,0.068854
8,MDVP_Shimmer,0.064113
12,MDVP_APQ,0.058507
11,Shimmer_APQ_5,0.053558
19,spread2,0.048508
7,Jitter_DDP,0.047557
13,Shimmer_DDA,0.04281
1,MDVP_Fhi_Hz,0.041611


In [27]:
fea_imp=fea_imp.head(9)

In [28]:
fea_imp

Unnamed: 0,Features,Importance
21,PPE,0.119725
18,spread1,0.099142
0,MDVP_Fo_Hz,0.068854
8,MDVP_Shimmer,0.064113
12,MDVP_APQ,0.058507
11,Shimmer_APQ_5,0.053558
19,spread2,0.048508
7,Jitter_DDP,0.047557
13,Shimmer_DDA,0.04281


In [29]:
l1=list(fea_imp.Features)
l1

['PPE',
 'spread1',
 'MDVP_Fo_Hz',
 'MDVP_Shimmer',
 'MDVP_APQ',
 'Shimmer_APQ_5',
 'spread2',
 'Jitter_DDP',
 'Shimmer_DDA']

In [30]:
l1.append('status')

In [31]:
l1

['PPE',
 'spread1',
 'MDVP_Fo_Hz',
 'MDVP_Shimmer',
 'MDVP_APQ',
 'Shimmer_APQ_5',
 'spread2',
 'Jitter_DDP',
 'Shimmer_DDA',
 'status']

In [32]:
df=df.loc[:,['PPE',
 'MDVP_APQ',
 'spread2',
 'MDVP_Fhi_Hz',
 'MDVP_Fo_Hz',
 'Shimmer_APQ_5',
 'Jitter_DDP',
 'RPDE',
 'status']]

In [33]:
df.head()

Unnamed: 0,PPE,MDVP_APQ,spread2,MDVP_Fhi_Hz,MDVP_Fo_Hz,Shimmer_APQ_5,Jitter_DDP,RPDE,status
0,0.284654,0.02971,0.266482,157.302,119.992,0.0313,0.01109,0.414783,1
1,0.368674,0.04368,0.33559,148.65,122.4,0.04518,0.01394,0.458359,1
2,0.332634,0.0359,0.311173,131.111,116.682,0.03858,0.01633,0.429895,1
3,0.368975,0.03772,0.334147,137.871,116.676,0.04005,0.01505,0.434969,1
4,0.410335,0.04465,0.234513,141.781,116.014,0.04825,0.01966,0.417356,1


In [34]:
df[df.status==0]

Unnamed: 0,PPE,MDVP_APQ,spread2,MDVP_Fhi_Hz,MDVP_Fo_Hz,Shimmer_APQ_5,Jitter_DDP,RPDE,status
30,0.085569,0.00802,0.177551,206.896,197.076,0.0068,0.00498,0.422229,0
31,0.068501,0.00762,0.173319,209.512,199.228,0.00641,0.00402,0.432439,0
32,0.09632,0.00951,0.175181,215.203,198.383,0.00825,0.00339,0.465946,0
33,0.056141,0.00719,0.17854,211.604,202.266,0.00606,0.00278,0.368535,0
34,0.044539,0.00726,0.163519,211.526,203.184,0.0061,0.00283,0.340068,0
35,0.05761,0.00957,0.170183,210.565,201.464,0.0076,0.00314,0.344252,0
42,0.095032,0.01133,0.098648,247.326,237.226,0.01024,0.00507,0.305062,0
43,0.117399,0.01251,0.158266,248.834,241.404,0.01038,0.0047,0.457702,0
44,0.09147,0.01033,0.091608,250.912,243.439,0.00898,0.00327,0.438296,0
45,0.102706,0.01014,0.102083,255.034,242.852,0.00879,0.0035,0.431285,0


In [35]:
df.columns

Index(['PPE', 'MDVP_APQ', 'spread2', 'MDVP_Fhi_Hz', 'MDVP_Fo_Hz',
       'Shimmer_APQ_5', 'Jitter_DDP', 'RPDE', 'status'],
      dtype='object')

In [36]:
df.columns

Index(['PPE', 'MDVP_APQ', 'spread2', 'MDVP_Fhi_Hz', 'MDVP_Fo_Hz',
       'Shimmer_APQ_5', 'Jitter_DDP', 'RPDE', 'status'],
      dtype='object')

In [37]:
df.shape

(195, 9)

In [38]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   PPE            195 non-null    float64
 1   MDVP_APQ       195 non-null    float64
 2   spread2        195 non-null    float64
 3   MDVP_Fhi_Hz    195 non-null    float64
 4   MDVP_Fo_Hz     195 non-null    float64
 5   Shimmer_APQ_5  195 non-null    float64
 6   Jitter_DDP     195 non-null    float64
 7   RPDE           195 non-null    float64
 8   status         195 non-null    int64  
dtypes: float64(8), int64(1)
memory usage: 13.8 KB


In [39]:
df.describe()

Unnamed: 0,PPE,MDVP_APQ,spread2,MDVP_Fhi_Hz,MDVP_Fo_Hz,Shimmer_APQ_5,Jitter_DDP,RPDE,status
count,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0
mean,0.206552,0.024081,0.22651,197.104918,154.228641,0.017878,0.00992,0.498536,0.753846
std,0.090119,0.016947,0.083406,91.491548,41.390065,0.012024,0.008903,0.103942,0.431878
min,0.044539,0.00719,0.006274,102.145,88.333,0.0057,0.00204,0.25657,0.0
25%,0.137451,0.01308,0.174351,134.8625,117.572,0.00958,0.004985,0.421306,1.0
50%,0.194052,0.01826,0.218885,175.829,148.79,0.01347,0.00749,0.495954,1.0
75%,0.25298,0.0294,0.279234,224.2055,182.769,0.02238,0.011505,0.587562,1.0
max,0.527367,0.13778,0.450493,592.03,260.105,0.0794,0.06433,0.685151,1.0


In [40]:
x=df.drop('status',axis=1)
y=df.status
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.8,random_state=345)

In [41]:
x_train.shape,x_test.shape,y_train.shape,y_test.shape

((156, 8), (39, 8), (156,), (39,))

In [42]:
from catboost import CatBoostClassifier
from sklearn.metrics import confusion_matrix,accuracy_score
cbc_f=CatBoostClassifier()
cbc_f.fit(x_train,y_train)
y_pred=cbc_f.predict(x_test)
tab1=confusion_matrix(y_test,y_pred)
print(tab1)
print('Accuracy for CatBoost Classifier',accuracy_score(y_test,y_pred))

Learning rate set to 0.00466
0:	learn: 0.6876839	total: 3.29ms	remaining: 3.28s
1:	learn: 0.6815775	total: 5.57ms	remaining: 2.78s
2:	learn: 0.6768241	total: 7.06ms	remaining: 2.35s
3:	learn: 0.6716573	total: 8.46ms	remaining: 2.1s
4:	learn: 0.6674321	total: 9.87ms	remaining: 1.96s
5:	learn: 0.6631072	total: 11.4ms	remaining: 1.88s
6:	learn: 0.6582131	total: 13.1ms	remaining: 1.86s
7:	learn: 0.6536570	total: 15.1ms	remaining: 1.87s
8:	learn: 0.6478448	total: 17.1ms	remaining: 1.89s
9:	learn: 0.6430525	total: 18.6ms	remaining: 1.84s
10:	learn: 0.6392828	total: 20.1ms	remaining: 1.81s
11:	learn: 0.6348478	total: 21.7ms	remaining: 1.78s
12:	learn: 0.6299499	total: 23.1ms	remaining: 1.76s
13:	learn: 0.6258898	total: 24.7ms	remaining: 1.74s
14:	learn: 0.6209675	total: 26.2ms	remaining: 1.72s
15:	learn: 0.6154484	total: 28.6ms	remaining: 1.76s
16:	learn: 0.6115008	total: 31.5ms	remaining: 1.82s
17:	learn: 0.6077928	total: 34.4ms	remaining: 1.88s
18:	learn: 0.6036503	total: 37.1ms	remaining: 

179:	learn: 0.2524296	total: 387ms	remaining: 1.76s
180:	learn: 0.2514719	total: 390ms	remaining: 1.76s
181:	learn: 0.2508238	total: 392ms	remaining: 1.76s
182:	learn: 0.2498880	total: 394ms	remaining: 1.76s
183:	learn: 0.2489447	total: 397ms	remaining: 1.76s
184:	learn: 0.2478912	total: 399ms	remaining: 1.76s
185:	learn: 0.2470403	total: 402ms	remaining: 1.76s
186:	learn: 0.2455784	total: 405ms	remaining: 1.76s
187:	learn: 0.2441542	total: 408ms	remaining: 1.76s
188:	learn: 0.2430752	total: 411ms	remaining: 1.76s
189:	learn: 0.2418202	total: 413ms	remaining: 1.76s
190:	learn: 0.2406909	total: 416ms	remaining: 1.76s
191:	learn: 0.2392749	total: 418ms	remaining: 1.76s
192:	learn: 0.2379493	total: 420ms	remaining: 1.76s
193:	learn: 0.2370015	total: 422ms	remaining: 1.75s
194:	learn: 0.2357788	total: 424ms	remaining: 1.75s
195:	learn: 0.2352088	total: 426ms	remaining: 1.75s
196:	learn: 0.2346032	total: 427ms	remaining: 1.74s
197:	learn: 0.2334675	total: 429ms	remaining: 1.74s
198:	learn: 

356:	learn: 0.1386155	total: 759ms	remaining: 1.37s
357:	learn: 0.1382146	total: 762ms	remaining: 1.37s
358:	learn: 0.1379413	total: 764ms	remaining: 1.36s
359:	learn: 0.1374528	total: 766ms	remaining: 1.36s
360:	learn: 0.1371197	total: 768ms	remaining: 1.36s
361:	learn: 0.1367066	total: 771ms	remaining: 1.36s
362:	learn: 0.1362609	total: 773ms	remaining: 1.36s
363:	learn: 0.1358896	total: 776ms	remaining: 1.35s
364:	learn: 0.1354599	total: 778ms	remaining: 1.35s
365:	learn: 0.1350672	total: 780ms	remaining: 1.35s
366:	learn: 0.1347086	total: 781ms	remaining: 1.35s
367:	learn: 0.1343255	total: 783ms	remaining: 1.34s
368:	learn: 0.1339100	total: 785ms	remaining: 1.34s
369:	learn: 0.1336862	total: 786ms	remaining: 1.34s
370:	learn: 0.1333699	total: 788ms	remaining: 1.34s
371:	learn: 0.1330469	total: 790ms	remaining: 1.33s
372:	learn: 0.1324920	total: 792ms	remaining: 1.33s
373:	learn: 0.1318285	total: 794ms	remaining: 1.33s
374:	learn: 0.1316070	total: 796ms	remaining: 1.33s
375:	learn: 

533:	learn: 0.0884805	total: 1.09s	remaining: 952ms
534:	learn: 0.0883165	total: 1.09s	remaining: 951ms
535:	learn: 0.0881654	total: 1.1s	remaining: 949ms
536:	learn: 0.0879841	total: 1.1s	remaining: 947ms
537:	learn: 0.0877285	total: 1.1s	remaining: 946ms
538:	learn: 0.0875519	total: 1.1s	remaining: 944ms
539:	learn: 0.0873054	total: 1.11s	remaining: 942ms
540:	learn: 0.0869347	total: 1.11s	remaining: 941ms
541:	learn: 0.0866802	total: 1.11s	remaining: 939ms
542:	learn: 0.0865048	total: 1.11s	remaining: 938ms
543:	learn: 0.0863418	total: 1.11s	remaining: 935ms
544:	learn: 0.0861979	total: 1.12s	remaining: 933ms
545:	learn: 0.0860139	total: 1.12s	remaining: 931ms
546:	learn: 0.0858458	total: 1.12s	remaining: 929ms
547:	learn: 0.0856329	total: 1.12s	remaining: 927ms
548:	learn: 0.0854512	total: 1.13s	remaining: 925ms
549:	learn: 0.0851951	total: 1.13s	remaining: 924ms
550:	learn: 0.0850308	total: 1.13s	remaining: 922ms
551:	learn: 0.0848662	total: 1.13s	remaining: 920ms
552:	learn: 0.08

729:	learn: 0.0594275	total: 1.46s	remaining: 542ms
730:	learn: 0.0593241	total: 1.47s	remaining: 540ms
731:	learn: 0.0591838	total: 1.47s	remaining: 538ms
732:	learn: 0.0590506	total: 1.47s	remaining: 537ms
733:	learn: 0.0589933	total: 1.48s	remaining: 535ms
734:	learn: 0.0588955	total: 1.48s	remaining: 533ms
735:	learn: 0.0588012	total: 1.48s	remaining: 531ms
736:	learn: 0.0587411	total: 1.48s	remaining: 529ms
737:	learn: 0.0586450	total: 1.49s	remaining: 528ms
738:	learn: 0.0585018	total: 1.49s	remaining: 526ms
739:	learn: 0.0584125	total: 1.49s	remaining: 523ms
740:	learn: 0.0582606	total: 1.49s	remaining: 521ms
741:	learn: 0.0581155	total: 1.49s	remaining: 519ms
742:	learn: 0.0579850	total: 1.5s	remaining: 518ms
743:	learn: 0.0579287	total: 1.5s	remaining: 516ms
744:	learn: 0.0577887	total: 1.5s	remaining: 513ms
745:	learn: 0.0577017	total: 1.5s	remaining: 511ms
746:	learn: 0.0575997	total: 1.5s	remaining: 509ms
747:	learn: 0.0575154	total: 1.51s	remaining: 507ms
748:	learn: 0.057

938:	learn: 0.0416708	total: 1.87s	remaining: 121ms
939:	learn: 0.0416396	total: 1.87s	remaining: 119ms
940:	learn: 0.0415526	total: 1.87s	remaining: 117ms
941:	learn: 0.0414573	total: 1.87s	remaining: 115ms
942:	learn: 0.0413942	total: 1.88s	remaining: 113ms
943:	learn: 0.0413350	total: 1.88s	remaining: 112ms
944:	learn: 0.0412565	total: 1.88s	remaining: 110ms
945:	learn: 0.0411797	total: 1.9s	remaining: 108ms
946:	learn: 0.0410980	total: 1.9s	remaining: 106ms
947:	learn: 0.0410513	total: 1.9s	remaining: 104ms
948:	learn: 0.0410016	total: 1.9s	remaining: 102ms
949:	learn: 0.0409539	total: 1.91s	remaining: 100ms
950:	learn: 0.0408667	total: 1.91s	remaining: 98.3ms
951:	learn: 0.0407959	total: 1.91s	remaining: 96.3ms
952:	learn: 0.0407307	total: 1.91s	remaining: 94.3ms
953:	learn: 0.0406433	total: 1.92s	remaining: 92.3ms
954:	learn: 0.0405998	total: 1.92s	remaining: 90.3ms
955:	learn: 0.0405594	total: 1.92s	remaining: 88.3ms
956:	learn: 0.0404919	total: 1.92s	remaining: 86.3ms
957:	lear

In [43]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report,precision_score,recall_score,f1_score

In [44]:
print('Accuracy for CatBoost Classifier',accuracy_score(y_test,y_pred))
print('Precision for CatBoost Classifier',precision_score(y_test,y_pred))
print('Recall for CatBoost Classifier',recall_score(y_test,y_pred))
print('F1 Score for CatBoost Classifier',f1_score(y_test,y_pred))

Accuracy for CatBoost Classifier 0.9743589743589743
Precision for CatBoost Classifier 0.96
Recall for CatBoost Classifier 1.0
F1 Score for CatBoost Classifier 0.9795918367346939


In [45]:
import pickle
import numpy as np
filename='model.pkl'
pickle.dump(cbc_f,open(filename,'wb'))

In [46]:
model=pickle.load(open('model.pkl','rb'))