# Introduction 

Neurodegenerative diseases are a heterogeneous group of disorders that are characterized
by the progressive degeneration of the structure and function of the nervous system. They
are incurable and debilitating conditions that cause problems with mental functioning
also called dementias.  

Neurodegenerative diseases affect millions of people worldwide. Alzheimer’s disease and
Parkinson’s disease are the most common neurodegenerative diseases. In 2016, an
estimated 5.4 million Americans were living with Alzheimer’s disease. An estimated
930,000 people in the United States could be living with Parkinson’s disease by 2020.

The goal of this project is to build a model to accurately predict the presence of a
neurodegenerative disease in an individual as early detection of a neurodegenerative
disease could be useful for the identification of people who can participate in trials of
neuroprotective agents, or ultimately to try and halt disease progression once effective
disease-modifying interventions have been identified.
Fortunately, in the last couple of months, you have been acquiring the needed skill to
build machine learning models in python. Now it is your time to put your skill to test!


For this task, you should do the following:

* Obtain any neurodegenerative disease dataset of your choice from a reliable source. For example, [UCI ML Parkinson’s dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/). Perform exploratory data analysis on this dataset.
* Highlight at least 3 important questions that you would like to address, which will provide information to the public and help to create awareness about neurodegenerative diseases.
* Carry out a detailed data analysis to answer your questions above. Your analysis should be done in Jupiter notebook with sections, headings and comments that aid readability.
* Build a model that accurately predicts the presence of a neurodegenerative disease in an individual using you dataset in (1) above. The python libraries; scikit-learn, numpy, pandas, and xgboost might be useful. Remember to evaluate the accuracy of your model.
* Submit your complete notebook to your GitHub repository and provide the link here. Submission is due on or before August 7, 2020.
* Write a blog on neurodegenerative diseases, your dataset, the analysis you performed and the model that you built, your findings and recommendations. Give your blog a unique title. The blog should be written here (https://www.datainsightonline.com/blog) with code from your Jupyter notebook embedded similar to Python: The Language for Data Science. The due date for completing the blog is August 14, 2020.



**Question:**
* How is correlation between features?
* The most important feature for predicting prakinson disease
* What is the recommendation

In [1]:
import numpy as np
import pandas as pd

> The metadata from this dataset is a file, so we need to open it in a programming way.

In [3]:
# Read the dataset metadata
with open('parkinsons.names') as txt:
    for t in txt:
        print(t)

Title: Parkinsons Disease Data Set



Abstract: Oxford Parkinson's Disease Detection Dataset



-----------------------------------------------------	



Data Set Characteristics: Multivariate

Number of Instances: 197

Area: Life

Attribute Characteristics: Real

Number of Attributes: 23

Date Donated: 2008-06-26

Associated Tasks: Classification

Missing Values? N/A



-----------------------------------------------------	



Source:



The dataset was created by Max Little of the University of Oxford, in 

collaboration with the National Centre for Voice and Speech, Denver, 

Colorado, who recorded the speech signals. The original study published the 

feature extraction methods for general voice disorders.



-----------------------------------------------------



Data Set Information:



This dataset is composed of a range of biomedical voice measurements from 

31 people, 23 with Parkinson's disease (PD). Each column in the table is a 

particular voice measure, and each row cor

Data Set Characteristics: Multivariate

Number of Instances: 197

Number of Attributes: 23  


Data Set Information:

This dataset is composed of a range of biomedical voice measurements from 

31 people, 23 with Parkinson's disease (PD). Each column in the table is a 

particular voice measure, and each row corresponds one of 195 voice 

recording from these individuals ("name" column). The main aim of the data 

is to discriminate healthy people from those with PD, according to "status" 

column which is set to 0 for healthy and 1 for PD.  



Attribute Information:

Matrix column entries (attributes):

name - ASCII subject name and recording number

MDVP:Fo(Hz) - Average vocal fundamental frequency

MDVP:Fhi(Hz) - Maximum vocal fundamental frequency

MDVP:Flo(Hz) - Minimum vocal fundamental frequency

MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Several 

measures of variation in fundamental frequency

MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude

NHR,HNR - Two measures of ratio of noise to tonal components in the voice

status - Health status of the subject (one) - Parkinson's, (zero) - healthy

RPDE,D2 - Two nonlinear dynamical complexity measures

DFA - Signal fractal scaling exponent

spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation 


In [2]:
# Import the dataframe using Pandas

df = pd.read_csv('parkinsons.data')
df

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,0.00007,0.00370,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.400,148.650,113.819,0.00968,0.00008,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.335590,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.01050,0.00009,0.00544,0.00781,0.01633,0.05233,...,0.08270,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,0.00009,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.10470,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.332180,0.410335
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,phon_R01_S50_2,174.188,230.978,94.261,0.00459,0.00003,0.00263,0.00259,0.00790,0.04087,...,0.07008,0.02764,19.517,0,0.448439,0.657899,-6.538586,0.121952,2.657476,0.133050
191,phon_R01_S50_3,209.516,253.017,89.488,0.00564,0.00003,0.00331,0.00292,0.00994,0.02751,...,0.04812,0.01810,19.147,0,0.431674,0.683244,-6.195325,0.129303,2.784312,0.168895
192,phon_R01_S50_4,174.688,240.005,74.287,0.01360,0.00008,0.00624,0.00564,0.01873,0.02308,...,0.03804,0.10715,17.883,0,0.407567,0.655683,-6.787197,0.158453,2.679772,0.131728
193,phon_R01_S50_5,198.764,396.961,74.904,0.00740,0.00004,0.00370,0.00390,0.01109,0.02296,...,0.03794,0.07223,19.020,0,0.451221,0.643956,-6.744577,0.207454,2.138608,0.123306
