https://archive.ics.uci.edu/ml/datasets/Parkinson+Speech+Dataset+with++Multiple+Types+of+Sound+Recordings

https://en.wikipedia.org/wiki/Unified_Parkinson's_disease_rating_scale

Attribute information
* column 1: Subject id 
* colum 2-27: features 
* features 1-5: Jitter (local),Jitter (local, absolute),Jitter (rap),Jitter (ppq5),Jitter (ddp), 
* features 6-11: Shimmer (local),Shimmer (local, dB),Shimmer (apq3),Shimmer (apq5), Shimmer (apq11),Shimmer (dda), 
* features 12-14: AC,NTH,HTN, 
* features 15-19: Median pitch,Mean pitch,Standard deviation,Minimum pitch,Maximum pitch, 
* features 20-23: Number of pulses,Number of periods,Mean period,Standard deviation of period, 
* features 24-26: Fraction of locally unvoiced frames,Number of voice breaks,Degree of voice breaks 
* column 28: UPDRS (NOT IN TEST)
* column 29: class information 

Training Dataset 

Each subject has 26 voice samples including sustained vowels, numbers, words and short sentences. The voice samples in the training data file are given in the following order: 
* sample# - corresponding voice samples 
* 1: sustained vowel (aaaâ€¦â€¦) 
* 2: sustained vowel (oooâ€¦...) 
* 3: sustained vowel (uuuâ€¦...) 
* 4-13: numbers from 1 to 10 
* 14-17: short sentences 
* 18-26: words 

Test Dataset

28 PD patients are asked to say only the sustained vowels 'a' and 'o' three times respectively which makes a total of 168 recordings (each subject has 6 voice samples). The voice samples in the test data file are given in the following order: 

sample# - corresponding voice samples 
* 1-3: sustained vowel (aaaâ€¦â€¦) 
* 4-6: sustained vowel (oooâ€¦â€¦) 


### Configuration of Notebook

In [3]:
import pandas as pd
import tensorflow as tf
import tempfile
import tensorflow.contrib.learn.python.learn as skflow
from sklearn import datasets, metrics
import numpy as np

### Load Datasets

In [4]:
# Initialize Variables
num_runs = 1 # run once, but have infrastructure built for multiple statistical runs
nSteps = 100000 # increase these

LABEL_COLUMN = "class_information"
CLASSIFIER_TYPE = "SLR"

In [8]:
COLUMN_NAMES = ["subject_id", "jitter_local", "jitter_local_absolute", "jitter_rap", "jitter_ppq5",
                 "jitter_ddp","shimmer_local","shimmer_local_db","shimmer_apq3","shimmer_apq5",
                 "shimmer_apq11","shimmer_dda","ac","nth","htn","pitch_median","pitch_mean","pitch_stddev",
                 "pitch_min","pitch_max","number_of_pulses", "number_of_periods", "period_mean",
                 "period_stddev","locally_unvoiced_frames_fraction","number_of_voice_breaks",
                 "degree_of_voice_breaks","updrs","class_information"]

df_train = pd.read_csv('data/train_data.txt', names = COLUMN_NAMES,header=None)

# test columns are the same but doesn't contain "UPDRS"
COLUMN_NAMES.remove("updrs")
df_test = pd.read_csv('data/test_data.txt', names = COLUMN_NAMES,header=None)

COLUMN_NAMES.remove("class_information")
COLUMN_NAMES.remove("subject_id")

### Function Definitions

In [12]:
def get_data():
    ### Only grab data that has the classification we want
    if LABEL_COLUMN == "updrs": 
        df = df_train
    else: 
        df = df_train.append(df_test)
    return df

def convert_data():
    df = get_data()
    ### Converting Data into Tensors
    X = np.array(df[COLUMN_NAMES].iloc[:,]).astype(np.float32)
    ### Reduce Labels to discrete values
    if CLASSIFIER_TYPE == "SLR": 
        Y = np.array(df[LABEL_COLUMN].iloc[:,]).astype(np.float32)
    else:
        labels = list(set(df[LABEL_COLUMN]))
        Y = np.array([labels.index(x) for x in df[LABEL_COLUMN]]).astype(np.float32)
        
    feature_columns = skflow.infer_real_valued_columns_from_input(X)
    return X, Y, feature_columns

def slice_data():
    ### slice data
    df = get_data()
    randomInd = np.random.permutation(df.shape[0]) # number of records (1040+168)
    mid = int(.7*df.shape[0]) # ~70% for train, and 30% for testing
    trainidx = randomInd[:mid]
    testidx = randomInd[mid:]
    labelcats = df[LABEL_COLUMN].unique().shape[0]+1
    return trainidx, testidx,labelcats

### Model 1 - Deep Neural Network, class_information label

In [13]:
LABEL_COLUMN = "class_information"
X, Y, feature_columns = convert_data()
trainidx, testidx, labelcats = slice_data()
    
classifier = skflow.DNNClassifier(
    hidden_units=[48, 24, 24], 
    n_classes= labelcats, # set to be the number of distinct categories of labels
    feature_columns=feature_columns,
    enable_centered_bias=False,
    model_dir='models/SpeechDataset/DNN/'+LABEL_COLUMN+"_model_1")



In [14]:
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=300)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]
print metrics.accuracy_score(y_t, y_p)

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.661157024793


In [15]:
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=300)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]
print metrics.accuracy_score(y_t, y_p)

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.67217630854


In [16]:
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=1000)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]
print metrics.accuracy_score(y_t, y_p)

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.674931129477


### Model 2 - Deep Neural Network, UPDRS label

In [21]:
LABEL_COLUMN = "updrs"
CLASSIFIER_TYPE = "DNN"

X, Y, feature_columns = convert_data()
trainidx, testidx, labelcats = slice_data()
    
classifier = skflow.DNNClassifier(
    hidden_units=[48, 24, 24], 
    n_classes= labelcats, # set to be the number of distinct categories of labels
    feature_columns=feature_columns,
    enable_centered_bias=False,
    model_dir='models/SpeechDataset/DNN/'+LABEL_COLUMN+"_model_1")



In [22]:
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=1000)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]
print metrics.accuracy_score(y_t, y_p)

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.509615384615


In [23]:
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=1000)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]
print metrics.accuracy_score(y_t, y_p)

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.512820512821


In [24]:
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=1000)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]
print metrics.accuracy_score(y_t, y_p)

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.5


### Model 3 - Simple Linear Regression, UPDRS label

In [27]:
LABEL_COLUMN = "updrs"
CLASSIFIER_TYPE = "SLR"

X, Y, feature_columns = convert_data()
trainidx, testidx, labelcats = slice_data()

### Define classifier - Simple Linear Regression
classifier = skflow.LinearRegressor(
    feature_columns=feature_columns,
    model_dir='models/SpeechDataset/SLR/'+LABEL_COLUMN+"_model_3",
    enable_centered_bias=False)



In [28]:
%%time
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=1000)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]

print np.linalg.norm(y_t-y_p)/y_t.shape[0]

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.882544982128


In [29]:
%%time
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=1000)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]

print np.linalg.norm(y_t-y_p)/y_t.shape[0]

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.881388248541


In [30]:
%%time
### Fit Model
classifier.fit(X[trainidx,:], Y[trainidx].astype(np.int), steps=5000)

### Evaluate Model
y_p=classifier.predict(X[testidx,:])
y_t=Y[testidx]

print np.linalg.norm(y_t-y_p)/y_t.shape[0]

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


0.879742451203
CPU times: user 12.7 s, sys: 1.31 s, total: 14 s
Wall time: 11.8 s


In [33]:
classifier = skflow.LinearRegressor(
    feature_columns=feature_columns,
    model_dir='models/SpeechDataset/SLR/'+LABEL_COLUMN+"_model_3",
    enable_centered_bias=False)

### Implement Model
y_p=classifier.predict(X[4,:])
print y_p

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.


[ 14.26342964]
