# Assignment: Automated classification to diagnose cardiac Single Proton Emission Computed Tomography (SPECT) images

We will be creating a classifier to diagnose a subject's Single Photon Emission Computed Tomography image to determine if he or she is normal or abnormal, in terms of his or her's heart. This is based on the data set present here: <a href= "https://archive.ics.uci.edu/ml/datasets/SPECT+Heart">Link </a>. 

There are 23 attributes recorded from 267 subjects. Note that the presence of a partial diagnosis in the data is indicated with a 1 and the lack of a partial diagnosis is recorded with a 0. If the subject receives an abnormal overall diagnosis, it will be indicated with a 1 and a 0 for a normal overall diagnosis. The following information describes these attributes and the values they can have: 

<b>Attribute Information:</b>
1. OVERALL_DIAGNOSIS: 0,1 (class attribute, binary) 
2. F1: 0,1 (the partial diagnosis 1, binary) 
3. F2: 0,1 (the partial diagnosis 2, binary) 
4. F3: 0,1 (the partial diagnosis 3, binary) 
5. F4: 0,1 (the partial diagnosis 4, binary) 
6. F5: 0,1 (the partial diagnosis 5, binary) 
7. F6: 0,1 (the partial diagnosis 6, binary) 
8. F7: 0,1 (the partial diagnosis 7, binary) 
9. F8: 0,1 (the partial diagnosis 8, binary) 
10. F9: 0,1 (the partial diagnosis 9, binary) 
11. F10: 0,1 (the partial diagnosis 10, binary) 
12. F11: 0,1 (the partial diagnosis 11, binary) 
13. F12: 0,1 (the partial diagnosis 12, binary) 
14. F13: 0,1 (the partial diagnosis 13, binary) 
15. F14: 0,1 (the partial diagnosis 14, binary) 
16. F15: 0,1 (the partial diagnosis 15, binary) 
17. F16: 0,1 (the partial diagnosis 16, binary) 
18. F17: 0,1 (the partial diagnosis 17, binary) 
19. F18: 0,1 (the partial diagnosis 18, binary) 
20. F19: 0,1 (the partial diagnosis 19, binary) 
21. F20: 0,1 (the partial diagnosis 20, binary) 
22. F21: 0,1 (the partial diagnosis 21, binary) 
23. F22: 0,1 (the partial diagnosis 22, binary) 

The dataset is divided into:
1. Training data ("SPECT.train" 80 instances)
2. Testing data ("SPECT.test" 187 instances)

We will use this data to build a classifier to diagnose the cardiac SPECT images of the patients in the test data set that has been given to us above. The attribute of "OVERALL_DIAGNOSIS" will be used at the training target for the classifier. 

# Perform the following steps to build your classifier: 

## 1. Open the file 'SPECT.train' in a text editor. 

Note: the extension is ‘.train’, and even though it is a text file, your operating system may not recognize it as one. One option is to change the extension (to something like ‘.txt’). Another option is to start your text editor and open the file through the editor. In any case doing this will give you a chance to observe the structure of the data while you read the data file details. 

## 2. Read the text file in as a dataframe, and then print it to make sure it was read in correctly. 

This data set does not include column headers. Therefore, when you are reading in the file, you can either make sure the read function is set up to not expect column headers or you can supply your own 23-item lists of strings to serve as column headers. You can use the list below as a starting point. 

['overall_diagnosis', ‘partial_1’,’partial_2’,’partial_3’,’partial_4’.....] 

In [94]:
import pandas as pd
spect_train = pd.read_csv('SPECT.train',names=['overall_diagnosis','partial_1','partial_2','partial_3','partial_4','partial_5','partial_6','partial_7','partial_8','partial_9','partial_10','partial_11','partial_12','partial_13','partial_14','partial_15','partial_16','partial_17','partial_18','partial_19','partial_20','partial_21','partial_22'])
print(spect_train)

    overall_diagnosis  partial_1  partial_2  partial_3  partial_4  partial_5  \
0                   1          0          0          0          1          0   
1                   1          0          0          1          1          0   
2                   1          1          0          1          0          1   
3                   1          0          0          0          0          0   
4                   1          0          0          0          0          0   
5                   1          0          0          0          1          0   
6                   1          1          0          1          1          0   
7                   1          0          0          1          0          0   
8                   1          0          0          1          0          0   
9                   1          0          1          0          0          0   
10                  1          1          1          0          0          1   
11                  1          1        

## 3. Create an empty training data array, and call it training_data. 

This should be a array that has 80 rows and 22 columns, and starts filled with all zeros. 

In [95]:
import numpy as np
training_data = np.zeros((80,22)) 

## 4. Now fill the array with the training data, using the data frame that you read your data into in step #2. Remember to skip the first column when filling the array, since we are using the overall diagnosis as our training point. Also, print the training_data array to make sure it worked. 

To convert a PANDAS dataframe into a numpy 2D array, we want to use the PANDAS library "as_matrix" function (https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.as_matrix.html). 

When using this function, you can specify the columns you want to import by creating a list with the column names. In this case, we want the columns from 'partial_1' to 'partial_22'. 

In [96]:
training_data = spect_train.as_matrix(columns=['partial_1','partial_2','partial_3','partial_4','partial_5','partial_6','partial_7','partial_8','partial_9','partial_10','partial_11','partial_12','partial_13','partial_14','partial_15','partial_16','partial_17','partial_18','partial_19','partial_20','partial_21','partial_22']) 
print(training_data)

[[0 0 0 ..., 0 0 0]
 [0 0 1 ..., 0 0 1]
 [1 0 1 ..., 0 0 0]
 ..., 
 [1 0 0 ..., 0 0 0]
 [0 0 1 ..., 0 1 1]
 [1 0 0 ..., 0 0 0]]


## 5. Open the file 'SPECT.test' in a text editor. 


Note: the extension is ‘.test’, and even though it is a text file, your operating system may not recognize it as one. One option is to change the extension (to something like ‘.txt’). Another option is to start your text editor and open the file through the editor. In any case doing this will give you a chance to observe the structure of the data while you read the data file details. 

## 6. Read the text file in as a dataframe, and then print it to make sure it was read in correctly. 

This data set does not include column headers. Therefore, when you are reading in the file, you can either make sure the read function is set up to not expect column headers or you can supply your own 23-item lists of strings to serve as column headers, like you did in step #2. 

In [97]:
import pandas as pd
spect_test = pd.read_csv('SPECT.test',names=['overall_diagnosis','partial_1','partial_2','partial_3','partial_4','partial_5','partial_6','partial_7','partial_8','partial_9','partial_10','partial_11','partial_12','partial_13','partial_14','partial_15','partial_16','partial_17','partial_18','partial_19','partial_20','partial_21','partial_22'])
print(spect_test)

     overall_diagnosis  partial_1  partial_2  partial_3  partial_4  partial_5  \
0                    1          1          0          0          1          1   
1                    1          1          0          0          1          1   
2                    1          0          0          0          1          0   
3                    1          0          1          1          1          0   
4                    1          0          0          1          0          0   
5                    1          0          0          1          1          0   
6                    1          1          0          0          1          0   
7                    1          1          0          0          1          0   
8                    1          0          0          0          0          0   
9                    1          1          0          0          1          1   
10                   1          1          0          0          0          1   
11                   1      

## 7. Create an empty test data array, and call it test_data. 

This should be an array that is 187 rows and 22 columns, and starts filled with all zeros.

In [98]:
import numpy as np
test_data = np.zeros((187,22))

## 8. Now fill the array with the test data, using the data frame that you read your data into in step #6.  Also, print the test_data array to make sure it worked. 

To convert a PANDAS dataframe into a numpy 2D array, we want to use the PANDAS library "as_matrix" function (https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.as_matrix.html). 

When using this function, you can specify the columns you want to import by creating a list with the column names. 

In [99]:
test_data = spect_test.as_matrix(columns=['partial_1','partial_2','partial_3','partial_4','partial_5','partial_6','partial_7','partial_8','partial_9','partial_10','partial_11','partial_12','partial_13','partial_14','partial_15','partial_16','partial_17','partial_18','partial_19','partial_20','partial_21','partial_22']) 
print(test_data)

[[1 0 0 ..., 1 0 0]
 [1 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 1]
 ..., 
 [1 0 1 ..., 0 0 0]
 [1 0 1 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]]


## 9. Train your classifier on training_data and overall_diagnosis (as the training target). Then test the created classifer on the test_data you just retrieved to diagnose each test subject's corresponding SPECT image. 

### 9a. Create an overall_diagnosis array and fill it with the values from the first column of the training data frame. 

In [100]:
overall_diagnosis = spect_train.as_matrix(columns=['overall_diagnosis'])
print(overall_diagnosis)
print(overall_diagnosis.shape)

[[1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]]
(80, 1)


### 9b. Use the k-nearest neighbors classifier with a k-value = 3. Fit the classifier on the training_data array using the overall_diagnosis array as the training point. 

In [101]:
from sklearn import neighbors

#k-NN classifier for k=3
k3 = neighbors.KNeighborsClassifier(3,weights='distance')
k3.fit(training_data,overall_diagnosis.ravel())

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='distance')

### 9c. Clearly display which of the 187 test subjects will receive an abnormal diagnosis from his or her SPECT image. 

We can determine the overall accuracy, average recall, and average precious of the predictions by comparing them to the overall diagnosis values from the test data set. Therefore, we want to create an array of the overall diagnosis values from the test data set and then, perform the various computations. 

In [102]:
#Predictions of overall diagnosis from the k-classifier
k3_predictions = k3.predict(test_data)

#Predictions from the classifier in a 2d-array
k3_predictions_2d = np.reshape(k3_predictions,(187,-1))
print(k3_predictions_2d)

[[1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [0]
 [0]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [0]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [0]
 [1]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [1]
 [0]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [0]
 [0]
 [1]
 [1]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]]


In [103]:
#Actual overall diagnosis values from the test data
overall_diagnosis_test = overall_diagnosis = spect_test.as_matrix(columns=['overall_diagnosis'])
print(overall_diagnosis_test)
print(overall_diagnosis_test.shape)

[[1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]]
(187, 1)


In [104]:
from sklearn.metrics import accuracy_score,recall_score,precision_score,confusion_matrix

#Overall accuracy
overall_accuracy = accuracy_score(overall_diagnosis_test,k3_predictions_2d)
print('Overall Accuracy of the Classifier Predictions')
print(overall_accuracy)

#Average recall
average_recall = recall_score(overall_diagnosis_test,k3_predictions_2d,average='macro')
print('Average recall of the Classifier Predictions')
print(average_recall)

#Average precision
average_prec = precision_score(overall_diagnosis_test,k3_predictions_2d,average='macro')
print('Average precision of the Classifier Predictions')
print(average_prec)

Overall Accuracy of the Classifier Predictions
0.737967914439
Average recall of the Classifier Predictions
0.553294573643
Average precision of the Classifier Predictions
0.521853146853
