In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from path import Path

# SPECTF Heart Data Set

## Data Set Information

The dataset describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each of the patients is classified into two categories: normal and abnormal. The database of 267 SPECT image sets (patients) was processed to extract features that summarize the original SPECT images. As a result, 44 continuous feature pattern was created for each patient.

## Attribute Information:

1. OVERALL_DIAGNOSIS: 0,1 (class attribute, binary)
2. F1R: continuous (count in ROI (region of interest) 1 in rest)
3. F1S: continuous (count in ROI 1 in stress)
4. F2R: continuous (count in ROI 2 in rest)
5. F2S: continuous (count in ROI 2 in stress)
6. F3R: continuous (count in ROI 3 in rest)
7. F3S: continuous (count in ROI 3 in stress)
8. F4R: continuous (count in ROI 4 in rest)
9. F4S: continuous (count in ROI 4 in stress)
10. F5R: continuous (count in ROI 5 in rest)
11. F5S: continuous (count in ROI 5 in stress)
12. F6R: continuous (count in ROI 6 in rest)
13. F6S: continuous (count in ROI 6 in stress)
14. F7R: continuous (count in ROI 7 in rest)
15. F7S: continuous (count in ROI 7 in stress)
16. F8R: continuous (count in ROI 8 in rest)
17. F8S: continuous (count in ROI 8 in stress)
18. F9R: continuous (count in ROI 9 in rest)
19. F9S: continuous (count in ROI 9 in stress)
20. F10R: continuous (count in ROI 10 in rest)
21. F10S: continuous (count in ROI 10 in stress)
22. F11R: continuous (count in ROI 11 in rest)
23. F11S: continuous (count in ROI 11 in stress)
24. F12R: continuous (count in ROI 12 in rest)
25. F12S: continuous (count in ROI 12 in stress)
26. F13R: continuous (count in ROI 13 in rest)
27. F13S: continuous (count in ROI 13 in stress)
28. F14R: continuous (count in ROI 14 in rest)
29. F14S: continuous (count in ROI 14 in stress)
30. F15R: continuous (count in ROI 15 in rest)
31. F15S: continuous (count in ROI 15 in stress)
32. F16R: continuous (count in ROI 16 in rest)
33. F16S: continuous (count in ROI 16 in stress)
34. F17R: continuous (count in ROI 17 in rest)
35. F17S: continuous (count in ROI 17 in stress)
36. F18R: continuous (count in ROI 18 in rest)
37. F18S: continuous (count in ROI 18 in stress)
38. F19R: continuous (count in ROI 19 in rest)
39. F19S: continuous (count in ROI 19 in stress)
40. F20R: continuous (count in ROI 20 in rest)
41. F20S: continuous (count in ROI 20 in stress)
42. F21R: continuous (count in ROI 21 in rest)
43. F21S: continuous (count in ROI 21 in stress)
44. F22R: continuous (count in ROI 22 in rest)
45. F22S: continuous (count in ROI 22 in stress)

**Data Source:** [SPECTF Heart Data Set](https://archive.ics.uci.edu/ml/datasets/SPECTF+Heart). Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

In [2]:
# Load dataset
file_path = Path("../Resources/heart_data.csv")
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,OVERALL_DIAGNOSIS,F1R,F1S,F2R,F2S,F3R,F3S,F4R,F4S,F5R,...,F18R,F18S,F19R,F19S,F20R,F20S,F21R,F21S,F22R,F22S
0,1,63,70,64,72,56,64,58,69,68,...,66,69,59,58,53,64,59,54,43,49
1,1,69,71,70,78,61,63,67,65,59,...,61,61,66,65,72,73,68,68,59,63
2,1,65,62,67,68,65,67,71,71,64,...,67,63,74,63,77,79,68,70,59,56
3,0,75,75,70,77,67,75,75,75,67,...,71,69,66,63,70,73,66,68,58,59
4,1,65,56,67,58,76,79,70,71,59,...,68,63,54,38,57,52,71,74,59,65


In [3]:
# Define the X (features) and y (target) sets
y = df["OVERALL_DIAGNOSIS"].values
X = df.drop("OVERALL_DIAGNOSIS", axis=1)

Split our data into training and testing data

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

Create a logistic regression model

In [5]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(max_iter=10000)
classifier

LogisticRegression(max_iter=10000)

Fit (train) our model by using the training data

In [6]:
classifier.fit(X_train, y_train)

LogisticRegression(max_iter=10000)

Validate the model by using the test data

In [7]:
print(f"Training Data Score: {classifier.score(X_train, y_train)}")
print(f"Testing Data Score: {classifier.score(X_test, y_test)}")

Training Data Score: 0.925
Testing Data Score: 0.835820895522388


In [8]:
from sklearn.metrics import confusion_matrix

y_true = y_test
y_pred = classifier.predict(X_test)
confusion_matrix(y_true, y_pred)

array([[ 9,  6],
       [ 5, 47]], dtype=int64)

In [9]:
confusion_matrix(y, classifier.predict(X))

array([[ 40,  15],
       [ 11, 201]], dtype=int64)