Heart Disease Predictor
https://archive.ics.uci.edu/ml/datasets/Heart+Disease

## Data pre-processing
### Features chosen
1. age:age in years <br />
2. sex-:(1 = male; 0 = female) <br />
3. cp:chest pain type <br />
4. trestbps-resting blood pressure (in mm Hg on admission to the hospital) <br />
5. chol:serum cholestoral in mg/dl <br />
6. fbs:(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) <br />
7. restecg:resting electrocardiographic results <br />
8. thalach:maximum heart rate achieved <br />
9. exang:exercise induced angina (1 = yes; 0 = no) <br />
10. oldpeak:ST depression induced by exercise relative to rest <br />
11. slope:the slope of the peak exercise ST segment <br />
12. ca:number of major vessels (0-3) colored by flourosopy <br />
13. thal:3 = normal; 6 = fixed defect; 7 = reversable defect <br />
14. target:1 or 0

In [1]:
import pandas as pd

dataset = pd.read_csv("heart.csv")
dataset.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [2]:
# detailed statistical info of the dataset 
dataset.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,303.0,54.366337,9.082101,29.0,47.5,55.0,61.0,77.0
sex,303.0,0.683168,0.466011,0.0,0.0,1.0,1.0,1.0
cp,303.0,0.966997,1.032052,0.0,0.0,1.0,2.0,3.0
trestbps,303.0,131.623762,17.538143,94.0,120.0,130.0,140.0,200.0
chol,303.0,246.264026,51.830751,126.0,211.0,240.0,274.5,564.0
fbs,303.0,0.148515,0.356198,0.0,0.0,0.0,0.0,1.0
restecg,303.0,0.528053,0.52586,0.0,0.0,1.0,1.0,2.0
thalach,303.0,149.646865,22.905161,71.0,133.5,153.0,166.0,202.0
exang,303.0,0.326733,0.469794,0.0,0.0,0.0,1.0,1.0
oldpeak,303.0,1.039604,1.161075,0.0,0.0,0.8,1.6,6.2


### Feature scaling

In [3]:
dataset.columns

Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
      dtype='object')

In [4]:
# age is taken in groups as a categorical variable
# target variable is not normalized as 
cols_to_norm = ['sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach','exang', 'oldpeak', 'slope', 'ca', 'thal']

# OR use the MinMaxScalar in sklearn
dataset[cols_to_norm] = dataset[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

dataset.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1.0,1.0,0.481132,0.244292,1.0,0.0,0.603053,0.0,0.370968,0.0,0.0,0.333333,1
1,37,1.0,0.666667,0.339623,0.283105,0.0,0.5,0.885496,0.0,0.564516,0.0,0.0,0.666667,1
2,41,0.0,0.333333,0.339623,0.178082,0.0,0.0,0.770992,0.0,0.225806,1.0,0.0,0.666667,1
3,56,1.0,0.333333,0.245283,0.251142,0.0,0.5,0.816794,0.0,0.129032,1.0,0.0,0.666667,1
4,57,0.0,0.0,0.245283,0.520548,0.0,0.5,0.70229,1.0,0.096774,1.0,0.0,0.666667,1


### Logistic Regression Implementation

In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [6]:
lr = LogisticRegression()

In [7]:
dataset.columns.values

array(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'], dtype=object)

In [8]:
cols = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal']

In [9]:
X = dataset[cols]
y = dataset['target']

In [10]:
lr.fit(X,y)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

In [11]:
lr.score(X,y)

0.84158415841584155

In [24]:
results = lr.predict_proba([X.iloc[1]])
print ("Chance of getting heart attack:", results[0][1]*100, '%')

Chance of getting heart attack: 62.1511004811 %
