## Heart Disease Predict Model

This model can be used to predict whether a patient has heart disease.

## DataSource

[Heart-disease-uci Kaggle](https://www.kaggle.com/ronitf/heart-disease-uci)

## Features
* age, age: 年龄
* sex, sex: 性别
* cp, chest pain type (4 values): 胸痛类型(1: 典型心绞痛, 2: 非典型心绞痛, 3: 无心绞痛, 4: 无症状)
* trestbps, resting blood pressure: 静息血压
* chol, serum cholestoral in mg/dl: 血清胆固醇含量 毫克/分升（10dl = 1l）
* fbs, fasting blood sugar > 120 mg/dl:空腹血糖 > 120 mg/dl
* restecg, resting electrocardiographic results (values 0,1,2): 静息心电图结果
* thalac, maximum heart rate achieved: 最大心率
* exang, exercise induced angina: 运动性心绞痛
* oldpeak, oldpeak = ST depression induced by exercise relative to rest: 运动引起的相对静息状态的[ST段](https://en.wikipedia.org/wiki/ST_segment)压低
* slope, the slope of the peak exercise ST segment: 最大运动时ST段的斜率
* ca, number of major vessels (0-3) colored by flourosopy: 荧光镜透视下的主要血管数目
* thal, thal: 3 = normal; 6 = fixed defect; 7 = de defe: 地中海贫血（3=正常；6=固定缺陷；7=可逆缺陷）
* target: 是否患有心脏疾病 (0 = 否, 1 = 是)

In [170]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import altair as alt

In [171]:
path = "tests/data/hdresult.csv"
# path = "tests/data/hdorigin.csv"
df = pd.read_csv(path)
df

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,oldpeak_small,target,target_predict,diff,fp,fn
0,59.0,1.0,3.0,170.0,288.0,0.0,0.0,159.0,0.0,0.2,1.0,0.0,3.0,1.0,1.0,0.0,True,False,True
1,54.0,1.0,1.0,192.0,283.0,0.0,0.0,195.0,0.0,0.0,2.0,1.0,3.0,1.0,0.0,0.0,False,False,False
2,47.0,1.0,2.0,130.0,253.0,0.0,1.0,179.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0,1.0,False,False,False
3,65.0,1.0,0.0,135.0,254.0,0.0,0.0,127.0,0.0,2.8,1.0,1.0,3.0,0.0,0.0,0.0,False,False,False
4,71.0,0.0,2.0,110.0,265.0,1.0,0.0,130.0,0.0,0.0,2.0,1.0,2.0,1.0,1.0,1.0,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,54.0,1.0,2.0,150.0,232.0,0.0,0.0,165.0,0.0,1.6,2.0,0.0,3.0,0.0,1.0,1.0,False,False,False
87,45.0,1.0,0.0,115.0,260.0,0.0,0.0,185.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0,1.0,False,False,False
88,63.0,0.0,0.0,124.0,197.0,0.0,1.0,136.0,1.0,0.0,1.0,0.0,2.0,1.0,1.0,0.0,True,False,True
89,45.0,0.0,0.0,138.0,236.0,0.0,0.0,152.0,1.0,0.2,1.0,0.0,2.0,1.0,1.0,1.0,False,False,False


In [172]:
# plt.figure(figsize=(200,50))
corr = df.corr()["target"].reset_index()
corr
alt.Chart(corr).mark_bar().encode(
    alt.Y("index:O",sort=None),
    x="target:Q",
)

In [173]:

df = df.assign(check= df["target"].astype(str) + '_' + df["target_predict"].astype(str))
alt.Chart(df).mark_bar().encode(
    alt.X(alt.repeat("row"),type="quantitative",bin=True),
    y="count()",
    color="check",
    tooltip="check"
).repeat(
    row=list(df.columns)
)
#         charts.append(chart)
# alt.concat(*charts)
# charts[1]

In [174]:
# df.groupby(["target","target_predict"]).mean().round(2).T

## 直观分析

判断为假阴的样本，年龄更小，大部分都有cp_3.0，可能可以保持"exang"的0-1值