# Heart attack prediction

In [1]:
import pandas as pd
import numpy as np

## Goal

#### Current Cost

The goal of this project is to reduce the costs associated to heart attacks in the national healthcare system by a 20%. The current cost per person is:

$$ Current cost = (1-HA) * (CostNHA) + HA * (CostHA) = 4709.3€ $$

$$ Current cost = 0.905799 * 0 + 0.094186 * 50000 = 4709.3€ $$

HA = Heart attack probability = 0.094186

CostHA = Cost of one person having a heart attack = 50000

CostNHA = Cost of one person not having a heart attack = 0

#### After plan cost

The cost per person after applying this program has several branches:
* People that won't have a HA but were diagnosed to have one and accept to follow the plan -> Cost: 1.000€
$$ C1 = FPP * 1.000€ $$

* People that will have a heart attack but were diagnosed not to have one -> Cost: 50.000€
$$ C2 = FNP * 50.000€ $$

* People that will have a heart attack, are diagnosed to have one and decide not to take the plan -> Cost: 50.000€
$$ C3 = TPP * (1- P(takesplan)) * 50.000€ $$

* People that will have a heart attack, are diagnosed to have one and decide to take the plan, and they don't adhere to it -> Cost: 51.000€
$$ C4 = TPP * P(takesplan) * (1-P(adheresplan)) * 51.000€ $$

* People that will have a heart attack, are diagnosed to have one and decide to take the plan, they adhere to it, and it doesn't work -> Cost: 51.000€
$$ C5 = TPP * P(takesplan) * P(adheresplan) * (1-P(planworks)) * 51.000€ $$

* People that will have a heart attack, are diagnosed to have one and decide to take the plan, they adhere to it, and it works -> Cost: 1.000€
$$ C6 = TPP * P(takesplan) * P(adheresplan) * P(planworks) * 1.000€ $$

FPP = False Positive Percentage = TN / Total Cases

FNP = False Negative Percentage = FN / Total Cases

TPP = True Positive Percentage = TP / Total Cases

P(takesplan) = Probability that someone takes the plan when is offered to him/her = 0.85

P(adheresplan) = Probability that someone adheres to the plan = Unknown

P(planworks) = Probability that the plan works = 0.75

#### Formula
We need to do a cost analysis to check the minimum percentage of adherence that we need in order to reduce the cost by 20%, that means, we need to get the value of P(adheresplan) so that the cost is equal than 80% of the current cost.

$$ Currentcost * 0.8 = Newcost $$
$$ Currentcost * 0.8 - (c1 + c2 + c3) = c4 + c5 + c6 $$

$$ (Currentcost * 0.8 - (c1 + c2 + c3)) / (TPP*P(takesplan)) = $$
$$ =((1-P(adheresplan))*51000) + (P(adheresplan)*(1-(P(planworks))*51000) + (P(adheresplan)*P(planworks)*1000) = $$
$$ = 51000-51000*P(adheresplan) + P(adheresplan)*(1-P(planworks))*51000) + (P(adheresplan)*P(planworks)*1000) $$

$$ ((Currentcost * 0.8 - (c1 + c2 + c3)) / (TPP*P(takesplan)) - 51000)/ P(adheresplan) = $$
$$ = -51000 + (1-P(planworks))*51000) + P(planworks)*1000) $$

$$ P(adheresplan) = $$
$$ = ((Currentcost * 0.8 - (c1 + c2 + c3)) / (TPP*P(takesplan)) - 51000)/ (-51000 + (1-P(planworks))*51000) + P(planworks)*1000)) = $$
$$ = ((Currentcost * 0.8 - (c1 + c2 + c3)) / (TPP*P(takesplan)) - 51000)/ (-P(planworks))*51000) + P(planworks)*1000))$$
$$ = ((Currentcost * 0.8 - (c1 + c2 + c3)) / (TPP*P(takesplan)) - 51000)/ (-P(planworks))*50000)))$$


In [168]:
def minimum_adheresplan_prob(fpp, fnp, tpp, p_takesplan=0.85, p_planworks=0.75):
    nha = (df.HeartAttack.value_counts()/len(df))[0]
    ha = 1-nha
    
    current_cost = ha*50000 + nha*0
    target_new_cost = 0.8 * current_cost
    
    print(f"We need a to reduce the cost per person to {target_new_cost}€\n")
    
    c1 = nha * fpp * 1000
    c2 = ha * fnp * 50000
    c3 = ha * tpp * (1-p_takesplan) * 50000
    
    p_adheresplan = (((target_new_cost - (c1+c2+c3)) / (tpp*p_takesplan)) - 51000) / (-p_planworks*50000)
    
    print(f"The minimum adherence rate plan to reduce the cost by 20% is {p_adheresplan}\n")
    
    return p_adheresplan

In [169]:
minimum_adheresplan_prob(0.01, 0.001, 0.09)

We need a to reduce the cost per person to 3768.0542415641758€

The minimum adherence rate plan to reduce the cost by 20% is 0.07348138666798575



0.07348138666798575

## Load the dataset 

In [2]:
df = pd.read_csv("heart_disease.csv")

## Data Cleaning

In [3]:
df.head()

Unnamed: 0,HeartAttack,HighBP,HighChol,CholCheck,BMI,Smoker,Stroke,Diabetes,PhysActivity,Fruits,Veggies,HvyAlcoholConsump,GenHlth,MentHlth,PhysHlth,DiffWalk,Sex,Age,Education,Income
0,0.0,1.0,1.0,1.0,40.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,5.0,18.0,15.0,1.0,0.0,9.0,4.0,3.0
1,0.0,0.0,0.0,0.0,25.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,7.0,6.0,1.0
2,0.0,1.0,1.0,1.0,28.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,5.0,30.0,30.0,1.0,0.0,9.0,4.0,8.0
3,0.0,1.0,0.0,1.0,27.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,11.0,3.0,6.0
4,0.0,1.0,1.0,1.0,24.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,2.0,3.0,0.0,0.0,0.0,11.0,5.0,4.0


In [20]:
(df.HeartAttack.value_counts()/len(df))[0]

0.9057986439608956

## Prediction

In [None]:
pass