<h1 ><b>💔Heart Attack  EDA, Visualization & Prediction 💗</b></h1>


<h2 style="color:red;"><b>Heart Attack:-</b></h2>
<h3 style="background-color:powderblue;">
A heart attack is a medical emergency. A heart attack usually occurs when a blood clot blocks blood flow to the heart. Without blood, tissue loses oxygen and dies.Symptoms include tightness or pain in the chest, neck, back or arms, as well as fatigue, lightheadedness, abnormal heartbeat and anxiety. Women are more likely to have atypical symptoms than men.Treatment ranges from lifestyle changes and cardiac rehabilitation to medication, stents and bypass surgery.</h3>


<h2 style="color:SlateBlue;"><b>About Data:-</b></h2>

* Age : Age of the patient
* Sex : Sex of the patient
* exang: exercise induced angina (1 = yes; 0 = no)
* ca: number of major vessels (0-3)
* cp : Chest Pain type chest pain type
            Value 1: typical angina
            Value 2: atypical angina
            Value 3: non-anginal pain
            Value 4: asymptomatic
* trtbps : resting blood pressure (in mm Hg)
* chol : cholestoral in mg/dl fetched via BMI sensor
* fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
* rest_ecg : resting electrocardiographic results
            Value 0: normal
            Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or       depression of > 0.05 mV)  
            Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
* thalach : maximum heart rate achieved
* target : 0= less chance of heart attack 1= more chance of heart attack. </b>




<h2 style="color:SlateBlue;"><b> Importing Required Libraries</b></h2>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, roc_curve, confusion_matrix,roc_auc_score
import warnings
warnings.filterwarnings("ignore")

In [None]:
data=pd.read_csv('../input/heart-attack-analysis-prediction-dataset/heart.csv')
data.head()

In [None]:
data.info()

In [None]:
data.isnull().sum()

In [None]:
data.describe()

<h2 style="color:SlateBlue;"><b>EDA & Visualization</b></h2>

In [None]:
# Heat Map Correlation
corr = data.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
f, ax = plt.subplots(figsize=(8, 8))
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
plt.figure(figsize=(14,12))
ax = sns.heatmap(corr, square=True, annot=True, fmt='.2f')
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)          
plt.show()

In [None]:
plt.figure(figsize=(15,10))
sns.countplot(x=data['age'])
plt.title('AGE OF PATIENTS')
plt.xlabel('AGE')
plt.ylabel('COUNT')
plt.show()

**Observation:-** Most of the Patients have Age 58

In [None]:
plt.figure(figsize=(10,7))
sns.countplot(x=data['sex'])
plt.title('MALE VS FEMALE')
plt.xlabel('SEX')
plt.ylabel('COUNT')
plt.show()

**Observation:-** Males patients are almost double than female

In [None]:
plt.figure(figsize=(10,7))
sns.countplot(x=data['cp'])
plt.title('TYPES OF CHEST PAIN')
plt.xlabel('TYPES')
plt.ylabel('COUNT')
plt.show()

**Observation:-** Most of the Patients Have Value 1: typical angina

In [None]:
plt.figure(figsize=(20,15))
sns.displot(data["trtbps"])
plt.title("DISTRIBUTION OF BLOOD PRESSURE AROUND PATIENTS",fontsize=20)
plt.xlabel("BLOOD PRESSURE",fontsize=20)
plt.ylabel("COUNT",fontsize=20)
plt.show()

In [None]:
labels = ['More Chance of Heart Attack', 'Less Chance of Heart Attack']
sizes = data['output'].value_counts(sort = True)

colors = ["#ffb3b3","#C2C4E2"]
explode = (0.05,0) 
 
plt.figure(figsize=(7,7))
plt.suptitle("Number of Targets in the dataset",y=0.9, family='Sherif', size=18, weight='bold')
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=90,)

plt.show()

In [None]:
numerical = ['age','trtbps','chol','thalachh','oldpeak']

j=0
fig=plt.figure(figsize=(10,10),constrained_layout =True)
plt.suptitle("Distribution of the Numeric Variables",y=1.07, family='Sherif', size=18, weight='bold')
fig.text(0.315,1.02,"Numerical Data without Condition", size=13, fontweight='light', fontfamily='monospace')
for i in data[numerical]:
    ax=plt.subplot(321+j)
    ax.set_aspect('auto')
    ax.grid(color='gray', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
    ax=sns.kdeplot(data=data, x=i, color='#D0DBEE', fill=True, edgecolor='black', alpha=1)
    for s in ['left','right','top','bottom']:
        ax.spines[s].set_visible(False)
    j=j+1

In [None]:
# Distribution Plot of Numerical Data with Condition
colors = ['#D0DBEE','#ffcccc']
j=0
fig=plt.figure(figsize=(10,10),constrained_layout =True)
plt.suptitle("Distribution of the Numeric Variables",y=1.07, family='Sherif', size=18, weight='bold')
fig.text(0.333,1.02,"Numerical Data with Condition", size=13, fontweight='light', fontfamily='monospace')
for i in data[numerical]:
    ax=plt.subplot(321+j)
    ax.set_aspect('auto')
    ax.grid(color='gray', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
    ax=sns.kdeplot(data=data, x=i, hue='output', palette=colors, fill=True, edgecolor='black', alpha=1)
    for s in ['left','right','top','bottom']:
        ax.spines[s].set_visible(False)
    j=j+1

#### Lets Check Outliers

In [None]:
# Outliers Detection
colors = ['#CBE4F9','#CDF5F6','#EFF9DA','#F9EBDF','#F9D8D6']
plt.figure(figsize=(10,9))
plt.suptitle("Outliers of Numeric Variables",y=0.94, family='Sherif', size=18, weight='bold')
plt.text(-0.4, 1.64, 'Detecting Outlier', horizontalalignment='center',verticalalignment='center', transform=ax.transAxes,size=14,fontweight='light', fontfamily='monospace')
sns.boxenplot(data = data[numerical],palette = colors)
plt.grid(color='gray', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
plt.xticks(rotation=45)
plt.show()

### Removing Outliers

In [None]:
for i in data[numerical]:
    q1 = data[i].quantile(0.25)
    q3 = data[i].quantile(0.75)
    iqr = q3-q1
    Lower_tail = q1 - 1.5 * iqr
    Upper_tail = q3 + 1.5 * iqr
    med = np.median(data[i])
    for j in data[i]:
        if j > Upper_tail or j < Lower_tail:
            data[i] = data[i].replace(j, med)

In [None]:
colors = ['#CBE4F9','#CDF5F6','#EFF9DA','#F9EBDF','#F9D8D6']
plt.figure(figsize=(9,9))
plt.suptitle("Outliers of Numeric Variables",y=0.94, family='Sherif', size=18, weight='bold')
plt.text(-0.405, 1.64, 'Removing Outlier', horizontalalignment='center',verticalalignment='center', transform=ax.transAxes,size=14,fontweight='light', fontfamily='monospace')
sns.boxenplot(data = data[numerical],palette = colors)
plt.grid(color='gray', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
plt.xticks(rotation=45)
plt.show()

<h2 style="color:SlateBlue;"><b>Data Preprocessing</b></h2>

In [None]:
df1 = data

# define the columns to be encoded and scaled
cat_cols = ['sex','exng','caa','cp','fbs','restecg','slp','thall']
con_cols = ["age","trtbps","chol","thalachh","oldpeak"]

# encoding the categorical columns
df1 = pd.get_dummies(df1, columns = cat_cols, drop_first = True)

# defining the features and target
X = df1.drop(['output'],axis=1)
y = df1[['output']]

# instantiating the scaler
scaler = RobustScaler()

# scaling the continuous featuree
X[con_cols] = scaler.fit_transform(X[con_cols])
print("The first 5 rows of X are")
X.head()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 42)
print("The shape of X_train is      ", X_train.shape)
print("The shape of X_test is       ",X_test.shape)
print("The shape of y_train is      ",y_train.shape)
print("The shape of y_test is       ",y_test.shape)

<h2 style="color:SlateBlue;"><b>Applying ML Models</b></h2>

<h3 style="color:red;"><b> Logistic Regression</b></h3>

In [None]:
logreg = LogisticRegression()

# fitting the object
logreg.fit(X_train, y_train)

# calculating the probabilities
y_pred_proba = logreg.predict_proba(X_test)

# finding the predicted valued
y_pred = np.argmax(y_pred_proba,axis=1)

# printing the test accuracy
print("The test accuracy score of Logistric Regression is ", accuracy_score(y_test, y_pred))


In [None]:
# calculating the probabilities
y_pred_prob = logreg.predict_proba(X_test)[:,1]

# instantiating the roc_cruve
fpr,tpr,threshols=roc_curve(y_test,y_pred_prob)

# plotting the curve
plt.plot([0,1],[0,1],"k--",'r+')
plt.plot(fpr,tpr,label='Logistic Regression')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Logistric Regression ROC Curve")
plt.show()

**Using Logistice Regression we got an Accuracy of 0.9016393442622951**

<div class="alert alert-block alert-info"> Please Upvote ✌ if you like the notebook and share possible improvements in the comments.</div>

<h2 style="color:SlateBlue;"><b>Also checkout my other Notebooks</b></h2>

* https://www.kaggle.com/ritesh7355/netflix-eda-visualization-for-beginner
* https://www.kaggle.com/ritesh7355/complete-data-visualization-using-seaborn
* https://www.kaggle.com/ritesh7355/eda-visualization-indianeedsoxygen
* https://www.kaggle.com/ritesh7355/cotton-disease-prediction-with-99-accuracy
* https://www.kaggle.com/ritesh7355/shape-image-classification-with-accuracy-95-56