# Heart Disease Prediction: Data Exploration & Preprocessing

This notebook contains the initial analysis and preprocessing of the Heart Disease dataset. We explore the data, handle missing values, and prepare it for training a machine learning model.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("heart.csv")
df.head()

In [None]:
df.info()
df.describe()
df.isnull().sum()

In [None]:
sns.countplot(x='target', data=df)
plt.show()
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()

In [None]:
# Drop missing values if any
df.dropna(inplace=True)

In [None]:
from sklearn.preprocessing import StandardScaler

X = df.drop('target', axis=1)
y = df['target']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

### Conclusion

The dataset was clean and required minimal preprocessing. After scaling the features and splitting the data, we trained a basic Random Forest model that achieved good initial accuracy.