# Task 5: Train-Test Split & Evaluation Metrics
### Dataset: Heart Disease Dataset (heart.csv)

## Import Required Libraries

In [3]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

print("Libraries imported successfully.")

Libraries imported successfully.


## Load the Dataset

In [4]:
df = pd.read_csv("heart.csv")

print("Dataset loaded successfully.")
print("Dataset shape:", df.shape)

Dataset loaded successfully.
Dataset shape: (1025, 14)


## View Dataset Sample

In [5]:
print("First 5 rows of the dataset:")
print(df.head())

First 5 rows of the dataset:
   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      168      0      1.0      2   
1   53    1   0       140   203    1        0      155      1      3.1      0   
2   70    1   0       145   174    0        1      125      1      2.6      0   
3   61    1   0       148   203    0        1      161      0      0.0      2   
4   62    0   0       138   294    1        1      106      0      1.9      1   

   ca  thal  target  
0   2     3       0  
1   0     3       0  
2   0     3       0  
3   1     3       0  
4   3     2       0  


## Check Dataset Information

In [6]:
print("\nDataset information:")
print(df.info())


Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1025 non-null   int64  
 1   sex       1025 non-null   int64  
 2   cp        1025 non-null   int64  
 3   trestbps  1025 non-null   int64  
 4   chol      1025 non-null   int64  
 5   fbs       1025 non-null   int64  
 6   restecg   1025 non-null   int64  
 7   thalach   1025 non-null   int64  
 8   exang     1025 non-null   int64  
 9   oldpeak   1025 non-null   float64
 10  slope     1025 non-null   int64  
 11  ca        1025 non-null   int64  
 12  thal      1025 non-null   int64  
 13  target    1025 non-null   int64  
dtypes: float64(1), int64(13)
memory usage: 112.2 KB
None


## Check Missing Values

In [7]:
print("\nMissing values in each column:")
print(df.isnull().sum())


Missing values in each column:
age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64


## Separate Features and Target Variable

In [9]:
X = df.drop(columns=['target'])
y = df['target']

print("\nFeature set shape:", X.shape)
print("Target variable shape:", y.shape)
print("Target variable value counts:")
print(y.value_counts())


Feature set shape: (1025, 13)
Target variable shape: (1025,)
Target variable value counts:
target
1    526
0    499
Name: count, dtype: int64


## Train-Test Split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

print("\nTrain-Test Split completed.")
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)


Train-Test Split completed.
X_train shape: (820, 13)
X_test shape: (205, 13)
y_train shape: (820,)
y_test shape: (205,)


## Train Logistic Regression Model

In [11]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

print("\nLogistic Regression model trained successfully.")


Logistic Regression model trained successfully.


## Make Predictions on Test Data

In [12]:
y_pred = model.predict(X_test)

print("\nPredictions on test data:")
print(y_pred[:10])  # show first 10 predictions


Predictions on test data:
[1 1 0 1 0 1 0 0 1 0]


## Evaluate the Model

In [13]:
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print("\nModel Evaluation Metrics:")
print("Accuracy :", accuracy)
print("Precision:", precision)
print("Recall   :", recall)


Model Evaluation Metrics:
Accuracy : 0.7951219512195122
Precision: 0.7563025210084033
Recall   : 0.8737864077669902


## Confusion Matrix

In [14]:
cm = confusion_matrix(y_test, y_pred)

print("\nConfusion Matrix:")
print(cm)


Confusion Matrix:
[[73 29]
 [13 90]]


## Interpretation Output

In [15]:
print("\nInterpretation:")
print("True Positives :", cm[1][1])
print("True Negatives :", cm[0][0])
print("False Positives:", cm[0][1])
print("False Negatives:", cm[1][0])


Interpretation:
True Positives : 90
True Negatives : 73
False Positives: 29
False Negatives: 13


## Final Confirmation Prints

In [16]:
print("\nTask 5 completed successfully.")
print("Model training and evaluation done.")


Task 5 completed successfully.
Model training and evaluation done.
