<a href="https://colab.research.google.com/github/DammuNikhitha/AI-ML-Internship-Task-5/blob/main/01_Task5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task 5 : Train-Test Split & Evaluation Metrics

---



In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,precision_score,recall_score,confusion_matrix

In [5]:
df=pd.read_csv("heart.csv")
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


**Target column** : target


*   1 --> Heart disease present
*   0 --> No heart disease



In [7]:
X=df.drop("target",axis=1)
y=df["target"]

# Step 1 : Train-Test Split



In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Training size:", X_train.shape)
print("Testing size:", X_test.shape)

Training size: (820, 13)
Testing size: (205, 13)


# Step 2 : Purpose of training vs testing

Machine learning models require separate datasets for training and testing to ensure reliable performance evaluation.

**Training Data**
- Used to teach the model patterns in the data
- The model learns relationships between input features and output labels
- Helps in building and fitting the model

**Testing Data**
- Used to evaluate the model on unseen data
- Measures how well the model generalizes to new inputs
- Helps detect overfitting

**Why Split the Data?**
If the same data is used for both training and testing, the model may memorize the data instead of learning. Splitting the dataset ensures honest evaluation and real-world performance measurement.


# Step 3 : Train Logistic Regression Model
Why Logistic Regression?

*   Simple and interpretable
*   Best for binary classification
*   Widely used in medical prediction





In [9]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Step 4 : Make Predictions

In [10]:
y_pred=model.predict(X_test)

# Step 5 : Evaluation Metrics
**Metrics Used:**


*   Accuracy --> Overall correctness
*   Precision --> How many predicted positives are correct
*   Recall --> How many actual positives are captured





In [11]:
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)

Accuracy: 0.7951219512195122
Precision: 0.7563025210084033
Recall: 0.8737864077669902


# Step 6 : Confusion Matrix

A confusion matrix is a table used to evaluate the performance of a classification model by comparing actual values with predicted values.

It contains four components:

- **True Positive (TP):** Model correctly predicts the presence of heart disease.
- **True Negative (TN):** Model correctly predicts the absence of heart disease.
- **False Positive (FP):** Model predicts heart disease when it is not present.
- **False Negative (FN):** Model predicts no heart disease when it is actually present.

### Importance of Confusion Matrix
- Provides detailed insight into classification errors
- Helps understand how accuracy, precision, and recall are calculated
- Very important in medical applications where false negatives can be risky

The confusion matrix helps assess not just how accurate the model is, but how reliable its predictions are.

In [12]:
cm=confusion_matrix(y_test,y_pred)
cm

array([[73, 29],
       [13, 90]])

# Step 7: Result Interpretation

* High accuracy → Model performs well overall
* High precision → Few false positives (important in medical diagnosis)
* High recall → Model detects most heart disease cases
* Confusion matrix shows balance between misclassifications

✅ Logistic Regression is suitable for baseline medical prediction models.