<a href="https://colab.research.google.com/github/Vaibhav9369755717/AI-ML-2-internship-/blob/main/janday6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DAY 6 – Model Behavior, Overfitting, Underfitting & Feature Scaling

## Objective of Day 6
The goal of this session is to understand **why a Machine Learning model behaves the way it does** and how to **improve model performance correctly**.

By the end of this class, students will be able to:
- Explain overfitting and underfitting
- Compare training vs test performance
- Understand why feature scaling is required
- Apply feature scaling correctly
- Improve model stability using industry practices


## Context from Day 5

In Day 5, we:
- Used Train–Test Split
- Trained a model only on training data
- Evaluated performance on unseen test data

We noticed that accuracy can change.
Day 6 explains **why this happens** and **what to do about it**.


## Overfitting

Overfitting occurs when:
- Training accuracy is very high
- Test accuracy is significantly lower

This means the model has memorized training data instead of learning general patterns.


## Underfitting

Underfitting occurs when:
- Training accuracy is low
- Test accuracy is also low

This means the model is too simple and has not learned enough from the data.


## Import Required Libraries


In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression


## Dataset (Same as Day 5)

We keep the dataset unchanged to focus only on **model behavior and improvement techniques**.


In [None]:
X = np.array([
    [7.5, 1, 3],
    [6.2, 0, 2],
    [8.1, 2, 4],
    [7.0, 1, 3],
    [8.5, 3, 5],
    [5.9, 0, 1],
    [7.8, 2, 4],
    [6.8, 1, 2]
])

y = np.array([1, 0, 1, 1, 1, 0, 1, 0])

## Train–Test Split

We split the data to evaluate how well the model generalizes to unseen data.


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

## Training the Model Without Feature Scaling

First, we train the model using raw feature values.
This helps us observe baseline behavior.


In [None]:
model = LogisticRegression()
model.fit(X_train, y_train)

train_accuracy = model.score(X_train, y_train)
test_accuracy = model.score(X_test, y_test)

print("Training Accuracy (No Scaling):", train_accuracy)
print("Test Accuracy (No Scaling):", test_accuracy)

Training Accuracy (No Scaling): 0.8333333333333334
Test Accuracy (No Scaling): 1.0


## Why Feature Scaling is Required

Different features have different ranges:
- CGPA: 0–10
- Internships: 0–3
- Coding skill: 1–5

Models like Logistic Regression are sensitive to feature scales.
Without scaling, features with larger ranges can dominate learning.


## Feature Scaling Using StandardScaler

StandardScaler transforms features so that:
- Mean becomes 0
- Standard deviation becomes 1

Scaling is fitted on training data only to avoid data leakage.


In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Training the Model After Feature Scaling

We retrain the same model using scaled features.


In [None]:
scaled_model = LogisticRegression()
scaled_model.fit(X_train_scaled, y_train)

scaled_train_accuracy = scaled_model.score(X_train_scaled, y_train)
scaled_test_accuracy = scaled_model.score(X_test_scaled, y_test)

print("Training Accuracy (Scaled):", scaled_train_accuracy)
print("Test Accuracy (Scaled):", scaled_test_accuracy)

Training Accuracy (Scaled): 0.8333333333333334
Test Accuracy (Scaled): 1.0


## Key Observations

- Training and test accuracy can differ due to overfitting or underfitting
- Feature scaling often improves stability and fairness among features
- Test accuracy is the primary metric for judging real-world performance
- Model improvement is an iterative process


## Industry Perspective

In real-world projects:
- Models are evaluated repeatedly
- Features are scaled and engineered
- Performance is monitored and improved over time

Day 6 introduces this professional mindset.


## What Comes Next

Next, we will:
- Compare multiple ML models
- Understand model selection
- Use additional evaluation metrics
- Make the project interview-ready
