<a href="https://colab.research.google.com/github/Vaibhav9369755717/AI-ML-2-internship-/blob/main/janday7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DAY 7 – Model Comparison, Selection & Final Prediction

## Objective of Day 7
In this session, we move from using a single Machine Learning model to **comparing multiple models** and **selecting the best one based on evidence**.

By the end of this notebook, you will:
- Understand why multiple models are tried in ML projects
- Compare different algorithms fairly
- Use proper evaluation metrics
- Select the best-performing model
- Make reliable predictions on new input data


## Context from Previous Days

Till Day 6, we:
- Built ML models correctly
- Used Train–Test Split
- Improved model behavior using feature scaling
- Understood overfitting and underfitting

Now the next logical question is:

**Which model should we actually use?**


## Import Required Libraries


In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score


## Dataset – Student Placement Data

Features:
- CGPA
- Number of internships
- Coding skill level

Target:
- Placement status (0 = Not Placed, 1 = Placed)


In [None]:
X = np.array([
    [7.5, 1, 3],
    [6.2, 0, 2],
    [8.1, 2, 4],
    [7.0, 1, 3],
    [8.5, 3, 5],
    [5.9, 0, 1],
    [7.8, 2, 4],
    [6.8, 1, 2],
    [9.0, 3, 5],
    [6.0, 0, 1]
])

y = np.array([1, 0, 1, 1, 1, 0, 1, 0, 1, 0])

## Train–Test Split

We use the same split for all models to ensure fair comparison.


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

## Feature Scaling

We scale features because some models are sensitive to feature ranges.


In [None]:
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Model 1 – Logistic Regression (Baseline Model)


In [None]:
lr_model = LogisticRegression()
lr_model.fit(X_train_scaled, y_train)

lr_pred = lr_model.predict(X_test_scaled)
lr_accuracy = accuracy_score(y_test, lr_pred)

print("Logistic Regression Accuracy:", lr_accuracy)

Logistic Regression Accuracy: 1.0


## Model 2 – K-Nearest Neighbors (KNN)


In [None]:
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train_scaled, y_train)

knn_pred = knn_model.predict(X_test_scaled)
knn_accuracy = accuracy_score(y_test, knn_pred)

print("KNN Accuracy:", knn_accuracy)

KNN Accuracy: 1.0


## Model 3 – Decision Tree


In [None]:
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

dt_pred = dt_model.predict(X_test)
dt_accuracy = accuracy_score(y_test, dt_pred)

print("Decision Tree Accuracy:", dt_accuracy)

Decision Tree Accuracy: 1.0


## Model Comparison Summary


In [None]:
results = pd.DataFrame({
    "Model": ["Logistic Regression", "KNN", "Decision Tree"],
    "Accuracy": [lr_accuracy, knn_accuracy, dt_accuracy]
})

results

Unnamed: 0,Model,Accuracy
0,Logistic Regression,1.0
1,KNN,1.0
2,Decision Tree,1.0


## Selecting the Best Model

Based on accuracy and stability, we select the best-performing model.


In [None]:
best_model = lr_model
print("Selected Model: Logistic Regression")

Selected Model: Logistic Regression


## Final Prediction on New Student Data

You can change the values below and observe how the model prediction changes.

Format:
[CGPA, Internships, Coding Skill]


In [None]:
new_student = np.array([[7.2, 1, 3]])

new_student_scaled = scaler.transform(new_student)

prediction = best_model.predict(new_student_scaled)

if prediction[0] == 1:
    print("Prediction: Student is likely to be PLACED")
else:
    print("Prediction: Student is likely to be NOT PLACED")

Prediction: Student is likely to be PLACED


## Key Learning from Day 7

- Machine Learning is about comparison, not assumption
- Multiple models must be evaluated fairly
- Accuracy must be measured on unseen data
- Final model selection must be justified
- Changing input values changes predictions meaningfully


## What Comes Next

Next, we will:
- Study confusion matrix
- Learn precision, recall, F1-score
- Improve model explainability
- Prepare the project for interviews and deployment
