<a href="https://colab.research.google.com/github/Maruf346/AI-ML-with-python/blob/main/Solution2_of_Lab_Final_Question2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSE 412 - ML Lab Final (Summer 2025)
## Student Name: Maruf Hossain
## ID: 221902318

This notebook contains solutions for the ML Lab Final.  
It is divided into 3 main parts:
1. Titanic Dataset (Linear Regression)
2. Liver Disease Dataset (KNN Classification)
3. MNIST Dataset (CNN for Digit Recognition)


# **1. Titanic Dataset (titanic_train.csv)**
### Steps:
1. Load the Titanic dataset.
2. Convert categorical features into numeric form using encoding.
3. Normalize numerical features.
4. Train a Linear Regression model (80/20 split).
5. Print model accuracy on the test set.


In [None]:
# --- Titanic Dataset ---
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score

# Load Titanic dataset
url = "https://tinyurl.com/labfinaldataset"
titanic = pd.read_csv(url + "/titanic_train.csv")

# Drop irrelevant columns
titanic = titanic.drop(["PassengerId", "Name", "Ticket", "Cabin"], axis=1)

# Handle missing values
titanic = titanic.fillna(method='ffill')

# Encode categorical columns
le = LabelEncoder()
for col in titanic.select_dtypes(include=['object']).columns:
    titanic[col] = le.fit_transform(titanic[col])

# Features and target
X = titanic.drop("Survived", axis=1)
y = titanic["Survived"]

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train Linear Regression
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions (rounding since regression outputs float)
y_pred = np.round(model.predict(X_test))

# Accuracy
acc = accuracy_score(y_test, y_pred)
print("Titanic Linear Regression Accuracy:", acc)


# **2. Liver Disease Dataset (ILPD.csv)**
### Steps:
1. Load ILPD dataset.
2. Display first and last 10 rows.
3. Preprocess dataset (scaling, encoding if needed).
4. Train KNN classifier (K=5).
5. Create two synthetic samples and test the model.


In [None]:
# --- Liver Disease Dataset ---
from sklearn.neighbors import KNeighborsClassifier

# Load dataset
ilpd = pd.read_csv(url + "/ILPD.csv")

# Display first and last 10 rows
print("First 10 rows:\n", ilpd.head(10))
print("\nLast 10 rows:\n", ilpd.tail(10))

# Separate features and target
X = ilpd.iloc[:, :-1]   # all except last column
y = ilpd.iloc[:, -1]    # last column is target

# Normalize features
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# KNN model (K=5)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Accuracy
print("Liver Disease KNN Accuracy:", knn.score(X_test, y_test))

# Create two synthetic samples (random)
synthetic_samples = np.random.rand(2, X.shape[1])
synthetic_samples = scaler.transform(synthetic_samples)

predictions = knn.predict(synthetic_samples)
print("Synthetic Sample Predictions:", predictions)


# **3. MNIST Handwritten Digit Classification (CNN)**
### Steps:
1. Load MNIST dataset from `tensorflow.keras.datasets`.
2. Preprocess (normalize pixels 0–1).
3. Visualize 5 random images with labels.
4. Build CNN with 3 conv layers + pooling + dense layer.
5. Train and evaluate with classification report + confusion matrix.


In [None]:
# --- MNIST CNN ---
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
import random
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values
x_train, x_test = x_train / 255.0, x_test / 255.0

# Reshape for CNN (28x28x1)
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Visualize 5 random images
plt.figure(figsize=(10,4))
for i in range(5):
    idx = random.randint(0, len(x_train))
    plt.subplot(1,5,i+1)
    plt.imshow(x_train[idx].reshape(28,28), cmap="gray")
    plt.title("Label: " + str(y_train[idx]))
    plt.axis("off")
plt.show()

# CNN architecture
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(128, (3,3), activation='relu'),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train
history = model.fit(x_train, y_train, epochs=3, batch_size=64, validation_split=0.1)

# Evaluate
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print("MNIST CNN Test Accuracy:", test_acc)

# Predictions for classification report
y_pred = model.predict(x_test)
y_pred_classes = y_pred.argmax(axis=1)

print("\nClassification Report:\n", classification_report(y_test, y_pred_classes))

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred_classes)
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
