<a href="https://colab.research.google.com/github/Ethansb16/Data/blob/main/Lab6_480.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Make you own copy of this notebook. File>Save a Copy

Lab 6 – Day 1: Decision Trees on the Breast Cancer Dataset
Dataset


- We will use the Breast Cancer Wisconsin dataset, a built-in dataset from sklearn.datasets.

- It contains 30 features computed from digitized images of a breast mass and a binary target: malignant or benign.


Lab Tasks
1. Setup

In [None]:
from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree

from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay

import matplotlib.pyplot as plt

2. Load and Explore the Data

In [None]:
data = load_breast_cancer()

X, y = data.data, data.target

print("Features:", data.feature_names)

print("Target names:", data.target_names)

print("Shape:", X.shape)


3. Split the Data

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


4. Train the Decision Tree

In [None]:
clf = DecisionTreeClassifier(max_depth=3, random_state=42)

clf.fit(X_train, y_train)


5. Visualize the Tree

In [None]:
plt.figure(figsize=(20,10))

plot_tree(clf, filled=True, feature_names=data.feature_names, class_names=data.target_names)

plt.show()


6. Evaluate the Classifier

In [None]:
y_pred = clf.predict(X_test)

acc = accuracy_score(y_test, y_pred)

print("Testing Accuracy:", acc)

y_pred2 = clf.predict(X_train)

acc = accuracy_score(y_train, y_pred2)

print("Training Accuracy:", acc)

cm = confusion_matrix(y_test, y_pred, labels=clf.classes_)

disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=data.target_names)

disp.plot()

plt.show()


Answer the accompanying reflection Questions

Lab 6 – Day 2: Feedforward Neural Networks on Fashion MNIST


Objective
- Train a dense (non-convolutional) neural network on image data using TensorFlow/Keras to classify clothing items.

Setup
- Requirements: Google Colab (no setup required, TensorFlow is pre-installed)

Dataset: Fashion MNIST
- A set of grayscale images (28×28) of clothing types.

Classes:

0: T-shirt/top


1: Trouser


2: Pullover


3: Dress


4: Coat


5: Sandal


6: Shirt


7: Sneaker


8: Bag


9: Ankle boot


1. Import Libraries


In [None]:
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np


2. Load and Normalize Data

In [None]:
fashion_mnist = keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

X_train, X_test = X_train / 255.0, X_test / 255.0  # Normalize to [0, 1]


3. Visualize the Dataset

In [None]:
plt.figure(figsize=(10, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_train[i], cmap='gray')
    plt.title(f"Label: {y_train[i]}")
    plt.axis('off')
plt.tight_layout()
plt.show()


4. Build a Feedforward Neural Network

In [None]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])


5. Compile and Train

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(X_train, y_train, epochs=10, validation_split=0.1)


6. Evaluate the Model

In [None]:
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")


7. Plot Learning Curves

In [None]:
plt.plot(history.history['accuracy'], label='train accuracy')
plt.plot(history.history['val_accuracy'], label='val accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()


Answer Reflection Questions

Lab 6 – Day 3: CNNs for Fashion MNIST

Objective

Introduce students to convolutional neural networks and show how they improve performance on image classification tasks by leveraging spatial structure.


Setup
- Platform: Google Colab (TensorFlow pre-installed)
- Dataset: Fashion MNIST (same as Day 2, but input needs reshaping for CNN)

1. Import Libraries


In [None]:
import tensorflow as tf

from tensorflow import keras

import matplotlib.pyplot as plt


2. Load and Preprocess Data

In [None]:
fashion_mnist = keras.datasets.fashion_mnist

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

# Normalize and reshape for CNN: (28,28) → (28,28,1)

X_train = X_train.reshape(-1, 28, 28, 1) / 255.0

X_test = X_test.reshape(-1, 28, 28, 1) / 255.0


3. Build a Simple CNN

In [None]:
model = keras.Sequential([

    keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),

    keras.layers.MaxPooling2D((2, 2)),

    keras.layers.Conv2D(64, (3, 3), activation='relu'),

    keras.layers.MaxPooling2D((2, 2)),

    keras.layers.Flatten(),

    keras.layers.Dense(64, activation='relu'),

    keras.layers.Dense(10, activation='softmax')

])


4. Compile and Train


In [None]:
model.compile(optimizer='adam',

              loss='sparse_categorical_crossentropy',

              metrics=['accuracy'])

history = model.fit(X_train, y_train, epochs=10, validation_split=0.1)


5. Evaluate and Plot Learning Curve

In [None]:
test_loss, test_acc = model.evaluate(X_test, y_test)

print(f"Test Accuracy: {test_acc:.4f}")

plt.plot(history.history['accuracy'], label='train accuracy')

plt.plot(history.history['val_accuracy'], label='val accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.grid(True)

plt.show()


Confusion Matrix

In [None]:
import numpy as np

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

y_pred = model.predict(X_test).argmax(axis=1)

cm = confusion_matrix(y_test, y_pred)

disp = ConfusionMatrixDisplay(confusion_matrix=cm)

disp.plot()

plt.show()


Lab 6 – Day 4: Reinforcement Learning – The Vacuum Robot (4×4 Grid)

Objective
- Introduce students to reinforcement learning by training a vacuum robot using Q-learning in a larger and more complex 4x4 grid world.

Scenario
- A robot operates in a 4x4 grid (16 locations). Each cell may be dirty or clean.

- The agent can move up, down, left, right.

- Cleaning a dirty tile gives a reward of +10.

- Every move costs -1 (to encourage efficiency).

Setup: Import Libraries and Initialize Environment


In [None]:
import numpy as np
import random
import matplotlib.pyplot as plt

Part 1: Define the Grid World
We'll define a 4x4 grid where:

Each tile can be clean or dirty.

The agent can move up, down, left, right.

Cleaning a dirty tile gives a reward of +10.

Every move costs -1 (to encourage efficiency).

In [None]:
GRID_SIZE = 4
ACTIONS = ['UP', 'DOWN', 'LEFT', 'RIGHT', 'CLEAN']
NUM_ACTIONS = len(ACTIONS)

def random_dirty_grid():
    return np.random.choice([0, 1], size=(GRID_SIZE, GRID_SIZE), p=[0.5, 0.5])


Part 2: Define the Environment Dynamics

In [None]:
class VacuumEnv:
    def __init__(self):
        self.reset()

    def reset(self):
        self.grid = random_dirty_grid()
        self.agent_pos = [0, 0]
        return self._get_state()

    def _get_state(self):
        x, y = self.agent_pos
        return (x, y, self.grid[x][y])

    def step(self, action):
        x, y = self.agent_pos
        reward = -1  # default move penalty

        if action == 'CLEAN':
            if self.grid[x][y] == 1:
                self.grid[x][y] = 0
                reward = 10
        else:
            if action == 'UP' and x > 0:
                x -= 1
            elif action == 'DOWN' and x < GRID_SIZE - 1:
                x += 1
            elif action == 'LEFT' and y > 0:
                y -= 1
            elif action == 'RIGHT' and y < GRID_SIZE - 1:
                y += 1
            self.agent_pos = [x, y]

        done = np.sum(self.grid) == 0  # all clean
        return self._get_state(), reward, done


Part 3: Initialize Q-Table

In [None]:
q_table = {}

def get_q(state):
    if state not in q_table:
        q_table[state] = np.zeros(NUM_ACTIONS)
    return q_table[state]


Part 4: Q-Learning Algorithm

In [None]:
EPISODES = 5000
LEARNING_RATE = 0.1
DISCOUNT = 0.9
EPSILON = 0.2

for ep in range(EPISODES):
    env = VacuumEnv()
    state = env.reset()
    done = False

    while not done:
        if random.uniform(0, 1) < EPSILON:
            action_idx = random.randint(0, NUM_ACTIONS - 1)
        else:
            action_idx = np.argmax(get_q(state))

        action = ACTIONS[action_idx]
        next_state, reward, done = env.step(action)

        old_q = get_q(state)[action_idx]
        future_q = np.max(get_q(next_state))

        new_q = old_q + LEARNING_RATE * (reward + DISCOUNT * future_q - old_q)
        get_q(state)[action_idx] = new_q

        state = next_state


Part 5: Evaluate the Policy

In [None]:
def run_episode():
    env = VacuumEnv()
    state = env.reset()
    total_reward = 0
    steps = 0
    done = False

    while not done and steps < 50:
        action_idx = np.argmax(get_q(state))
        action = ACTIONS[action_idx]
        state, reward, done = env.step(action)
        total_reward += reward
        steps += 1

    return total_reward, steps

rewards = [run_episode()[0] for _ in range(100)]
print(f"Average reward over 100 episodes: {np.mean(rewards)}")


Visualize the Learned Policy

In [None]:
policy_grid = np.empty((GRID_SIZE, GRID_SIZE), dtype=object)

for i in range(GRID_SIZE):
    for j in range(GRID_SIZE):
        state = (i, j, 1)  # assume dirty tile
        best_action = ACTIONS[np.argmax(get_q(state))]
        policy_grid[i][j] = best_action

print("Learned policy (assuming all tiles are dirty):")
print(policy_grid)
