<h1>AI School 2</h1>

<h2>Overview</h2>

- Feature Engineering
    - What is Feature Engineering?
    - Training and Testing Logic
    - Different Types of Models
- Neural Networks
    - What is a Neural Network?
    - What is Gradient Descent?
    - What are Nets?


# AI School 2 Pre-Test Survey

Form: https://forms.gle/sq9R3xTHe4hHvLvB6


#### Pip Installs if necessary!

In [None]:
!pip install scikit-learn
!pip install torch
!pip install torchvision

In [None]:
#%pip install scikit-learn
#%pip install torch
#%pip install torchvision

<h1>What is Feature Engineering?</h1>

Feature engineering is the process of using data to create, transform, and select features to enhance a model.

You can think of Features as Columns on a Data Set!

In [None]:
import numpy as np
import pandas as pd

<h1>Imports and Libraries</h1>

In Python, there are these things called Imports and Modules. In Python, you can import a variety of files, packages, modules, and libraries!

For Feature Engineering, there are a variety of libraries we can use but the most common are:

* NumPy
* Pandas
* Scikit-Learn
* PyTorch

But there are even more for whatever task you want to do using Python Code!


You can think of Features as Columns on a Data Set!

In order to import these libaries it should look like this:

```python
import numpy as np
import pandas as pd
```

In the following blank code cell, import the following modules:

Now there are some modules that are so big that they might crash your laptop.

Some of these are Scikit-Learn and PyTorch.

Due to this, we will import individual packages from them rather than all of scikit learn for learning purposes and to save time.

Normally if you publish code onto GitHub or any other Cloud platform, it is standard to use it all under one cell.

For teaching demonstrations and learning purposes, if we use a specific module, you will also see its import statement as well.

But for now, let's get back to Feature Engineering!

<h1> Feature Engineering</h1>

For feature engineering, we can think of features as columns.

Run the following cell to see the dataset!

In [None]:
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt

We will be using the MNIST Dataset which is a Dataset with a bunch of hand-drawn number drawings that have turned into code.

The MNIST Dataset has 70,000 Images and for learning purposes and time effiency we will be working with 1000 images!

We will use different models for Image Classification.

Specifically, which model is the best model for predicting numbers on the images!

In [None]:
mnist = fetch_openml('mnist_784', as_frame=False)
X = mnist.data[:1000]
y= mnist.target.astype(int)[:1000]

In [None]:
X

In [None]:
y

## Training and Testing Logic

For Training and Testing a Model, you must ask yourself how many times do I want to test my model?

Due to the limiting computer powers of laptops, your answer might be once or twice.

If its once, we will have a training and testing set that is split 70-30.

About 70% of the data is spent on training and about 30% of the data is spent on testing.

If its twice, we will have a training, validation, and holdout. The split is 70-15-15

What is a Validation Set?

A Validation Set is a Test Set that you have inbetween your Training Set and Final Test (Holdout) Set.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=41)

In [None]:
X_validation, X_holdout_test, y_validation, y_holdout_test = train_test_split(X_test, y_test, test_size=0.5, random_state=41)

Really Important to Scale Your Data

It just helps normalize the data, change any labels, and helps increase model/performance speed.

Time is our biggest enemy for Neural Networks and Feature Engineering so saving time is always a plus!

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler()
scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)
scaled_X_holdout_test = scaler.transform(X_holdout_test)
scaled_X_validation = scaler.fit_transform(X_validation)

Different Types of Models We Can Use Are

* Logistic Regression
* Support Vector Machine
* Random Forest Classifier

Logistic Regression

Each model has a lot of parameters you can change, but for Logistic Regression.

The most important ones for Logistic Regression are

* max_iter
* solver
* penalty
* C : Inverse of Regularization Strength (The smaller the value is, the better it is)


`max_iter` takes values from 1 to 1000

`solver` takes values ['lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga']

`penalty` takes values ['l1', 'l2', 'elasticnet', None]

`max_iter` takes positive values


In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
log_model = LogisticRegression(solver='saga',max_iter=1000,random_state=41)

In [None]:
log_model.fit(scaled_X_train, y_train)

In [None]:
log_validation_predictions = log_model.predict(scaled_X_validation)


In [None]:
log_holdout_predictions = log_model.predict(scaled_X_holdout_test)

In [None]:
log_y_final_pred = log_model.predict(scaled_X_test)

In [None]:
accuracy_score(y_test,log_y_final_pred)

Parameters you can change!

`max_iter` takes values from 1 to 1000

`solver` takes values ['l1', 'l2', 'elasticnet', None]

`penalty` takes values ['lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga']

`max_iter` takes positive values

Example:

```python

log_model2 = LogisticRegression(solver='liblinear',max_iter=500, solver='l2',random_state=41)

```

Make sure to keep the random_state as 41 or else you may get varying results even with the same code!

Create a new variable `log_model2` and change the parameters and see your accuracy!

## SVC

Support Vector Machine Classifier

There are multiple type of Support Vector Machines:

* SVC : Base Model
* LinearSVC : Fastest but not most Accurate
* SGDClassifier : A Mix of Both

The most important ones for Support Vector Machines are

* max_iter
* multiclass
* penalty
* loss
* C


`max_iter` takes values from 1 to 1000

`multi_class` takes values ['ovr', 'crammer_singer']

`penalty` takes values ['l1', 'l2']

`loss` takes values ['hinge', 'squared_hinge']

`C` takes positive values


In [None]:
from sklearn.svm import LinearSVC

In [None]:
linsvc_model = LinearSVC()

In [None]:
linsvc_model = LinearSVC(max_iter=1000,multi_class='ovr',penalty='l2',loss='squared_hinge',C=1,random_state=41)

In [None]:
linsvc_model.fit(scaled_X_train,y_train)


In [None]:
linsvc_validation_predictions = linsvc_model.predict(scaled_X_validation)

In [None]:
accuracy_score(y_validation,linsvc_validation_predictions)

In [None]:
linsvc_holdout_predictions = linsvc_model.predict(scaled_X_holdout_test)


X_validation, X_holdout_test, y_validation, y_holdout_test

In [None]:
accuracy_score(y_holdout_test,linsvc_holdout_predictions)

In [None]:
linsvc_y_final_pred = linsvc_model.predict(scaled_X_test)


In [None]:
accuracy_score(y_test,linsvc_y_final_pred)

Parameters you can change!

The most important ones for Support Vector Machines are

* max_iter
* multiclass
* penalty
* loss
* C


`max_iter` takes values from 1 to 1000

`multi_class` takes values ['ovr', 'crammer_singer']

`penalty` takes values ['l1', 'l2']

`loss` takes values ['hinge', 'squared_hinge']

`C` takes positive values


Example:

```python

svc_model2 = LinearSVC(max_iter=5000,multi_class='crammer singer',penalty='l2',loss='hinge',C=1,random_state=41)

```

Make sure to keep the random_state as 41 or else you may get varying results even with the same code!

Create a new variable `svc_model2` and change the parameters and see your accuracy!

## Random Forest Classifier

Random Forest Classifier

There are multiple type of Random Forest Classifiers:

* Random Forest Classifier : Base Model
* ExtraTreesClassifier : Fastest but not most Accurate
* GradientBoostingClassifier : Slower but more Accurate

The most important ones for Support Vector Machines are

* n_estimators
* max_depth
* criterion


`n_estimators` takes values from 1 to 200

`max_depth` takes values 1 to 200

`criterion` takes values ["gini", "entropy", "log_loss"]


In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
forest_model = RandomForestClassifier()

In [None]:
forest_model = RandomForestClassifier(n_estimators = 200, max_depth = 25, criterion ='gini', random_state=41)

In [None]:
forest_model.fit(scaled_X_train,y_train)


In [None]:
forest_validation_predictions = forest_model.predict(scaled_X_validation)

In [None]:
forest_holdout_predictions = forest_model.predict(scaled_X_holdout_test)


In [None]:
forest_y_final_pred = forest_model.predict(scaled_X_test)


In [None]:
accuracy_score(y_test,forest_y_final_pred)

## Google Form Time!

This will take about 5-10 minutes. Make sure to talk with your neighbors about the answers to further your understanding!

Google Form: https://forms.gle/UPZQxEbdHJnZyjLc6


# Neural Network Time

Let's make a Neural Network Model that Identifies whether or not the Image is a Number!

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Dataset, ConcatDataset, Subset
import matplotlib.pyplot as plt
import numpy as np

Loading the MNIST data into Tensors for our Neural Network to Understand.

Tensors are ways to hold data such as:

* input data (images, embeddings)
* model weights and biases
* hidden layers
* gradients during backpropagation

In [None]:
# STEP 1: Load the MNIST dataset (images of numbers 0–9)
transform = transforms.ToTensor()
mnist_full = datasets.MNIST(root="./data", train=True, download=True, transform=transform)

The original dataset consists of 70,000 images and we are reducing back down to 1,000 images to match our previous models. Furthermore, having a smaller subwset with lead to faster experimentation and model training!

In [None]:
# Select only 1,000 random samples from MNIST
subset_indices = np.random.choice(len(mnist_full), 1000, replace=False)
mnist = Subset(mnist_full, subset_indices)

In this step, we create a custom dataset of random noise images.

These will serve as examples of images that are NOT handwritten numbers, allowing us to train our model to distinguish between “numbers” and “not numbers.”

Explanation

`count:`

* The total number of fake images we want to generate.

`__len__: `

* Returns the number of items in the dataset, which is required for PyTorch datasets.

`__getitem__:`

* Returns a single random image (img) and its label (label) for a given index idx.

`torch.rand(1, 28, 28) `

* generates a single-channel (grayscale) image with random pixel values between 0 and 1.

We assign a label of 0 because these images are not handwritten digits.

* Label 1 = MNIST digit

* Label 0 = random noise

In [None]:
# STEP 2: Make fake "not a number" images by using random noise
class RandomNoise(Dataset):
    def __init__(self, count):
        self.count = count

    def __len__(self):
        return self.count

    def __getitem__(self, idx):
        img = torch.rand(1, 28, 28)  # random pixels
        label = 0                    # label 0 = not a number
        return img, label

## Activation Layer

In this step, we convert the original MNIST dataset into a binary classification format where every image of a handwritten digit is labeled as 1 (“is a number”).

`mnist_data:`

* The original MNIST dataset (handwritten digits 0–9).

`__len__:`

* Returns the total number of MNIST images.

`__getitem__:`

* Retrieves the image at index idx.


In [None]:
# STEP 3: Turn MNIST into a dataset of "is a number" = 1
class MNISTBinary(Dataset):
    def __init__(self, mnist_data):
        self.mnist_data = mnist_data

    def __len__(self):
        return len(self.mnist_data)

    def __getitem__(self, idx):
        img, _ = self.mnist_data[idx]
        label = 1                    # label 1 = is a number
        return img, label

In this step, we merge the positive and negative datasets to create a single dataset for training our binary classifier.

In [None]:
# STEP 4: Combine both datasets together
train_data = ConcatDataset([
    MNISTBinary(mnist),
    RandomNoise(len(mnist))  # same number of fake samples (1000)
])

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)


## Building our Neural Network!

`nn.Flatten()`

* Converts each 28×28 image into a 784-dimensional vector.

* Input: (1, 28, 28) → Output: (784)

`nn.Linear(28*28, 64)`

* This is the HIDDEN LAYER: it takes the 784 input features and maps them to 64 neurons.

* These 64 neurons are not the output — they are internal representations that the network uses to learn patterns.

`nn.ReLU()`

* Applies a nonlinear activation to the hidden layer outputs.

* ReLU ensures the network can learn nonlinear relationships, not just linear ones.

`nn.Linear(64, 1)`

* The output layer: reduces 64 hidden features to a single number.

`nn.Sigmoid()`

* Converts the output to a probability between 0 and 1, suitable for binary classification (number vs. not a number).


In [None]:
# STEP 5: Build a very small neural network
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.fc(x)

`model = SimpleNet()`

* Creates an instance of our neural network defined in Step 5.

* This network takes a 28×28 image as input and outputs a probability that the image is a number (1) or not a number (0).

`loss_fn = nn.BCELoss()`

* Binary Cross Entropy (BCE) is ideal for binary classification tasks.

* Measures how far the predicted probability is from the actual label (0 or 1).

* The network will try to minimize this loss during training.

`optimizer = optim.Adam(model.parameters(), lr=0.001)`

* Adam is a popular optimization algorithm that updates the model’s weights to reduce loss.

* `model.parameters()` tells Adam which parameters to update.

`lr=0.001`

* sets the learning rate, controlling how big each update step is.


In [None]:
# STEP 6: Set up the model, loss, and optimizer
model = SimpleNet()
loss_fn = nn.BCELoss()               # Binary Cross Entropy = good for yes/no tasks
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
# STEP 7: Train the model
for epoch in range(3):  # just 3 rounds to keep it fast
    total_loss = 0
    for imgs, labels in train_loader:
        labels = labels.float()
        preds = model(imgs).squeeze()
        loss = loss_fn(preds, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: Loss = {total_loss/len(train_loader):.4f}")

In [None]:
# STEP 8: Test the model visually
model.eval()
imgs, labels = next(iter(train_loader))
with torch.no_grad():
    preds = model(imgs).squeeze()
    preds = (preds > 0.5).float()

In [None]:
# Show first 10 examples
fig, axes = plt.subplots(2, 5, figsize=(8,6))
for i, ax in enumerate(axes.flat):
    ax.imshow(imgs[i].squeeze(), cmap="gray")
    ax.set_title("Number" if preds[i]==1 else "Not a number")
    ax.axis("off")
plt.show()


In [None]:
model.eval()  # put model in evaluation mode

# Get 1000 samples from the training set (or test set)
# Here we just reuse train_loader for simplicity
test_images = []
test_labels = []

for imgs, labels in train_loader:
    test_images.append(imgs)
    test_labels.append(labels)
    if len(torch.cat(test_images)) >= 1000:
        break  # stop once we have 1000 images

test_images = torch.cat(test_images)[:1000]
test_labels = torch.cat(test_labels)[:1000]

In [None]:
with torch.no_grad():
    preds = model(test_images).squeeze()
    preds = (preds > 0.5).float()

# STEP 10: Measure accuracy
correct = (preds == test_labels).sum().item()
accuracy = correct / len(test_labels)
print(f"✅ Accuracy on 1000 images: {accuracy*100:.2f}%")

# STEP 11: See a few random predictions
import matplotlib.pyplot as plt

# Pick 10 random examples
idxs = torch.randint(0, 1000, (10,))
fig, axes = plt.subplots(2, 5, figsize=(10, 4))
for i, ax in enumerate(axes.flat):
    idx = idxs[i]
    ax.imshow(test_images[idx].squeeze(), cmap="gray")
    ax.set_title("Number" if preds[idx]==1 else "Not a number")
    ax.axis("off")
plt.show()

## Google Form Time Part 2!

This will take about 5-10 minutes. Make sure to talk with your neighbors about the answers to further your understanding!

Google Form: https://forms.gle/ZZuYoh4xXNEhGaE76


## Making Another Neural Network!

Now a Model that not only identifies whether it is a number, but what number it actually is!

In [None]:
transform = transforms.ToTensor()
mnist_full = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
subset_indices = np.random.choice(len(mnist_full), 1000, replace=False)
mnist_small = Subset(mnist_full, subset_indices)

# STEP 2: Create random noise "not a number" images
class RandomNoise(Dataset):
    def __init__(self, count):
        self.count = count
    def __len__(self):
        return self.count
    def __getitem__(self, idx):
        img = torch.rand(1, 28, 28)
        label = 0  # "not a number"
        return img, label

# Convert MNIST digits into "is a number" = 1
class MNISTBinary(Dataset):
    def __init__(self, mnist_data):
        self.mnist_data = mnist_data
    def __len__(self):
        return len(self.mnist_data)
    def __getitem__(self, idx):
        img, _ = self.mnist_data[idx]
        label = 1
        return img, label

# Combine both for the binary classifier
binary_train = ConcatDataset([MNISTBinary(mnist_small), RandomNoise(len(mnist_small))])
binary_loader = DataLoader(binary_train, batch_size=64, shuffle=True)

# STEP 3: Binary classifier ("is it a number?")
class BinaryNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )
    def forward(self, x):
        return self.fc(x)

binary_model = BinaryNet()
binary_loss = nn.BCELoss()
binary_opt = optim.Adam(binary_model.parameters(), lr=0.001)

for epoch in range(3):
    total_loss = 0
    for imgs, labels in binary_loader:
        labels = labels.float()
        preds = binary_model(imgs).squeeze()
        loss = binary_loss(preds, labels)
        binary_opt.zero_grad()
        loss.backward()
        binary_opt.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: Binary Loss = {total_loss/len(binary_loader):.4f}")


### =====================================================================
#ALL THE NEW CODE!!
# STEP 4: Digit classifier (0–9)
digit_loader = DataLoader(mnist_small, batch_size=64, shuffle=True)

class DigitNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        return self.fc(x)

digit_model = DigitNet()
digit_loss = nn.CrossEntropyLoss()
digit_opt = optim.Adam(digit_model.parameters(), lr=0.001)

### =====================================================================


for epoch in range(3):
    total_loss = 0
    correct = 0
    total = 0
    for imgs, labels in digit_loader:
        preds = digit_model(imgs)
        loss = digit_loss(preds, labels)
        digit_opt.zero_grad()
        loss.backward()
        digit_opt.step()
        total_loss += loss.item()
        correct += (preds.argmax(1) == labels).sum().item()
        total += labels.size(0)
    print(f"Epoch {epoch+1}: Digit Loss = {total_loss/len(digit_loader):.4f}, Accuracy = {100*correct/total:.2f}%")

# STEP 5: Test combined system visually
binary_model.eval()
digit_model.eval()

# Make a test batch: half MNIST + half noise
test_data = ConcatDataset([Subset(mnist_small, range(5)), RandomNoise(5)])
test_loader = DataLoader(test_data, batch_size=1, shuffle=False)

fig, axes = plt.subplots(2, 5, figsize=(10, 4))
axes = axes.flat

print("\nPredictions:")
for i, (img, _) in enumerate(test_loader):
    with torch.no_grad():
        prob_is_num = binary_model(img).item()
        if prob_is_num > 0.5:
            # Stage 2: predict which number
            pred_digit = digit_model(img).argmax(1).item()
            label = f"Number {pred_digit}"
        else:
            label = "Not a number"
    ax = axes[i]
    ax.imshow(img.squeeze(), cmap="gray")
    ax.set_title(label)
    ax.axis("off")

plt.show()


## New Code!

The only additional code is

```python
digit_loader = DataLoader(mnist_small, batch_size=64, shuffle=True)

class DigitNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        return self.fc(x)

digit_model = DigitNet()
digit_loss = nn.CrossEntropyLoss()
digit_opt = optim.Adam(digit_model.parameters(), lr=0.001)
```

All that has changed is

```python

nn.Linear(64, 10)

...

digit_loss = nn.CrossEntropyLoss()
```

`nn.Linear(64, 10)` this line of code takes our output of a 64 1D Vector and determines what number from 0-9 (10 number) it is

`digit_loss = nn.CrossEntropyLoss()` our previous Loss function as Binary Cross Entropy or (BCE) and now it is Ctegorical Cross Entropy Loss

### What happens if we do 5 or more epochs?

```python
for epoch in range(...):
```

If you want to see more than 5 epochs, change the number in the range function to your desired number!

## Model for 5 Epochs

In [None]:
transform = transforms.ToTensor()
mnist_full = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
subset_indices = np.random.choice(len(mnist_full), 1000, replace=False)
mnist_small = Subset(mnist_full, subset_indices)

# STEP 2: Create random noise "not a number" images
class RandomNoise(Dataset):
    def __init__(self, count):
        self.count = count
    def __len__(self):
        return self.count
    def __getitem__(self, idx):
        img = torch.rand(1, 28, 28)
        label = 0  # "not a number"
        return img, label

# Convert MNIST digits into "is a number" = 1
class MNISTBinary(Dataset):
    def __init__(self, mnist_data):
        self.mnist_data = mnist_data
    def __len__(self):
        return len(self.mnist_data)
    def __getitem__(self, idx):
        img, _ = self.mnist_data[idx]
        label = 1
        return img, label

# Combine both for the binary classifier
binary_train = ConcatDataset([MNISTBinary(mnist_small), RandomNoise(len(mnist_small))])
binary_loader = DataLoader(binary_train, batch_size=64, shuffle=True)

# STEP 3: Binary classifier ("is it a number?")
class BinaryNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )
    def forward(self, x):
        return self.fc(x)

binary_model = BinaryNet()
binary_loss = nn.BCELoss()
binary_opt = optim.Adam(binary_model.parameters(), lr=0.001)

for epoch in range(5):
    total_loss = 0
    for imgs, labels in binary_loader:
        labels = labels.float()
        preds = binary_model(imgs).squeeze()
        loss = binary_loss(preds, labels)
        binary_opt.zero_grad()
        loss.backward()
        binary_opt.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: Binary Loss = {total_loss/len(binary_loader):.4f}")


### =====================================================================
#ALL THE NEW CODE!!
# STEP 4: Digit classifier (0–9)
digit_loader = DataLoader(mnist_small, batch_size=64, shuffle=True)

class DigitNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        return self.fc(x)

digit_model = DigitNet()
digit_loss = nn.CrossEntropyLoss()
digit_opt = optim.Adam(digit_model.parameters(), lr=0.001)

### =====================================================================


for epoch in range(5):
    total_loss = 0
    correct = 0
    total = 0
    for imgs, labels in digit_loader:
        preds = digit_model(imgs)
        loss = digit_loss(preds, labels)
        digit_opt.zero_grad()
        loss.backward()
        digit_opt.step()
        total_loss += loss.item()
        correct += (preds.argmax(1) == labels).sum().item()
        total += labels.size(0)
    print(f"Epoch {epoch+1}: Digit Loss = {total_loss/len(digit_loader):.4f}, Accuracy = {100*correct/total:.2f}%")

# STEP 5: Test combined system visually
binary_model.eval()
digit_model.eval()

# Make a test batch: half MNIST + half noise
test_data = ConcatDataset([Subset(mnist_small, range(5)), RandomNoise(5)])
test_loader = DataLoader(test_data, batch_size=1, shuffle=False)

fig, axes = plt.subplots(2, 5, figsize=(10, 4))
axes = axes.flat

print("\nPredictions:")
for i, (img, _) in enumerate(test_loader):
    with torch.no_grad():
        prob_is_num = binary_model(img).item()
        if prob_is_num > 0.5:
            # Stage 2: predict which number
            pred_digit = digit_model(img).argmax(1).item()
            label = f"Number {pred_digit}"
        else:
            label = "Not a number"
    ax = axes[i]
    ax.imshow(img.squeeze(), cmap="gray")
    ax.set_title(label)
    ax.axis("off")

plt.show()


## Test Model for X Epochs

In [None]:
transform = transforms.ToTensor()
mnist_full = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
subset_indices = np.random.choice(len(mnist_full), 1000, replace=False)
mnist_small = Subset(mnist_full, subset_indices)

# STEP 2: Create random noise "not a number" images
class RandomNoise(Dataset):
    def __init__(self, count):
        self.count = count
    def __len__(self):
        return self.count
    def __getitem__(self, idx):
        img = torch.rand(1, 28, 28)
        label = 0  # "not a number"
        return img, label

# Convert MNIST digits into "is a number" = 1
class MNISTBinary(Dataset):
    def __init__(self, mnist_data):
        self.mnist_data = mnist_data
    def __len__(self):
        return len(self.mnist_data)
    def __getitem__(self, idx):
        img, _ = self.mnist_data[idx]
        label = 1
        return img, label

# Combine both for the binary classifier
binary_train = ConcatDataset([MNISTBinary(mnist_small), RandomNoise(len(mnist_small))])
binary_loader = DataLoader(binary_train, batch_size=64, shuffle=True)

# STEP 3: Binary classifier ("is it a number?")
class BinaryNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )
    def forward(self, x):
        return self.fc(x)

binary_model = BinaryNet()
binary_loss = nn.BCELoss()
binary_opt = optim.Adam(binary_model.parameters(), lr=0.001)

for epoch in range(...):
    total_loss = 0
    for imgs, labels in binary_loader:
        labels = labels.float()
        preds = binary_model(imgs).squeeze()
        loss = binary_loss(preds, labels)
        binary_opt.zero_grad()
        loss.backward()
        binary_opt.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: Binary Loss = {total_loss/len(binary_loader):.4f}")


### =====================================================================
#ALL THE NEW CODE!!
# STEP 4: Digit classifier (0–9)
digit_loader = DataLoader(mnist_small, batch_size=64, shuffle=True)

class DigitNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        return self.fc(x)

digit_model = DigitNet()
digit_loss = nn.CrossEntropyLoss()
digit_opt = optim.Adam(digit_model.parameters(), lr=0.001)

### =====================================================================


for epoch in range(...):
    total_loss = 0
    correct = 0
    total = 0
    for imgs, labels in digit_loader:
        preds = digit_model(imgs)
        loss = digit_loss(preds, labels)
        digit_opt.zero_grad()
        loss.backward()
        digit_opt.step()
        total_loss += loss.item()
        correct += (preds.argmax(1) == labels).sum().item()
        total += labels.size(0)
    print(f"Epoch {epoch+1}: Digit Loss = {total_loss/len(digit_loader):.4f}, Accuracy = {100*correct/total:.2f}%")

# STEP 5: Test combined system visually
binary_model.eval()
digit_model.eval()

# Make a test batch: half MNIST + half noise
test_data = ConcatDataset([Subset(mnist_small, range(5)), RandomNoise(5)])
test_loader = DataLoader(test_data, batch_size=1, shuffle=False)

fig, axes = plt.subplots(2, 5, figsize=(10, 4))
axes = axes.flat

print("\nPredictions:")
for i, (img, _) in enumerate(test_loader):
    with torch.no_grad():
        prob_is_num = binary_model(img).item()
        if prob_is_num > 0.5:
            # Stage 2: predict which number
            pred_digit = digit_model(img).argmax(1).item()
            label = f"Number {pred_digit}"
        else:
            label = "Not a number"
    ax = axes[i]
    ax.imshow(img.squeeze(), cmap="gray")
    ax.set_title(label)
    ax.axis("off")

plt.show()


# AI School 2 Post-Test Survey

Form: https://forms.gle/jLmmB61b4dxZpErz5


# Post-AI School Stuff

## Logistic Regression Model

In [None]:
from sklearn.linear_model import LogisticRegression
log_model = LogisticRegression(random_state=41)
penalty = ['none','l1','l2','elasticnet']
solver = ['lbfgs','liblinear','saga']
max_iter = [500, 1000]
from sklearn.model_selection import GridSearchCV
param_grid = {'penalty': penalty,
              'solver' : solver,
              'max_iter': max_iter}
lr_grid_model = GridSearchCV(log_model,param_grid=param_grid)
lr_grid_model.fit(scaled_X_train,y_train)
lr_grid_model.best_params_

In [None]:
best_lr_model = LogisticRegression(C=..., max_iter=..., penalty=...,random_state=41)
from sklearn.preprocessing import StandardScaler
best_lr_model.fit(scaled_X_train,y_train)
scaler = StandardScaler()
scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)
scaled_X_holdout_test = scaler.transform(X_holdout_test)
scaled_X_validation = scaler.fit_transform(X_validation)
lr_validation_predictions = best_lr_model.predict(scaled_X_validation)
lr_holdout_predictions = best_lr_model.predict(scaled_X_holdout_test)
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
accuracy_score(y_holdout_test,lr_holdout_predictions)

## LinearSVC

In [None]:
from sklearn.svm import LinearSVC
penality = ['l1', 'l2']
C = [0.01, 0.1, 1, 10]
max_iter = [100, 500, 1000]
param_grid = {'penalty': penalty,
              'C': C,
             'max_iter': max_iter}
linsvc_model = LinearSVC(random_state=41)
svc_grid_model = GridSearchCV(linsvc_model,param_grid=param_grid)
svc_grid_model.fit(scaled_X_train, y_train)
best_svc_model = svc_grid_model.best_params_


In [None]:
best_svc_model = LinearSVC(random_state=41,C=...,max_iter=..., penalty=...)
best_svc_model.fit(scaled_X_train,y_train)
svc_validation_predictions = best_svc_model.predict(scaled_X_validation)
svc_holdout_predictions = best_svc_model.predict(scaled_X_holdout_test)
svc_y_final_pred = best_svc_model.predict(scaled_X_test)
accuracy_score(y_test,svc_y_final_pred)

## Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier
forest_model = RandomForestClassifier(random_state=41)
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2']
}
forest_grid_model = GridSearchCV(estimator=forest_model,param_grid=param_grid)
forest_grid_model.best_params_

In [None]:
best_forest_model = RandomForestClassifier(n_estimators = ..., max_depth = ..., max_features =...,random_state=41)
best_forest_model_.fit(scaled_X_train, y_train)
best_forest
forest_validation_predictions = best_forest_model.predict(scaled_X_validation)
forest_holdout_predictions = best_forest_model.predict(scaled_X_holdout_test)
forest_y_final_pred = forest_model.predict(scaled_X_test)
accuracy_score(y_test,forest_y_final_pred)

## Neuron Network Identifying Which Number

In [None]:
transform = transforms.ToTensor()
mnist_full = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
subset_indices = np.random.choice(len(mnist_full), 1000, replace=False)
mnist_small = Subset(mnist_full, subset_indices)

class RandomNoise(Dataset):
    def __init__(self, count):
        self.count = count
    def __len__(self):
        return self.count
    def __getitem__(self, idx):
        img = torch.rand(1, 28, 28)
        label = 0  # "not a number"
        return img, label

class MNISTBinary(Dataset):
    def __init__(self, mnist_data):
        self.mnist_data = mnist_data
    def __len__(self):
        return len(self.mnist_data)
    def __getitem__(self, idx):
        img, _ = self.mnist_data[idx]
        label = 1
        return img, label

binary_train = ConcatDataset([MNISTBinary(mnist_small), RandomNoise(len(mnist_small))])
binary_loader = DataLoader(binary_train, batch_size=64, shuffle=True)

class BinaryNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )
    def forward(self, x):
        return self.fc(x)

binary_model = BinaryNet()
binary_loss = nn.BCELoss()
binary_opt = optim.Adam(binary_model.parameters(), lr=0.001)

for epoch in range(5):
    total_loss = 0
    for imgs, labels in binary_loader:
        labels = labels.float()
        preds = binary_model(imgs).squeeze()
        loss = binary_loss(preds, labels)
        binary_opt.zero_grad()
        loss.backward()
        binary_opt.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: Binary Loss = {total_loss/len(binary_loader):.4f}")


digit_loader = DataLoader(mnist_small, batch_size=64, shuffle=True)

class DigitNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        return self.fc(x)

digit_model = DigitNet()
digit_loss = nn.CrossEntropyLoss()
digit_opt = optim.Adam(digit_model.parameters(), lr=0.001)

for epoch in range(5):
    total_loss = 0
    correct = 0
    total = 0
    for imgs, labels in digit_loader:
        preds = digit_model(imgs)
        loss = digit_loss(preds, labels)
        digit_opt.zero_grad()
        loss.backward()
        digit_opt.step()
        total_loss += loss.item()
        correct += (preds.argmax(1) == labels).sum().item()
        total += labels.size(0)
    print(f"Epoch {epoch+1}: Digit Loss = {total_loss/len(digit_loader):.4f}, Accuracy = {100*correct/total:.2f}%")

binary_model.eval()
digit_model.eval()

test_data = ConcatDataset([Subset(mnist_small, range(5)), RandomNoise(5)])
test_loader = DataLoader(test_data, batch_size=1, shuffle=False)

fig, axes = plt.subplots(2, 5, figsize=(10, 4))
axes = axes.flat

print("\nPredictions:")
for i, (img, _) in enumerate(test_loader):
    with torch.no_grad():
        prob_is_num = binary_model(img).item()
        if prob_is_num > 0.5:
            # Stage 2: predict which number
            pred_digit = digit_model(img).argmax(1).item()
            label = f"Number {pred_digit}"
        else:
            label = "Not a number"
    ax = axes[i]
    ax.imshow(img.squeeze(), cmap="gray")
    ax.set_title(label)
    ax.axis("off")

plt.show()