# Nursery dataset
Was originally created to rank and evaluate nursery school applications. 
So an application with for example, a family that is financially stable, has good housing, and has no social or health problems would be classified as priority.
And applications that may involve severe financial, social, or health issues that make it highly unlikely for the application to be accepted, would be classified as

Dit weet ik niet zeker??..

So the ranking from best to worst is:

Very Recommended > Recommended > Priority > Special Priority > Not Recommended.

In [1]:
#!pip install -r ../requirements.txt

In [2]:
# ucimlrepo is a tool that provides easy access to datasets hosted on the UCI Machine Learning Repository
from ucimlrepo import fetch_ucirepo

# Fetch dataset from UCI repository, which has ud 76
nursery = fetch_ucirepo(id=76)

# Display metadata and variable information
print(nursery.metadata) # metadata 
print(nursery.variables) # variable information 
print("\n"+ "The first 5 rows of the dataset:")
print(nursery.data.features.head())  # Display first 5 rows of features
# Show the target variable, possibilities

# Show the target variable and its unique possibilities
unique_targets = nursery.data.targets['class'].unique()  # Access the 'class' column and get unique values
print("\nPossible target classes:")
print(unique_targets)  # Display unique target classes

{'uci_id': 76, 'name': 'Nursery', 'repository_url': 'https://archive.ics.uci.edu/dataset/76/nursery', 'data_url': 'https://archive.ics.uci.edu/static/public/76/data.csv', 'abstract': ' Nursery Database was derived from a hierarchical decision model originally developed to rank applications for nursery schools.', 'area': 'Social Science', 'tasks': ['Classification'], 'characteristics': ['Multivariate'], 'num_instances': 12960, 'num_features': 8, 'feature_types': ['Categorical'], 'demographics': [], 'target_col': ['class'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 1989, 'last_updated': 'Sun Jan 14 2024', 'dataset_doi': '10.24432/C5P88W', 'creators': ['Vladislav Rajkovic'], 'intro_paper': {'ID': 372, 'type': 'NATIVE', 'title': 'An application for admission in public school systems', 'authors': 'M. Olave, V. Rajkovic, M. Bohanec', 'venue': 'Expert Systems in Public Administration', 'year': 1989, 'journal': None, 'DOI': None, 

# Imports

In [3]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.metrics import accuracy_score
# from model import Mamba, ModelArgs  # Import your custom Mamba implementation

# Preprocessing
The datta will be preprocessed, and converted into tensors

In [4]:
X = nursery.data.features  # These are all the feature columns in the dataset
Y = nursery.data.targets  # This is the target column in the dataset
print("\nOriginal  values (before encoding):")
print(X[1:4])  # Display a sample of the feature values
print(Y[1:4])  # Display a sample of the target values

# In case of future errors: Y = Y.values.ravel()  # Flatten Y to make it a 1D array if needed
label_encoder = LabelEncoder()  # Used to encode the categorical target variables into numerical values
X = X.apply(label_encoder.fit_transform)  # Encode the feature variables (X)
Y = label_encoder.fit_transform(Y)  # Encode the target variable (Y)

print("\nEncoded target values (after encoding):")
print(X[1:4])  # Display the encoded feature values
print(Y[1:4])  # Display the encoded target values

# Split into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.21, random_state=42)

# Convert the train/test data into PyTorch tensors
# We must do this because PyTorch models only accept tensors as input
# Both the MambaClassifier and Mamba classes inherit from torch.nn.Module
# which is the base class for all neural network modules in PyTorch.
X_train_tensor = torch.tensor(X_train.values, dtype=torch.long)
X_test_tensor = torch.tensor(X_test.values, dtype=torch.long)
Y_train_tensor = torch.tensor(Y_train, dtype=torch.long)
Y_test_tensor = torch.tensor(Y_test, dtype=torch.long)

# Lets see how these tensors look like
print("\nSample of training data tensor:")
print(X_train_tensor[0:2])  # Display a sample of the training data tensor
print("\nSample of training target tensor:")
print(Y_train_tensor[0:2])  # Display a sample of the training target tensor
print("\nSample of testing data tensor:")
print(X_test_tensor[0:2])  # Display a sample of the testing data tensor
print("\nSample of testing target tensor:")
print(Y_test_tensor[0:2])  # Display a sample of the testing target tensor

# Create PyTorch datasets and data loaders
train_dataset = TensorDataset(X_train_tensor, Y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, Y_test_tensor)

# DataLoader to help in batch processing during model training/testing
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64)


Original  values (before encoding):
  parents has_nurs      form children     housing     finance         social  \
1   usual   proper  complete        1  convenient  convenient        nonprob   
2   usual   proper  complete        1  convenient  convenient        nonprob   
3   usual   proper  complete        1  convenient  convenient  slightly_prob   

        health  
1     priority  
2    not_recom  
3  recommended  
       class
1   priority
2  not_recom
3  recommend

Encoded target values (after encoding):
   parents  has_nurs  form  children  housing  finance  social  health
1        2         3     0         0        0        0       0       1
2        2         3     0         0        0        0       0       0
3        2         3     0         0        0        0       2       2
[1 0 2]

Sample of training data tensor:
tensor([[0, 2, 0, 3, 2, 0, 2, 1],
        [2, 0, 3, 2, 2, 1, 1, 1]])

Sample of training target tensor:
tensor([3, 3])

Sample of testing data tensor:
tenso

  y = column_or_1d(y, warn=True)


# Defining the model

In [5]:
from model import Mamba, ModelArgs
# I also want another mamba model which was not pretrained
d_model = 64
n_layer = 4
vocabsize = len(X.nunique())
# or might it be vocabsize = X.apply(lambda col: col.nunique()).max()
model_args = ModelArgs(d_model=d_model, n_layer=n_layer, vocab_size=vocabsize)
num_classes = len(nursery.data.targets['class'].unique())
model = Mamba(model_args, num_classes=num_classes)


# Training MAMBA on Nursery

In [6]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# Set the device (GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0

    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the gradients
        optimizer.zero_grad()

        # Forward pass
        logits, probabilities = model(inputs)  # Unpack the logits and probabilities

        # Flatten labels if they are not already
        labels = labels.view(-1)  # Flatten the labels to [batch_size * seq_length]

        # Compute loss
        loss = criterion(logits, labels)  # Use logits for loss computation

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')

Epoch [1/10], Loss: 0.8944
Epoch [2/10], Loss: 0.4998
Epoch [3/10], Loss: 0.3088
Epoch [4/10], Loss: 0.1135
Epoch [5/10], Loss: 0.0523
Epoch [6/10], Loss: 0.0291
Epoch [7/10], Loss: 0.0182
Epoch [8/10], Loss: 0.0124
Epoch [9/10], Loss: 0.0085
Epoch [10/10], Loss: 0.0066


# Evaluating the model + testing inference

In [7]:
# Switch to evaluation mode
model.eval()
y_pred = []
y_true = []
y_prob = []  # List to store probabilities

# Test the model
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        logits, probabilities = model(inputs)  # Now get both logits and probabilities
        _, predicted = torch.max(logits, 1)
        y_pred.extend(predicted.cpu().numpy())
        y_true.extend(labels.cpu().numpy())
        y_prob.extend(probabilities.cpu().numpy())  # Store probabilities

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f'Accuracy on the test set: {accuracy:.4f}')

# Convert probabilities to percentages
y_prob_percentages = [prob * 100 for prob in y_prob]

# Map the predicted class indices to their corresponding class names
class_names = unique_targets  # Use the unique target classes from the dataset

# Display the first few predicted class names along with their probabilities and the correct class
print("First few predicted class names and their probabilities (in percentages):")
for i in range(5):
    print(f"Instance {i+1}:")
    print(f"  Correct Class: {class_names[y_true[i]]}")
    for class_index, class_name in enumerate(class_names):
        print(f"  Class: {class_name}, Probability: {y_prob_percentages[i][class_index]:.2f}%")

# Switch to evaluation mode
model.eval()
y_pred_train = []
y_true_train = []
y_prob_train = []  # List to store probabilities

# Test the model on training data
with torch.no_grad():
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        logits, probabilities = model(inputs)  # Now get both logits and probabilities
        _, predicted = torch.max(logits, 1)
        y_pred_train.extend(predicted.cpu().numpy())
        y_true_train.extend(labels.cpu().numpy())
        y_prob_train.extend(probabilities.cpu().numpy())  # Store probabilities

# Calculate accuracy on training data
accuracy_train = accuracy_score(y_true_train, y_pred_train)
print(f'Accuracy on the training set: {accuracy_train:.4f}')

# Convert probabilities to percentages
y_prob_train_percentages = [prob * 100 for prob in y_prob_train]

# Map the predicted class indices to their corresponding class names
class_names = unique_targets  # Use the unique target classes from the dataset

# Display the first few predicted class names along with their probabilities and the correct class
print("First few predicted class names and their probabilities (in percentages) for training data:")
for i in range(5):
    print(f"Instance {i+1}:")
    print(f"  Correct Class: {class_names[y_true_train[i]]}")
    for class_index, class_name in enumerate(class_names):
        print(f"  Class: {class_name}, Probability: {y_prob_train_percentages[i][class_index]:.2f}%")

Accuracy on the test set: 0.9993
First few predicted class names and their probabilities (in percentages):
Instance 1:
  Correct Class: recommend
  Class: recommend, Probability: 99.58%
  Class: priority, Probability: 0.08%
  Class: not_recom, Probability: 0.12%
  Class: very_recom, Probability: 0.08%
  Class: spec_prior, Probability: 0.15%
Instance 2:
  Correct Class: very_recom
  Class: recommend, Probability: 0.10%
  Class: priority, Probability: 0.13%
  Class: not_recom, Probability: 0.11%
  Class: very_recom, Probability: 99.57%
  Class: spec_prior, Probability: 0.10%
Instance 3:
  Correct Class: priority
  Class: recommend, Probability: 0.07%
  Class: priority, Probability: 99.42%
  Class: not_recom, Probability: 0.19%
  Class: very_recom, Probability: 0.03%
  Class: spec_prior, Probability: 0.28%
Instance 4:
  Correct Class: very_recom
  Class: recommend, Probability: 0.10%
  Class: priority, Probability: 0.03%
  Class: not_recom, Probability: 0.20%
  Class: very_recom, Probabil

In [8]:
import numpy as np

# Extract probabilities for the correct class on the test data
correct_class_probs_test = [y_prob[i][y_true[i]] for i in range(len(y_true))]
average_prob_correct_class_test = np.mean(correct_class_probs_test)
print(f'Average probability for the correct class on the test data: {average_prob_correct_class_test:.4f}')

# Extract probabilities for the correct class on the training data
correct_class_probs_train = [y_prob_train[i][y_true_train[i]] for i in range(len(y_true_train))]
average_prob_correct_class_train = np.mean(correct_class_probs_train)
print(f'Average probability for the correct class on the training data: {average_prob_correct_class_train:.4f}')


Average probability for the correct class on the test data: 0.9927
Average probability for the correct class on the training data: 0.9942


# Try 2, simpler model
As we can see there is not much difference between the probabilities of true class predictions on the traindataset instances, and the testdataset instances.

This might indicate that the dataset was too easy for this model. Below i will try the same on a simpler version of the same mamba model e.g. less layers and lower dimensionality:

In [9]:
# I also want another mamba model which was not pretrained
d_model = 16
n_layer = 3
vocabsize = len(X.nunique())
# or might it be vocabsize = X.apply(lambda col: col.nunique()).max()
model_args = ModelArgs(d_model=d_model, n_layer=n_layer, vocab_size=vocabsize)
num_classes = len(nursery.data.targets['class'].unique())
smaller_model = Mamba(model_args, num_classes=num_classes)

In [10]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(smaller_model.parameters(), lr=1e-4)

# Set the device (GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
smaller_model.to(device)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    smaller_model.train()
    running_loss = 0.0

    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the gradients
        optimizer.zero_grad()

        # Forward pass
        logits, probabilities = smaller_model(inputs)  # Unpack the logits and probabilities

        # Flatten labels if they are not already
        labels = labels.view(-1)  # Flatten the labels to [batch_size * seq_length]

        # Compute loss
        loss = criterion(logits, labels)  # Use logits for loss computation

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')

Epoch [1/10], Loss: 1.4998
Epoch [2/10], Loss: 1.0255
Epoch [3/10], Loss: 0.6812
Epoch [4/10], Loss: 0.6144
Epoch [5/10], Loss: 0.5780
Epoch [6/10], Loss: 0.5498
Epoch [7/10], Loss: 0.5099
Epoch [8/10], Loss: 0.4239
Epoch [9/10], Loss: 0.3326
Epoch [10/10], Loss: 0.2680


In [11]:
# Switch to evaluation mode
smaller_model.eval()
y_pred = []
y_true = []
y_prob = []  # List to store probabilities

# Test the model
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        logits, probabilities = smaller_model(inputs)  # Now get both logits and probabilities
        _, predicted = torch.max(logits, 1)
        y_pred.extend(predicted.cpu().numpy())
        y_true.extend(labels.cpu().numpy())
        y_prob.extend(probabilities.cpu().numpy())  # Store probabilities

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f'Accuracy on the test set: {accuracy:.4f}')

# Convert probabilities to percentages
y_prob_percentages = [prob * 100 for prob in y_prob]

# Map the predicted class indices to their corresponding class names
class_names = unique_targets  # Use the unique target classes from the dataset

# Display the first few predicted class names along with their probabilities and the correct class
print("First few predicted class names and their probabilities (in percentages):")
for i in range(5):
    print(f"Instance {i+1}:")
    print(f"  Correct Class: {class_names[y_true[i]]}")
    for class_index, class_name in enumerate(class_names):
        print(f"  Class: {class_name}, Probability: {y_prob_percentages[i][class_index]:.2f}%")

# Switch to evaluation mode
smaller_model.eval()
y_pred_train = []
y_true_train = []
y_prob_train = []  # List to store probabilities

# Test the model on training data
with torch.no_grad():
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        logits, probabilities = smaller_model(inputs)  # Now get both logits and probabilities
        _, predicted = torch.max(logits, 1)
        y_pred_train.extend(predicted.cpu().numpy())
        y_true_train.extend(labels.cpu().numpy())
        y_prob_train.extend(probabilities.cpu().numpy())  # Store probabilities

# Calculate accuracy on training data
accuracy_train = accuracy_score(y_true_train, y_pred_train)
print(f'Accuracy on the training set: {accuracy_train:.4f}')

# Convert probabilities to percentages
y_prob_train_percentages = [prob * 100 for prob in y_prob_train]

# Map the predicted class indices to their corresponding class names
class_names = unique_targets  # Use the unique target classes from the dataset

# Display the first few predicted class names along with their probabilities and the correct class
print("First few predicted class names and their probabilities (in percentages) for training data:")
for i in range(5):
    print(f"Instance {i+1}:")
    print(f"  Correct Class: {class_names[y_true_train[i]]}")
    for class_index, class_name in enumerate(class_names):
        print(f"  Class: {class_name}, Probability: {y_prob_train_percentages[i][class_index]:.2f}%")

Accuracy on the test set: 0.9269
First few predicted class names and their probabilities (in percentages):
Instance 1:
  Correct Class: recommend
  Class: recommend, Probability: 95.08%
  Class: priority, Probability: 1.15%
  Class: not_recom, Probability: 1.18%
  Class: very_recom, Probability: 0.89%
  Class: spec_prior, Probability: 1.70%
Instance 2:
  Correct Class: very_recom
  Class: recommend, Probability: 0.97%
  Class: priority, Probability: 39.42%
  Class: not_recom, Probability: 1.39%
  Class: very_recom, Probability: 56.56%
  Class: spec_prior, Probability: 1.66%
Instance 3:
  Correct Class: priority
  Class: recommend, Probability: 5.82%
  Class: priority, Probability: 84.26%
  Class: not_recom, Probability: 3.90%
  Class: very_recom, Probability: 1.97%
  Class: spec_prior, Probability: 4.06%
Instance 4:
  Correct Class: very_recom
  Class: recommend, Probability: 1.76%
  Class: priority, Probability: 2.19%
  Class: not_recom, Probability: 1.68%
  Class: very_recom, Probabi

In [12]:
import numpy as np

# Extract probabilities for the correct class on the test data
correct_class_probs_test = [y_prob[i][y_true[i]] for i in range(len(y_true))]
average_prob_correct_class_test = np.mean(correct_class_probs_test)
print(f'Average probability for the correct class on the test data: {average_prob_correct_class_test:.4f}')

# Extract probabilities for the correct class on the training data
correct_class_probs_train = [y_prob_train[i][y_true_train[i]] for i in range(len(y_true_train))]
average_prob_correct_class_train = np.mean(correct_class_probs_train)
print(f'Average probability for the correct class on the training data: {average_prob_correct_class_train:.4f}')


Average probability for the correct class on the test data: 0.8227
Average probability for the correct class on the training data: 0.8257


As we see even on a smaller model there is no siginificant difference in the probability the model outputs for a correct class.
Therefore I will move on to different datasets / problems.