## Classification on the MONK Dataset Using Differentiable Logic Gates

This notebook focuses on solving a binary classification task using the MONK dataset. The dataset contains examples described by a set of categorical features, and the goal is to predict whether each instance belongs to the target class.

What makes this approach unique is the use of a custom neural network architecture built from differentiable logic gates instead of traditional activation functions. This setup allows us to explore the interpretability and logic-like behavior of learned decision boundaries while still benefiting from gradient-based learning methods.


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

In [None]:
def load_monk_data(filepath):
    data = pd.read_csv(filepath)
    data = data.drop(columns=data.columns[0])

    X = data.iloc[:, 1:].values
    y = data.iloc[:, 0].values

    scaler = MinMaxScaler()
    X = scaler.fit_transform(X)

    return train_test_split(X, y, test_size=0.2, random_state=2137)

### Differentiable Logic Gate Model

The core of the model is a neural network where each layer consists of neurons that behave like logic gates. These gates are designed in a differentiable way, enabling training through backpropagation.

The network is configured with:

- Input dimension matching the MONK dataset features
- Six hidden layers, each containing 24 logic-based neurons
- One output neuron for binary classification



In [None]:
class DifferentiableLogicGate(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
        super(DifferentiableLogicGate, self).__init__()
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers

        self.layers = nn.ModuleList()
        self.layers.append(nn.Linear(input_dim, hidden_dim))
        for _ in range(num_layers - 1):
            self.layers.append(nn.Linear(hidden_dim, hidden_dim))
        self.out_layer = nn.Linear(hidden_dim, output_dim)

        self.logic_weights = nn.ParameterList([
            nn.Parameter(torch.randn(16)) for _ in range(hidden_dim)
        ])

    def forward_logic_gate(self, a, b, gate_id):
        if gate_id == 0:  # FALSE
            return torch.zeros_like(a)
        elif gate_id == 1:  # AND
            return a * b
        elif gate_id == 2:  # NOT(A->B)
            return a - a * b
        elif gate_id == 3:  # A
            return a
        elif gate_id == 4:  # NOT(B->A)
            return b - a * b
        elif gate_id == 5:  # B
            return b
        elif gate_id == 6:  # XOR
            return a + b - 2 * a * b
        elif gate_id == 7:  # OR
            return a + b - a * b
        elif gate_id == 8:  # NOR
            return 1 - (a + b - a * b)
        elif gate_id == 9:  # XNOR
            return 1 - (a + b - 2 * a * b)
        elif gate_id == 10:  # NOT(B)
            return 1 - b
        elif gate_id == 11:  # A <= B
            return 1 - b + a * b
        elif gate_id == 12:  # NOT(A)
            return 1 - a
        elif gate_id == 13:  # A -> B
            return 1 - a + a * b
        elif gate_id == 14:  # NAND
            return 1 - a * b
        elif gate_id == 15:  # TRUE
            return torch.ones_like(a)

    def forward(self, x):
        for i, layer in enumerate(self.layers):
            x = layer(x)
            if i < self.num_layers:
                new_x = []
                for neuron_idx in range(self.hidden_dim):
                    a = x[:, neuron_idx]
                    b = x[:, (neuron_idx + 1) % self.hidden_dim]
                    gate_probs = F.softmax(self.logic_weights[neuron_idx], dim=0)
                    neuron_output = sum(
                        gate_probs[j] * self.forward_logic_gate(a, b, j)
                        for j in range(16)
                    )
                    new_x.append(neuron_output)
                x = torch.stack(new_x, dim=1)
            x = F.relu(x)
        return torch.sigmoid(self.out_layer(x))

In [None]:
filepath = "monk.csv"
X_train, X_test, y_train, y_test = load_monk_data(filepath)

X_train, X_test = torch.tensor(X_train, dtype=torch.float32), torch.tensor(X_test, dtype=torch.float32)
y_train, y_test = torch.tensor(y_train, dtype=torch.float32), torch.tensor(y_test, dtype=torch.float32)

In [None]:
input_dim = X_train.shape[1]
hidden_dim = 24
output_dim = 1
num_layers = 6

model = DifferentiableLogicGate(input_dim, hidden_dim, output_dim, num_layers)
optimizer = Adam(model.parameters(), lr=0.01)
loss_fn = nn.BCELoss()

In [None]:
epochs = 200
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    y_pred = model(X_train).squeeze()
    loss = loss_fn(y_pred, y_train)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch + 1}/{epochs}, Loss: {loss.item():.4f}")

Epoch 10/200, Loss: 0.6429
Epoch 20/200, Loss: 0.6422
Epoch 30/200, Loss: 0.6423
Epoch 40/200, Loss: 0.6421
Epoch 50/200, Loss: 0.6421
Epoch 60/200, Loss: 0.6261
Epoch 70/200, Loss: 0.5467
Epoch 80/200, Loss: 0.5103
Epoch 90/200, Loss: 0.4089
Epoch 100/200, Loss: 0.3473
Epoch 110/200, Loss: 0.2690
Epoch 120/200, Loss: 0.2102
Epoch 130/200, Loss: 0.1288
Epoch 140/200, Loss: 0.2194
Epoch 150/200, Loss: 0.0761
Epoch 160/200, Loss: 0.0198
Epoch 170/200, Loss: 0.0017
Epoch 180/200, Loss: 0.0001
Epoch 190/200, Loss: 0.0000
Epoch 200/200, Loss: 0.0000


In [None]:
model.eval()
y_pred_test = model(X_test).squeeze().round()
accuracy = (y_pred_test == y_test).float().mean()
print(f"Accuracy: {accuracy.item() * 100:.2f}%")

Accuracy: 97.52%


In [None]:
y_pred_test_np = y_pred_test.detach().numpy()
y_test_np = y_test.numpy()
conf_matrix = confusion_matrix(y_test_np, y_pred_test_np)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[77  2]
 [ 1 41]]


### Summary

After training, the model achieved a high classification accuracy of **97.52%** on the test set. The confusion matrix confirms strong generalization, with only a few misclassifications. This result demonstrates that the differentiable logic gate approach is highly effective for symbolic-style learning while still maintaining strong numerical performance.

This experiment confirms that logic-inspired architectures can not only learn successfully but also offer interpretable internal structures — bridging the gap between symbolic and neural approaches.