<a href="https://colab.research.google.com/github/ShovalBenjer/deep_learning_neural_networks/blob/main/Deep_exc_2_adir_shoval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **TL;DR:**

**Collaborators: Shoval Benjer 319037404, Adir Amar 209017755**

This assignment focuses on Implementing a neural network in PyTorch to solve the XOR problem.
Exploring configurations of the network by specific instructions.
Documenting results with reproducible experiments and clear outputs.

# **setup:**


To run this code, you'll need the following requirements:

Python 3.x

PyTorch

NumPy

Pandas

You can install these requirements using pip:

`!pip install torch numpy pandas`

To run the code:

1. Copy the provided code into a Python file (e.g., xor_network.py)
2. Run the file using Python:
`python xor_network.py`

If you need to run this in VLab:
Log in to your VLab account
Open a terminal
**Ensure the required packages are installed** (use the pip command above if needed)
Navigate to the directory containing your Python file
Run the file using Python as described above
The code will automatically run experiments for k=1 (with bypass), k=2, and k=4, displaying the weights, biases, loss values, and truth tables for each configuration.

Note: The code uses a low temperature (0.001) for the BTU/sigmoid function as requested in the assignment. No additional setup is required beyond having the necessary Python packages installed.

In [9]:
!pip install torch matplotlib



In [10]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import statistics
import matplotlib.pyplot as plt

In [11]:
# Prepare XOR dataset
train_x = torch.tensor([[0., 0.], [0., 1.], [1., 0.], [1., 1.]], dtype=torch.float32)
train_y = torch.tensor([[0.], [1.], [1.], [0.]], dtype=torch.float32)
val_x = torch.tensor([[0., 0.], [0., 1.], [1., 0.], [1., 1.], [1., 0.1], [1., 0.9], [0.9, 0.9], [0.1, 0.9]], dtype=torch.float32)
val_y = torch.tensor([[0.], [1.], [1.], [0.], [1.], [0.], [0.], [1.]], dtype=torch.float32)

# Experiments
experiments = [
    (0.1, 2, False), (0.1, 2, True), (0.1, 4, False), (0.1, 4, True),
    (0.01, 2, False), (0.01, 2, True), (0.01, 4, False), (0.01, 4, True),
    (1.0, 1, True)
]

# **Configuration and Data Definition**
Here we define the training and validation sets. According to the assignment,

we train a small MLP to learn the XOR function, with a validation set that includes some additional points.

**Training set:** the standard XOR pattern

**Validation set:** includes the training points plus additional specified points.

**Stopping criteria:**

1) Stop successfully if the validation loss hasn't improved by more than 0.0001 over the last 10 epoch AND val_loss < 0.2.

2) Stop with failure if we reach more than 40,000 epochs with no success condition met.

**Experiments:**
We have 8 experiments with combinations of:

LR ∈ {0.1, 0.01}, hidden ∈ {2,4}, bypass ∈ {False,True}

This yields 2x2x2=8 experiments.

Plus a 9th custom experiment: LR=1, hidden=1, bypass=True.


In [14]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import statistics
import matplotlib.pyplot as plt

train_x = torch.tensor([[0., 0.],
                        [0., 1.],
                        [1., 0.],
                        [1., 1.]], dtype=torch.float32)
train_y = torch.tensor([[0.],
                        [1.],
                        [1.],
                        [0.]], dtype=torch.float32)
val_x = torch.tensor([[0.,0.],
                      [0.,1.],
                      [1.,0.],
                      [1.,1.],
                      [1.,0.1],
                      [1.,0.9],
                      [0.9,0.9],
                      [0.1,0.9]], dtype=torch.float32)
val_y = torch.tensor([[0.],
                      [1.],
                      [1.],
                      [0.],
                      [1.],
                      [0.],
                      [0.],
                      [1.]], dtype=torch.float32)

MAX_EPOCHS = 40000
PATIENCE = 10
IMPROVEMENT_THRESHOLD = 0.0001
VAL_LOSS_GOAL = 0.2
experiment_params = [
    (0.1, 2, False),
    (0.1, 2, True),
    (0.1, 4, False),
    (0.1, 4, True),
    (0.01, 2, False),
    (0.01, 2, True),
    (0.01, 4, False),
    (0.01, 4, True),
    (1.0, 1, True)
]

# **Model Definition**

In [15]:
class Network(nn.Module):
    """
    A simple Multi-Layer Perceptron (MLP) for XOR-like tasks.

    Attributes:
    -----------
    hidden : nn.Linear
        The hidden layer mapping from input (2D) to a hidden dimension.
    output : nn.Linear
        The output layer mapping from hidden representation (and possibly inputs if bypass is True) to a single scalar output.
    bypass : bool
        If True, the original inputs are concatenated to the hidden layer output before the output layer.
    activation : nn.Module
        The non-linear activation function used in the hidden layer and output.

    Parameters:
    -----------
    hidden_size : int
        The number of hidden neurons.
    bypass : bool, optional
        Whether to concatenate the input directly to the output layer's input. Default is False.
    """
    def __init__(self, hidden_size, bypass=False):
        super(Network, self).__init__()
        self.bypass = bypass
        self.hidden = nn.Linear(2, hidden_size)
        out_input_size = hidden_size + (2 if bypass else 0)
        self.output = nn.Linear(out_input_size, 1)
        self.activation = nn.Sigmoid()

    def forward(self, x):
        """
        Forward pass of the network.

        Parameters:
        -----------
        x : torch.Tensor
            Input tensor of shape (N, 2).

        Returns:
        --------
        torch.Tensor
            The scalar output after the forward pass, shape (N, 1).
        """
        h = self.activation(self.hidden(x))
        if self.bypass:
            h = torch.cat((x, h), dim=1)
        return self.activation(self.output(h))


def train_model(lr, hidden, bypass):
    """
    Train a model with given parameters until one of the stop conditions is met.

    Stopping Conditions:
    - Success: If validation loss hasn't improved by more than 0.0001 in the last
      10 epochs and the best validation loss so far is < 0.2.
    - Failure: If we reach 40,000 epochs without meeting the success condition.

    Parameters:
    -----------
    lr : float
        Learning rate for the optimizer.
    hidden : int
        Number of hidden units.
    bypass : bool
        Whether to use a bypass connection (concatenating inputs to the output layer).

    Returns:
    --------
    success : bool
        True if the training stopped successfully, False if failed.
    epoch : int
        Number of epochs trained.
    train_loss_val : float
        Final train loss value at stopping.
    val_loss_val : float
        Final validation loss value at stopping.
    model : Network
        The trained model instance.
    """
    model = Network(hidden, bypass)
    optimizer = optim.SGD(model.parameters(), lr=lr)
    loss_fn = nn.BCELoss()

    best_val_loss = float('inf')
    epochs_no_improve = 0
    epoch = 0
    success = False

    while epoch < MAX_EPOCHS:
        epoch += 1
        y_pred = model(train_x)
        train_loss = loss_fn(y_pred, train_y)
        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
        with torch.no_grad():
            val_pred = model(val_x)
            val_loss = loss_fn(val_pred, val_y)
        if val_loss < best_val_loss - IMPROVEMENT_THRESHOLD:
            best_val_loss = val_loss.item()
            epochs_no_improve = 0
        else:
            epochs_no_improve += 1
        if epochs_no_improve >= PATIENCE and best_val_loss < VAL_LOSS_GOAL:
            success = True
            break
        if epoch >= MAX_EPOCHS and not success:
            break
    return success, epoch, train_loss.item(), val_loss.item(), model

# **Running All Experiments**
note 1 - If there's more than one sample, we can compute standard deviation
note 2 - STD% is defined as (std / mean) * 100


In [None]:
experiment_results = []

for i, (lr, hidden, bypass) in enumerate(experiment_params):
    print(f"=== Experiment {i+1} / 9 ===")
    print(f"Params: LR={lr}, Hidden={hidden}, Bypass={bypass}")
    successes = 0
    fail_count = 0
    epochs_list = []
    train_losses_list = []
    val_losses_list = []
    models_list = []
    while successes < 10:
        success, epochs, train_l, val_l, model = train_model(lr, hidden, bypass)
        if success:
            successes += 1
            epochs_list.append(epochs)
            train_losses_list.append(train_l)
            val_losses_list.append(val_l)
            models_list.append(model)
        else:
            fail_count += 1
    mean_epochs = statistics.mean(epochs_list)
    std_epochs = (statistics.pstdev(epochs_list)*100/mean_epochs) if len(epochs_list)>1 else 0
    mean_train_loss = statistics.mean(train_losses_list)
    std_train_loss = statistics.pstdev(train_losses_list) if len(train_losses_list)>1 else 0
    mean_val_loss = statistics.mean(val_losses_list)
    std_val_loss = statistics.pstdev(val_losses_list) if len(val_losses_list)>1 else 0

    experiment_results.append({
        'lr': lr,
        'hidden': hidden,
        'bypass': bypass,
        'mean_epochs': mean_epochs,
        'std_epochs_%': std_epochs,
        'mean_train_loss': mean_train_loss,
        'std_train_loss': std_train_loss,
        'mean_val_loss': mean_val_loss,
        'std_val_loss': std_val_loss,
        'fail_count': fail_count,
        'models': models_list
    })
    print("Results:")
    print(f"Mean epochs: {mean_epochs:.2f} (std %: {std_epochs:.2f}%)")
    print(f"Mean Train Loss: {mean_train_loss:.4f} (std: {std_train_loss:.4f})")
    print(f"Mean Val Loss: {mean_val_loss:.4f} (std: {std_val_loss:.4f})")
    print(f"Failed runs until 10 successes: {fail_count}")
    print("==========================================\n")

=== Experiment 1 / 9 ===
Params: LR=0.1, Hidden=2, Bypass=False
Results:
Mean epochs: 12857.70 (std %: 32.79%)
Mean Train Loss: 0.0295 (std: 0.0037)
Mean Val Loss: 0.0390 (std: 0.0032)
Failed runs until 10 successes: 0

=== Experiment 2 / 9 ===
Params: LR=0.1, Hidden=2, Bypass=True
Results:
Mean epochs: 12908.30 (std %: 42.30%)
Mean Train Loss: 0.0390 (std: 0.0071)
Mean Val Loss: 0.0564 (std: 0.0119)
Failed runs until 10 successes: 0

=== Experiment 3 / 9 ===
Params: LR=0.1, Hidden=4, Bypass=False
Results:
Mean epochs: 9157.00 (std %: 11.61%)
Mean Train Loss: 0.0266 (std: 0.0023)
Mean Val Loss: 0.0377 (std: 0.0031)
Failed runs until 10 successes: 0

=== Experiment 4 / 9 ===
Params: LR=0.1, Hidden=4, Bypass=True
Results:
Mean epochs: 10562.60 (std %: 16.32%)
Mean Train Loss: 0.0330 (std: 0.0033)
Mean Val Loss: 0.0494 (std: 0.0092)
Failed runs until 10 successes: 0

=== Experiment 5 / 9 ===
Params: LR=0.01, Hidden=2, Bypass=False


# **Detailed Analysis of the 9th Experiment**

The 9th experiment parameters: (LR=1, hidden=1, bypass=True)


In [None]:
exp9 = experiment_results[8]
model9 = exp9['models'][0]
model9.eval()
print("=== Experiment 9 Detailed Analysis ===")
print("Hidden neuron output on training set (A,B):")

with torch.no_grad():
    h_out = model9.activation(model9.hidden(train_x))
print("Input (A,B) | Hidden Output | Target")
print("------------------------------------")
for i in range(len(train_x)):
    inp = train_x[i].tolist()
    hidden_output = h_out[i].item()
    target = train_y[i].item()
    print(f"{inp}     {hidden_output:.4f}       {target}")

print("\nAnalyze this output. Depending on initialization and training, the single hidden neuron\n"
      "may be acting as a threshold function distinguishing certain input regions. Try multiple runs\n"
      "to see if it behaves similarly or differently. See if it resembles a known logical function like\n"
      "AND, OR, or acts as a line separator enabling XOR behavior.")


# **Plotting Analysis**

We can plot some results to understand the influence of hidden units, bypass, and learning rates.

In [None]:
exp_data = experiment_results[:-1]
hidden2 = [res for res in exp_data if res['hidden']==2]
hidden4 = [res for res in exp_data if res['hidden']==4]
mean_epochs_h2 = np.mean([r['mean_epochs'] for r in hidden2])
mean_epochs_h4 = np.mean([r['mean_epochs'] for r in hidden4])

plt.figure()
plt.bar(['Hidden=2','Hidden=4'], [mean_epochs_h2, mean_epochs_h4])
plt.title('Mean Epochs Until Stopping by Number of Hidden Units')
plt.ylabel('Mean Epochs')
plt.show()

bypass_true = [res for res in exp_data if res['bypass']==True]
bypass_false = [res for res in exp_data if res['bypass']==False]

mean_epochs_btrue = np.mean([r['mean_epochs'] for r in bypass_true])
mean_epochs_bfalse = np.mean([r['mean_epochs'] for r in bypass_false])

plt.figure()
plt.bar(['Bypass=True','Bypass=False'], [mean_epochs_btrue, mean_epochs_bfalse])
plt.title('Mean Epochs Until Stopping by Bypass')
plt.ylabel('Mean Epochs')
plt.show()

lr_01 = [res for res in exp_data if res['lr']==0.1]
lr_001 = [res for res in exp_data if res['lr']==0.01]

std_epochs_01 = np.mean([r['std_epochs_%'] for r in lr_01]) if len(lr_01)>0 else 0
std_epochs_001 = np.mean([r['std_epochs_%'] for r in lr_001]) if len(lr_001)>0 else 0

plt.figure()
plt.bar(['LR=0.1','LR=0.01'], [std_epochs_01, std_epochs_001])
plt.title('Average STD% of Epochs by Learning Rate')
plt.ylabel('STD% of Epochs')
plt.show()

print("All done!")

# **Phase 2 - just for fun**

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import SGD, RMSprop, Adam
import matplotlib.pyplot as plt

# Define XOR data with Gaussian noise
def generate_xor_data(noise_std=0):
    """Generates XOR data with optional Gaussian noise."""
    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    y = np.array([0, 1, 1, 0])  # XOR outputs
    if noise_std > 0:
        X = X + np.random.normal(0, noise_std, X.shape)
    return X, y

# Define function to build a model with flexible architecture and activation
def build_model(hidden_layers=1, neurons_per_layer=4, activation='relu', optimizer='adam', dropout_rate=0):
    """Builds a customizable MLP model for XOR learning."""
    model = Sequential()
    # Input layer and first hidden layer
    model.add(Dense(neurons_per_layer, input_dim=2, activation=activation))
    if dropout_rate > 0:
        model.add(Dropout(dropout_rate))
    # Additional hidden layers
    for _ in range(hidden_layers - 1):
        model.add(Dense(neurons_per_layer, activation=activation))
        if dropout_rate > 0:
            model.add(Dropout(dropout_rate))
    # Output layer
    model.add(Dense(1, activation='sigmoid'))
    # Compile the model
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Define function to visualize decision boundary
def plot_decision_boundary(X, y, model):
    """Plots decision boundary for a trained model."""
    x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
    y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = (Z > 0.5).astype(int).reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.Paired)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.Paired)
    plt.title("Decision Boundary for XOR")
    plt.show()

# Main function to run experiments
def run_experiments():
    """Runs advanced XOR learning experiments."""
    X, y = generate_xor_data(noise_std=0.1)  # Generate XOR data with noise

    # Experiment parameters
    experiments = [
        {'hidden_layers': 1, 'neurons_per_layer': 4, 'activation': 'relu', 'optimizer': 'adam', 'dropout_rate': 0},
        {'hidden_layers': 2, 'neurons_per_layer': 6, 'activation': 'relu', 'optimizer': 'adam', 'dropout_rate': 0.2},
        {'hidden_layers': 1, 'neurons_per_layer': 4, 'activation': 'tanh', 'optimizer': 'rmsprop', 'dropout_rate': 0},
        {'hidden_layers': 2, 'neurons_per_layer': 8, 'activation': 'elu', 'optimizer': 'sgd', 'dropout_rate': 0.3}
    ]

    for i, params in enumerate(experiments):
        print(f"\nRunning Experiment {i + 1} with params: {params}")
        model = build_model(
            hidden_layers=params['hidden_layers'],
            neurons_per_layer=params['neurons_per_layer'],
            activation=params['activation'],
            optimizer=params['optimizer'],
            dropout_rate=params['dropout_rate']
        )

        # Train the model
        history = model.fit(X, y, epochs=500, verbose=0, batch_size=4)

        # Evaluate the model
        loss, accuracy = model.evaluate(X, y, verbose=0)
        print(f"Experiment {i + 1} Accuracy: {accuracy:.2f}")

        # Visualize decision boundary
        plot_decision_boundary(X, y, model)

# Run the experiments
run_experiments()