<a href="https://colab.research.google.com/github/ShovalBenjer/deep_learning_neural_networks/blob/main/Deep_exc_2_adir_shoval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **TL;DR:**

**Collaborators: Shoval Benjer 319037404, Adir Amar 209017755**

# **setup:**


To run this code, you'll need the following requirements:

Python 3.x

PyTorch

NumPy

Pandas

You can install these requirements using pip:

`!pip install torch numpy pandas`

To run the code:

1. Copy the provided code into a Python file (e.g., xor_network.py)
2. Run the file using Python:
`python xor_network.py`

If you need to run this in VLab:
Log in to your VLab account
Open a terminal
**Ensure the required packages are installed** (use the pip command above if needed)
Navigate to the directory containing your Python file
Run the file using Python as described above
The code will automatically run experiments for k=1 (with bypass), k=2, and k=4, displaying the weights, biases, loss values, and truth tables for each configuration.

Note: The code uses a low temperature (0.001) for the BTU/sigmoid function as requested in the assignment. No additional setup is required beyond having the necessary Python packages installed.

In [1]:
!pip install torch seaborn matplotlib



In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import statistics
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
def create_datasets():
    """
    Creates the training and validation sets for the XOR problem.

    Returns:
        tuple: (train_x, train_y, val_x, val_y) all as torch.FloatTensors.
    """
    train_x = torch.tensor([[0., 0.],
                            [0., 1.],
                            [1., 0.],
                            [1., 1.]], dtype=torch.float32)
    train_y = torch.tensor([[0.],
                            [1.],
                            [1.],
                            [0.]], dtype=torch.float32)

    val_x = torch.tensor([[0.,0.],
                          [0.,1.],
                          [1.,0.],
                          [1.,1.],
                          [1.,0.1],
                          [1.,0.9],
                          [0.9,0.9],
                          [0.1,0.9]], dtype=torch.float32)
    val_y = torch.tensor([[0.],
                          [1.],
                          [1.],
                          [0.],
                          [1.],
                          [0.],
                          [0.],
                          [1.]], dtype=torch.float32)
    return train_x, train_y, val_x, val_y

class Network(nn.Module):
    """
    A small neural network class for learning the XOR function.
    Can optionally include a bypass connection from inputs to output layer.
    """
    def __init__(self, hidden_size: int, bypass: bool = False):
        """
        Args:
            hidden_size (int): Number of hidden neurons.
            bypass (bool): If True, output layer receives original input as well.
        """
        super(Network, self).__init__()
        self.bypass = bypass
        self.hidden = nn.Linear(2, hidden_size)
        output_input_dim = hidden_size + (2 if bypass else 0)
        self.output = nn.Linear(output_input_dim, 1)
        self.activation = nn.Sigmoid()

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass of the network.

        Args:
            x (torch.Tensor): Input data of shape (batch_size, 2).

        Returns:
            torch.Tensor: Model predictions of shape (batch_size, 1).
        """
        h = self.activation(self.hidden(x))
        if self.bypass:
            h = torch.cat((x, h), dim=1)
        return self.activation(self.output(h))

def train_model(model: nn.Module, train_x: torch.Tensor, train_y: torch.Tensor,
                val_x: torch.Tensor, val_y: torch.Tensor, lr: float,
                max_epochs: int = 40000, patience: int = 10,
                improvement_threshold: float = 0.0001, val_loss_goal: float = 0.2) -> tuple:
    """
    Trains the given model until one of the specified stop conditions is met.

    Stop conditions:
    1. Validation loss not improved by more than `improvement_threshold` in `patience` consecutive epochs
       AND best validation loss < val_loss_goal => success
    2. If `max_epochs` reached without success => fail

    Args:
        model (nn.Module): The neural network to train.
        train_x (torch.Tensor): Training inputs.
        train_y (torch.Tensor): Training targets.
        val_x (torch.Tensor): Validation inputs.
        val_y (torch.Tensor): Validation targets.
        lr (float): Learning rate.
        max_epochs (int): Maximum number of epochs allowed.
        patience (int): Number of epochs to wait for improvement.
        improvement_threshold (float): Minimum improvement to reset patience.
        val_loss_goal (float): Validation loss goal for success.

    Returns:
        tuple: (success (bool), epochs (int), final_train_loss (float), final_val_loss (float))
    """
    optimizer = optim.SGD(model.parameters(), lr=lr)
    loss_fn = nn.BCELoss()

    best_val_loss = float('inf')
    epochs_no_improve = 0
    success = False
    epoch = 0

    while epoch < max_epochs:
        epoch += 1
        model.train()
        optimizer.zero_grad()
        y_pred = model(train_x)
        train_loss = loss_fn(y_pred, train_y)
        train_loss.backward()
        optimizer.step()
        model.eval()
        with torch.no_grad():
            val_pred = model(val_x)
            val_loss = loss_fn(val_pred, val_y).item()

        if val_loss < best_val_loss - improvement_threshold:
            best_val_loss = val_loss
            epochs_no_improve = 0
        else:
            epochs_no_improve += 1

        if epochs_no_improve >= patience and best_val_loss < val_loss_goal:
            success = True
            break

    return success, epoch, train_loss.item(), val_loss

def run_experiment(lr: float, hidden: int, bypass: bool, n_successes: int = 10) -> dict:
    """
    Runs multiple trials of the XOR training with given parameters until `n_successes` successful runs are obtained.
    Tracks and aggregates statistics from successful runs.

    Args:
        lr (float): Learning rate.
        hidden (int): Number of hidden neurons.
        bypass (bool): Whether to use bypass connection.
        n_successes (int): Number of successful runs required.

    Returns:
        dict: A dictionary with computed statistics and details of runs.
    """
    train_x, train_y, val_x, val_y = create_datasets()

    successes = 0
    fail_count = 0
    epochs_list = []
    train_losses_list = []
    val_losses_list = []
    models_list = []

    while successes < n_successes:
        model = Network(hidden, bypass)
        success, epochs, train_l, val_l = train_model(
            model, train_x, train_y, val_x, val_y, lr=lr)

        if success:
            successes += 1
            epochs_list.append(epochs)
            train_losses_list.append(train_l)
            val_losses_list.append(val_l)
            models_list.append(model)
        else:
            fail_count += 1

    mean_epochs = statistics.mean(epochs_list)
    std_epochs_percent = (statistics.pstdev(epochs_list) * 100 / mean_epochs) if len(epochs_list) > 1 else 0
    mean_train_loss = statistics.mean(train_losses_list)
    std_train_loss = statistics.pstdev(train_losses_list) if len(train_losses_list) > 1 else 0
    mean_val_loss = statistics.mean(val_losses_list)
    std_val_loss = statistics.pstdev(val_losses_list) if len(val_losses_list) > 1 else 0

    return {
        'lr': lr,
        'hidden': hidden,
        'bypass': bypass,
        'mean_epochs': mean_epochs,
        'std_epochs_%': std_epochs_percent,
        'mean_train_loss': mean_train_loss,
        'std_train_loss': std_train_loss,
        'mean_val_loss': mean_val_loss,
        'std_val_loss': std_val_loss,
        'fail_count': fail_count,
        'models': models_list
    }

def print_experiment_results(result: dict):
    """
    Prints experiment results in a structured format.

    Args:
        result (dict): Result dictionary from run_experiment().
    """
    print("=== Experiment Results ===")
    print(f"Params: LR={result['lr']}, Hidden={result['hidden']}, Bypass={result['bypass']}")
    print(f"Mean epochs: {result['mean_epochs']:.2f} (std %: {result['std_epochs_%']:.2f}%)")
    print(f"Mean Train Loss: {result['mean_train_loss']:.4f} (std: {result['std_train_loss']:.4f})")
    print(f"Mean Val Loss: {result['mean_val_loss']:.4f} (std: {result['std_val_loss']:.4f})")
    print(f"Failed runs until 10 successes: {result['fail_count']}")
    print("==========================================\n")

def print_hidden_outputs(model: nn.Module, train_x: torch.Tensor, train_y: torch.Tensor):
    """
    Prints the hidden neuron outputs for the training set inputs.

    Args:
        model (nn.Module): The trained model.
        train_x (torch.Tensor): Training input features.
        train_y (torch.Tensor): Training targets.
    """
    model.eval()
    with torch.no_grad():
        h = model.activation(model.hidden(train_x))
        print("Hidden neuron outputs:")
        for i, inp in enumerate(train_x):
            print(f"Input: {inp.tolist()}, Hidden: {h[i].tolist()}, Target: {train_y[i].item()}")

def main():
    """
    Main function to run all experiments as required and print results.
    """
    experiment_params = [
        (0.1, 2, False),
        (0.1, 2, True),
        (0.1, 4, False),
        (0.1, 4, True),
        (0.01, 2, False),
        (0.01, 2, True),
        (0.01, 4, False),
        (0.01, 4, True),
        (1.0, 1, True)
    ]

    results = []
    for i, (lr, hidden, bypass) in enumerate(experiment_params):
        result = run_experiment(lr, hidden, bypass)
        print_experiment_results(result)
        results.append(result)

    exp9 = results[-1]
    train_x, train_y, _, _ = create_datasets()
    model9 = exp9['models'][0]
    print("=== Experiment 9 Hidden Layer Analysis ===")
    print_hidden_outputs(model9, train_x, train_y)
    exp_data = results[:-1]
    hidden2 = [res for res in exp_data if res['hidden'] == 2]
    hidden4 = [res for res in exp_data if res['hidden'] == 4]
    mean_epochs_h2 = np.mean([r['mean_epochs'] for r in hidden2])
    mean_epochs_h4 = np.mean([r['mean_epochs'] for r in hidden4])

    plt.figure()
    plt.bar(['Hidden=2','Hidden=4'], [mean_epochs_h2, mean_epochs_h4])
    plt.title('Mean Epochs Until Stopping by Hidden Units')
    plt.ylabel('Mean Epochs')
    plt.show()

In [None]:
if __name__ == "__main__":
    main()

=== Experiment Results ===
Params: LR=0.1, Hidden=2, Bypass=False
Mean epochs: 11327.70 (std %: 35.78%)
Mean Train Loss: 0.0301 (std: 0.0043)
Mean Val Loss: 0.0408 (std: 0.0051)
Failed runs until 10 successes: 0

=== Experiment Results ===
Params: LR=0.1, Hidden=2, Bypass=True
Mean epochs: 11702.50 (std %: 15.25%)
Mean Train Loss: 0.0391 (std: 0.0051)
Mean Val Loss: 0.0554 (std: 0.0077)
Failed runs until 10 successes: 0

=== Experiment Results ===
Params: LR=0.1, Hidden=4, Bypass=False
Mean epochs: 9562.20 (std %: 22.51%)
Mean Train Loss: 0.0267 (std: 0.0018)
Mean Val Loss: 0.0373 (std: 0.0026)
Failed runs until 10 successes: 0

=== Experiment Results ===
Params: LR=0.1, Hidden=4, Bypass=True
Mean epochs: 9424.20 (std %: 14.31%)
Mean Train Loss: 0.0312 (std: 0.0059)
Mean Val Loss: 0.0461 (std: 0.0078)
Failed runs until 10 successes: 0

