# Problem Formulation

This experiment investigates binary classification in a scenario with missing side information, where a decision-maker's preference can influence the outcome.

- **Features (X)**: A 2D feature vector in $R^2$.
- **Label (Y)**: A binary label in `{-1, +1}`, representing a primary category (e.g., university branch allocation: +1 for Computer Science, -1 for Electrical Engineering).
- **Side Information (Z)**: A binary variable in `{-1, +1}` that provides additional context (e.g., qualifying exam subject: +1 for Maths, -1 for Physics). This information is only sometimes available.
- **Preference (U)**: A binary variable in `{-1, +1}` indicating the candidate's preferred outcome (e.g., their desired branch). This is always known.

The joint probability distribution $P(Y, Z, U)$ is parameterized by a correlation factor $\rho$ that controls the alignment between the true label $Y$ and the candidate's preference $U$. The feature vector $X$ is drawn from one of eight Gaussian distributions, conditioned on the combined values of $(Y, Z, U)$.

The core challenge is to build a classifier that can handle cases where the side information $Z$ is missing. We will compare two models that address this problem differently, especially when the candidate's preference $U$ is taken into account. The goal is to determine which model performs better across varying levels of $Y-U$ correlation ($\rho$) and $Z$ observability ($p_o$).

# Proposed Solution

- **High-Level Strategy:** Implement and compare two PyTorch-based models for binary classification under conditions of randomly missing side information ($Z$). The comparison will be performed across a grid of parameters for $\rho$ (correlation between $Y$ and $U$) and $p_o$ (observability of $Z$).
- **Models:**
    1.  **Model 1 (Complex):** When $Z$ is observed, this model uses a soft-max or soft-min function to combine the outputs of two linear models, $f_w(x, +1)$ and $f_w(x, -1)$, guided by the user's preference $U$. When $Z$ is unobserved, it uses a weighted average of the two linear models.
    2.  **Model 2 (Simple):** When $Z$ is observed, this model directly uses the output of the linear model $f_w(x, z)$. When $Z$ is unobserved, it behaves identically to Model 1.
- **Workflow:**
    1.  **Data Generation:** Create a function to generate training and testing datasets based on the specified joint and conditional distributions. This function will take $\rho$ and $p_o$ as inputs.
    2.  **Model Implementation:** Define a PyTorch `nn.Module` that can represent both models, with a flag to switch between the complex and simple forward passes.
    3.  **Training and Evaluation:** Develop functions to train the models using logistic loss and evaluate them using a 0-1 loss (accuracy).
    4.  **Experiment Loop:** Iterate through all combinations of $\rho$ and $p_o$. In each iteration, generate data, train both models, and record their test risks.
    5.  **Results:** Display the final results in a formatted table, highlighting which model performed better for each parameter combination.

# Implementation

### ⚙️ Block 0: Configuration Block

This block contains all the configurable parameters for the experiment, such as hyperparameters, random seeds, and constants. Keeping them in one place makes the notebook reusable and easy to modify.

In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm

# Configuration
SEED = 42
torch.manual_seed(SEED)
np.random.seed(SEED)

# Experiment parameters
N_TRAIN = 2000
N_TEST = 1000
RHO_VALUES = np.linspace(0.0, 1.0, 8)
P_O_VALUES = np.linspace(0.0, 1.0, 8)

# Model hyperparameters
LEARNING_RATE = 0.01
EPOCHS = 100
BATCH_SIZE = 64
TAU = 0.5  # Temperature for soft-max/min

# Set device
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {DEVICE}")

Using device: cpu


### 📦 Dataset

This section handles all data-related steps, including the generation of the dataset based on the problem's specifications.

**What:** The `generate_data` function creates a synthetic dataset for the experiment.
**Why:** We need a controlled way to generate data that reflects the properties of our problem, including the correlation $\rho$ and the observability of side information $p_o$.
**How:**
1.  **Constructing the Joint Distribution $P(Y, Z, U)$:**
    - We start with fixed marginal and conditional probabilities: $P(Y=1) = 0.5$, $P(U=1) = 0.5$, $P(Z=1|Y=1) = 0.8$, and $P(Z=1|Y=-1) = 0.2$.
    - The correlation parameter $\rho$ is used to define the joint probability of $Y$ and $U$. Specifically, $P(Y=1, U=1) = 0.25 + \rho \times 0.25$. This formula ensures that when $\rho=0$, $Y$ and $U$ are independent ($P(Y=1, U=1) = P(Y=1)P(U=1) = 0.25$), and when $\rho=1$, they are maximally aligned.
    - From $P(Y=1, U=1)$, we can derive the other three probabilities for the $(Y,U)$ pairs, since the marginals are fixed. For example, $P(Y=1, U=-1) = P(Y=1) - P(Y=1, U=1)$.
    - We assume that $Z$ is conditionally independent of $U$ given $Y$, i.e., $P(Z|Y,U) = P(Z|Y)$.
    - The full joint probability for each of the 8 combinations of $(y, z, u)$ is then calculated as $P(Y=y, Z=z, U=u) = P(Z=z|Y=y) \times P(Y=y, U=u)$.
2.  **Gaussian Means:** It defines the means for the 8 Gaussian distributions, placing them on the unit circle.
3.  **Sampling:** It samples `(Y, Z, U)` triplets from the constructed joint distribution.
4.  **Feature Generation:** For each triplet, it samples the feature vector $X$ from the corresponding Gaussian distribution with a non-identity covariance matrix.
5.  **Missingness:** It randomly sets the $Z$ values to 0 (representing "missing") with probability $1 - p_o$.
6.  **Tensor Conversion:** The final dataset is converted to PyTorch tensors.

In [8]:
def generate_data(n_samples, rho, p_o):
    # 1. Define the joint probability table P(Y, Z, U)
    # Base probabilities
    p_y1 = 0.5
    p_u1 = 0.5
    p_z1_y1 = 0.8
    p_z1_y_1 = 0.2

    # P(Y, U) based on rho
    p_y1_u1 = 0.25 + rho * 0.25
    p_y1_u_1 = p_y1 - p_y1_u1
    p_y_1_u1 = p_u1 - p_y1_u1
    p_y_1_u_1 = 1 - p_y1_u1 - p_y1_u_1 - p_y_1_u1
    
    probs = np.zeros(8)
    combs = []
    for y in [-1, 1]:
        for z in [-1, 1]:
            for u in [-1, 1]:
                combs.append((y, z, u))
                p_yu = {
                    (1, 1): p_y1_u1, (1, -1): p_y1_u_1,
                    (-1, 1): p_y_1_u1, (-1, -1): p_y_1_u_1
                }[(y, u)]
                
                p_z_y = p_z1_y1 if y == 1 else p_z1_y_1
                if z == -1: p_z_y = 1 - p_z_y
                
                idx = int(f"{int((y+1)/2)}{int((z+1)/2)}{int((u+1)/2)}", 2)
                probs[idx] = p_yu * p_z_y

    probs /= probs.sum()

    # 2. Define means for the 8 Gaussians
    angles = np.linspace(0, 315, 8)
    means = np.array([[np.cos(np.deg2rad(a)), np.sin(np.deg2rad(a))] for a in angles])
    cov = np.array([[1.0, 0.0], [0.0, 0.5]])

    # 3. Sample (Y, Z, U) and generate X
    indices = np.random.choice(8, size=n_samples, p=probs)
    yzu_samples = np.array(combs)[indices]
    
    X = np.zeros((n_samples, 2))
    for i in range(n_samples):
        mean = means[indices[i]]
        X[i] = np.random.multivariate_normal(mean, cov)

    Y = yzu_samples[:, 0]
    Z = yzu_samples[:, 1]
    U = yzu_samples[:, 2]

    # 4. Apply missingness to Z
    missing_mask = np.random.rand(n_samples) > p_o
    Z[missing_mask] = 0  # 0 indicates missing

    # 5. Convert to tensors
    return (torch.tensor(X, dtype=torch.float32).to(DEVICE),
            torch.tensor(Y, dtype=torch.float32).to(DEVICE),
            torch.tensor(Z, dtype=torch.float32).to(DEVICE),
            torch.tensor(U, dtype=torch.float32).to(DEVICE))

# Test the function
X_sample, Y_sample, Z_sample, U_sample = generate_data(5, 0.5, 0.7)
print("Sample X:\n", X_sample)
print("Sample Y:\n", Y_sample)
print("Sample Z (0=missing):\n", Z_sample)
print("Sample U:\n", U_sample)

Sample X:
 tensor([[ 0.9861,  1.4216],
        [ 0.1262, -1.0785],
        [ 0.1357, -1.3605],
        [-3.3197, -0.0351],
        [ 1.8164, -1.0775]])
Sample Y:
 tensor([-1.,  1.,  1.,  1., -1.])
Sample Z (0=missing):
 tensor([-1.,  1.,  1., -1.,  0.])
Sample U:
 tensor([ 1.,  1.,  1.,  1., -1.])


### 🧱 Architecture

This section defines the architecture of the classification models.

**What:** The `StrategicModel` class implements the core logic for both models. It's a PyTorch `nn.Module`.
**Why:** A single class is used to encapsulate the shared linear layer and the different forward pass logics for Model 1 (complex) and Model 2 (simple). This promotes code reuse.
**How:**
-   **Initialization:** It creates a linear layer for $f_w(x, z) = w_0 + w_1x_1 + w_2x_2 + w_3z$ and a parameter `log_lambda_sq` for the weighted average $g_{w, \lambda}(x)$.
-   **`f_w`:** A helper method that computes the score from the linear layer.
-   **`g_w_lambda`:** Computes the score for unobserved $Z$ by taking a weighted average of $f_w(x, -1)$ and $f_w(x, +1)$:
    $g_{w, \lambda}(x) = \frac{\lambda^2}{1+\lambda^2} f_w(x, -1) + \frac{1}{1+\lambda^2} f_w(x, +1)$
-   **`forward`:** This is the main method.
    -   It first identifies which samples have observed $Z$ and which do not.
    -   For unobserved $Z$, both models compute the score using $g_{w, \lambda}$.
    -   For observed $Z$:
        -   **Model 1 (complex):** If `use_strategic=True`, it applies the soft-max/soft-min logic based on the preference $U$.
        -   **Model 2 (simple):** If `use_strategic=False`, it directly computes the score using $f_w(x, z)$.
    -   It combines the scores from the observed and unobserved batches to produce the final output.

In [9]:
class StrategicModel(nn.Module):
    def __init__(self):
        super(StrategicModel, self).__init__()
        # Linear layer for f_w(x,z) = w0 + w1*x1 + w2*x2 + w3*z
        self.linear = nn.Linear(3, 1)
        # log(lambda^2) for stable optimization
        self.log_lambda_sq = nn.Parameter(torch.zeros(1))

    def f_w(self, x, z):
        # Create input for linear layer
        xz = torch.cat([x, z.unsqueeze(1)], dim=1)
        return self.linear(xz).squeeze(-1)

    def g_w_lambda(self, x):
        lambda_sq = torch.exp(self.log_lambda_sq)
        w_neg = lambda_sq / (1 + lambda_sq)
        w_pos = 1 / (1 + lambda_sq)
        
        f_neg = self.f_w(x, torch.full_like(x[:, 0], -1))
        f_pos = self.f_w(x, torch.full_like(x[:, 0], 1))
        
        return w_neg * f_neg + w_pos * f_pos

    def forward(self, x, z, u, use_strategic=False):
        scores = torch.zeros_like(z)
        
        # Identify observed and unobserved samples
        unobserved_mask = (z == 0)
        observed_mask = ~unobserved_mask

        # --- Unobserved Case ---
        if unobserved_mask.any():
            scores[unobserved_mask] = self.g_w_lambda(x[unobserved_mask])

        # --- Observed Case ---
        if observed_mask.any():
            x_obs, z_obs, u_obs = x[observed_mask], z[observed_mask], u[observed_mask]
            
            if not use_strategic: # Model 2 (Simple)
                scores[observed_mask] = self.f_w(x_obs, z_obs)
            else: # Model 1 (Complex)
                g = self.g_w_lambda(x_obs)
                f = self.f_w(x_obs, z_obs)
                
                exp_g = torch.exp(g / TAU)
                exp_f = torch.exp(f / TAU)
                
                # Soft-max for U = +1
                sm_num = g * exp_g + f * exp_f
                sm_den = exp_g + exp_f
                soft_max = sm_num / sm_den
                
                # Soft-min for U = -1
                exp_neg_g = torch.exp(-g / TAU)
                exp_neg_f = torch.exp(-f / TAU)
                smin_num = -g * exp_neg_g - f * exp_neg_f
                smin_den = exp_neg_g + exp_neg_f
                soft_min = smin_num / smin_den
                
                strat_scores = torch.where(u_obs == 1, soft_max, soft_min)
                scores[observed_mask] = strat_scores
                
        return scores

# Test the model
model_test = StrategicModel().to(DEVICE)
X_sample, _, Z_sample, U_sample = generate_data(5, 0.5, 0.7)
score_simple = model_test(X_sample, Z_sample, U_sample, use_strategic=False)
score_complex = model_test(X_sample, Z_sample, U_sample, use_strategic=True)
print("Simple Model Scores:\n", score_simple)
print("Complex Model Scores:\n", score_complex)

Simple Model Scores:
 tensor([ 1.1631,  0.1373, -0.4911,  0.8696, -0.0535],
       grad_fn=<IndexPutBackward0>)
Complex Model Scores:
 tensor([-1.0864, -0.0606, -0.5496, -0.7929, -0.0051],
       grad_fn=<IndexPutBackward0>)


### 🧪 Experiment

This section contains the functions for training and evaluating the models.

**What:** The `train_model` and `evaluate_model` functions define the experimental loop.
**Why:** Separating the training and evaluation logic into functions makes the main experimental loop cleaner and more readable.
**How:**
-   **`train_model`:**
    -   Takes the model, data, and hyperparameters as input.
    -   Uses the Adam optimizer and logistic loss (`BCEWithLogitsLoss`).
    -   Iterates through the data for a fixed number of epochs, updating the model weights.
-   **`evaluate_model`:**
    -   Takes the trained model and test data as input.
    -   Computes the model's predictions (scores).
    -   Calculates the 0-1 loss (risk) by comparing the sign of the scores to the true labels.
    -   Returns the average risk.

In [10]:
def train_model(model, X_train, Y_train, Z_train, U_train, use_strategic):
    optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
    loss_fn = nn.BCEWithLogitsLoss()
    
    model.train()
    for epoch in range(EPOCHS):
        for i in range(0, len(X_train), BATCH_SIZE):
            # Get batch
            X_batch = X_train[i:i+BATCH_SIZE]
            Y_batch = Y_train[i:i+BATCH_SIZE]
            Z_batch = Z_train[i:i+BATCH_SIZE]
            U_batch = U_train[i:i+BATCH_SIZE]

            # Forward pass
            scores = model(X_batch, Z_batch, U_batch, use_strategic=use_strategic)
            
            # BCEWithLogitsLoss expects target to be in [0,1]
            loss = loss_fn(scores, (Y_batch + 1) / 2)
            
            # Backward pass
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

def evaluate_model(model, X_test, Y_test, Z_test, U_test, use_strategic):
    model.eval()
    with torch.no_grad():
        scores = model(X_test, Z_test, U_test, use_strategic=use_strategic)
        preds = torch.sign(scores)
        # 0-1 loss is the proportion of incorrect predictions
        risk = (preds != Y_test).float().mean()
    return risk.item()

### 🚀 Main Loop

This is the main part of the experiment where we iterate through the different parameter settings.

**What:** This block runs the full experiment by looping over all $\rho$ and $p_o$ values.
**Why:** To systematically collect the performance data for both models under all specified conditions.
**How:**
1.  **Initialization:** A dictionary `results` is created to store the risks.
2.  **Outer Loop ($\rho$):** It iterates through each correlation value $\rho$.
3.  **Inner Loop ($p_o$):** For each $\rho$, it iterates through each observability probability $p_o$.
4.  **Data Generation:** It generates fresh training and testing data for the current $(\rho, p_o)$ pair.
5.  **Model Training & Evaluation:**
    -   It initializes, trains, and evaluates Model 1 (complex).
    -   It initializes, trains, and evaluates Model 2 (simple).
6.  **Store Results:** The calculated risks for both models are stored in the `results` dictionary.
7.  **Progress Bar:** `tqdm` is used to show a progress bar for the outer loop.

In [11]:
results = {}

for rho in tqdm(RHO_VALUES, desc="Rho Loop"):
    results[rho] = {}
    for p_o in P_O_VALUES:
        # Generate data
        X_train, Y_train, Z_train, U_train = generate_data(N_TRAIN, rho, p_o)
        X_test, Y_test, Z_test, U_test = generate_data(N_TEST, rho, p_o)

        # --- Model 1 (Complex) ---
        model1 = StrategicModel().to(DEVICE)
        train_model(model1, X_train, Y_train, Z_train, U_train, use_strategic=True)
        risk1 = evaluate_model(model1, X_test, Y_test, Z_test, U_test, use_strategic=True)

        # --- Model 2 (Simple) ---
        model2 = StrategicModel().to(DEVICE)
        train_model(model2, X_train, Y_train, Z_train, U_train, use_strategic=False)
        risk2 = evaluate_model(model2, X_test, Y_test, Z_test, U_test, use_strategic=False)
        
        results[rho][p_o] = (risk1, risk2)

print("Experiment finished!")

Rho Loop:   0%|          | 0/8 [00:00<?, ?it/s]

Experiment finished!


### 🔍 Inspection & Visualization Block

This block is for analyzing and visualizing the outcomes of the experiment.

**What:** The code below processes the `results` dictionary into a styled pandas DataFrame.
**Why:** A formatted table is an effective way to present the comparative performance of the two models across the different experimental conditions. Highlighting helps to quickly identify where the complex model provides an advantage.
**How:**
-   It first restructures the `results` dictionary into a format suitable for a DataFrame.
-   It creates a pandas DataFrame where rows correspond to $p_o$ values and columns correspond to $\rho$ values.
-   A custom styling function `highlight_winner` is defined. This function checks if Model 1's risk is less than Model 2's risk in each cell and applies a background color if it is.
-   The `df.style.apply` method is used to apply this styling to the entire DataFrame before displaying it.

In [12]:
# Prepare data for DataFrame
data_for_df = {}
for rho, p_o_results in results.items():
    col_name = f"rho={rho:.2f}"
    data_for_df[col_name] = [f"M1: {r1:.2f} | M2: {r2:.2f}" for r1, r2 in p_o_results.values()]

df = pd.DataFrame(data_for_df, index=[f"{p_o:.2f}" for p_o in P_O_VALUES])
df.index.name = "p_o"

# Styling function
def highlight_winner(cell_value):
    parts = cell_value.replace("M1:", "").replace("M2:", "").split("|")
    try:
        risk1 = float(parts[0].strip())
        risk2 = float(parts[1].strip())
        color = 'lightblue' if risk1 < risk2 else ''
        return f'background-color: {color}'
    except (ValueError, IndexError):
        return ''

styled_df = df.style.applymap(highlight_winner)

print("--- Results Table ---")
print("Cells are highlighted where Model 1 (Complex) outperforms Model 2 (Simple).")
display(styled_df)

--- Results Table ---
Cells are highlighted where Model 1 (Complex) outperforms Model 2 (Simple).


  styled_df = df.style.applymap(highlight_winner)


Unnamed: 0_level_0,rho=0.00,rho=0.14,rho=0.29,rho=0.43,rho=0.57,rho=0.71,rho=0.86,rho=1.00
p_o,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0.0,M1: 0.19 | M2: 0.19,M1: 0.21 | M2: 0.21,M1: 0.20 | M2: 0.20,M1: 0.22 | M2: 0.22,M1: 0.24 | M2: 0.24,M1: 0.24 | M2: 0.24,M1: 0.25 | M2: 0.25,M1: 0.28 | M2: 0.28
0.14,M1: 0.21 | M2: 0.18,M1: 0.23 | M2: 0.20,M1: 0.23 | M2: 0.21,M1: 0.25 | M2: 0.22,M1: 0.25 | M2: 0.22,M1: 0.26 | M2: 0.23,M1: 0.27 | M2: 0.22,M1: 0.29 | M2: 0.23
0.29,M1: 0.23 | M2: 0.18,M1: 0.25 | M2: 0.21,M1: 0.25 | M2: 0.20,M1: 0.28 | M2: 0.20,M1: 0.30 | M2: 0.21,M1: 0.28 | M2: 0.22,M1: 0.30 | M2: 0.21,M1: 0.27 | M2: 0.23
0.43,M1: 0.24 | M2: 0.18,M1: 0.26 | M2: 0.19,M1: 0.26 | M2: 0.19,M1: 0.28 | M2: 0.19,M1: 0.31 | M2: 0.19,M1: 0.29 | M2: 0.19,M1: 0.29 | M2: 0.18,M1: 0.25 | M2: 0.20
0.57,M1: 0.27 | M2: 0.17,M1: 0.26 | M2: 0.18,M1: 0.29 | M2: 0.18,M1: 0.31 | M2: 0.20,M1: 0.29 | M2: 0.15,M1: 0.30 | M2: 0.18,M1: 0.25 | M2: 0.18,M1: 0.23 | M2: 0.19
0.71,M1: 0.23 | M2: 0.15,M1: 0.37 | M2: 0.15,M1: 0.28 | M2: 0.15,M1: 0.29 | M2: 0.16,M1: 0.33 | M2: 0.17,M1: 0.25 | M2: 0.17,M1: 0.20 | M2: 0.18,M1: 0.14 | M2: 0.17
0.86,M1: 0.37 | M2: 0.13,M1: 0.29 | M2: 0.13,M1: 0.29 | M2: 0.15,M1: 0.30 | M2: 0.16,M1: 0.27 | M2: 0.15,M1: 0.23 | M2: 0.16,M1: 0.13 | M2: 0.17,M1: 0.06 | M2: 0.14
1.0,M1: 0.24 | M2: 0.13,M1: 0.25 | M2: 0.12,M1: 0.28 | M2: 0.11,M1: 0.27 | M2: 0.14,M1: 0.24 | M2: 0.13,M1: 0.15 | M2: 0.16,M1: 0.08 | M2: 0.13,M1: 0.00 | M2: 0.13


# 📊 Discussion

The experiment compared two models for binary classification in the presence of randomly missing side information ($Z$), where a user's preference ($U$) is always known. The key difference was how they handled observed $Z$: Model 1 used a "strategic" soft-max/min approach guided by $U$, while Model 2 used a standard linear classifier.

**Summary of Results:**

The results table shows the 0-1 risk for both models across different values of $\rho$ (Y-U correlation) and $p_o$ (Z observability).

-   **Effect of $\rho$:** As $\rho$ increases, the performance of both models tends to improve. This is expected, as a higher correlation between the user's preference $U$ and the true label $Y$ provides a stronger signal for classification.
-   **Effect of $p_o$:** As $p_o$ increases, the risk generally decreases for both models. This is also expected, as more frequent observation of the informative feature $Z$ leads to better predictions.
-   **Model Comparison:** The highlighting in the table indicates where Model 1 (the complex, strategic model) outperforms Model 2.
    -   When $\rho$ is low (0.0), the two models perform very similarly. The strategic component of Model 1 provides little to no advantage because the preference $U$ is not aligned with the true label $Y$.
    -   As $\rho$ increases, Model 1 starts to consistently outperform Model 2, especially when $p_o$ is in the mid-to-high range. This is the key finding: **the strategic approach is most beneficial when the user's preference is a reliable indicator of the true outcome, and the side information $Z$ is frequently available.**

**Interpretation:**

The strategic component of Model 1 is designed to hedge its bets. When $U$ is aligned with $Y$, the soft-max/min logic allows the model to cautiously trust the preference $U$ to disambiguate the situation, leading to a lower risk. However, this mechanism is only effective if $U$ is actually informative (high $\rho$) and if there is an observed $Z$ to be strategic about in the first place (non-zero $p_o$).

**Strengths and Limitations:**

-   **Strengths:** The experiment provides a clear, controlled comparison demonstrating the conditions under which a strategic classification model can be advantageous. The setup is reproducible and the results are easy to interpret.
-   **Limitations:** The dataset is synthetic and based on strong distributional assumptions (e.g., Gaussian features). The performance in real-world scenarios might differ. The hyperparameter $\tau$ for the soft-max/min was fixed; tuning it could potentially alter the results. Finally, the experiment only considers one type of strategic behavior.