# Dieting Network

## Problem statement

This neural network below wants to lose weight and go on a diet.

```
NN: I want to slim down
You: ... what?
NN: I want to have less weight! I think the less numbers I store the better.
You: (trying to think)
You: I suppose you can try sparsity? The more weights you have that are zero, then the less numbers you have to save to disk. But you must save your weights as a sparse tensor though. By default, network weights are saved as dense tensors so even if you have zero as a number, it will still take up space
NN: (scrolling on phone)
NN: I'm reading this tiktok that says Python stores integers as as objects, and so when I have one million copies of the same integer, I'm actually only using one number.
You: (thinking)
You: Uhhh
You: I feel like there are many other things that need to be done properly for your statement to stand! But most importantly though
You: How are you going to function if you practically have no weights?
NN: Well, here's where you come in to help me out!
You: Hm, but I don't know how!
NN: Hahaha
NN: You don't know how **yet!**
You: ...
You: This is so strange.
Narrator: _This is not even the strangest problem in Selection 2_
```

Do what you can to make the network below perform well when all of its weights are frozen to unity with no biases (i.e. all weights = 1 and all biases = 0). You may only adjust the activation functions. Implement whatever activation function you want!

You are given:
- a baseline network below where you are to modify `self.act1`, `self.act2` and `self.act3`
- tensors `X_train`, `y_train`, `X_val`, `y_val`, `X_test` to act as your training data, validation data and test data
- helper code to submit your predictions on `X_test` for scoring!

The following restrictions apply:

- Each activation function may contain a small amount of parameters, but have to be less than or equal to 5
- Each activation function shall be stateless during inferencing, i.e. the activation function should return the exact same answer when provided the exact same input. Inputs from a previous iteration should not affect outputs of the current iteration.

Scores shall be awarded as follows:
- 1 pt for explaining the reasoning of your approach in this notebook
- 1 pt for scoring R2 >= 0.25 while using activations that fulfill the restrictions above.
    - Submit your predictions on `X_test` via API submission. You will be scored by `sklearn.metrics.r2_score`. See example below.
    - Your score column for this problem is NOT your aggregate score, but just your R2 score on the test set!
- Additional 0 - 3 pts to be assigned based on this formula: `(Your R2 score - baseline score) / (Benchmark score - baseline score) x 3 pts`, where:
    - Benchmark score is the highest scoring R2 achieved by all participants in this problem
    - Baseline score is 0.25 R2, by default. If the lowest scoring R2 by all participants exceeds 0.25 R2, the baseline score will be set as that instead
    - e.g. max R2 score achieved is 0.5, while min R2 score achieved is 0.3. If your score is 0.4, you get (0.4 - 0.3)/(0.5 - 0.3) x 3 = 1.5 pts

## Setup

In [None]:
import requests

auth_token = "maio_cf75a54b_d24c_4ca3_b9c2_4b90d9d61d6f"

def make_payload(x):
    # This will convert your torch tensor / numpy array
    # into a list of floats
    return {"solution": x.flatten().tolist()}

url = "https://competitions.aiolympiad.my/api/selection2_2025_day2/selection2_2025_dieting_network"

def post_answer(data: dict):
    response = requests.post(url=url, json=data, headers={"X-API-Key": auth_token})
    if response.status_code == 200:
        return response.json()
    else:
        return f"Failed to submit, status code is {response.status_code}\n{response.text}"

In [None]:
import requests

url = "https://competitions.aiolympiad.my/api/selection2_2025_day2/selection2_2025_dieting_network"

def post_answer(data: dict):
    response = requests.post(url=url, json=data, headers={"X-API-Key": auth_token})
    if response.status_code == 200:
        return response.json()
    else:
        return f"Failed to submit, status code is {response.status_code}\n{response.text}"

## Datasets

In [1]:
!curl https://storage.googleapis.com/aiolympiadmy/ioai-2025-tsp/ioai2025_tsp_selection2/dieting_network/X_train.pt -o X_train.pt
!curl https://storage.googleapis.com/aiolympiadmy/ioai-2025-tsp/ioai2025_tsp_selection2/dieting_network/X_val.pt -o X_val.pt
!curl https://storage.googleapis.com/aiolympiadmy/ioai-2025-tsp/ioai2025_tsp_selection2/dieting_network/X_test.pt -o X_test.pt
!curl https://storage.googleapis.com/aiolympiadmy/ioai-2025-tsp/ioai2025_tsp_selection2/dieting_network/y_train.pt -o y_train.pt
!curl https://storage.googleapis.com/aiolympiadmy/ioai-2025-tsp/ioai2025_tsp_selection2/dieting_network/y_val.pt -o y_val.pt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 65180  100 65180    0     0   155k      0 --:--:-- --:--:-- --:--:--  156k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 65180  100 65180    0     0   349k      0 --:--:-- --:--:-- --:--:--  355k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 65180  100 65180    0     0   221k      0 --:

In [None]:
import torch
import torch.nn as nn
from sklearn.metrics import r2_score

with open("X_train.pt", "rb") as f: X_train = torch.load(f)
with open("y_train.pt", "rb") as f: y_train = torch.load(f)
with open("X_val.pt", "rb") as f: X_val = torch.load(f)
with open("y_val.pt", "rb") as f: y_val = torch.load(f)
with open("X_test.pt", "rb") as f: X_test = torch.load(f)

## Baseline network

In [None]:
class FixedLinear(nn.Module):
    """
    Similar to a nn.Linear layer, just not trainable
    """

    def __init__(self, in_features, out_features):
        super().__init__()
        self.weight = torch.ones(out_features, in_features)

    def forward(self, x):
        return torch.mm(x, self.weight.t())

In [None]:
class DietNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = FixedLinear(8, 5)
        self.act1 = nn.Identity() # <- Replace me!
        self.layer2 = FixedLinear(5, 5)
        self.act2 = nn.Identity() # <- Replace me!
        self.layer3 = FixedLinear(5, 5)
        self.act3 = nn.Identity() # <- Replace me!
        self.layer4 = FixedLinear(5, 1)

    def forward(self, x):
        x = self.layer1(x)
        x = self.act1(x)
        x = self.layer2(x)
        x = self.act2(x)
        x = self.layer3(x)
        x = self.act3(x)
        x = self.layer4(x)
        return x

In [None]:
model = DietNetwork()
model.eval();

In [None]:
criterion = nn.MSELoss()

In [None]:
with torch.no_grad():
    y_train_pred = model(X_train)
    y_val_pred = model(X_val)
    
    train_r2 = r2_score(y_train, y_train_pred)
    val_r2 = r2_score(y_val, y_val_pred)
    
    print(
        f"train / val R2: {train_r2:.4f} / {val_r2:.4f}"
    )

In [None]:
with torch.no_grad():
    y_test_pred = model(X_test)

In [None]:
# Should be (400, 1)
y_test_pred.shape

In [None]:
# As long as your y_test_pred follows the format of this baseline output,
# your API submission will work!
post_answer(make_payload(y_test_pred))

## Your work below

In [None]:
class FixedLinear(nn.Module):
    """
    Similar to a nn.Linear layer, just not trainable
    """

    def __init__(self, in_features, out_features):
        super().__init__()
        self.weight = torch.ones(out_features, in_features)

    def forward(self, x):
        return torch.mm(x, self.weight.t())
    
class PolyAct(nn.Module):
    def __init__(self, coefficients):
        super().__init__()
        # Ensure coefficients are a tensor and have 5 elements for degree-4 polynomial
        self.coefficients = torch.tensor(coefficients, dtype=torch.float32)

    def forward(self, x):
        # Input x shape: (batch_size, 5), where all 5 elements per sample are identical (sum of inputs)
        # Take the first element of each sample as s
        s = x[:, 0:1]  # Shape: (batch_size, 1)
        # Compute powers s^0, s^1, s^2, s^3, s^4
        powers = torch.cat([s**i for i in range(5)], dim=1)  # Shape: (batch_size, 5)
        # Apply coefficients to each power
        out = powers * self.coefficients  # Shape: (batch_size, 5)
        return out

In [None]:
class DietNetwork(nn.Module):
    def __init__(self, coefficients):
        super().__init__()
        self.layer1 = FixedLinear(8, 5)

        self.act1 = PolyAct(coefficients) # <- Replace me!

        self.layer2 = FixedLinear(5, 5)

        self.act2 = nn.Identity() # <- Replace me!

        self.layer3 = FixedLinear(5, 5)

        self.act3 = nn.Identity() # <- Replace me!

        self.layer4 = FixedLinear(5, 1)

    def forward(self, x):
        x = self.layer1(x)
        x = self.act1(x)
        x = self.layer2(x)
        x = self.act2(x)
        x = self.layer3(x)
        x = self.act3(x)        
        x = self.layer4(x)
        return x

In [None]:
s_train = X_train.sum(dim=1, keepdim=True).numpy()
y_train_scaled = y_train.numpy() / 25.0

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

poly_features = PolynomialFeatures(degree=4, include_bias=True)
s_train_poly = poly_features.fit_transform(s_train)

reg = LinearRegression(fit_intercept=False)
reg.fit(s_train_poly, y_train_scaled)
coefficients = reg.coef_.flatten()

In [None]:
model = DietNetwork(coefficients)
model.eval()

DietNetwork(
  (layer1): FixedLinear()
  (act1): PolyAct()
  (layer2): FixedLinear()
  (act2): Identity()
  (layer3): FixedLinear()
  (act3): Identity()
  (layer4): FixedLinear()
)

In [None]:
with torch.no_grad():
    y_train_pred = model(X_train)
    y_val_pred = model(X_val)
    
    train_r2 = r2_score(y_train, y_train_pred)
    val_r2 = r2_score(y_val, y_val_pred)
    
    print(
        f"train / val R2: {train_r2:.4f} / {val_r2:.4f}"
    )

train / val R2: 0.7375 / 0.7630


In [None]:
with torch.no_grad():
    y_test_pred = model(X_test)

In [None]:
# Should be (400, 1)
y_test_pred.shape

torch.Size([400, 1])

In [None]:
post_answer(make_payload(y_test_pred))

{'status': 'SUCCESS',
 'message': 'Answer for challenge selection2_2025_dieting_network submitted successfully on 2025-06-14 10:47:44.817791+00:00. Total submissions is 13 / 100.'}

# What I did is that since there are only 5 trainable parameters, we can use a 4-th order polynomial fit.