This notebook contain the training using the library tabnet and the neural network with lightning_torch

In [35]:
import torch

In [36]:
# load data
X_train=torch.load('train_inputs')
y_train=torch.load('train_outputs')

X_val=torch.load('val_inputs')
y_val=torch.load('val_outputs')

X_test=torch.load('test_inputs')
y_test=torch.load('test_outputs')


In [37]:
y_test


tensor([6.4586, 6.5430, 6.5545,  ..., 6.3887, 6.3819, 7.0265])

In [38]:
from torch.utils.data import DataLoader, TensorDataset

# Detach tensors to avoid DataLoader serialization issues
X_train_detached = X_train.detach()
y_train_detached = y_train.detach()
X_test_detached = X_test.detach()
y_test_detached = y_test.detach()
X_val_detached = X_val.detach()
y_val_detached = y_val.detach()

# transformation to tensor dataset
train_dataset = TensorDataset(X_train_detached, y_train_detached)
test_dataset = TensorDataset(X_test_detached, y_test_detached)
val_dataset = TensorDataset(X_val_detached, y_val_detached)

# Dataloader for batching
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4,persistent_workers=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=4,persistent_workers=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=4,persistent_workers=True)

In [39]:
# Example of iterating through the DataLoader
ex_X, ex_y = next(iter(train_loader))
ex_X.shape, ex_y.shape, ex_X, ex_y

(torch.Size([32, 44]),
 torch.Size([32]),
 tensor([[ 4.0000,  3.0000, -0.6466,  ...,  0.7807, -0.3420, -1.2116],
         [ 5.0000,  2.0000, -0.8801,  ...,  0.3436,  0.2235, -0.0520],
         [ 2.0000,  5.0000,  0.6191,  ..., -0.6025,  0.0446,  0.6243],
         ...,
         [ 4.0000,  4.0000, -0.1374,  ...,  0.4495,  0.7099, -0.1533],
         [ 1.0000,  9.0000, -0.5541,  ..., -1.9610, -2.2232,  1.6628],
         [ 2.0000,  5.0000,  0.3740,  ..., -0.7541,  0.3627, -0.9050]]),
 tensor([7.1826, 7.1765, 7.9275, 6.1079, 6.9360, 7.3812, 6.8740, 7.9122, 7.0419,
         6.4387, 6.3850, 7.9384, 7.7129, 6.8277, 7.9026, 6.4268, 6.5961, 6.1645,
         5.2566, 5.5081, 7.4772, 7.4965, 6.3104, 6.0463, 7.3193, 7.1586, 6.2153,
         6.0908, 7.0439, 6.7717, 6.1346, 6.3696]))

In [40]:
# TabNet
from pytorch_tabnet.tab_model import TabNetRegressor

# Initialize TabNetRegressor
tab_net = TabNetRegressor()



In [41]:
# Fit the model
tab_net.fit(
    X_train=X_train_detached.numpy(), y_train=y_train_detached.numpy().reshape(-1, 1),# reshape for single output
    eval_set=[(X_val_detached.numpy(), y_val_detached.numpy().reshape(-1, 1))],# reshape for single output
    eval_name=['val'],# name of the eval set
    eval_metric=['rmse'],# metrics to be evaluated
    max_epochs=10,# maximum number of epochs
    patience=3,# early stopping patience
    batch_size=1024, virtual_batch_size=128,# batch size and virtual batch size
    num_workers=4,# number of workers for data loading
    drop_last=False# whether to drop the last incomplete batch
)


epoch 0  | loss: 0.699   | val_rmse: 0.09278 |  0:01:05s
epoch 1  | loss: 0.01409 | val_rmse: 0.07995 |  0:02:09s
epoch 2  | loss: 0.01049 | val_rmse: 0.07148 |  0:03:12s
epoch 3  | loss: 0.00987 | val_rmse: 0.07771 |  0:04:17s
epoch 4  | loss: 0.01024 | val_rmse: 0.07504 |  0:05:19s
epoch 5  | loss: 0.00894 | val_rmse: 0.07347 |  0:06:22s

Early stopping occurred at epoch 5 with best_epoch = 2 and best_val_rmse = 0.07148




In [42]:
# Evaluate the model
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

preds = tab_net.predict(X_test_detached.numpy())
mean_squared_error(y_test_detached.numpy(), preds), mean_absolute_error(y_test_detached.numpy(), preds), r2_score(y_test_detached.numpy(), preds)
# (0.002001001499593258, 0.03651253506541252, 0.9957746863365173)

(0.005190330091863871, 0.057450033724308014, 0.9890401363372803)

In [43]:
# Neural Network with Lightning
from lightning import Trainer
from torch import nn
from lightning.pytorch.callbacks.early_stopping import EarlyStopping
from lightning.pytorch import LightningModule
from lightning.pytorch.loggers import TensorBoardLogger
from torchmetrics.regression import  MeanSquaredError, MeanAbsoluteError, R2Score

In [44]:
# check if GPU is available
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [51]:
# Define the model
class LitModel(LightningModule):
    def __init__(self):
        super(LitModel, self).__init__()
        
        # Define the layers of the neural network
        self.layer1 = nn.Sequential(nn.Linear(X_train_detached.shape[1], 128), nn.ReLU())
        self.layer2 = nn.Sequential(nn.Linear(128, 64), nn.ReLU())
        self.layer3 = nn.Sequential(nn.Linear(64, 32), nn.ReLU())
        self.layer4 = nn.Linear(32, 1)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        return x

    # Training step
    def training_step(self, batch, batch_idx):
        x, y = batch
        y = y.view(-1, 1) # reshape for single output
        logits = self(x)
        loss = nn.functional.mse_loss(logits, y)
        preds = logits.detach() # detach to avoid tracking in autograd
        target = y.detach() # detach to avoid tracking in autograd

        self.log("train_mse",  MeanSquaredError().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True) # log MSE
        self.log("train_mae",  MeanAbsoluteError().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True) # log MAE
        self.log("train_r2",  R2Score().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True) # log R2
        return loss
    
    # Validation step
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y = y.view(-1, 1)
        logits = self(x)
        loss = nn.functional.mse_loss(logits, y)
        preds = logits.detach()
        target = y.detach()
        self.log("val_mse",  MeanSquaredError().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True)
        self.log("val_mae",  MeanAbsoluteError().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True)
        self.log("val_r2",  R2Score().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True)
        return loss
    
    # Test step
    def test_step(self, batch, batch_idx):
        x, y = batch
        y = y.view(-1, 1)
        logits = self(x)
        loss = nn.functional.mse_loss(logits, y)
        preds = logits.detach()
        target = y.detach()
        self.log("test_mse",  MeanSquaredError().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True)
        self.log("test_mae",  MeanAbsoluteError().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True)
        self.log("test_r2",  R2Score().to(device)(preds, target), prog_bar=True, on_step=False, on_epoch=True)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3) # Adam optimizer
        scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5) # Learning rate scheduler
        return [optimizer], [scheduler]


In [52]:
# Initialize the model
model = LitModel().to('cuda')

# Initialize the trainer
trainer = Trainer(
    accelerator="gpu",
    # devices=0,
    max_epochs=10,
    callbacks=[EarlyStopping(monitor="val_mse", patience=3)], # early stopping callback
    logger=TensorBoardLogger("logs/") # log to TensorBoard
)


💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


In [53]:
model.device

device(type='cuda', index=0)

In [54]:
# Train the model
trainer.fit(model, train_loader, val_loader)

# Test the model
trainer.test(model, test_loader)

# Validate the model
trainer.validate(model, val_loader)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name   | Type       | Params | Mode 
----------------------------------------------
0 | layer1 | Sequential | 5.8 K  | train
1 | layer2 | Sequential | 8.3 K  | train
2 | layer3 | Sequential | 2.1 K  | train
3 | layer4 | Linear     | 33     | train
----------------------------------------------
16.1 K    Trainable params
0         Non-trainable params
16.1 K    Total params
0.065     Total estimated model params size (MB)
10        Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=10` reached.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: |          | 0/? [00:00<?, ?it/s]

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Validation: |          | 0/? [00:00<?, ?it/s]

[{'val_mse': 0.0016909410478547215,
  'val_mae': 0.03360701724886894,
  'val_r2': 0.9960504174232483}]

In [59]:
%reload_ext tensorboard
%tensorboard --logdir logs/


Reusing TensorBoard on port 6007 (pid 4468), started 0:00:16 ago. (Use '!kill 4468' to kill it.)

In [68]:
p = trainer.predict(model, ex_X.to(device))
torch.tensor(p).device

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: |          | 0/? [00:00<?, ?it/s]

device(type='cpu')

In [83]:
for p in pred:
    print(p)

3099002.2
3564687.8
3308868.8
53907584.0
838995.2
9060325.0
2265644.2
1533385.4
10537047.0
8630332.0
83543140.0
33295760.0
3975252.8
291948.53
601938.3
14978192.0
28004354.0
120472990.0
36804290.0
5837697.0
2177140.0
30120740.0
12189962.0
3487022.2
452428.56
7284201.0
4716561.0
1287626.4
11724625.0
2129782.8
216250.97
294929340.0


In [85]:
import pandas as pd
import numpy as np

# prediction
ex_X, ex_y = next(iter(test_loader))
model.eval()
preds = trainer.predict(model, ex_X.to(device))
pred=np.power(10, torch.tensor(preds).cpu().detach().numpy().flatten()) 
y_true=np.power(10, ex_y.numpy().flatten())
dict_preds = {'preds': [float(p) for p in pred], 'target': [float(t) for t in y_true]} # convert to dataframe
pd.DataFrame(dict_preds).head()


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: |          | 0/? [00:00<?, ?it/s]

Unnamed: 0,preds,target
0,3099002.0,2875075.0
1,3564688.0,3491493.0
2,3308869.0,3585000.0
3,53907580.0,57611980.0
4,838995.2,852313.2


# Conclusion:

## Result

 we get very good result with tabnet(mean_squared_error: 0.002001001499593258, mean_absolute_error: 0.03651253506541252, r2_score: 0.9957746863365173) and neural network. In part EDA we see very bad correlation  columns (heatmap just see linear correlation) but good correlation between budget columns then it's normal to see very good result because the model will just use  these columns

## little explication

In [None]:
Le fait qu’il y ait une faible corrélation linéaire entre les colonnes (features) et la target n’empêche pas forcément un modèle (surtout un réseau de neurones) de bien performer :

1. Corrélation ≠ Relation réelle

La corrélation simple (Pearson) ne capture que les relations linéaires.

Si la relation est non linéaire (par exemple quadratique, logarithmique, interaction entre variables…), la corrélation peut être proche de 0, mais le modèle arrive quand même à exploiter cette structure.

Exemple :

𝑦=𝑥2

Si 𝑥 est centré autour de 0, la corrélation entre 𝑥 et 𝑦 est très faible voire nulle… mais une régression non linéaire apprend facilement la relation.

2. Puissance des modèles complexes

Les réseaux de neurones (même petits) capturent des interactions complexes entre variables, là où un calcul de corrélation simple passe à côté.

Cela explique pourquoi tu peux avoir R² ou accuracy élevés malgré des colonnes avec très peu de corrélation.

3. Multivarié ≠ univarié

Chaque colonne peut avoir une corrélation faible avec la target.

Mais plusieurs colonnes combinées peuvent contenir une information forte.

C’est justement le rôle du modèle d’assembler ces signaux faibles.

✅ En pratique :

Une faible corrélation n’est pas un problème si ton modèle apprend bien (bonne loss, bon R², pas de surapprentissage).

The fact that there is a low linear correlation between the columns (features) and the Target does not necessarily prevent a model (especially a network of neurons) from performing well:

1. Correlation ≠ real relationship

Simple correlation (Pearson) only captures linear relationships.

If the relationship is non -linear (for example quadratic, logarithmic, interaction between variables, etc.), the correlation can be close to 0, but the model still manages to exploit this structure.

Example :

𝑦 = 𝑥2

If 𝑥 is centered around 0, the correlation between 𝑥 and 𝑦 is very low or even zero ... but a non -linear regression easily learns the relationship.

2. Power of complex models

The neural networks (even small) capture complex interactions between variables, where a simple correlation calculation passes next.

This explains why you can have r² or high battery despite columns with very little correlation.

3. Multivarié ≠ univarié

Each column can have a weak correlation with the Target.

But several combined columns may contain strong information.

This is precisely the role of the model to assemble these weak signals.

✅ In practice:

A low correlation is not a problem if your model learns well (good loss, good R², no over -appreation).