# Click Dataset - NODE 8 Layers Baseline

This notebook runs the Click dataset (2012 KDD Cup) with an 8-layer NODE model.

**Dataset Info:**
- Size: ~800K training samples, 200K test samples
- Features: 11 features (mix of categorical and numerical)
- Task: Binary classification (click prediction)
- Source: 2012 KDD Cup competition

**Model Configuration:**
- 8 layers, 128 trees per layer
- Tree depth: 6
- Tree dimension: 3
- Choice function: entmax15
- Binary function: entmoid15


In [None]:
import os
import sys
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F

# Add parent directory to path
sys.path.insert(0, '..')
import lib
from qhoptim.pyt import QHAdam

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name()}")
    print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")


In [None]:
# Set device and memory optimization
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

if torch.cuda.is_available():
    os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
    torch.cuda.empty_cache()


## Data Loading

The Click dataset will be automatically downloaded if not present. It includes categorical encoding using LeaveOneOutEncoder.


In [None]:
# Load Click data using lib.Dataset
print("Loading Click dataset...")
data = lib.Dataset(
    'CLICK',
    random_state=1337,
    quantile_transform=True,
    quantile_noise=1e-3,
    valid_size=100_000,
    validation_seed=1337
)

print(f"Training set: {data.X_train.shape}")
print(f"Validation set: {data.X_valid.shape}")
print(f"Test set: {data.X_test.shape}")
print(f"Feature dimension: {data.X_train.shape[1]}")
print(f"Class distribution (train): {np.bincount(data.y_train)}")
print(f"Class distribution (valid): {np.bincount(data.y_valid)}")
print(f"Class distribution (test): {np.bincount(data.y_test)}")

in_features = data.X_train.shape[1]
