<p align="center">
    <img src="https://upload.wikimedia.org/wikipedia/commons/7/74/Logo_%C3%89cole_normale_sup%C3%A9rieure_-_PSL_%28ENS-PSL%29.svg"
             alt="ENS-PSL"
             width="475"
             style="margin-right: 30px; display: inline-block; vertical-align: middle;"/>
    <img src="https://challengedata.ens.fr/logo/public/CFM_CoRGB_300dpi_Tight_box_Er2kNvB.png"
             alt="Crédit Agricole Assurances"
             width="260"
             style="display: inline-block; vertical-align: middle;"/>
</p>

# Capital Fund Management - Intraday Price Path Classification
**Predicting Afternoon (14h - 16h) Price Direction from Morning Intraday Trajectories**

## Data Challenge 
**Powered by ENS** 

<h3><span style="color:#800000;"><strong>Authored by:</strong> <em>Alexandre Mathias DONNAT, Sr</em></span></h3>

**Curently ranked 139/260** on *https://challengedata.ens.fr/challenges/84*

This notebook presents a gradient-boosting framework for predicting afternoon price direction from morning intraday paths of anonymous equities.

The objective is to classify each 09:30–14:00 price trajectory into one of three classes (strong down / flat / strong up), using only the 53 five-minute returns observed before 14:00.

The dataset is unusually large for a tabular intraday challenge: both the training set and the test set contain nearly 850,000 rows, each corresponding to a different (day, equity) scenario. Memory efficiency and feature-engineering discipline are therefore essential.

Each training sample consists of:

- A unique (day, equity) pair — both anonymized, and never overlapping between train and test,
- A sequence of 53 intraday returns **r0…r52** in basis points, describing the price path from 09:30 to 14:00,
- A target label **reod** ∈ {−1, 0, 1} encoding the direction and magnitude of the subsequent 14:00–16:00 return:
    - **−1** = strong negative move (< −25 bps)
    - **0** = small / flat move (between −25 and +25 bps)
    - **+1** = strong positive move (> +25 bps)

The challenge is therefore to infer the future two-hour movement using only the shape, volatility, and micro-structure patterns of the morning trajectory.


## Understanding the modeling problem

Each row (each ID) corresponds to:

- One stock on one trading day,
- A contiguous intraday path of 53 five-minute returns from 09:30 to 14:00,
- A 3-class label describing how that same stock moved between 14:00 and 16:00.

**Our task is:**

Given the 09:30–14:00 return path, predict whether the 14:00–16:00 move will be strongly down, flat, or strongly up.

Several factors make this prediction intrinsically difficult:

- Afternoon returns are small, noisy, and weakly autocorrelated.
- Crucial price drivers (news releases, liquidity shocks, order-book imbalances) are absent from the dataset.
- The competition enforces strict generalisation:
    - train and test contain different days,
    - train and test contain different equities,
    - → the model cannot memorize anything — it must capture generic intraday patterns only.

The signal-to-noise ratio is extremely low, meaning even strong models typically peak around 40–42% accuracy.

Yet, intraday trajectories still carry exploitable statistical fingerprints:

- **Directional imbalance** (`ret_sum`, net bias of positive vs negative returns),
- **Intraday volatility profile** (`ret_std`, count of large moves),
- **Shock frequency** (`big_move_count_25` / `50`),
- Cases where strong morning trends extend vs mean-revert in the afternoon.


## Data description

Three CSV files are provided:

#### 1) `x_train.csv` – Intraday features (inputs)

Each row corresponds to one (day, equity) observation, uniquely identified by an ID. Main columns:

- **ID** – unique row identifier,
- **day** – day identifier (no overlap between train and test),
- **equity** – stock identifier (also disjoint between train and test),
- **r0 … r52** – 53 intraday returns in basis points:

$$r_t = \frac{P_{t+5\text{min}} - P_t}{P_t} \times 10^4$$

These 53 returns encode the morning price trajectory from 09:30 to 14:00 for each stock and each day.



In [None]:
import pandas as pd
df = pd.read_csv("x_train.csv")
df.head()

Unnamed: 0,ID,day,equity,r0,r1,r2,r3,r4,r5,r6,...,r43,r44,r45,r46,r47,r48,r49,r50,r51,r52
0,0,249,1488,0.0,,,,0.0,,,...,0.0,0.0,,0.0,,0.0,,,,0.0
1,1,272,107,-9.76,0.0,-12.21,46.44,34.08,0.0,41.24,...,-4.83,-16.92,-4.84,4.84,0.0,7.26,-9.68,-19.38,9.71,26.68
2,2,323,1063,49.85,0.0,0.0,-26.64,-23.66,-22.14,49.12,...,-6.37,1.59,6.37,-49.32,-9.59,-6.4,22.41,-6.39,7.99,15.96
3,3,302,513,0.0,,0.0,0.0,0.0,,,...,,,,,,,,,0.0,
4,4,123,1465,-123.84,-115.18,-26.44,0.0,42.42,10.56,0.0,...,-5.36,-21.44,-21.48,10.78,-21.55,-5.4,-10.81,5.41,-32.47,43.43


#### 2) `y_train.csv` – Target labels

For each ID in x_train.csv, this file provides the target:

- **ID** – matches the same ID as in x_train.csv,
- **reod** – integer in {−1, 0, 1} representing the direction of the 14:00–16:00 move:
    - **−1** = strong downward move (less than −25 bps),
    - **0** = flat / small move (between −25 and +25 bps),
    - **+1** = strong upward move (more than +25 bps).

This is the ground truth the model tries to predict.

In [2]:
y = pd.read_csv("y_train.csv")
y.head()

Unnamed: 0,ID,reod
0,0,0
1,1,0
2,2,-1
3,3,0
4,4,-1


#### 3) `x_test.csv` – Test set (unlabeled)

Same structure as x_train.csv:

- ID, day, equity, r0 … r52,

Our final submission `y_prediction.csv` must contain:

- **ID**,
- **reod** – predicted class in {−1, 0, 1}.

## 3 Evaluation metric and benchmark

The platform evaluates submissions using the classification accuracy on the hidden test set.

- A naive random or constant prediction yields ≈ 33% accuracy (3 classes).
- The official benchmark pipeline reaches about 41.74% accuracy.

The objective of this notebook is not to aggressively tune every detail under heavy compute, but to build a clean, robust and interpretable pipeline that:

- uses only the intraday returns,
- respects realistic constraints (8 GB RAM, no extreme feature explosion),
- achieves a stable performance in the 0.40–0.42 accuracy range, i.e. competitive with the benchmark and proving that a signal is catched

## Modelling philosophy

Predicting the sign of a small afternoon move from a noisy morning trajectory is more tricky that it seems:

- the signal-to-noise ratio is low,
- markets can move on news (jumps) that are not observable in our features,
- train and test periods have different days and different stocks, so overfitting on IDs is useless.

Given these constraints, we adopt a minimalist but principled approach:

- Use the full intraday path (r0…r52) as raw features, so the model can exploit any pattern in the shape of the trajectory.
- Add only a small set of global statistics that summarize momentum and volatility, without exceeding my computer 8 Go RAM.
- Deliberately exclude day and equity from the feature set to avoid modeling specific days/stocks that do not appear in the test set (no leakage-like behavior).
- Train a single LightGBM multiclass model, carefully regularised, rather than stacking many heavy models that would not significantly beat the noise given the hardware and time budget.

This pipeline is deliberately simple, stable and explainable. It also reflects a realistic trade-off between:

- model complexity,
- feature engineering effort,
- computational resources.

## Feature engineering strategy

### Raw intraday path

The core of the feature space is simply the 53 intraday returns:

- r0, r1, …, r52.

They form a discrete trajectory of the price movement from 09:30 to 14:00.

We keep these columns as-is, after:

- casting them to float,
- replacing any missing values with 0.0.

This ensures the model sees the full temporal structure of the morning move, without any arbitrary aggregation that might destroy signal.

### Global trajectory statistics

On top of the raw path, we add a few global summary statistics that capture:

- net direction,
- volatility,
- asymmetry of moves.

Concretely, for each row:

- **ret_sum** – sum of all 53 returns: overall morning performance,
- **ret_std** – standard deviation of the 53 returns: intraday volatility,
- **pos_count** – number of positive 5-min returns,
- **neg_count** – number of negative 5-min returns,
- **big_move_count_25** – number of 5-min moves with |return| > 25 bps,
- **big_move_count_50** – number of moves with |return| > 50 bps.

**Idea behind:**

- ret_sum and pos_count vs neg_count encode whether the morning was globally bullish or bearish.
- ret_std and the big_move_count_* features measure how turbulent the session was. A noisy morning may correlate with larger afternoon moves.

Together, the feature set is:

- 53 raw returns,
- 6 summary stats,
- → **total 59 features** per sample (minus any missing columns, handled robustly).

### Why we explicitly drop day and equity

The challenge is constructed so that:

- no day appears in both train and test,
- no equity appears in both train and test.

Using day or equity as features risks learning idiosyncratic patterns specific to some stocks or dates that will never generalize to the unseen test period.

In practice, I observed that including these identifiers tends to inflate validation performance while having very limited value on the public leaderboard final accuracy. For this clean baseline, I therefore exclude them from the feature set and focus only on what is likely to generalize: the shape and volatility of the intraday path itself.

## Modelling with LightGBM

### Target encoding

The target in y_train.csv is reod ∈ {−1, 0, 1}.

LightGBM expects class indices in {0, 1, 2} for multiclass objectives.

We therefore apply a deterministic mapping:

- -1 → 0,
- 0 → 1,
- +1 → 2.

And we store the inverse mapping to reconstruct the original labels before exporting the submission.

### Train/validation split

To estimate generalisation performance, we perform a stratified train/validation split:

- 80% of the data for training,
- 20% for validation,
- stratification on the mapped target {0,1,2}.

This preserves the class distribution in both folds and gives a reasonably stable estimate of validation accuracy and F1-macro.

### LightGBM hyperparameters

We use LightGBM in multiclass mode with the following configuration:

- **objective** = "multiclass",
- **num_class** = 3,
- **learning_rate** = 0.05,
- **num_leaves** = 31 : relatively small trees, more regular,
- **max_depth** = -1  : let the model decide depth within the limit of num_leaves,
- **feature_fraction** = 0.9 : random feature subsampling per tree, improves robustness,
- **bagging_fraction** = 0.8, **bagging_freq** = 1 : row subsampling per iteration,
- **min_data_in_leaf** = 200 : large leaves, strong regularisation against overfitting,
- **lambda_l2** = 2.0 : L2 regularisation on leaf weights,
- **metric** = ["multi_logloss", "multi_error"]  : monitor both loss and accuracy during training,
- **n_estimators** = 2000,
- **random_state** = 42,
- **n_jobs** = -1.

These parameters are intentionally conservative:

- large min_data_in_leaf and non-trivial lambda_l2 bias the model towards smoother decision boundaries,
- feature_fraction and bagging_fraction act like a built-in ensemble, reducing variance.

Given the noisy nature of the problem and limited compute, we prefer a slightly underfit but stable model to a highly tuned, brittle one.

### Early stopping and internal metrics

Then we train the model with:

- up to 2000 boosting iterations,
- early stopping with a patience of 200 rounds using the validation logloss.

At the end of training, it compute:

- Validation accuracy (multi_error),
- F1-macro (symmetric across the three classes),
- Full classification report (precision/recall/F1 per class).

Typical internal performance for this pipeline is around:

- Accuracy ≈ 0.49,
- F1-macro ≈ 0.48,

which is consistent with the difficulty of the task and with public leaderboard scores around 0.41.

## Full training and submission generation

Once the best number of iterations `best_iter` is found on the validation split, we retrain a final LightGBM model on the entire training set:

- same hyperparameters as above,
- n_estimators = best_iter.

This final model is then used to predict class probabilities on x_test.csv, and we:

- take the argmax to obtain class indices {0,1,2},
- map them back to the original classes {−1, 0, 1},
- build the submission file:
    - ID copied from x_test.csv,
    - reod = predicted class.

The notebook finishes by exporting:

- **y_prediction.csv** (no index),
- ready to be uploaded to the platform.


## Limitations and possible (few) improvements

This notebook intentionally remains a clean, interpretable baseline.  
Beyond this point, any improvement would rely less on modeling insight and more on brute-force search.

### Why tuning the model further has limited value ?

Further LGBM hyperparameter tuning (`num_leaves`, `learning_rate`, `max_depth`, regularization…) would not discover new structure.  
It would only perturb LightGBM's partitioning of the 53-dimensional space.

Yes, such random perturbations may, by chance, push accuracy from ~0.41 to ~0.45 or slightly higher, but:

- each attempt requires long training times,
- gains are unstable and non-generalizable,
- improvements reflect noise exploitation rather than true modeling progress.

This becomes a lottery, not an insight-driven process.

### Why richer feature engineering was not pursued here ?

The only genuine path to higher accuracy is **massive feature generation**.  
But doing this properly requires hundreds or thousands of candidate features, which is incompatible with an 8 GB RAM budget unless the computations are distributed.

More advanced pipelines would involve:

- PCA or autoencoders on return paths,
- rolling-window micro-pattern detectors,
- volatility-regime segmentation,
- cross-statistic interactions (ratios, differences, products),
- and even random symbolic transformations such as:

    ```
    meta_1 = log(|r17| + 7) × cos(r3 − r28)
    ```

These transformations have no theoretical meaning but sometimes unlock tiny dataset-specific gains.

**The problem:**  
those gains do not generalize at all.  
They only overfit the quirks of this specific context (specific given data, evaluated in this specific way).

Going down that road would require:

- either distributing computation on multiple VMs,
- or implementing a full batch/streaming FE pipeline,

which turns the challenge into a **systems engineering exercise**, not a modeling one.

This was deliberately avoided.

### The real takeaway (according to me)

A multiclass accuracy **> 0.41** (compared to the 0.33 random baseline) already demonstrates something fundamental:

> **The morning trajectory does contain real predictive signal — and the model is extracting it.**

This is non-trivial.  
In financial time series, theory and empirical evidence both agree:

- Every price path contains a **predictable component** (microstructure biases, intraday seasonality, volatility clustering).
- And an **irreducible stochastic component**, which cannot be forecast from past returns alone.

In such a regime:

- accuracy > 0.33 means the model is consistently beating randomness,
- accuracy ≈ 0.4–0.45 indicates that some structure is exploitable,
- accuracy > 0.5 would likely reflect dataset-specific artifacts, not a breakthrough in predictability.

**Financial trajectories are partly orderly, partly random.**  
The objective was not to predict perfectly, but to extract the fraction of orderliness that exists. We achieve that without overfitting or relying on speculative feature explosions.


# Imports & data loading

In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, classification_report

import lightgbm as lgb

# Display options 
pd.set_option("display.max_columns", 80)
pd.set_option("display.width", 200)

# File paths
PATH_X_TRAIN = "x_train.csv"
PATH_Y_TRAIN = "y_train.csv"
PATH_X_TEST  = "x_test.csv"

# Read CSVs
X_train_raw = pd.read_csv(PATH_X_TRAIN)
y_train_raw = pd.read_csv(PATH_Y_TRAIN)
X_test_raw  = pd.read_csv(PATH_X_TEST)

print("Shapes:")
print("  X_train:", X_train_raw.shape)
print("  y_train:", y_train_raw.shape)
print("  X_test :", X_test_raw.shape)

print("\nHead X_train:")
print(X_train_raw.head(3))

print("\nHead y_train:")
print(y_train_raw.head(3))

# Sanity: y_train must contain columns ['ID','reod']
assert {"ID", "reod"}.issubset(y_train_raw.columns), "y_train must contain columns ID and reod"

# Merge X_train and y_train on ID 
df_train = X_train_raw.merge(y_train_raw, on="ID", how="inner")
print("\nAfter merge on ID:")
print("  df_train:", df_train.shape)

# Basic class distribution
print("\nTarget distribution (reod):")
print(df_train["reod"].value_counts(normalize=True).sort_index())

Shapes:
  X_train: (843299, 56)
  y_train: (843299, 2)
  X_test : (885799, 56)

Head X_train:
   ID  day  equity     r0   r1     r2     r3     r4     r5     r6     r7     r8     r9    r10    r11   r12    r13   r14    r15    r16   r17    r18    r19    r20    r21    r22    r23    r24     r25  \
0   0  249    1488   0.00  NaN    NaN    NaN   0.00    NaN    NaN -68.03 -34.25    NaN    NaN    NaN  0.00    NaN   NaN   0.00    NaN   NaN    NaN    NaN   0.00   0.00   0.00    NaN   0.00  137.93   
1   1  272     107  -9.76  0.0 -12.21  46.44  34.08   0.00  41.24  12.08 -26.54  19.32  48.22  23.99 -7.18 -26.34 -2.40   0.00 -12.01  0.00   0.00  19.24  12.00   0.00 -16.78   0.00 -38.42  -19.29   
2   2  323    1063  49.85  0.0   0.00 -26.64 -23.66 -22.14  49.12  53.61  -4.70 -28.27   0.00 -33.01  6.30 -31.45 -3.15 -26.78  33.18 -9.45  40.98  -4.71 -17.26  25.14   4.71 -17.27  15.73  -18.85   

    r26    r27    r28     r29    r30   r31   r32    r33   r34    r35  r36    r37    r38    r39   r40    r

# Feature engineering

In [None]:
def build_features(df: pd.DataFrame) -> pd.DataFrame:
    """
    Create model-ready features from raw intraday returns.
    Here we go for a *simple but strong* representation:
    - raw r0..r52
    - a few global stats (sum, std, counts)
    """
    df = df.copy()

    # List of return columns r0..r52, ordered
    ret_cols = [col for col in df.columns if col.startswith("r")]
    ret_cols = sorted(ret_cols, key=lambda x: int(x[1:]))

    # Cast returns to float and fill NaN with 0
    df[ret_cols] = df[ret_cols].astype(float).fillna(0.0)

    # Global stats on the path 
    df["ret_sum"] = df[ret_cols].sum(axis=1)
    df["ret_std"] = df[ret_cols].std(axis=1)
    df["pos_count"] = (df[ret_cols] > 0).sum(axis=1)
    df["neg_count"] = (df[ret_cols] < 0).sum(axis=1)
    df["big_move_count_25"] = (df[ret_cols].abs() > 25).sum(axis=1)
    df["big_move_count_50"] = (df[ret_cols].abs() > 50).sum(axis=1)

    # We do NOT include 'day' or 'equity' here to avoid overfitting
    # on IDs that do not appear in the test set.

    feature_cols = ret_cols + [
        "ret_sum",
        "ret_std",
        "pos_count",
        "neg_count",
        "big_move_count_25",
        "big_move_count_50",
    ]

    # Robustness: only keep cols that exist in the dataframe
    feature_cols = [c for c in feature_cols if c in df.columns]

    return df[feature_cols], ret_cols


# Build features for train and test 

X_train_feats, ret_cols_train = build_features(X_train_raw)
X_test_feats, ret_cols_test = build_features(X_test_raw)

# Sanity: we ensure same feature columns order between train and test
assert list(X_train_feats.columns) == list(X_test_feats.columns), \
    "Train and test feature columns mismatch"

print("Feature matrix shapes:")
print("  X_train_feats:", X_train_feats.shape)
print("  X_test_feats :", X_test_feats.shape)

print("\nSample of engineered features:")
print(X_train_feats.head(5))

Feature matrix shapes:
  X_train_feats: (843299, 59)
  X_test_feats : (885799, 59)

Sample of engineered features:
       r0      r1     r2     r3     r4     r5     r6     r7     r8     r9    r10    r11    r12    r13    r14    r15    r16    r17    r18    r19    r20    r21    r22    r23    r24     r25    r26  \
0    0.00    0.00   0.00   0.00   0.00   0.00   0.00 -68.03 -34.25   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00  137.93   0.00   
1   -9.76    0.00 -12.21  46.44  34.08   0.00  41.24  12.08 -26.54  19.32  48.22  23.99  -7.18 -26.34  -2.40   0.00 -12.01   0.00   0.00  19.24  12.00   0.00 -16.78   0.00 -38.42  -19.29   0.00   
2   49.85    0.00   0.00 -26.64 -23.66 -22.14  49.12  53.61  -4.70 -28.27   0.00 -33.01   6.30 -31.45  -3.15 -26.78  33.18  -9.45  40.98  -4.71 -17.26  25.14   4.71 -17.27  15.73  -18.85  -3.93   
3    0.00    0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   

# Train/validation split & LightGBM training

In [None]:
# Target
y = df_train["reod"].values

# Map target {-1,0,1} -> {0,1,2} for LightGBM
label_mapping = {-1: 0, 0: 1, 1: 2}
inv_label_mapping = {v: k for k, v in label_mapping.items()}
y_mapped = np.array([label_mapping[val] for val in y])

# Align features with df_train by index (ID is already aligned by construction)
X = X_train_feats.values

# Train/validation split (stratified)
X_tr, X_val, y_tr, y_val = train_test_split(
    X,
    y_mapped,
    test_size=0.2,
    random_state=42,
    stratify=y_mapped,
)

print("Shapes after split:")
print("  X_tr :", X_tr.shape)
print("  X_val:", X_val.shape)

# LightGBM parameters (cell to modify if tuning but not very significative)
lgb_params = {
    "objective": "multiclass",
    "num_class": 3,
    "learning_rate": 0.05,
    "num_leaves": 31,          
    "max_depth": -1,
    "feature_fraction": 0.9,
    "bagging_fraction": 0.8,
    "bagging_freq": 1,
    "min_data_in_leaf": 200,   
    "lambda_l2": 2.0,
    "metric": ["multi_logloss", "multi_error"],  
    "n_estimators": 2000,
    "random_state": 42,
    "n_jobs": -1,
}

# Dataset objects for LightGBM
train_set = lgb.Dataset(X_tr, label=y_tr)
valid_set = lgb.Dataset(X_val, label=y_val, reference=train_set)

# Training with early stopping
evals_result = {}
gbm = lgb.train(
    params=lgb_params,
    train_set=train_set,
    valid_sets=[train_set, valid_set],
    valid_names=["train", "valid"],
    num_boost_round=lgb_params["n_estimators"],
    callbacks=[
        lgb.early_stopping(stopping_rounds=200),
        lgb.log_evaluation(period=100),
        lgb.record_evaluation(evals_result),
    ],
)

best_iter = gbm.best_iteration
print(f"\nBest iteration (early stopping): {best_iter}")

# Validation metrics
y_val_pred_prob = gbm.predict(X_val, num_iteration=best_iter)
y_val_pred = np.argmax(y_val_pred_prob, axis=1)

# Map back to {-1,0,1}
y_val_true_labels = np.array([inv_label_mapping[i] for i in y_val])
y_val_pred_labels = np.array([inv_label_mapping[i] for i in y_val_pred])

acc = accuracy_score(y_val_true_labels, y_val_pred_labels)
f1_macro = f1_score(y_val_true_labels, y_val_pred_labels, average="macro")

print(f"\nValidation accuracy : {acc:.4f}")
print(f"Validation F1-macro : {f1_macro:.4f}")

print("\nClassification report (validation):")
print(classification_report(y_val_true_labels, y_val_pred_labels, digits=4))

Shapes after split:
  X_tr : (674639, 59)
  X_val: (168660, 59)
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.336526 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 14207
[LightGBM] [Info] Number of data points in the train set: 674639, number of used features: 59
[LightGBM] [Info] Start training from score -1.201410
[LightGBM] [Info] Start training from score -0.886657
[LightGBM] [Info] Start training from score -1.247578
Training until validation scores don't improve for 200 rounds
[100]	train's multi_logloss: 0.997587	train's multi_error: 0.507829	valid's multi_logloss: 1.0019	valid's multi_error: 0.515688
[200]	train's multi_logloss: 0.988922	train's multi_error: 0.49589	valid's multi_logloss: 0.997918	valid's multi_error: 0.511443
[300]	train's multi_logloss: 0.982569	train's multi_error: 0.486273	valid's multi_logloss: 0.996104	valid's multi_error: 0.510441
[400]	train's multi_logloss: 0.976

# Full training & test predictions

In [None]:
# We reuse:
# - X_train_feats, X_test_feats
# - y_mapped, label_mapping, inv_label_mapping
# - lgb_params
# - best_iter

X_full = X_train_feats.values
y_full = y_mapped

# We fix n_estimators to the best_iter found on validation
lgb_params_final = lgb_params.copy()
lgb_params_final["n_estimators"] = best_iter

train_set_full = lgb.Dataset(X_full, label=y_full)

gbm_full = lgb.train(
    params=lgb_params_final,
    train_set=train_set_full,
    num_boost_round=best_iter
)

# Predictions on the test set
X_test_mat = X_test_feats.values
y_test_pred_prob = gbm_full.predict(X_test_mat, num_iteration=best_iter)
y_test_pred = np.argmax(y_test_pred_prob, axis=1)

# Remap to {-1, 0, 1}
y_test_pred_labels = np.array([inv_label_mapping[i] for i in y_test_pred])

# Build y_prediction.csv
submission = pd.DataFrame({
    "ID": X_test_raw["ID"].values,
    "reod": y_test_pred_labels,
})

print(submission.head())
print(submission["reod"].value_counts(normalize=True))

submission.to_csv("y_prediction.csv", index=False)
print("\n>> y_prediction.csv created")

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.181750 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 14208
[LightGBM] [Info] Number of data points in the train set: 843299, number of used features: 59
[LightGBM] [Info] Start training from score -1.201410
[LightGBM] [Info] Start training from score -0.886658
[LightGBM] [Info] Start training from score -1.247578
        ID  reod
0  1000000     1
1  1000001     1
2  1000002     1
3  1000003     1
4  1000004     0
reod
-1    0.389654
 1    0.309911
 0    0.300435
Name: proportion, dtype: float64

>> y_prediction.csv created
