## Probemos un poquito Learning to Rank con la librería LightGBM

Seguimos el ejemplo del código en https://mlexplained.com/2019/05/27/learning-to-rank-explained-with-code/

Para eso hay que descargar los datos con el archivo trans_data.py, ejecutando retrieve_30k.sh

In [None]:
! sh retrieve_30k.sh

In [None]:
# Importemos las librerías más importantes
import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_svmlight_file
from scipy.stats import spearmanr

# Carguemos los archivos que pudimos bajar con el script trans_data.py
x_train, y_train = load_svmlight_file("mq2008.train")
x_valid, y_valid = load_svmlight_file("mq2008.vali")
x_test, y_test = load_svmlight_file("mq2008.test")

In [None]:
y_train

In [None]:
q_train = np.loadtxt('mq2008.train.group')
q_valid = np.loadtxt('mq2008.vali.group')
q_test = np.loadtxt('mq2008.test.group')

In [None]:
x_test

In [None]:
q_test

In [None]:
y_test[:8]

In [None]:
# LGBMRanker doc: https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRanker.html
gbm = lgb.LGBMRanker()

gbm.fit(
    x_train, y_train, group=q_train, eval_set=[(x_valid, y_valid)],
    eval_group=[q_valid], eval_at=[1, 3], early_stopping_rounds=20, verbose=True,
    callbacks=[lgb.reset_parameter(learning_rate=lambda x: 0.95 ** x * 0.1)]
)

In [None]:
# Tiremos el predictor sobre los datos de test
preds_test = gbm.predict(x_test)
preds_test

In [None]:
# Usemos la métrica de Spearman para correlación de Rankings
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html
spearmanr(y_test, preds_test)

## Agrupemos todo el dataset y reentrenemos!

In [None]:
q_train = [x_train.shape[0]]
q_valid = [x_valid.shape[0]]
q_test = [x_test.shape[0]]

gbm = lgb.LGBMRanker()
gbm.fit(
    x_train, y_train, group=q_train, eval_set=[(x_valid, y_valid)],
    eval_group=[q_valid], eval_at=[1, 3], early_stopping_rounds=20, verbose=True,
    callbacks=[lgb.reset_parameter(learning_rate=lambda x: 0.95 ** x * 0.1)]
)

In [None]:
preds_test = gbm.predict(x_test)
preds_test
spearmanr(y_test, preds_test)