`LightGBM` is a gradient boosting framework which outperforms `XGBoost` in training speeds, memory usage and size of the datasets it can handle. `LightGBM` is able to do so by using histogram-based algorithms to bucket continuous features into `discrete` bins during training.



In [1]:
import pandas as pd
import lightgbm as lgb
from sklearn.metrics import mean_squared_error

In [2]:
# Setup Wandb
import wandb
from wandb.lightgbm import wandb_callback, log_summary

wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: W&B API key is configured. Use `wandb login --relogin` to force relogin


True

### Download and Prepare Dataset

In [5]:
!wget https://raw.githubusercontent.com/microsoft/LightGBM/master/examples/regression/regression.train --no-check-certificate
!wget https://raw.githubusercontent.com/microsoft/LightGBM/master/examples/regression/regression.test --no-check-certificate

--2022-06-08 21:15:51--  https://raw.githubusercontent.com/microsoft/LightGBM/master/examples/regression/regression.train
Resolving raw.githubusercontent.com... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com|2606:50c0:8000::154|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 1228616 (1.2M) [text/plain]
Saving to: 'regression.train'

     0K .......... .......... .......... .......... ..........  4% 2.58M 0s
    50K .......... .......... .......... .......... ..........  8% 3.35M 0s
   100K .......... .......... .......... .......... .......... 12% 5.16M 0s
   150K .......... .......... .......... .......... .......... 16% 4.00M 0s
   200K .......... .......... .......... .......... .......... 20% 3.91M 0s
   250K .......... .......... .......... .......... .......... 25% 4.67M 0s
   300K .......... .......... .......... .......... .......... 29% 5.54M

In [6]:
# Load and Create Dataset 
df_train = pd.read_csv('regression.train', header=None, sep='\t')
df_test = pd.read_csv('regression.test', header=None, sep='\t')

y_train = df_train[0]
y_test = df_test[0]
X_train = df_train.drop(0, axis=1)
X_test = df_test.drop(0, axis=1)

# Create Dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train)
lgb_test = lgb.Dataset(X_test, y_test)

### Train

In [7]:
# Configurations
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': ['rmse', 'l2', 'l1', 'huber'],
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbosity': 0
}

wandb.init(project='lightgbm-wandb_example', config=params)

In [8]:
# Train using wand_callback
gbm = lgb.train(params,
                lgb_train,
                num_boost_round=30,
                valid_sets=lgb_test,
                valid_names=('validation'),
                callbacks=[wandb_callback()],
                early_stopping_rounds=5)



You can set `force_col_wise=true` to remove the overhead.


### Log Feature Importance and upload Model with `log_summary`

`log_summary` will upload calculate and upload the feature importance import and (optionally) upload your trained model to W&B Artifacts so you can use it later

In [9]:
log_summary(gbm, save_model_checkpoint=True)

### Evaluate

In [12]:
# Predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)

# Eval
print('RMSE: ', mean_squared_error(y_test, y_pred) ** 0.5)
wandb.log({'rmse_prediction': mean_squared_error(y_test, y_pred) ** 0.5})

RMSE:  0.43421275319941804


In [13]:
wandb.finish()

VBox(children=(Label(value='0.087 MB of 0.087 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
iteration,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇███
rmse_prediction,▁
validation_huber,██▇▆▆▆▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁
validation_l1,██▇▇▇▆▆▆▅▅▅▄▄▄▄▄▃▃▃▃▂▂▂▂▂▂▁▁▁▁
validation_l2,██▇▆▆▆▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁
validation_rmse,██▇▆▆▆▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁

0,1
best_iteration,30.0
iteration,29.0
rmse_prediction,0.43421
