<a href="https://colab.research.google.com/github/wandb/examples/blob/master/colabs/boosting/Simple_LightGBM_Integration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases" />

<div><img /></div>

<img src="https://i.imgur.com/uEtWSEb.png" width="650" alt="Weights & Biases" />

<div><img /></div>

# 🏋️‍♀️ W&B + 💡 LightGBM
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.

Gradient boosting decision trees are the state of the art when it comes to building predictive models for structured data.

[LigthGBM](https://github.com/microsoft/LightGBM), a gradient boosting framework by Microsoft, has dethroned xgboost and become the go to GBDT algorithm (along with catboost). It outperforms xgboost in training speeds, memory usage and the size of datasets it can handle. LightGBM does so by using histogram-based algorithms to bucket continuous features into discrete bins during training.


## What this notebook covers
* Easy integration of Weights and Biases with LightGBM. 
* `wandb_callback()` callback

We want to make it incredible easy for people to look under the hood of their models, so we built a callback that helps you visualize your LightGBM’s performance in just one line of code.

**Note**: Sections starting with _Step_ is all you need to integrate W&B.

# Install, Import, and Log in

## The Usual Suspects

In [2]:
import pandas as pd
import lightgbm as lgb
from sklearn.metrics import mean_squared_error

## Step 0: Install W&B

In [1]:
%%capture
!pip install wandb

## Step 1: Import W&B and Login

In [3]:
import wandb
from wandb.lightgbm import wandb_callback

wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

# Download and Prepare Dataset


In [4]:
!wget https://raw.githubusercontent.com/microsoft/LightGBM/master/examples/regression/regression.train -qq
!wget https://raw.githubusercontent.com/microsoft/LightGBM/master/examples/regression/regression.test -qq

In [5]:
# load or create your dataset
df_train = pd.read_csv('regression.train', header=None, sep='\t')
df_test = pd.read_csv('regression.test', header=None, sep='\t')

y_train = df_train[0]
y_test = df_test[0]
X_train = df_train.drop(0, axis=1)
X_test = df_test.drop(0, axis=1)

# create dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

# Train

### Step 2: Initialize your wandb run. 

Using `wandb.init()` initialize your W&B run. You can also pass a dictionary of configs. [Check out the official documentation here $\rightarrow$](https://docs.wandb.com/library/init)

You can't deny the importance of configs in your ML/DL workflow. W&B makes sure that you have access to the right config to reproduce your model. 

[Learn more about configs in this colab notebook $\rightarrow$](http://wandb.me/config-colab)

In [6]:
# specify your configurations as a dict
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': ['rmse', 'l2', 'l1', 'huber'],
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbosity': 0
}

wandb.init(project='my-lightgbm-integration', config=params);

[34m[1mwandb[0m: Currently logged in as: [33mcharlesfrye[0m (use `wandb login --relogin` to force relogin)


> Once you have trained your model come back and click on the **Project page**.

### Step 3: Train with `wandb_callback`

In [7]:
# train 
# add lightgbm callback
gbm = lgb.train(params,
                lgb_train,
                num_boost_round=30,
                valid_sets=lgb_eval,
                valid_names=('validation'),
                callbacks=[wandb_callback()],
                early_stopping_rounds=5)

[1]	validation's huber: 0.121741	validation's l1: 0.492417	validation's l2: 0.243481	validation's rmse: 0.493438
Training until validation scores don't improve for 5 rounds.
[2]	validation's huber: 0.120023	validation's l1: 0.48874	validation's l2: 0.240045	validation's rmse: 0.489944
[3]	validation's huber: 0.118318	validation's l1: 0.485042	validation's l2: 0.236636	validation's rmse: 0.486452
[4]	validation's huber: 0.116479	validation's l1: 0.480872	validation's l2: 0.232959	validation's rmse: 0.482658
[5]	validation's huber: 0.114842	validation's l1: 0.476928	validation's l2: 0.229684	validation's rmse: 0.479254
[6]	validation's huber: 0.113471	validation's l1: 0.473545	validation's l2: 0.226942	validation's rmse: 0.476384
[7]	validation's huber: 0.111986	validation's l1: 0.469984	validation's l2: 0.223972	validation's rmse: 0.473256
[8]	validation's huber: 0.110464	validation's l1: 0.466083	validation's l2: 0.220928	validation's rmse: 0.47003
[9]	validation's huber: 0.108975	vali

# Evaluate

In [8]:
# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5)
wandb.log({'rmse_prediction': mean_squared_error(y_test, y_pred) ** 0.5})

The rmse of prediction is: 0.43626711632808296


# Visualize Results

Click on the **project page** link above to see your results automatically visualized.

<img src="https://imgur.com/S6lwSig.png" alt="Viz" />


# Sweep 101

Use Weights & Biases Sweeps to automate hyperparameter optimization and explore the space of possible models.

## [Check out Hyperparameter Optimization with XGBoost  using W&B Sweep $\rightarrow$](http://wandb.me/xgb-colab)

Running a hyperparameter sweep with Weights & Biases is very easy. There are just 3 simple steps:

1. **Define the sweep:** We do this by creating a dictionary or a [YAML file](https://docs.wandb.com/library/sweeps/configuration) that specifies the parameters to search through, the search strategy, the optimization metric et all.

2. **Initialize the sweep:** 
`sweep_id = wandb.sweep(sweep_config)`

3. **Run the sweep agent:** 
`wandb.agent(sweep_id, function=train)`

And voila! That's all there is to running a hyperparameter sweep!

<img src="https://imgur.com/SVtMfa2.png" alt="Sweep Result" />


# Example Gallery

See examples of projects tracked and visualized with W&B in our [Gallery →](https://app.wandb.ai/gallery)

# Basic Setup
1. **Projects**: Log multiple runs to a project to compare them. `wandb.init(project="project-name")`
2. **Groups**: For multiple processes or cross validation folds, log each process as a runs and group them together. `wandb.init(group='experiment-1')`
3. **Tags**: Add tags to track your current baseline or production model.
4. **Notes**: Type notes in the table to track the changes between runs.
5. **Reports**: Take quick notes on progress to share with colleagues and make dashboards and snapshots of your ML projects.

# Advanced Setup
1. [Environment variables](https://docs.wandb.com/library/environment-variables): Set API keys in environment variables so you can run training on a managed cluster.
2. [Offline mode](https://docs.wandb.com/library/technical-faq#can-i-run-wandb-offline): Use `dryrun` mode to train offline and sync results later.
3. [On-prem](https://docs.wandb.com/self-hosted): Install W&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams.
4. [Sweeps](https://docs.wandb.com/sweeps): Set up hyperparameter search quickly with our lightweight tool for tuning.