Getting the best performance out of tree-based models requires `selection of right hyperparameters`. Searching through high dimensional hyperparameter spaces to find the most performant model. `Wandb Sweeps` provides an organized and efficient way to conduct a contest b/w the models and get the winner the `best` model. They do this by searching through `combinations of hyperparameter values` to find the most optimal values.

## Sweeps
Running a `hyperparameter sweep` using Wandb.
1. <b>Define the Sweep</b>: Declare a `dict` that specifies the sweep, which parameters to search through, which `search strategy` to use, which `metric` to optimize.
2. <b>Initialize the sweep</b>: Write `sweep_id = wandb.sweep(sweep_config)`
3. <b>Run the Sweep Agent</b>: Use `wandb.agent()` and pass `sweep_id` along with the model `wandb.agent(sweep_id, function=train)`

In [1]:
import wandb

wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: Currently logged in as: raghvender. Use `wandb login --relogin` to force relogin


True

### 1. Define the Sweep

- `Metric`: The metric which sweeps are attempting to optimize. For Example `maximize` or `minimize`.
- `Search Strategy`: Different Search Strategy.
    1. <b>Grid Search</b> - Iterates over every combination of hyperparameter values.
    2. <b>Random Search</b> – Iterates over randomly chosen combinations of hyperparameter values.
    3. <b>Bayesian Search</b> - Creates a Probabilistic model that maps hyperparamter to probability of a metric score 
            and chooses the parameter having the highest probability of improving metric
- `Parameters` - A `dict` containing hyperparamter names, discrete values, a range, or distributions from which to pull 
                their values on each `iteration`

In [2]:
sweep_config = {
    "method": "random", # try grid or random
    "metric": {
      "name": "accuracy",
      "goal": "maximize"   
    },
    "parameters": {
        "booster": {
            "values": ["gbtree","gblinear"]
        },
        "max_depth": {
            "values": [3, 6, 9, 12]
        },
        "learning_rate": {
            "values": [0.1, 0.05, 0.2]
        },
        "subsample": {
            "values": [1, 0.5, 0.3]
        }
    }
}

### 2. Initialize Sweep

In [3]:
sweep_id = wandb.sweep(sweep_config, project='XGBoost-Sweeps-WandbExamples')

Create sweep with ID: m06t088c
Sweep URL: https://wandb.ai/raghvender/XGBoost-Sweeps-WandbExamples/sweeps/m06t088c


In [4]:
# Download Data
!wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv --no-check-certificate

--2022-06-08 21:47:12--  https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv
Resolving raw.githubusercontent.com... 2606:50c0:8003::154, 2606:50c0:8001::154, 2606:50c0:8000::154, ...
Connecting to raw.githubusercontent.com|2606:50c0:8003::154|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 23278 (23K) [text/plain]
Saving to: 'pima-indians-diabetes.data.csv'

     0K .......... .......... ..                              100% 4.41M=0.005s

2022-06-08 21:47:12 (4.41 MB/s) - 'pima-indians-diabetes.data.csv' saved [23278/23278]



In [5]:
import numpy as np
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

def train():
    config_defaults = {
        'booster': 'gbtree',
        'max_depth': 3,
        'learning_rate': 0.1,
        'subsample': 1,
        'seed': 117,
        'test_size': 0.33,
    }

    wandb.init(config=config_defaults)
    config = wandb.config

    # load data and split into pred and targets
    dataset = loadtxt('pima-indians-diabetes.data.csv', delimiter=',')
    X, Y = dataset[:, :8], dataset[:, 8]

    # Split dataset
    X_train, X_test, y_train, y_test = train_test_split(X, Y,
                                                      test_size=config.test_size,
                                                      random_state=config.seed)
    
    # fit model
    model = XGBClassifier(booster=config.booster, max_depth=config.max_depth, 
                            learning_rate=config.learning_rate, subsample=config.subsample)
    model.fit(X_train, y_train)

    # Make Predictions
    y_pred = model.predict(X_test)
    predictions = [round(value) for value in y_pred]

    # Evaluate Predictions
    accuracy = accuracy_score(y_test, predictions)
    print(f'Accuracy: {int(accuracy * 100)}%')
    wandb.log({'accuracy': accuracy})

### Run Sweep with Wandb Agent

In [6]:
wandb.agent(sweep_id, train, count=25)

wandb: Agent Starting Run: lydpwtc0 with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.2
wandb: 	max_depth: 12
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 72%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.72047


wandb: Agent Starting Run: zx8umsmb with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 3
wandb: 	subsample: 1
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 68%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.68504


wandb: Agent Starting Run: 95nhef68 with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.1
wandb: 	max_depth: 6
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 73%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.73228


wandb: Agent Starting Run: j8konlb5 with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.2
wandb: 	max_depth: 12
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 72%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.72047


wandb: Sweep Agent: Waiting for job.
wandb: Job received.
wandb: Agent Starting Run: atqd2jsh with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.2
wandb: 	max_depth: 3
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 70%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.70472


wandb: Agent Starting Run: 0rkfcutk with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.1
wandb: 	max_depth: 3
wandb: 	subsample: 1
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 74%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.74016


wandb: Agent Starting Run: 0qzmy7jo with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 6
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 68%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.68504


wandb: Agent Starting Run: 6ebkt9xd with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.2
wandb: 	max_depth: 6
wandb: 	subsample: 1
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 73%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.73228


wandb: Agent Starting Run: 96q69kwx with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 3
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 72%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.72047


wandb: Sweep Agent: Waiting for job.
wandb: Job received.
wandb: Agent Starting Run: o8wm822f with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.1
wandb: 	max_depth: 3
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 68%


VBox(children=(Label(value='0.000 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.0, max…

0,1
accuracy,▁

0,1
accuracy,0.6811


wandb: Agent Starting Run: jogepkr5 with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 9
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 72%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.72441


wandb: Agent Starting Run: jgal35qt with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.2
wandb: 	max_depth: 12
wandb: 	subsample: 1
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 73%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.73622


wandb: Agent Starting Run: vvq22p6d with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.2
wandb: 	max_depth: 3
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 71%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.71654


wandb: Agent Starting Run: 3qb2qt83 with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.1
wandb: 	max_depth: 12
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 70%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.70866


wandb: Agent Starting Run: suwhze3a with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 9
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 73%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.73228


wandb: Agent Starting Run: 2gal1jug with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 6
wandb: 	subsample: 1
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 73%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.73228


wandb: Agent Starting Run: 6q6r2wnw with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 9
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 68%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.68504


wandb: Agent Starting Run: 2jwft2np with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.2
wandb: 	max_depth: 12
wandb: 	subsample: 1
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 73%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.73622


wandb: Agent Starting Run: 4q6grkzf with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 9
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 68%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.68504


wandb: Agent Starting Run: 6td34b4i with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.05
wandb: 	max_depth: 9
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 72%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.72441


wandb: Agent Starting Run: 264s1apr with config:
wandb: 	booster: gbtree
wandb: 	learning_rate: 0.1
wandb: 	max_depth: 3
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 74%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.74409


wandb: Agent Starting Run: l3whuey1 with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.1
wandb: 	max_depth: 6
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 67%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.67323


wandb: Agent Starting Run: ckmjcwgi with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.1
wandb: 	max_depth: 12
wandb: 	subsample: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 67%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.67323


wandb: Agent Starting Run: vyvfvh9k with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.2
wandb: 	max_depth: 9
wandb: 	subsample: 1
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 71%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.71654


wandb: Agent Starting Run: gqyx75jg with config:
wandb: 	booster: gblinear
wandb: 	learning_rate: 0.1
wandb: 	max_depth: 9
wandb: 	subsample: 0.5
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Accuracy: 68%


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
accuracy,▁

0,1
accuracy,0.68504
