# Bonus: hyperparameter optimization using WandB Sweeps + Optuna

### !!! Warning
To be able to run the code below you need to have a WandB account

## Login to WandB

In [1]:
import wandb
import inspect
from wandb import CommError
import yaml

In [2]:

wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33maaalex-lit[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [3]:
PROJECT='diabetes-prediction'

## Create a Lauch Queue


In [4]:
config = {"label": "hyperparams-finetune-optuna"}
api = wandb.Api()
try:
    queue = api.create_run_queue(
            name="diabetes-prediction-queue",
            type="local-process",
            access="project",
            config=config,
        )
except CommError as e:
    print(e.message)

409 response executing GraphQL.
{"errors":[{"message":"project already has queue with name diabetes-prediction-queue","path":["createRunQueue"]}],"data":{"createRunQueue":null}}
[34m[1mwandb[0m: [32m[41mERROR[0m Error while calling W&B API: project already has queue with name diabetes-prediction-queue (<Response [409]>)


project already has queue with name diabetes-prediction-queue (Error 409: Conflict)


## Create the training job

In [6]:
!wandb job create -p $PROJECT -n "xgb-classifier-diabetes" code ./ -E "xgb_job.py"

[34m[1mwandb[0m: Creating launch job of type: code...
[34m[1mwandb[0m: Adding directory to artifact (./.)... Done. 0.1s
[34m[1mwandb[0m:                                                                                
[34m[1mwandb[0m: W&B sync reduced upload amount by 6.0%             
[34m[1mwandb[0m: Updated job: [33maaalex-lit/diabetes-prediction/xgb-classifier-diabetes:v6[0m, with alias: [33mlatest[0m
[34m[1mwandb[0m: View all jobs in project 'diabetes-prediction' here: [4mhttps://wandb.ai/aaalex-lit/diabetes-prediction/jobs[0m
[34m[1mwandb[0m: 


## Create a function to optimize


In [7]:
import optuna 

def objective(trial):
    # Define search spaces for hyperparameters
    n_estimators = trial.suggest_int('n_estimators', 10, 300)
    max_depth = trial.suggest_int('max_depth', 1, 20)
    min_child_weight = trial.suggest_float('min_child_weight', 0, 1)
    learning_rate = trial.suggest_float('learning_rate', 1e-5, 1, log=True)

    print(f"{n_estimators=} {max_depth=} {min_child_weight=} {learning_rate=}")

    # !! don't actually train, return -1
    return -1    

### Test the conditional configuration function.

In [8]:
import optuna 
# Create an Optuna study
study = optuna.create_study(direction="maximize")

# Start the optimization process
study.optimize(objective, n_trials=2)


[I 2023-11-11 07:22:49,347] A new study created in memory with name: no-name-05459e36-9a5b-409d-a810-a00a592858bf
[I 2023-11-11 07:22:49,349] Trial 0 finished with value: -1.0 and parameters: {'n_estimators': 170, 'max_depth': 16, 'min_child_weight': 0.5107981357782272, 'learning_rate': 0.0023627293894579525}. Best is trial 0 with value: -1.0.
[I 2023-11-11 07:22:49,350] Trial 1 finished with value: -1.0 and parameters: {'n_estimators': 247, 'max_depth': 7, 'min_child_weight': 0.09885542063191, 'learning_rate': 0.7383983999201801}. Best is trial 0 with value: -1.0.


n_estimators=170 max_depth=16 min_child_weight=0.5107981357782272 learning_rate=0.0023627293894579525
n_estimators=247 max_depth=7 min_child_weight=0.09885542063191 learning_rate=0.7383983999201801


## Save the configuration to W&B as an artifact.

Save the conditional search logic to W&B as an artifact.

In [5]:
ARTIFACT_FILENAME = "optuna_diabetes_prediction.py"
ARTIFACT_NAME = "optuna-config-diabetes-prediction"


In [8]:

"""write function to its own file"""
function_lines = inspect.getsource(objective)
with open(ARTIFACT_FILENAME, 'w') as f:
    f.write(function_lines)

"""create and log artifact to wandb"""
run = wandb.init(project=PROJECT)
artifact = run.log_artifact(ARTIFACT_FILENAME, name=ARTIFACT_NAME, type='optuna')
run.finish()



VBox(children=(Label(value='0.001 MB of 0.010 MB uploaded\r'), FloatProgress(value=0.10500500500500501, max=1.…

### The following way of creating the scheduler fails:

In [21]:
!wandb job create --project diabetes-prediction --name "optuna-scheduler" git https://github.com/wandb/launch-jobs --entry-point "jobs/sweep_schedulers/optuna_scheduler/optuna_scheduler.py"

[34m[1mwandb[0m: Creating launch job of type: git...
[34m[1mwandb[0m: [32m[41mERROR[0m Could not find requirements.txt file in git repo at https://github.com/wandb/requirements.txt or parent directories.
[34m[1mwandb[0m: [32m[41mERROR[0m Job creation failed


So I copied the file from https://github.com/wandb/launch-jobs/blob/main/jobs/sweep_schedulers/optuna_scheduler/optuna_wandb.py into this project and created the job the following way

In [11]:
!wandb job create --project diabetes-prediction --name "optuna-scheduler" code ./ -E "optuna_scheduler.py"

[34m[1mwandb[0m: Creating launch job of type: code...
[34m[1mwandb[0m: Adding directory to artifact (./.)... Done. 0.2s
[34m[1mwandb[0m:                                                                                
[34m[1mwandb[0m: W&B sync reduced upload amount by 99.2%             
[34m[1mwandb[0m: Created job: [33maaalex-lit/diabetes-prediction/optuna-scheduler:v0[0m, with alias: [33mlatest[0m
[34m[1mwandb[0m: View all jobs in project 'diabetes-prediction' here: [4mhttps://wandb.ai/aaalex-lit/diabetes-prediction/jobs[0m
[34m[1mwandb[0m: 


In [4]:
!wandb job create --project diabetes-prediction --name "optuna-scheduler" git https://github.com/aaalexlit/ml_zoomcamp_midterm_cdc_diabetes \
     --entry-point "optuna-wandb-sweeps-hyperparameter-tuning/optuna_scheduler.py"

[34m[1mwandb[0m: Creating launch job of type: git...
[34m[1mwandb[0m: Using requirements.txt in /
[34m[1mwandb[0m:                                                                                
[34m[1mwandb[0m: Updated job: [33maaalex-lit/diabetes-prediction/optuna-scheduler:v2[0m, with alias: [33mlatest[0m
[34m[1mwandb[0m: View all jobs in project 'diabetes-prediction' here: [4mhttps://wandb.ai/aaalex-lit/diabetes-prediction/jobs[0m
[34m[1mwandb[0m: 


## Define a sweep configuration

In [13]:
config = {
    "metric": {"name": "validation_0-custom_recall_score", "goal": "maximize"},
    "run_cap": 4,
    "job": "aaalex-lit/diabetes-prediction/xgb-classifier-diabetes:latest",
    "scheduler": {
        "job": "aaalex-lit/diabetes-prediction/optuna-scheduler:latest",
        "num_workers": 2,
        "settings": {
            "optuna_source": f"{PROJECT}/{ARTIFACT_NAME}:latest",
            "optuna_source_filename": ARTIFACT_FILENAME,
        }
    },
}

# write config to file
config_filename = "sweep-config.yaml"
yaml.dump(config, open(config_filename, "w"))

## Launch the agent

Run from CLI

```shell
wandb launch-agent -q diabetes-prediction-queue
```

## Launch the sweep

In [5]:
! wandb launch-sweep sweep-config.yaml -e aaalex-lit -p $PROJECT -q diabetes-prediction-queue

[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: - 0.001 MB of 0.001 MB uploaded



[34m[1mwandb[0m:                                                                                
[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: - 0.001 MB of 0.001 MB uploaded



[34m[1mwandb[0m:                                                                                
[34m[1mwandb[0m: [35mlaunch:[0m Launching run into aaalex-lit/diabetes-prediction
[34m[1mwandb[0m: Created sweep with ID: [33mjypdy9s9[0m
[34m[1mwandb[0m: View sweep at: [34m[4mhttps://wandb.ai/aaalex-lit/diabetes-prediction/sweeps/jypdy9s9[0m
[34m[1mwandb[0m: Scheduler added to launch queue (diabetes-prediction-queue)
