# NAS for Pizza Steak Sushi NAS

We will use the following additional libraries:
- TorchX (To run training jobs)

We'll tune the following parameters to find a good trade off between model performance and model size:
- batch_size
- embedding_dim
- num_heads
- num_encoders
- learning_rate
- epochs

The training will be carried out using [pizza_steak_sushi.py](./pizza_steak_sushi.py) training script. Here is an example to train a model using this script:
```
python pizza_steak_sushi.py --log_path 'logs/0' --batch_size 2 --embedding_dim 128 --num_heads 2 --num_encoders 2 --learning_rate 0.001 --epochs 2 --subset_size 10
```

Requirements for execution on cloud:
- [] Update TOTAL_TRIALS at [Choose a Generation Strategy](#7-choose-a-generation-strategy)

Cleanup:
- Move GitHub file download code to toolbox

# 1. Prepare Environment

## 1.1. Install Packages

In [25]:
try:
    import torchx
    import torcheval
except:
    print('Installing torchx, torcheval and ax-plaform')
    ! pip install torchx ax-platform torcheval 1>/dev/null

## 1.2. Download Dependencies

In [2]:
from pathlib import Path

remote_files = {
    'pizza_steak_sushi.py': 'https://raw.githubusercontent.com/NareshPS/doi-ml/main/torch/projects/pizza_steak_sushi/pizza_steak_sushi.py',
}

for name, remote_path in remote_files.items():
    if Path(name).exists():
        print(f'File: {name} exists. Skipping Download')
    else:
        ! wget {remote_path}

! ls *.py

File: pizza_steak_sushi.py exists. Skipping Download
pizza_steak_sushi.py


## 1.3. Download Toolbox

In [19]:
toolbox_path = 'toolbox'

if not Path(toolbox_path).exists():
    print(f'Downloading toolbox at {toolbox_path}')
    ! git clone https://github.com/NareshPS/doi-ml-toolbox.git
    ! mv doi-ml-toolbox/torch/toolbox .
    ! rm -rf doi-ml-toolbox
else:
    print(f'Toolbox already exists at {toolbox_path}')

Toolbox already exists at toolbox


# 1. Define a TorchX AppDef

 We'll define a TorchX AppDef which configures the parameters to the training script.

In [3]:
import torchx

from torchx import specs
from torchx.components import utils
from pathlib import Path

def trainer(
    log_path: str,
    batch_size: int,
    embedding_dim: int,
    num_heads: int,
    num_encoders: int,
    learning_rate: float,
    epochs: int,
    trial_idx: int = -1
) -> specs.AppDef:

    # 1. Configure the location of log files
    if trial_idx >= 0:
        log_path = (Path(log_path) / str(trial_idx)).absolute().as_posix()

    # 2. Define configuration arguments for the training script
    return utils.python(
        '--log_path', log_path,
        '--batch_size', str(batch_size),
        '--embedding_dim', str(embedding_dim),
        '--num_heads', str(num_heads),
        '--num_encoders', str(num_encoders),
        '--learning_rate', str(learning_rate),
        '--epochs', str(epochs),
        '--subset_size', str(10),
        name = 'trainer',
        script = 'pizza_steak_sushi.py',
        image = torchx.version.TORCHX_IMAGE
    )

# 2. Setup TorchX Runner

It configure the execution environment for torchX AppDef.

In [17]:
import tempfile

from ax.runners.torchx import TorchXRunner

# 1. Make a temporary directory to write logs
log_path = Path('logs')
log_path.mkdir(parents=True, exist_ok=True)

# 2. Initialize TorchX Runner
ax_runner = TorchXRunner(
    tracker_base="/tmp",
    component=trainer,
    scheduler="local_cwd",
    component_const_params={"log_path": log_path},
    cfg={},
)

print(f'Ax Runner: {ax_runner} \nLog Path: {log_path}')

Ax Runner: <ax.runners.torchx.TorchXRunner object at 0x14fa322f0> 
Log Path: logs


# 3. Setup the Search Space

It defines the search parameters, their type, and the range of values they can take. Parameter types are typically integer, float, or boolean. The values these parameters can take represent the search space. 

In [5]:
from ax.core import ChoiceParameter, ParameterType, RangeParameter, SearchSpace

parameters = [
    ChoiceParameter(
        name="batch_size",
        values=[16, 32, 64],
        parameter_type=ParameterType.INT,
        is_ordered=True,
        sort_values=True,
    ),
    ChoiceParameter(
        name="embedding_dim",
        values=[64, 128, 256, 512],
        parameter_type=ParameterType.INT,
        is_ordered=True,
        sort_values=True
    ),
    ChoiceParameter(
        name="num_heads",
        values=[2, 4, 8,],
        parameter_type=ParameterType.INT,
        is_ordered=True,
        sort_values=True
    ),
    RangeParameter(
        name="num_encoders", lower=4, upper=12, parameter_type=ParameterType.INT, log_scale=True
    ),
    RangeParameter(
        name="learning_rate",
        lower=0.0001,
        upper=0.01,
        parameter_type=ParameterType.FLOAT,
        log_scale=True,
    ),

    RangeParameter(name="epochs", lower=4, upper=20, parameter_type=ParameterType.INT),
]

search_space = SearchSpace(parameters=parameters, parameter_constraints=[])

print(f'Search Space: {search_space}')

Search Space: SearchSpace(parameters=[ChoiceParameter(name='batch_size', parameter_type=INT, values=[16, 32, 64], is_ordered=True, sort_values=True), ChoiceParameter(name='embedding_dim', parameter_type=INT, values=[64, 128, 256, 512], is_ordered=True, sort_values=True), RangeParameter(name='num_heads', parameter_type=INT, range=[2, 8], log_scale=True), RangeParameter(name='num_encoders', parameter_type=INT, range=[4, 12], log_scale=True), RangeParameter(name='learning_rate', parameter_type=FLOAT, range=[0.0001, 0.01], log_scale=True), RangeParameter(name='epochs', parameter_type=INT, range=[4, 20])], parameter_constraints=[])


# 4. Setup Metrics

These metrics measure the quality of the trials.

## 4.1. Define a Container Class for the Metrics

In [6]:
from ax.metrics.tensorboard import TensorboardCurveMetric

class SearchMetric(TensorboardCurveMetric):
    @classmethod
    def get_ids_from_trials(cls, trials):
        return {
            trial.index: (Path(log_path) / str(trial.index)).as_posix()
            for trial in trials
        }

    @classmethod
    def is_available_while_running(cls):
        return False

# search_metric = SearchMetric('mnist', 'loss')
# print(f'Search Metric: {search_metric}')

## 4.2. Define seach quality metrics

We will define the following metrics that cater to our multi-objective optimization target:
- Validation Accuracy
- Number of Model Parameters

In [7]:
val_acc = SearchMetric(
    name='val_acc',
    curve_name='val_acc',
    lower_is_better=False
)

num_model_parameters = SearchMetric(
    name='num_params',
    curve_name='num_params',
    lower_is_better=True
)

print(f'Validation Accuracy: {val_acc}')
print(f'Number of Model Parameters: {num_model_parameters}')

Validation Accuracy: SearchMetric('val_acc')
Number of Model Parameters: SearchMetric('num_params')


In [8]:
# from ax.early_stopping.strategies import PercentileEarlyStoppingStrategy

# percentile_early_stopping_strategy = PercentileEarlyStoppingStrategy(
#     # stop if in bottom 70% of runs at the same progression
#     percentile_threshold=70,
#     # the trial must have passed `min_progression` steps before early stopping is initiated
#     # note that we are using `normalize_progressions`, so this is on a scale of [0, 1]
#     min_progression=0.3,
#     # there must be `min_curves` completed trials and `min_curves` trials reporting data in
#     # order for early stopping to be applicable
#     min_curves=5,
#     # specify, e.g., [0, 1] if the first two trials should never be stopped
#     trial_indices_to_ignore=None,
#     # check for new data every 10 seconds
#     seconds_between_polls=10,
#     normalize_progressions=True,
# )

# print(f'Early Stopping Strategy: {percentile_early_stopping_strategy}')

# 5. Setup Optimization Configuration

We'll setup the Multi-Objective Optimization configuration.

In [27]:
from ax.core import MultiObjective, Objective, ObjectiveThreshold
from ax.core.optimization_config import MultiObjectiveOptimizationConfig

optimization_config = MultiObjectiveOptimizationConfig(
    objective=MultiObjective(
        objectives=[
            Objective(metric=val_acc, minimize=False),
            Objective(metric=num_model_parameters, minimize=True)
        ]
    ),
    objective_thresholds=[
        ObjectiveThreshold(metric=val_acc, bound=0.50, relative=False),
        ObjectiveThreshold(metric=num_model_parameters, bound=2_000_000, relative=False),
    ]
)

print(f'Optimization Configuration: {optimization_config}')

# from ax.core.optimization_config import OptimizationConfig

# opt_config = OptimizationConfig(
#     objective=Objective(
#         metric=val_acc,
#         minimize=False,
#     )
# )

Optimization Configuration: MultiObjectiveOptimizationConfig(objective=MultiObjective(objectives=[Objective(metric_name="val_acc", minimize=False), Objective(metric_name="num_params", minimize=True)]), outcome_constraints=[], objective_thresholds=[ObjectiveThreshold(val_acc >= 0.5), ObjectiveThreshold(num_params <= 2000000)])


# 6. Create the Ax Experiment

In [10]:
from ax.core import Experiment

experiment = Experiment(
    name='torchx_mnist',
    search_space=search_space,
    optimization_config=optimization_config,
    # optimization_config=opt_config,
    runner=ax_runner
)

print(f'Experiment: {experiment}')

Experiment: Experiment(torchx_mnist)


# 7. Choose a Generation Strategy

A Generation Strategy describes the method to optimize search space. We'll let Ax automatically choose a generation strategy.

In [11]:
from ax.modelbridge.dispatch_utils import choose_generation_strategy

# 1. Configure the total number of trials in the experiment.
# TOTAL_TRIALS = 48
TOTAL_TRIALS = 2

# 2. Choose a generation strategy
gen_strategy = choose_generation_strategy(
    search_space=experiment.search_space,
    optimization_config=experiment.optimization_config,
    num_trials=TOTAL_TRIALS
)

print(f'Generation Strategy: {gen_strategy}')

[INFO 08-03 11:06:42] ax.modelbridge.dispatch_utils: Using Models.MOO since there are more ordered parameters than there are categories for the unordered categorical parameters.
[INFO 08-03 11:06:42] ax.modelbridge.dispatch_utils: Calculating the number of remaining initialization trials based on num_initialization_trials=None max_initialization_trials=None num_tunable_parameters=6 num_trials=2 use_batch_trials=False
[INFO 08-03 11:06:42] ax.modelbridge.dispatch_utils: calculated num_initialization_trials=5
[INFO 08-03 11:06:42] ax.modelbridge.dispatch_utils: num_completed_initialization_trials=0 num_remaining_initialization_trials=5
[INFO 08-03 11:06:42] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+MOO', steps=[Sobol for 5 trials, MOO for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting.


Generation Strategy: GenerationStrategy(name='Sobol+MOO', steps=[Sobol for 5 trials, MOO for subsequent trials])


# 8. Configure a Scheduler

A scheduler controls the optimization loop. It communicates with the backend to launch trials, check their status, and retrieve results.

In [12]:
from ax.service.scheduler import Scheduler, SchedulerOptions

scheduler = Scheduler(
    experiment=experiment,
    generation_strategy=gen_strategy,
    options=SchedulerOptions(total_trials=TOTAL_TRIALS, max_pending_trials=4, logging_level='DEBUG')
)

print(f'Scheduler: {scheduler}')

[INFO 08-03 11:06:42] Scheduler: `Scheduler` requires experiment to have immutable search space and optimization config. Setting property immutable_search_space_and_opt_config to `True` on experiment.


Scheduler: Scheduler(experiment=Experiment(torchx_mnist), generation_strategy=GenerationStrategy(name='Sobol+MOO', steps=[Sobol for 5 trials, MOO for subsequent trials]), options=SchedulerOptions(max_pending_trials=4, trial_type=<TrialType.TRIAL: 0>, batch_size=None, total_trials=2, tolerated_trial_failure_rate=0.5, min_failed_trials_for_failure_rate_check=5, log_filepath=None, logging_level='DEBUG', ttl_seconds_for_trials=None, init_seconds_between_polls=1, min_seconds_before_poll=1.0, seconds_between_polls_backoff_factor=1.5, timeout_hours=None, run_trials_in_batches=False, debug_log_run_metadata=False, early_stopping_strategy=None, global_stopping_strategy=None, suppress_storage_errors_after_retries=False))


# 9. Run Trials

In [36]:
! python pizza_steak_sushi.py --log_path 'logs/0' --batch_size 2 --embedding_dim 64 --num_heads 2 --num_encoders 2 --learning_rate 0.001 --epochs 2 --subset_size 10
# scheduler.run_all_trials()


[INFO] Toolbox exists locally. Skipping download!
[INFO] /Users/broxoli/.datasets/pizza_steak_sushi directory exists. Skipping download.
Data Path: /Users/broxoli/.datasets/pizza_steak_sushi
Train Path: /Users/broxoli/.datasets/pizza_steak_sushi/train
Test Path: /Users/broxoli/.datasets/pizza_steak_sushi/test
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[INFO] Logging to path: logs/0
[INFO] /Users/broxoli/.datasets/pizza_steak_sushi directory exists. Skipping download.
Data Path: /Users/broxoli/.datasets/pizza_steak_sushi
Train Path: /Users/broxoli/.datasets/pizza_steak_sushi/train
Test Path: /Users/broxoli/.datasets/pizza_steak_sushi/test
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")

  | Name              | Type           | Params 
------------------------------------------------------
0 | embedding_dropout | Dropout        | 0      
1 | patch_embed

# 10. Evaluating Results

## 10.1. Experiment Summary

In [None]:
from ax.service.utils.report_utils import exp_to_df

df = exp_to_df(experiment)
df

Unnamed: 0,trial_index,arm_name,trial_status,generation_method,num_params,val_acc,is_feasible,hidden_size_1,hidden_size_2,learning_rate,dropout,batch_size,epochs
0,0,0_0,COMPLETED,Sobol,23804.0,0.909035,False,26,95,0.002439,0.249959,32,1
1,1,1_0,COMPLETED,Sobol,88323.0,0.949558,False,103,67,0.000268,0.030802,128,3


## 10.2. Pareto Frontier of Trade-Offs between Validation Accuracy and Model Size

In [None]:
from ax.service.utils.report_utils import _pareto_frontier_scatter_2d_plotly

_pareto_frontier_scatter_2d_plotly(experiment)

## 10.3. Cross-Validation of Surrogate Model Predictions with Actual Outcomes

In [None]:
from ax.modelbridge.cross_validation import compute_diagnostics, cross_validate
from ax.plot.diagnostic import interact_cross_validation_plotly
from ax.utils.notebook.plotting import init_notebook_plotting, render

cv = cross_validate(model=gen_strategy.model)  # The surrogate model is stored on the ``GenerationStrategy``
compute_diagnostics(cv)

interact_cross_validation_plotly(cv)

## 10.4. Contour Plots of Two Inputs and their Impact on the Objectives

### 10.4.1 Contour Plot for Validation Accuracy

In [None]:
from ax.plot.contour import interact_contour_plotly

interact_contour_plotly(model=gen_strategy.model, metric_name="val_acc")

### 10.4.2 Contour Plots for Number of Parameters

In [None]:
interact_contour_plotly(model=gen_strategy.model, metric_name="num_params")