## General Introduction

In this lab, you will learn how to use the search functionality in the software stack of MASE to implement a Network Architecture Search.

There are in total 4 tasks you would need to finish, there is also 1 optional task.

## What is Network Architecture Search?

The design of a network architecture can greatly impact the performance of the model. Consider the following optimization problem:

min_{a∈A} L_{val}(w*(a), a)

s.t. w*(a) = argmin_{w}(L_{train}(w, a))


For an architecture a sampled from a set of architectures A, we are minimizing the loss value when the architecture a is parameterized by w*(a).

Meanwhile, the parameters w*(a) is the particular parameterization that provides the lowest L_{train} given the architecture a.

This is the core optimization problem involved in Network Architecture Search, where several approximations can happen. For instance, we can approximate L_{train} or L_{val}, or the min_{a∈A} can be formulated as a reinforcement learning process and so on.
## A Handwritten JSC Network

We follow a similar procedure of what you have tried in lab3 to setup the dataset, copy and paste the following code snippet to a file, and name it `lab4.py`.



In [108]:
import sys
import logging
import os
from pathlib import Path
from pprint import pprint as pp

# figure out the correct path
machop_path = Path(".").resolve().parent.parent /"machop"
assert machop_path.exists(), "Failed to find machop at: {}".format(machop_path)
sys.path.append(str(machop_path))

from chop.dataset import MaseDataModule, get_dataset_info
from chop.tools.logger import set_logging_verbosity, get_logger

from chop.passes.graph.analysis import (
    report_node_meta_param_analysis_pass,
    profile_statistics_analysis_pass,
)
from chop.passes.graph import (
    add_common_metadata_analysis_pass,
    init_metadata_analysis_pass,
    add_software_metadata_analysis_pass,
)
from chop.tools.get_input import InputGenerator
from chop.ir.graph.mase_graph import MaseGraph

from chop.models import get_model_info, get_model

set_logging_verbosity("info")

logger = get_logger("chop")
logger.setLevel(logging.INFO)

batch_size = 8
model_name = "three_layer_jsc"
dataset_name = "jsc"


data_module = MaseDataModule(
    name=dataset_name,
    batch_size=batch_size,
    model_name=model_name,
    num_workers=0,
)
data_module.prepare_data()
data_module.setup()

model_info = get_model_info(model_name)

input_generator = InputGenerator(
    data_module=data_module,
    model_info=model_info,
    task="cls",
    which_dataloader="train",
)

dummy_in = {"x": next(iter(data_module.train_dataloader()))[0]}


[32mINFO    [0m [34mSet logging level to info[0m


This time we are going to use a slightly different network, so we define it as a Pytorch model, copy and paste this snippet also to `lab4.py`.

> **Note**
>
> MASE integrates seamlessly with native Pytorch models.


In [109]:
from torch import nn
from chop.passes.graph.utils import get_parent_name

# define a new model
class JSC_Three_Linear_Layers(nn.Module):
    def __init__(self):
        super(JSC_Three_Linear_Layers, self).__init__()
        self.seq_blocks = nn.Sequential(
            nn.BatchNorm1d(16),  # 0
            nn.ReLU(16),  # 1
            nn.Linear(16, 16),  # linear  2
            nn.Linear(16, 16),  # linear  3
            nn.Linear(16, 5),   # linear  4
            nn.ReLU(5),  # 5
        )

    def forward(self, x):
        return self.seq_blocks(x)


model = JSC_Three_Linear_Layers()

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)
mg, _ = init_metadata_analysis_pass(mg, None)

## Model Architecture Modification as a Transformation Pass

Similar to what you have done in `lab2`, one can also implement a change in model architecture as a transformation pass:


In [110]:
def instantiate_linear(in_features, out_features, bias):
    if bias is not None:
        bias = True
    return nn.Linear(
        in_features=in_features,
        out_features=out_features,
        bias=bias)

def redefine_linear_transform_pass(graph, pass_args=None):
    main_config = pass_args.pop('config')
    default = main_config.pop('default', None)
    if default is None:
       raise ValueError(f"default value must be provided.")
    i = 0
    for node in graph.fx_graph.nodes:
        i += 1
        # if node name is not matched, it won't be tracked
        config = main_config.get(node.name, default)['config']
        name = config.get("name", None)
        if name is not None:
            ori_module = graph.modules[node.target]
            in_features = ori_module.in_features
            out_features = ori_module.out_features
            bias = ori_module.bias
            if name == "output_only":
                out_features = out_features * config["channel_multiplier"]
            elif name == "both":
                in_features = in_features * config["channel_multiplier"]
                out_features = out_features * config["channel_multiplier"]
            elif name == "input_only":
                in_features = in_features * config["channel_multiplier"]
            elif name =="input_andoutput_only":
                in_features = in_features * config["channel_multiplier1"]
                out_features = out_features * config["channel_multiplier2"]
                print("inandout")
            
            new_module = instantiate_linear(in_features, out_features, bias)
                
            parent_name, name = get_parent_name(node.target)
            setattr(graph.modules[parent_name], name, new_module)
    return graph, {}



pass_config = {
"by": "name",
"default": {"config": {"name": None}},
"seq_blocks_2": {
    "config": {
        "name": "output_only",
        # weight
        "channel_multiplier": 2,
        }
    },
"seq_blocks_3": {
    "config": {
        "name": "both",
        "channel_multiplier": 2,
        }
    },
"seq_blocks_4": {
    "config": {
        "name": "input_only",
        "channel_multiplier": 2,
        }
    },
}

# this performs the architecture transformation based on the config
mg, _ = redefine_linear_transform_pass(
    graph=mg, pass_args={"config": pass_config})


counter:output
counter:both
counter:input


Copy and paste the above coding snippet and run your code. The modified network features linear layers expanded to double their size, yet it's unusual to sequence three linear layers consecutively without interposing any non-linear activations (do you know why?).

So we are interested in a modified network:


In [111]:
#define a new model
class JSC_Three_Linear_Layers(nn.Module):
    def __init__(self):
        super(JSC_Three_Linear_Layers, self).__init__()
        self.seq_blocks = nn.Sequential(
            nn.BatchNorm1d(16),  # 0
            nn.ReLU(16),  # 1
            nn.Linear(16, 16),  # linear seq_2
            nn.ReLU(16),  # 3
            nn.Linear(16, 16),  # linear seq_4
            nn.ReLU(16),  # 5
            nn.Linear(16, 5),  # linear seq_6
            nn.ReLU(5),  # 7
        )

    def forward(self, x):
        return self.seq_blocks(x)


1. Can you edit your code, so that we can modify the above network to have layers expanded to double their sizes? Note: you will have to change the `ReLU` also.


In [112]:
###Part 1
import copy
model = JSC_Three_Linear_Layers()

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)
mg, _ = init_metadata_analysis_pass(mg, None)


pass_config = {
"by": "name",
"default": {"config": {"name": None}},
"seq_blocks_2": {
    "config": {
        "name": "output_only",
        # weight
        "channel_multiplier": 2,
        }
    },

"seq_blocks_4": {
    "config": {
        "name": "both",
        "channel_multiplier": 2,
        }
    },

"seq_blocks_6": {
    "config": {
        "name": "input_only",
        "channel_multiplier": 2,
        }
    },
}

#this performs the architecture transformation based on the config
mg, _ = redefine_linear_transform_pass(
    graph=mg, pass_args={"config": copy.deepcopy(pass_config)})


counter:output
counter:both
counter:input


2. In `lab3`, we have implemented a grid search, can we use the grid search to search for the best channel multiplier value?


In [113]:
###Part 2



import copy
# build a search space
multipliers = [1, 2, 3, 4, 5, 6, 7]
#multipliers = [2, 2, 2, 2]
search_spaces = []
for d in multipliers:
    pass_config['seq_blocks_2']['config']['channel_multiplier'] = d
    pass_config['seq_blocks_4']['config']['channel_multiplier'] = d
    pass_config['seq_blocks_6']['config']['channel_multiplier'] = d
    # dict.copy() and dict(dict) only perform shallow copies
    # in fact, only primitive data types in python are doing implicit copy when a = b happens
    search_spaces.append(copy.deepcopy(pass_config))

In [114]:

import torch
from torchmetrics.classification import MulticlassAccuracy


metric = MulticlassAccuracy(num_classes=5)
num_batchs = 5
# This first loop is basically our search strategy,
# in this case, it is a simple brute force search

recorded_accs = []

for d in enumerate(search_spaces):
    model = get_model(
    model_name,
    task="cls",
    dataset_info=data_module.dataset_info,
    pretrained=False,
    checkpoint = None)

    #_ = model(**dummy_in)
    # generate the mase graph and initialize node metadata
    mg = MaseGraph(model=model)
    mg, _ = init_metadata_analysis_pass(mg, None)
    mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in})
    mg, _ = add_software_metadata_analysis_pass(mg, None)
    mg, _ = redefine_linear_transform_pass(graph=mg, pass_args=copy.deepcopy({"config": pass_config}))
    j = 0

     # this is the inner loop, where we also call it as a runner.
    acc_avg, loss_avg = 0, 0
    accs, losses = [], []
    for inputs in data_module.train_dataloader():
        xs, ys = inputs
        preds = mg.model(xs)
        loss = torch.nn.functional.cross_entropy(preds, ys)
        acc = metric(preds, ys)
        accs.append(acc)
        losses.append(loss)
        if j > num_batchs:
            break
        j += 1
    acc_avg = sum(accs) / len(accs)
    loss_avg = sum(losses) / len(losses)
    recorded_accs.append(acc_avg)
    mga, _ = redefine_linear_transform_pass(graph=mg, pass_args=copy.deepcopy({"config": pass_config}))
print(recorded_accs)

counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
counter:output
counter:both
counter:input
[tensor(0.1810), tensor(0.1589), tensor(0.2298), tensor(0.1833), tensor(0.2238), tensor(0.1810), tensor(0.2893)]


3. You may have noticed, one problem with the channel multiplier is that it scales all layers uniformly, ideally, we would like to be able to construct networks like the following:

In [115]:
# define a new model
class JSC_Three_Linear_Layers(nn.Module):
    def __init__(self):
        super(JSC_Three_Linear_Layers, self).__init__()
        self.seq_blocks = nn.Sequential(
            nn.BatchNorm1d(16),
            nn.ReLU(16),
            nn.Linear(16, 32),  # output scaled by 2
            nn.ReLU(32),  # scaled by 2
            nn.Linear(32, 64),  # input scaled by 2 but output scaled by 4
            nn.ReLU(64),  # scaled by 4
            nn.Linear(64, 5),  # scaled by 4
            nn.ReLU(5),
        )

    def forward(self, x):
        return self.seq_blocks(x)

In [116]:

pass_config = {
"by": "name",
"default": {"config": {"name": None}},
"seq_blocks_2": {
    "config": {
        "name": "output_only",
        # weight
        "channel_multiplier": 2,
        }
    },

"seq_blocks_4": {
    "config": {
        "name": "input_andoutput_only",
        "channel_multiplier1": 2,
        "channel_multiplier2": 2,
        }
    },

"seq_blocks_6": {
    "config": {
        "name": "input_only",
        "channel_multiplier": 4,
        }
    },
}

model = JSC_Three_Linear_Layers()

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)
mg, _ = init_metadata_analysis_pass(mg, None)
mg, _ = redefine_linear_transform_pass(
    graph=mg, pass_args={"config": copy.deepcopy(pass_config)})


counter:output
counter:output
counter:input


4. Integrate the search to the `chop` flow, so we can run it from the command line.

## Optional Task (scaling the search to real networks)

We have looked at how to search, on the architecture level, for a simple linear layer based network. MASE has the following components that you can have a look:

- Cifar10 dataset
- VGG, this is a variant used for CIFAR
- TPE-based Search, implemented using Optuna

Can you define a search space (maybe channel dimension) for the VGG network, and use the TPE-search to tune it?
