# Installing MASE (again)

Run the block below to install MASE in the current Colab runtime

In [26]:
# git_token = "YOUR_GIT_TOKEN"
# short_code = "YOUR_SHORT_CODE"

# # Check the current python version (It should be using Python 3.10) and update pip to the latest version.
# !python --version
# !python -m pip install --user --upgrade pip

# # Clone MASE from your branch (the branch must already exist)
# !git clone -b lab1_{short_code} https://{git_token}@github.com/DeepWok/mase.git

# # Install requirements
# !python -m pip install -r ./mase/machop/requirements.txt

# Change working directory to machop
%cd ../../../mase/machop
!./ch --help


[Errno 2] No such file or directory: '../../../mase/machop'
/home/wfp23/ADL/mase/machop


usage: ch [--config PATH] [--task TASK] [--load PATH] [--load-type]
          [--batch-size NUM] [--debug] [--log-level]
          [--report-to {wandb,tensorboard}] [--seed NUM] [--quant-config TOML]
          [--training-optimizer TYPE] [--trainer-precision TYPE]
          [--learning-rate NUM] [--weight-decay NUM] [--max-epochs NUM]
          [--max-steps NUM] [--accumulate-grad-batches NUM]
          [--log-every-n-steps NUM] [--cpu NUM] [--gpu NUM] [--nodes NUM]
          [--accelerator TYPE] [--strategy TYPE] [--auto-requeue]
          [--github-ci] [--disable-dataset-cache] [--target STR]
          [--num-targets NUM] [--pretrained] [--max-token-len NUM]
          [--project-dir DIR] [--project NAME] [-h] [-V] [--info [TYPE]]
          action [model] [dataset]

Chop is a simple utility, part of the MASE tookit, to train, test and
transform (i.e. prune or quantise) a supported model.

main arguments:
  action                action to perform. One of
                        (train|

In [27]:
!ch --help

/bin/bash: ch: command not found


# General introduction

In this lab, you will learn how to use the search functionality in the software stack of MASE.

There are in total 4 tasks you would need to finish.

# Writing a search using MaseGraph Transforms

In this section, our objective is to gain a comprehensive understanding of the construction of the current search function in Mase. To achieve this, we will require these essential components:

- MaseGraph: This component should be already created in the preceding lab.
- Search space: This component encompasses and defines the various available search options.
- Search strategy: An implementation of a search algorithm.
- Runner: This vital component manages and executes training, evaluation, or both procedures while generating a quality metric.

By analyzing these components, we can delve into the workings and effectiveness of the existing search function in Mase.

#Turning your network to a graph

We follow a similar procedure of what you have tried in lab2 to now produce a MaseGraph, this is converted from your pre-trained JSC model:

In [28]:
import sys
import logging
import os
from pathlib import Path
from pprint import pprint as pp

# # figure out the correct path
# machop_path = Path(".").resolve().parent.parent /"machop"
# assert machop_path.exists(), "Failed to find machop at: {}".format(machop_path)
# sys.path.append(str(machop_path))

from chop.dataset import MaseDataModule, get_dataset_info
from chop.tools.logger import set_logging_verbosity

from chop.passes.graph.analysis import (
    report_node_meta_param_analysis_pass,
    profile_statistics_analysis_pass,
)
from chop.passes.graph import (
    add_common_metadata_analysis_pass,
    init_metadata_analysis_pass,
    add_software_metadata_analysis_pass,
)
from chop.tools.get_input import InputGenerator
from chop.ir.graph.mase_graph import MaseGraph

from chop.models import get_model_info, get_model




set_logging_verbosity("info")

batch_size = 8
model_name = "jsc-tiny"
dataset_name = "jsc"


data_module = MaseDataModule(
    name=dataset_name,
    batch_size=batch_size,
    model_name=model_name,
    num_workers=0,
    # custom_dataset_cache_path="../../chop/dataset"
)
data_module.prepare_data()
data_module.setup()

model_info = get_model_info(model_name)
model = get_model(
    model_name,
    task="cls",
    dataset_info=data_module.dataset_info,
    pretrained=False,
    checkpoint = None)

input_generator = InputGenerator(
    data_module=data_module,
    model_info=model_info,
    task="cls",
    which_dataloader="train",
)

dummy_in = next(iter(input_generator))
_ = model(**dummy_in)

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)

[32mINFO    [0m [34mSet logging level to info[0m


#Defining a search space

Based on the previous `pass_args` template, the following code is utilized to generate a search space. The search space is constructed by combining different weight and data configurations in precision setups.

In [31]:
pass_args = {
"by": "type",
"default": {"config": {"name": None}},
"linear": {
        "config": {
            "name": "integer",
            # data
            "data_in_width": 8,
            "data_in_frac_width": 4,
            # weight
            "weight_width": 8,
            "weight_frac_width": 4,
            # bias
            "bias_width": 8,
            "bias_frac_width": 4,
        }
},}

import copy
# build a search space
data_in_frac_widths = [(16, 8), (8, 6), (8, 4), (4, 2)]
w_in_frac_widths = [(16, 8), (8, 6), (8, 4), (4, 2)]
search_spaces = []
for d_config in data_in_frac_widths:
    for w_config in w_in_frac_widths:
        pass_args['linear']['config']['data_in_width'] = d_config[0]
        pass_args['linear']['config']['data_in_frac_width'] = d_config[1]
        pass_args['linear']['config']['weight_width'] = w_config[0]
        pass_args['linear']['config']['weight_frac_width'] = w_config[1]
        # dict.copy() and dict(dict) only perform shallow copies
        # in fact, only primitive data types in python are doing implicit copy when a = b happens
        search_spaces.append(copy.deepcopy(pass_args))

## Defining a search strategy and a runner

The code provided below consists of two main `for` loops. The first `for` loop executes a straightforward brute-force search, enabling the iteration through the previously defined search space.

In contrast, the second `for` loop retrieves training samples from the train data loader. These samples are then utilized to generate accuracy and loss values, which serve as potential quality metrics for evaluating the system's performance.


In [36]:
pass_args = {
"by": "type",
"default": {"config": {"name": None}},
"linear": {
        "config": {
            "name": "integer",
            # data
            "data_in_width": 8,
            "data_in_frac_width": 4,
            # weight
            "weight_width": 8,
            "weight_frac_width": 4,
            # bias
            "bias_width": 8,
            "bias_frac_width": 4,
        }
},}

# grid search
import torch
from torchmetrics.classification import MulticlassAccuracy
import time
import subprocess
import psutil
import numpy as np
import matplotlib.pyplot as plt

from chop.passes.graph.transforms import (
    quantize_transform_pass,
    summarize_quantization_analysis_pass,
)

mg, _ = init_metadata_analysis_pass(mg, None)
mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in})
mg, _ = add_software_metadata_analysis_pass(mg, None)

metric = MulticlassAccuracy(num_classes=5)
num_batchs = 5

start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

def get_gpu_power_usage():
    try:
        smi_output = subprocess.check_output(['nvidia-smi', '--query-gpu=power.draw', '--format=csv,noheader,nounits']).decode().strip()
        power_usage = [float(x) for x in smi_output.split('\n')]  # power usage in watts
        return power_usage, True
    except Exception as e:
        print(f"{e}\nNo GPU found. Monitoring CPU usuage only.")
        return [], False

def get_cpu_utilization(): 
    try:
        cpu_utilization = psutil.cpu_percent(interval=None)
        return cpu_utilization, True
    except Exception as e:
        print(f"{e}\nNo CPU found.")
        return [], False


def plot_metric_search_spaces(metric, search_spaces):
    """
    Plots a 3D bar chart for a given metric across the provided search spaces.
    
    Parameters:
    - metric: str, name of the metric to plot ('accuracies', 'latencies', 'cpu_usages', or 'gpu_usages')
    - search_spaces: list of dicts, each containing configuration and corresponding metrics
    """
    # Extract the metric values for each configuration
    metric_values = [config['config'][metric] for config in search_spaces]

    # Reshape the metric values to fit a 4x4 grid (since we have 16 configurations)
    grid_shape = (4, 4)
    metric_values = np.array(metric_values).reshape(grid_shape)

    # Create the plot
    fig = plt.figure(figsize=(10, 8))
    ax = fig.add_subplot(111, projection='3d')

    # Create grid for plotting
    x, y = np.meshgrid(np.arange(grid_shape[0]), np.arange(grid_shape[1]))
    x = x.flatten()
    y = y.flatten()
    z = np.zeros(grid_shape).flatten()

    # Bar width and depth
    dx = dy = 0.5

    # Plotting the metric as a 3D bar chart
    ax.bar3d(x, y, z, dx, dy, metric_values.flatten(), shade=True)

    # Set plot labels and title
    ax.set_title(f'{metric.capitalize()} across Configurations')
    ax.set_xlabel('Data in Frac Widths')
    ax.set_ylabel('Weights in Frac Widths')
    ax.set_zlabel(metric.capitalize())

    # Show the plot
    plt.show()


def additional_metrics(plot=True):
    # Check for GPU / CPU
    _, gpu_found = get_gpu_power_usage()
    _, cpu_found = get_cpu_utilization()

    recorded_accs = []
    recorded_losses = []
    latencies = []
    gpu_power_usages = []
    cpu_utilizations = []  # List to store CPU utilization for each batch
    cpu_power_usages = []  # List to store estimated CPU power usage for each batch
    cpu_tdp = 25 # Assuming a Thermal Design Power for the CPU of 25 watts

    for i, config in enumerate(search_spaces):
        mg, _ = quantize_transform_pass(mg, config)
        j = 0
        acc_avg, loss_avg = 0, 0
        accs, losses = [], []
        for inputs in data_module.train_dataloader():
            xs, ys = inputs

            # Reset CPU utilization measurement
            _, _ = get_cpu_utilization()  # Call once to reset the measurement

            # Measure GPU power usage before prediction
            if gpu_found:
                gpu_power_before = sum(get_gpu_power_usage()[0])

            # Start measuring time
            if gpu_found:
                start_gpu = torch.cuda.Event(enable_timing=True)
                end_gpu = torch.cuda.Event(enable_timing=True)
                start.record()

            start_time = time.time()
            preds = mg.model(xs)  # Model prediction
            end_time = time.time()
            latency_gpu = 0
            # GPU latency is measured differently to CPU latency
            if gpu_found:
                end.record()
                torch.cuda.synchronize()  # Wait for GPU operations to finish
                latency_gpu = start_gpu.elapsed_time(end_gpu) # measured in milliseconds
                latencies.append(latency_gpu) 
            else:
                latencies.append(end_time - start_time)

            # Measure GPU power usage after prediction
            if gpu_found:
                gpu_power_after = sum(get_gpu_power_usage()[0])
                gpu_power_used = (gpu_power_after - gpu_power_before)  # Measured in W
                gpu_power_usages.append(gpu_power_used)

            # Measure CPU utilization and estimate power usage
            if cpu_found:
                cpu_utilization, _ = get_cpu_utilization()  # Get CPU utilization over operation duration
                cpu_utilizations.append(cpu_utilization)
                estimated_cpu_power = (cpu_utilization / 100) * cpu_tdp  # Measured in W
                cpu_power_usages.append(estimated_cpu_power)

            loss = torch.nn.functional.cross_entropy(preds, ys)
            acc = metric(preds, ys)
            accs.append(acc)
            losses.append(loss)

            if j > num_batchs:
                break
            j += 1

        acc_avg = sum(accs) / len(accs)
        loss_avg = sum(losses) / len(losses)
        recorded_losses.append(loss_avg)
        recorded_accs.append(acc_avg)

    avg_acc = np.mean(recorded_accs)
    print(f"Average Accuracy per Batch: {avg_acc:.4g}")

    avg_acc = np.mean(recorded_losses)
    print(f"Average Loss per Batch: {loss_avg:.4g}")

    avg_latency = np.mean(latencies)
    print(f"Average Latency per Batch: {avg_latency:.4g} milliseconds")

    if gpu_found:
        avg_gpu_power_usage = np.mean(gpu_power_usages)
        print(f"Average GPU Power Usage per Batch: {avg_gpu_power_usage:.4g} W")
    if cpu_found:
        avg_cpu_utilization = np.mean(cpu_utilizations)
        avg_cpu_power_usage = np.mean(cpu_power_usages)
        print(f"Average CPU Utilization per Batch: {avg_cpu_utilization:.4g}%")
        print(f"Average CPU Power Usage per Batch: {avg_cpu_power_usage:.4g} W")

    if plot:
        plot_metric_search_spaces('accuracies', search_spaces)
        plot_metric_search_spaces('latencies', search_spaces)
        if gpu_found:
            plot_metric_search_spaces('gpu_usages', search_spaces)
        if cpu_found:
            plot_metric_search_spaces('cpu_usages', search_spaces)

additional_metrics()

[Errno 2] No such file or directory: 'nvidia-smi'
No GPU found. Monitoring CPU usuage only.


UnboundLocalError: cannot access local variable 'mg' where it is not associated with a value

In [37]:

import torch
from torchmetrics.classification import MulticlassAccuracy

mg, _ = init_metadata_analysis_pass(mg, None)
mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in})
mg, _ = add_software_metadata_analysis_pass(mg, None)

metric = MulticlassAccuracy(num_classes=5)
num_batchs = 5
# This first loop is basically our search strategy,
# in this case, it is a simple brute force search

recorded_accs = []
for i, config in enumerate(search_spaces):
    mg, _ = quantize_transform_pass(mg, config)
    j = 0

    # this is the inner loop, where we also call it as a runner.
    acc_avg, loss_avg = 0, 0
    accs, losses = [], []
    for inputs in data_module.train_dataloader():
        xs, ys = inputs
        preds = mg.model(xs)
        loss = torch.nn.functional.cross_entropy(preds, ys)
        acc = metric(preds, ys)
        accs.append(acc)
        losses.append(loss)
        if j > num_batchs:
            break
        j += 1
    acc_avg = sum(accs) / len(accs)
    loss_avg = sum(losses) / len(losses)
    recorded_accs.append(acc_avg)

KeyError: 'by'

We now have the following task for you:

1. Explore additional metrics that can serve as quality metrics for the search process. For example, you can consider metrics such as latency, model size, or the number of FLOPs (floating-point operations) involved in the model.

2. Implement some of these additional metrics and attempt to combine them with the accuracy or loss quality metric. It's important to note that in this particular case, accuracy and loss actually serve as the same quality metric (do you know why?).



# The search command in the MASE flow

The search flow implemented in MASE is very similar to the one that you have constructed manually, the overall flow is implemented in [search.py](../../machop/chop/actions/search/search.py), the following bullet points provide you pointers to the code base.

- MaseGraph: this is the [MaseGraph](../../machop/chop/passes/graph/mase_graph.py) that you have used in lab2.
- Search space: The base class is implemented in [base.py](../../machop/chop/actions/search/search_space/base.py) , where in the same folder you can see a range of different supported search spaces.
- Search strategy: Similar to the search space, you can find a a base class [definition](../../machop/chop/actions/search/strategies/base.py), where different strategies are also defined in the same folder.
- Runner: Different [runners](../../machop/chop/actions/search/strategies/runners) can produce different metrics, they may also use `transforms` to help compute certain search metrics.

This enables one to execute the search through the MASE command line interface, remember to change the name after the `--load` option.


In [13]:
!./ch search --config configs/examples/jsc_toy_by_type.toml --load your_pre_trained_ckpt

usage: ch [--config PATH] [--task TASK] [--load PATH] [--load-type]
          [--batch-size NUM] [--debug] [--log-level]
          [--report-to {wandb,tensorboard}] [--seed NUM] [--quant-config TOML]
          [--training-optimizer TYPE] [--trainer-precision TYPE]
          [--learning-rate NUM] [--weight-decay NUM] [--max-epochs NUM]
          [--max-steps NUM] [--accumulate-grad-batches NUM]
          [--log-every-n-steps NUM] [--cpu NUM] [--gpu NUM] [--nodes NUM]
          [--accelerator TYPE] [--strategy TYPE] [--auto-requeue]
          [--github-ci] [--disable-dataset-cache] [--target STR]
          [--num-targets NUM] [--pretrained] [--max-token-len NUM]
          [--project-dir DIR] [--project NAME] [-h] [-V] [--info [TYPE]]
          action [model] [dataset]
ch: error: argument --load: file or directory not found


In this scenario, the search functionality is specified in the `toml` configuration file rather than via command-line inputs. This approach is adopted due to the multitude of configuration parameters that need to be set; encapsulating them within a single, elegant configuration file enhances reproducibility.

In `jsc_toy_by_type.toml`, the `search_space` configuration is set in `search.search_space`, the search strategy is configured via `search.strategy`. If you are not familiar with the `toml` syntax, you can read [here](https://toml.io/en/v1.0.0).

> In order to accomplish the following task, it is necessary to make direct modifications to the code base. This can be challenging within the Colab environment. **It is recommended to implement the task on a local setup and utilize Colab strictly as a server to execute the search command above.** Consider Colab as a dedicated server for this purpose.

With now an understanding of how the MASE flow work, consider the following tasks

3. Implement the brute-force search as an additional search method within the system, this would be a new search strategy in MASE.
4. Compare the brute-force search with the TPE based search, in terms of sample efficiency. Comment on the performance difference between the two search methods.