Accepted as a spotlight paper at ICLR 2021.
.
├── hw_nas_bench_api # HW-NAS-Bench API
│ ├── fbnet_models # FBNet's space
│ └── nas_201_models # NAS-Bench-201's space
│ ├── cell_infers
│ ├── cell_searchs
│ ├── config_utils
│ ├── shape_infers
│ └── shape_searchs
└── nas_201_api # NAS-Bench-201 API
The code has the following dependencies:
- python >= 3.6.10
- pytorch >= 1.2.0
- numpy >= 1.18.5
No addtional file needs to be downloaded, our HW-NAS-Bench dataset has been included in this repository.
[Optional] If you want to use NAS-Bench-201 to access information about the architectures' accuracy and loss, please follow the NAS-Bench-201 guide, and download the NAS-Bench-201-v1_1-096897.pth.
More usage can be found in our jupyter notebook example
- Create an API instance from a file:
from hw_nas_bench_api import HWNASBenchAPI as HWAPI
hw_api = HWAPI("HW-NAS-Bench-v1_0.pickle", search_space="nasbench201")
- Show the real measured/estimated hardware-cost in different datasets:
# Example to get all the hardware metrics in the No.0,1,2 architectures under NAS-Bench-201's Space
for idx in range(3):
for dataset in ["cifar10", "cifar100", "ImageNet16-120"]:
HW_metrics = hw_api.query_by_index(idx, dataset)
print("The HW_metrics (type: {}) for No.{} @ {} under NAS-Bench-201: {}".format(type(HW_metrics),
Corresponding printed information:
===> Example to get all the hardware metrics in the No.0,1,2 architectures under NAS-Bench-201's Space
The HW_metrics (type: <class 'dict'>) for No.0 @ cifar10 under NAS-Bench-201: {'edgegpu_latency': 5.807418537139893, 'edgegpu_energy': 24.226614330768584, 'raspi4_latency': 10.481976820010459, 'edgetpu_latency': 0.9571811309997429, 'pixel3_latency': 3.6058499999999998, 'eyeriss_latency': 3.645620000000001, 'eyeriss_energy': 0.6872827644999999, 'fpga_latency': 2.57296, 'fpga_energy': 18.01072}
...
- Show the real measured/estimated hardware-cost for a single architecture:
# Example to get use the hardware metrics in the No.0 architectures in CIFAR-10 under NAS-Bench-201's Space
print("===> Example to get use the hardware metrics in the No.0 architectures in CIFAR-10 under NAS-Bench-201's Space")
HW_metrics = hw_api.query_by_index(0, "cifar10")
for k in HW_metrics:
if "latency" in k:
unit = "ms"
else:
unit = "mJ"
print("{}: {} ({})".format(k, HW_metrics[k], unit))
Corresponding printed information:
===> Example to get use the hardware metrics in the No.0 architectures in CIFAR-10 under NAS-Bench-201's Space
edgegpu_latency: 5.807418537139893 (ms)
edgegpu_energy: 24.226614330768584 (mJ)
raspi4_latency: 10.481976820010459 (ms)
edgetpu_latency: 0.9571811309997429 (ms)
pixel3_latency: 3.6058499999999998 (ms)
eyeriss_latency: 3.645620000000001 (ms)
eyeriss_energy: 0.6872827644999999 (mJ)
fpga_latency: 2.57296 (ms)
fpga_energy: 18.01072 (mJ)
- Create the network from api:
# Create the network
config = hw_api.get_net_config(0, "cifar10")
print(config)
from hw_nas_bench_api.nas_201_models import get_cell_based_tiny_net
network = get_cell_based_tiny_net(config) # create the network from configurration
print(network) # show the structure of this architecture
Corresponding printed information:
{'name': 'infer.tiny', 'C': 16, 'N': 5, 'arch_str': '|avg_pool_3x3~0|+|nor_conv_1x1~0|skip_connect~1|+|nor_conv_1x1~0|skip_connect~1|skip_connect~2|', 'num_classes': 10}
TinyNetwork(
TinyNetwork(C=16, N=5, L=17)
(stem): Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(cells): ModuleList(
(0): InferCell(
info :: nodes=4, inC=16, outC=16, [1<-(I0-L0) | 2<-(I0-L1,I1-L2) | 3<-(I0-L3,I1-L4,I2-L5)], |avg_pool_3x3~0|+|nor_conv_1x1~0|skip_connect~1|+|nor_conv_1x1~0|skip_connect~1|skip_connect~2|
(layers): ModuleList(
(0): POOLING(
(op): AvgPool2d(kernel_size=3, stride=1, padding=1)
)
(1): ReLUConvBN(
...
When measuring the hardware-cost, a template is shared among different devices to collect the hardware-cost, as shown below. Different devices will use their own collect() function.
def collect(arch_idx, dataset):
...
device_name = "DEVICE"
import numpy as np
for dataset in DATASET_LIST: # repeat on the targe dataset
metric_list = []
for arch_idx in ARCH_IDX_LIST: # repeat on the architectures in the space
metric = collect(arch_idx, dataset)
metric_list.appencd(metric)
assert len(metric_list) == len(ARCH_IDX_LIST)
# save to npy
metric_npy = np.array(metric_list)
np.save("measurements_logs/{}/{}.npy".format(device_name, dataset), metric_npy)
For example, when measuring the latency and energy in the EdgeGPU, we replace the collect() with the following function, and os.system() is used to do each experiments on a specific architecture on a specific dataset sperately to avoid the remaining process in the system that brings extra errors. And the EdgeGPU_Benchmark.py is contained in the project folder.
import os
import pickle
def collect(arch_idx, dataset):
# CMD to run for measure one architecture @ one dataset
CMD_to_run = "python3 EdgeGPU_Benchmark.py \
--arch_idx {} \
--log_label assigned_tasks \
--num_repeats_item 50 \
--dataset {}".format(arch_idx, dataset)
os.system(CMD_to_run)
path = os.path.join("measurements_logs", "{}_arch_idx_{}_num_repeats_{}_label_{}.pkl".format(dataset, arch_idx, 50, assigned_tasks))
with open(path, 'rb') as f:
res = pickle.load(f)
return [res["energy"], res["latency"]]
A complete project folder is here.
Part of the devices used in HW-NAS-Bench:
- The code is inspired by NAS-Bench-201.
@inproceedings{
li2021hwnasbench,
title={{\{}HW{\}}-{\{}NAS{\}}-Bench: Hardware-Aware Neural Architecture Search Benchmark},
author={Chaojian Li and Zhongzhi Yu and Yonggan Fu and Yongan Zhang and Yang Zhao and Haoran You and Qixuan Yu and Yue Wang and Yingyan (Celine) Lin},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=_0kaDkv3dVf}
}
Copyright (c) 2022 GaTech-EIC. All rights reserved.
Licensed under the MIT license.