This repository contains instructions and examples for efficient neural architecture discovery and optimization solutions developed at Intel Labs.
Shears integrates cost-effective sparsity and Neural Low-rank adapter Search (NLS) to further improve the efficiency of Parameter-Efficient Fine-Tuning (PEFT) approaches.
BootstrapNAS automates the generation of weight-sharing super-networks using the Neural Network Compression Framework (NNCF).
This is an initial exploration of using weight-sharing NAS for the compression of large language models. We explore a search space of elastic low-rank adapters while reducing full-scale NAS's memory and compute requirements. This results in high-performing compressed models obtained from weight-sharing super-networks. We investigate the benefits and limitations of this method, motivating follow-up work.
Integrating neural architecture search (NAS) and network pruning techniques, we effectively generate and train weight-sharing super-networks that contain efficient, high-performing, and compressed transformer-based models. A common challenge in NAS is designing the search space, for which we propose a method to automatically obtain the boundaries of the search space and then derive the rest of the intermediate possible architectures using a first-order weight importance technique. The proposed end-to-end NAS solution, EFTNAS, discovers efficient subnetworks that have been compressed and fine-tuned for downstream NLP tasks.
EZNAS is a genetic programming-driven methodology for automatically discovering Zero-Cost Neural Architecture Scoring Metrics (ZC-NASMs).