NeuroShard

This is the implementation of Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models, accepted by MLSys 2023. Sharding a large machine learning model across multiple devices to balance the costs is important in distributed training. This is challenging because partitioning is NP-hard, and estimating the costs accurately and efficiently is difficult. In this work, we explore a "pre-train, and search" paradigm for efficient sharding. The idea is to pre-train a universal and once-for-all neural network to predict the costs of all the possible shards, which serves as an efficient sharding simulator. Built upon this pre-trained cost model, we then perform an online search to identify the best sharding plans given any specific sharding task.

Neuroshard utilizes pre-trained cost models for embedding table sharding, and generally involves the following steps:

Input generation: A large number of possible table configurations with data augmentation are randomly generated, along with a large number of possible table sharding plans with varying levels of balance.
Pre-training: The cost of each generated data point on hardware is evaluated, and three neural networks are pre-trained to predict computation cost, forward communication cost, and backward communication cost, respectively.
Online search: After training the neural cost models, the optimal sharding plan is identified for any sharding task using a search algorithm. The algorithm does not rely on hardware to obtain costs during the search process, instead using the cost models, which is efficient.

Cite this Work

If you find this project helpful, please cite

@inproceedings{zha2023pretrain,
  title={Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models},
  author={Zha, Daochen and Feng, Louis and Luo, Liang and Bhushanam, Bhargav and Liu, Zirui and Hu, Yusuo and Nie, Jade and Huang, Yuzhen and Tian, Yuandong and Kejariwal, Arun and Hu, Xia},
  booktitle={Conference on Machine Learning and Systems},
  year={2023}
}

Installation

Step 1: install PyTorch

pip3 install torch==1.8.0

Step 2: install FBGEMM

Follow the instructions in https://github.com/pytorch/FBGEMM to install the embedding operators

Step 3: install NeuroShard

pip3 install -e .

Run on the DLRM Dataset

Step 1: Download DLRM dataset

Download the data with git lfs at https://github.com/facebookresearch/dlrm_datasets

Step 2: Process the dataset

python3 tools/gen_dlrm_data.py

Note that you need to change --data argument to the path of the downloaded DLRM dataset.

Step 3: Generate training and testing tasks

python3 tools/gen_tasks.py

Note that --data_dir specifies the DLRM dataset directory, and --out_dir indicates the output directory.

Step 4: Collect cost data

python3 collect_compute_cost_data.py