High-Accuracy Low-Precision Training

This repo contains a PyTorch implementation of the HALP optimizer from the paper High-Accuracy Low-Precision Training as well as a full-precision SVRG optimizer. It is designed for explanatory purposes rather than high-performance.

Getting Started

git clone git@github.com:HazyResearch/torchhalp.git && cd torchhalp
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py install
pytest test/ -v

This only supports PyTorch version 0.3.1 or lower.

Use in Other PyTorch Code

To add the optimizers to your existing PyTorch code:

Import the optimizer from torchhalp.optim import HALP
Change the optimizer to optimizer = HALP(model.parameters(), lr=args.lr, T=T, data_loader=train_loader)
Add a closure method which takes a datapoint and target, and recomputes the gradient.

def closure(data=data, target=target):
	data = Variable(data, requires_grad=False)
	target = Variable(target, requires_grad=False)
    if cuda:
        data, target = data.cuda(), target.cuda()
    output = model(data)
    loss = loss_fn(output, target)
    loss.backward()
    return loss

Pass the closure method to the step function when you call optimizer.step(closure).

Examples

We include examples for linear regression and ResNet-18 on CIFAR-10.

Notes

This is meant to be a simulation to evaluate the effect of HALP on accuracy, but as a simulation, this implementation adds overhead with quantization.
The SVRG and HALP optimizers take two additional arguments as compared to the SGD optimizer, T and data_loader. T indicates how often the full gradient over the entire dataset, a key step in the SVRG algorithm, is taken, where T is the number of batches in between updating the full gradient. The data_loader argument requires a PyTorch DataLoader, such that the gradient over the full dataset can be initiated internally in the optimizer. The HALP optimizer has the additional arguments of mu, bits, and unbiased which affect the quantization, where mu contributes to the dynamic rescaling, bits is the number of bits used for the quantized numbers, and unbiased indicates whether stochastic rounding is used.
Currently, the SVRG and HALP optimizers don’t support multiple per-parameter options and parameter groups.
Stateful LSTMs are not supported due to the optimizer's self-contained nature. However, we can still use learned hidden layers or stateless LSTMs.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
examples		examples
test		test
torchhalp		torchhalp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-Accuracy Low-Precision Training

Getting Started

Use in Other PyTorch Code

Examples

Notes

About

Releases

Packages

Languages

License

HazyResearch/torchhalp

Folders and files

Latest commit

History

Repository files navigation

High-Accuracy Low-Precision Training

Getting Started

Use in Other PyTorch Code

Examples

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages