This repo contains a PyTorch implementation of the HALP optimizer from the paper High-Accuracy Low-Precision Training as well as a full-precision SVRG optimizer. It is designed for explanatory purposes rather than high-performance.
git clone git@github.com:HazyResearch/torchhalp.git && cd torchhalp
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py install
pytest test/ -v
This only supports PyTorch version 0.3.1 or lower.
To add the optimizers to your existing PyTorch code:
- Import the optimizer
from torchhalp.optim import HALP
- Change the optimizer to
optimizer = HALP(model.parameters(), lr=args.lr, T=T, data_loader=train_loader)
- Add a closure method which takes a datapoint and target, and recomputes the gradient.
def closure(data=data, target=target):
data = Variable(data, requires_grad=False)
target = Variable(target, requires_grad=False)
if cuda:
data, target = data.cuda(), target.cuda()
output = model(data)
loss = loss_fn(output, target)
loss.backward()
return loss
- Pass the closure method to the step function when you call
optimizer.step(closure)
.
We include examples for linear regression and ResNet-18 on CIFAR-10.
-
This is meant to be a simulation to evaluate the effect of HALP on accuracy, but as a simulation, this implementation adds overhead with quantization.
-
The SVRG and HALP optimizers take two additional arguments as compared to the SGD optimizer,
T
anddata_loader
.T
indicates how often the full gradient over the entire dataset, a key step in the SVRG algorithm, is taken, whereT
is the number of batches in between updating the full gradient. Thedata_loader
argument requires a PyTorch DataLoader, such that the gradient over the full dataset can be initiated internally in the optimizer. The HALP optimizer has the additional arguments ofmu
,bits
, andunbiased
which affect the quantization, wheremu
contributes to the dynamic rescaling,bits
is the number of bits used for the quantized numbers, andunbiased
indicates whether stochastic rounding is used. -
Currently, the SVRG and HALP optimizers don’t support multiple per-parameter options and parameter groups.
-
Stateful LSTMs are not supported due to the optimizer's self-contained nature. However, we can still use learned hidden layers or stateless LSTMs.