Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add two bit compression operator #7512

Closed
wants to merge 54 commits into from
Closed

Add two bit compression operator #7512

wants to merge 54 commits into from

Conversation

aksnzhy
Copy link
Contributor

@aksnzhy aksnzhy commented Aug 17, 2017

This is the implementation of two-bit compression.

Usage:

> import mxnet as mx
>
> grad = mx.nd.array([-6, -2, 3, 1, 10, 5, -3, 2, -8, 0])
> residual = mx.nd.array([-3, 1, -1, 5, -2, 2, 3, -7, -2, -100])
> neg_threshold = mx.nd.array([-4.0])
> pos_threshold = mx.nd.array([4.0])
> out = mx.contrib.nd.quantize_2bit(grad, residual, neg_threshold, pos_threshold)
> mx.contrib.nd.dequantize_2bit(out, grad)
>
> out
[ -4.00000000e+00   4.00000000e+00   7.40468810e-39]
<NDArray 3 @cpu(0)>
>
> residual
[ -5.  -1.   2.   2.   4.   3.   0.  -1.  -6. -96.]
<NDArray 10 @cpu(0)>
>
> grad
[-4.  0.  0.  4.  4.  4.  0. -4. -4. -4.]
<NDArray 10 @cpu(0)>

@piiswrong
Copy link
Contributor

Is the output actually compressed? Looks like it's still float?

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 17, 2017

It is actually compressed. The float data is just a holder, in which 16 float data will be compressed into one float data.

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 17, 2017

For now, this method can only support to compress the float data.

@piiswrong
Copy link
Contributor

We may want to consider adding a data type like int2

@piiswrong
Copy link
Contributor

How is this going to be used by kvstore and ps-lite? How does it recognize a compressed array?

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 17, 2017

For multi-GPU, we can directly invoke quantize_2bit() before copying data to other gpu, and invoke dequantize_2bit() when receiving the data block. But in distributed training I'm not sure if we need to hack the ps-lite code, to invoke a dequantize function before updating the model ?

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 17, 2017

The compressed array is also a NDArray, the first two element is two threshold, the other elements is the compressed data. Every 16 data will be pack into one float .

@piiswrong
Copy link
Contributor

How are gradients from different gpus aggregated?

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 18, 2017

Before aggregating the gradient, we first need to decompress them into a new array with same size of the original gradient array, but the value new array is changed, only consist of two threshold. The new array can be re-used.

@piiswrong
Copy link
Contributor

quantize also need to output the residual right?

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 18, 2017

Yes, before compressing we need to add the residual, and we also need to calculate new residual after compressing. I think it will be better to let user do this task. This operator is just a very low-level operator. We can build higher level python wrapper for that.

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 18, 2017

The residual is stored in each local device.

@piiswrong
Copy link
Contributor

You can calculate compression and residual in one operator to make it faster

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 18, 2017

I will do that .

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 21, 2017

@piiswrong I add the calculation of residual in this operator and add test case in test file.

@aksnzhy aksnzhy changed the title add two bit compression operator Add two bit compression operator Aug 22, 2017
@piiswrong
Copy link
Contributor

Where are the tests? I don't see any

@aksnzhy
Copy link
Contributor Author

aksnzhy commented Aug 24, 2017

I removed the test file just to verify if it can be built successfully. I will add them later.

@piiswrong piiswrong closed this Nov 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants