Automated Deep Compression status #64

amjad-twalo · 2018-10-26T03:45:24Z

Hello there,
I am wondering about the state of the ADC implementation, and what remains to bring it to a functional state.
In the ADC merge commit message, you mentioned that it is still WiP and that it is using an unreleased version of Coach. Is that still the case?
Also, is there any documentation for how to use ADC in Distiller?

Thanks

nzmora · 2018-10-29T11:17:35Z

Hi,

Currently the status of ADC (now AMC: https://arxiv.org/abs/1802.03494) is unchanged. I'll update when we have something that can be shared.

Cheers
Neta

amjad-twalo · 2018-11-01T09:40:01Z

Thanks for the response :)
As far as I looked, the implementation seems to be almost done. if the remaining work is clear and you're open for contributions, I can set aside some time to finish it up.
I have been using distiller for a while now, and it saved me a lot of time. It would be awesome to have AMC up and running on it.

Cheers,
Amjad

nzmora · 2018-11-04T12:35:24Z

Hi Amjad,

I'm happy to hear that you're using Distiller and find it useful!
I'll be returning from Beijing in a couple of weeks and then I'll spend some time to synchronize Distiller with the public Coach APIs, and we can then see how to work together to get AMC working ASAP.
I appreciate the help!

Cheers,
Neta

amjad-twalo · 2018-12-11T09:09:39Z

Hey Neta,
Any update regarding this? I think I will have some time to work on it in the next couple of weeks.

Cheers,
Amjad

nzmora · 2018-12-14T13:19:02Z

Sorry Amjad, I still haven't completed the move to the public v.0.11.0 Coach. I'm currently pushing code that's still integrated with an older, private branch, of Coach.
I'll let you know as soon as I commit a version that can work with public Coach.
Thanks,
Neta

nzmora · 2018-12-16T21:54:55Z

Hi Amjad,
I pushed a commit that integrates Distiller with the Coach master branch (requires one PR I pushed to Coach - see details in the Distiller commit).
Currently only R_flops (AccuracyGuaranteed Compression) is enabled.
It converges to a solution quickly after finishing the first 100 exploration episodes, but the converged solution is unsatisfactory. I tried it on Plain-20 and VGG16 - both for CIFAR.
There are several open issues, which I won't enumerate right now - first, I need to try to better understand what's going on.

Cheers,
Neta

HKLee2040 · 2018-12-18T07:46:49Z

@nzmora
*NOTE: you may need to update TensorFlow to the expected version:
$ pip3 install tensorflow==1.9.0

Does that mean I have to install cuda 9.0 if I want to try AMC?

nzmora · 2018-12-19T14:26:48Z

Hi @HKLee2040
No, installing TF 1.9.0 doesn't not require upgrading CUDA.

Cheers
Neta

nzmora · 2019-01-01T20:12:02Z

See https://github.com/NervanaSystems/distiller/blob/amc/examples/automated_deep_compression/amc-results.ipynb.

Work on AMC currently takes place in branch 'amc'. Your help is more than welcome.
Cheers
Neta

nzmora · 2019-01-06T19:13:23Z

After switching to using Clipped PPO I'm getting very encouraging results. See: https://github.com/NervanaSystems/distiller/wiki/AutoML-for-Model-Compression-(AMC):-Trials-and-Tribulations

huxianer · 2019-01-10T10:38:54Z

@nzmora @nzmora Could you share plain20.checkpoint.pth.tar,Thanks!

nzmora · 2019-01-10T11:15:20Z

@huxianer the schedule file for training Plain20 is here. It took me about 33 minutes on 4-GPUs.

However, since you've asked :-), I've also uploaded the image here:
https://drive.google.com/file/d/1bBhjjxkXjFHmqfTWKnxop3n6QCN8QfZJ/view?usp=sharing

Cheers,
Neta

huxianer · 2019-01-10T12:00:20Z

@nzmora Thank you very much! I have another question to ask you,I found the top1 performance is really unchanged when I dont use the pretrained model.So, if I dont have the pretrained model,what
can I do?

nzmora · 2019-01-10T12:29:00Z

Hi @huxianer,
I am not sure I understood your question, so I will answer according to what I understood.

I think you are asking how to train using AMC if we don't have a pre-trained model of the network we are compressing.
The answer is that you must have a pre-trained model because "We aim to automatically find the redundancy for each layer, characterized by sparsity. We train an reinforcement learning agent to predict the action and give the sparsity, then perform form the pruning. We quickly evaluate the accuracy after pruning but before fine-tuning as an effective delegate of final accuracy" (section 3, page 4). You can only "find the redundancy for each layer" if you are searching a pre-trained model. If the model is not trained, you cannot find any redundancy because the weights do not have any meaning (they are randomly distributed).

I hope this helps,
Neta

HKLee2040 · 2019-01-10T13:50:03Z

Why the smooth_top1 and smooth_reward are overlapping in my "Performance Data" diagram?
I have some modifications:
Due to only one GPU in my environment, so I modify "conv_op = g.find_op(normalize_module_name(name))" to "conv_op = g.find_op(name)".

And args.amc_target_density = None, so I add
args.amc_target_density = 0.5;
in my code.

nzmora · 2019-01-10T19:06:05Z

Hi @HKLee2040

I have some modifications:

I will need to fix the code for the case of one GPU.

Why the smooth_top1 and smooth_reward are overlapping in my "Performance Data" diagram?

I don't know which protocol you are using ("mac-constrained" or "accuracy-guaranteed"), but both are highly correlated to the Top1 accuracy:

So it makes sense that you will see an overlap when the graphs are smoothed (I smoothed using a simple moving average) because the signal noise is made less noticeable in both the reward and accuracy signals. You can see an example here.

Having said that, I think that you ask a good question. I think that this is a clue as to why the reward defined in the AMC paper, for accuracy-guaranteed-compression, is not so good. The solutions converge on maximum density for all layers (you can see this in the green bars here) - probably because the agent tries to maximize the Top1 accuracy - and not enough weight is given to the MACs (FLOPs) in the reward (5).
This is my conjecture at the moment.

Thanks,
Neta

nzmora · 2019-01-11T12:53:10Z

Hi @HKLee2040,

My protocol is "mac-constrained". The reward fn should be top1/100.
But why the blue and green line your Performance Data are so different?

Thanks for the persistency. The shift you see is an illusion (and causes confusion, I guess) and is caused by the fact that the reward and Top1 accuracy use different axes (top1 on the right; reward on the left). The reward's range is [0..1] and the accuracy is [0..100] and because their values are correlated exactly (reward = 1/100 as you wrote above) they should align. However, when we draw the MAC values, also on the left axis, they distort the relativity of the axes (they shift relative to one another). You can see this if you disable the rendering of the MACs graphs, or if you set the ylim of the axes. For example:

def plot_performance(alpha, window_size, top1, macs, params, reward, start=0, end=-1):
    plot_kwargs = {"figsize":(15,7), "lw": 1, "alpha": alpha, "title": "Performance Data"}
    smooth_kwargs = {"lw": 2 if window_size > 0 else 1, "legend": True}
    if macs:
        ax = df['normalized_macs'][start:end].plot(**plot_kwargs, color="r")
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_normalized_macs'] = smooth(df['normalized_macs'], window_size)
        df['smooth_normalized_macs'][start:end].plot(**smooth_kwargs, color="r")
    if top1:
        ax = df['top1'][start:end].plot(**plot_kwargs, color="b", grid=True)
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_top1'] = smooth(df['top1'], window_size)
        df['smooth_top1'][start:end].plot(**smooth_kwargs, color="b")
    if params:
        ax = df['normalized_nnz'][start:end].plot(**plot_kwargs, color="black")
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_normalized_nnz'] = smooth(df['normalized_nnz'], window_size)
        df['smooth_normalized_nnz'][start:end].plot(**smooth_kwargs, color="black")        
    if reward:
        ax = df['reward'][start:end].plot(**plot_kwargs, secondary_y=True, color="g")
        ax.set(xlabel="Episode", ylabel="reward", ylim=[0,1.0])
        df['smooth_reward'] = smooth(df['reward'], window_size)
        df['smooth_reward'][start:end].plot(**smooth_kwargs, secondary_y=True, color="g")    
    ax.grid(True, which='minor', axis='x', alpha=0.3)

I uploaded my raw log files to here and you can load and try them.

Still, you ask why for you the graphs overlap and for me they don't. This is because, in my files, the big drop in the MACs (at episode 3474; to ~5%) causes the left and right axes to shift and they become unaligned.

Cheers
Neta

HKLee2040 · 2019-01-11T14:18:26Z

Hi @nzmora

Got it! It's my carelessness. I didn't check the scale of axes.
Thanks for your detailed reply.

HKLee2040 · 2019-01-15T07:39:51Z

Hi @nzmora

May I know why you set pi_lr = 1e-4, q_lr = 1e-3 in ddpg?
Do you refer to arXiv:1811.08886, where they use a fixed learning rate of 1e−4 for the actor network and 1e−3 for the critic network.

    ddpg.ddpg(env=env1, test_env=env2, actor_critic=core.mlp_actor_critic,
              ac_kwargs=dict(hidden_sizes=[hid]*layers, output_activation=tf.sigmoid),
              gamma=1,  # discount rate
              seed=seed,
              epochs=400,
              replay_size=2000,
              batch_size=64,
              start_steps=env1.amc_cfg.num_heatup_epochs,
              steps_per_epoch=800 * env1.num_layers(),  # every 50 episodes perform 10 episodes of testing
              act_noise=0.5,
              pi_lr=1e-4,
              q_lr=1e-3,
              logger_kwargs=logger_kwargs)

nzmora · 2019-01-15T08:53:26Z

Hi @HKLee2040,
I got these numbers from the DDPG paper
Continuous control with deep reinforcement learning.
Cheers
Neta

huxianer · 2019-01-15T09:00:59Z

@nzmora Hi，How do you get the YAML file of pruning schedule,Could you share the pruning schedule YAML file of resnet trained in IMAGENET,THKS!

nzmora · 2019-01-15T11:42:28Z

Hi @huxianer,
I'm not sure I understand which YAML file you refer to. AMC/ADC currently works w/o YAML.
There are some sample YAML files using other techniques. For example AGP.
Cheers
Neta

huxianer · 2019-01-16T03:04:52Z

@nzmora @HKLee2040 I refer to every YAML file,here give it directly,but it does not say how to get it.You say AMC/ADC currently works w/o YAML,could you give an example which without YAML file,Thank you for your help!

HKLee2040 · 2019-01-16T08:06:45Z

Hi @huxianer

You can refer to nzmora's message
https://github.com/NervanaSystems/distiller/issues/64#issuecomment-451766455

The command-line is:
python3 compress_classifier.py --arch=plain20_cifar ../../../data.cifar --amc --resume=checkpoint.plain20_cifar.pth.tar --lr=0.05 --amc-action-range 0.0 0.80 --vs=0.8

huxianer · 2019-02-18T08:23:50Z

@nzmora Hi,whether this Distiller supports detection model,and if not,do you have any intention to support it？

RizhaoCai · 2019-07-31T12:42:57Z

I am also interested in using AMC for detection models. How about the progress now?

nzmora · 2019-08-06T21:49:03Z

Hi @huxianer , @RizhaoCai ,

I merged the revised AMC implementation to 'master'. You can now try our auto-compression code.
I'll add more information on the setup soon.

It currently doesn't support object detection. @levzlotnik is working on adding an example of object detection, after which we will consider automating. If you happen to integrate object-detection with AMC, we'd be interested in considering it for integration into the Distiller code-base.
Cheers,
Neta

Cheers
Neta

wangyidong3 · 2019-11-28T03:38:51Z

Hi @levzlotnik @nzmora
Thank you for your great work!
Is there any update for the example of object detection with AMC? Or do you have any suggestions?
Thanks.

nzmora assigned amjad-twalo and nzmora Nov 4, 2018

nzmora added the automated compression Automating compression (not limited to AutoML for compression) label Nov 4, 2018

nzmora mentioned this issue Jan 22, 2019

[AMC] can't refine the model #130

Closed

nzmora closed this as completed Apr 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Deep Compression status #64

Automated Deep Compression status #64

amjad-twalo commented Oct 26, 2018

nzmora commented Oct 29, 2018

amjad-twalo commented Nov 1, 2018

nzmora commented Nov 4, 2018 •

edited

amjad-twalo commented Dec 11, 2018

nzmora commented Dec 14, 2018

nzmora commented Dec 16, 2018

HKLee2040 commented Dec 18, 2018

nzmora commented Dec 19, 2018

nzmora commented Jan 1, 2019

nzmora commented Jan 6, 2019

huxianer commented Jan 10, 2019

nzmora commented Jan 10, 2019

huxianer commented Jan 10, 2019

nzmora commented Jan 10, 2019

HKLee2040 commented Jan 10, 2019

nzmora commented Jan 10, 2019 •

edited

nzmora commented Jan 11, 2019

HKLee2040 commented Jan 11, 2019

HKLee2040 commented Jan 15, 2019

nzmora commented Jan 15, 2019

huxianer commented Jan 15, 2019

nzmora commented Jan 15, 2019

huxianer commented Jan 16, 2019

HKLee2040 commented Jan 16, 2019 •

edited

huxianer commented Feb 18, 2019

RizhaoCai commented Jul 31, 2019

nzmora commented Aug 6, 2019

wangyidong3 commented Nov 28, 2019

Automated Deep Compression status #64

Automated Deep Compression status #64

Comments

amjad-twalo commented Oct 26, 2018

nzmora commented Oct 29, 2018

amjad-twalo commented Nov 1, 2018

nzmora commented Nov 4, 2018 • edited

amjad-twalo commented Dec 11, 2018

nzmora commented Dec 14, 2018

nzmora commented Dec 16, 2018

HKLee2040 commented Dec 18, 2018

nzmora commented Dec 19, 2018

nzmora commented Jan 1, 2019

nzmora commented Jan 6, 2019

huxianer commented Jan 10, 2019

nzmora commented Jan 10, 2019

huxianer commented Jan 10, 2019

nzmora commented Jan 10, 2019

HKLee2040 commented Jan 10, 2019

nzmora commented Jan 10, 2019 • edited

nzmora commented Jan 11, 2019

HKLee2040 commented Jan 11, 2019

HKLee2040 commented Jan 15, 2019

nzmora commented Jan 15, 2019

huxianer commented Jan 15, 2019

nzmora commented Jan 15, 2019

huxianer commented Jan 16, 2019

HKLee2040 commented Jan 16, 2019 • edited

huxianer commented Feb 18, 2019

RizhaoCai commented Jul 31, 2019

nzmora commented Aug 6, 2019

wangyidong3 commented Nov 28, 2019

nzmora commented Nov 4, 2018 •

edited

nzmora commented Jan 10, 2019 •

edited

HKLee2040 commented Jan 16, 2019 •

edited