Skip to content
This repository has been archived by the owner on May 1, 2023. It is now read-only.

Automated Deep Compression status #64

Closed
amjad-twalo opened this issue Oct 26, 2018 · 28 comments
Closed

Automated Deep Compression status #64

amjad-twalo opened this issue Oct 26, 2018 · 28 comments
Assignees
Labels
automated compression Automating compression (not limited to AutoML for compression)

Comments

@amjad-twalo
Copy link

Hello there,
I am wondering about the state of the ADC implementation, and what remains to bring it to a functional state.
In the ADC merge commit message, you mentioned that it is still WiP and that it is using an unreleased version of Coach. Is that still the case?
Also, is there any documentation for how to use ADC in Distiller?

Thanks

@nzmora
Copy link
Contributor

nzmora commented Oct 29, 2018

Hi,

Currently the status of ADC (now AMC: https://arxiv.org/abs/1802.03494) is unchanged. I'll update when we have something that can be shared.

Cheers
Neta

@amjad-twalo
Copy link
Author

Thanks for the response :)
As far as I looked, the implementation seems to be almost done. if the remaining work is clear and you're open for contributions, I can set aside some time to finish it up.
I have been using distiller for a while now, and it saved me a lot of time. It would be awesome to have AMC up and running on it.

Cheers,
Amjad

@nzmora
Copy link
Contributor

nzmora commented Nov 4, 2018

Hi Amjad,

I'm happy to hear that you're using Distiller and find it useful!
I'll be returning from Beijing in a couple of weeks and then I'll spend some time to synchronize Distiller with the public Coach APIs, and we can then see how to work together to get AMC working ASAP.
I appreciate the help!

Cheers,
Neta

@nzmora nzmora added the automated compression Automating compression (not limited to AutoML for compression) label Nov 4, 2018
@amjad-twalo
Copy link
Author

Hey Neta,
Any update regarding this? I think I will have some time to work on it in the next couple of weeks.

Cheers,
Amjad

@nzmora
Copy link
Contributor

nzmora commented Dec 14, 2018

Sorry Amjad, I still haven't completed the move to the public v.0.11.0 Coach. I'm currently pushing code that's still integrated with an older, private branch, of Coach.
I'll let you know as soon as I commit a version that can work with public Coach.
Thanks,
Neta

@nzmora
Copy link
Contributor

nzmora commented Dec 16, 2018

Hi Amjad,
I pushed a commit that integrates Distiller with the Coach master branch (requires one PR I pushed to Coach - see details in the Distiller commit).
Currently only R_flops (AccuracyGuaranteed Compression) is enabled.
It converges to a solution quickly after finishing the first 100 exploration episodes, but the converged solution is unsatisfactory. I tried it on Plain-20 and VGG16 - both for CIFAR.
There are several open issues, which I won't enumerate right now - first, I need to try to better understand what's going on.

Cheers,
Neta

@HKLee2040
Copy link

@nzmora
*NOTE: you may need to update TensorFlow to the expected version:
$ pip3 install tensorflow==1.9.0

Does that mean I have to install cuda 9.0 if I want to try AMC?

@nzmora
Copy link
Contributor

nzmora commented Dec 19, 2018

Hi @HKLee2040
No, installing TF 1.9.0 doesn't not require upgrading CUDA.

Cheers
Neta

@nzmora
Copy link
Contributor

nzmora commented Jan 1, 2019

See https://github.com/NervanaSystems/distiller/blob/amc/examples/automated_deep_compression/amc-results.ipynb.

Work on AMC currently takes place in branch 'amc'. Your help is more than welcome.
Cheers
Neta

@nzmora
Copy link
Contributor

nzmora commented Jan 6, 2019

After switching to using Clipped PPO I'm getting very encouraging results. See: https://github.com/NervanaSystems/distiller/wiki/AutoML-for-Model-Compression-(AMC):-Trials-and-Tribulations

@huxianer
Copy link

@nzmora @nzmora Could you share plain20.checkpoint.pth.tar,Thanks!

@nzmora
Copy link
Contributor

nzmora commented Jan 10, 2019

@huxianer the schedule file for training Plain20 is here. It took me about 33 minutes on 4-GPUs.

However, since you've asked :-), I've also uploaded the image here:
https://drive.google.com/file/d/1bBhjjxkXjFHmqfTWKnxop3n6QCN8QfZJ/view?usp=sharing

Cheers,
Neta

@huxianer
Copy link

@nzmora Thank you very much! I have another question to ask you,I found the top1 performance is really unchanged when I dont use the pretrained model.So, if I dont have the pretrained model,what
can I do?

@nzmora
Copy link
Contributor

nzmora commented Jan 10, 2019

Hi @huxianer,
I am not sure I understood your question, so I will answer according to what I understood.

I think you are asking how to train using AMC if we don't have a pre-trained model of the network we are compressing.
The answer is that you must have a pre-trained model because "We aim to automatically find the redundancy for each layer, characterized by sparsity. We train an reinforcement learning agent to predict the action and give the sparsity, then perform form the pruning. We quickly evaluate the accuracy after pruning but before fine-tuning as an effective delegate of final accuracy" (section 3, page 4). You can only "find the redundancy for each layer" if you are searching a pre-trained model. If the model is not trained, you cannot find any redundancy because the weights do not have any meaning (they are randomly distributed).

I hope this helps,
Neta

@HKLee2040
Copy link

Why the smooth_top1 and smooth_reward are overlapping in my "Performance Data" diagram?
I have some modifications:
Due to only one GPU in my environment, so I modify "conv_op = g.find_op(normalize_module_name(name))" to "conv_op = g.find_op(name)".

And args.amc_target_density = None, so I add
args.amc_target_density = 0.5;
in my code.

@nzmora
Copy link
Contributor

nzmora commented Jan 10, 2019

Hi @HKLee2040

I have some modifications:

I will need to fix the code for the case of one GPU.

Why the smooth_top1 and smooth_reward are overlapping in my "Performance Data" diagram?

I don't know which protocol you are using ("mac-constrained" or "accuracy-guaranteed"), but both are highly correlated to the Top1 accuracy:
image
image

So it makes sense that you will see an overlap when the graphs are smoothed (I smoothed using a simple moving average) because the signal noise is made less noticeable in both the reward and accuracy signals. You can see an example here.

Having said that, I think that you ask a good question. I think that this is a clue as to why the reward defined in the AMC paper, for accuracy-guaranteed-compression, is not so good. The solutions converge on maximum density for all layers (you can see this in the green bars here) - probably because the agent tries to maximize the Top1 accuracy - and not enough weight is given to the MACs (FLOPs) in the reward (5).
This is my conjecture at the moment.

Thanks,
Neta

@nzmora
Copy link
Contributor

nzmora commented Jan 11, 2019

Hi @HKLee2040,

My protocol is "mac-constrained". The reward fn should be top1/100.
But why the blue and green line your Performance Data are so different?

Thanks for the persistency. The shift you see is an illusion (and causes confusion, I guess) and is caused by the fact that the reward and Top1 accuracy use different axes (top1 on the right; reward on the left). The reward's range is [0..1] and the accuracy is [0..100] and because their values are correlated exactly (reward = 1/100 as you wrote above) they should align. However, when we draw the MAC values, also on the left axis, they distort the relativity of the axes (they shift relative to one another). You can see this if you disable the rendering of the MACs graphs, or if you set the ylim of the axes. For example:

def plot_performance(alpha, window_size, top1, macs, params, reward, start=0, end=-1):
    plot_kwargs = {"figsize":(15,7), "lw": 1, "alpha": alpha, "title": "Performance Data"}
    smooth_kwargs = {"lw": 2 if window_size > 0 else 1, "legend": True}
    if macs:
        ax = df['normalized_macs'][start:end].plot(**plot_kwargs, color="r")
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_normalized_macs'] = smooth(df['normalized_macs'], window_size)
        df['smooth_normalized_macs'][start:end].plot(**smooth_kwargs, color="r")
    if top1:
        ax = df['top1'][start:end].plot(**plot_kwargs, color="b", grid=True)
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_top1'] = smooth(df['top1'], window_size)
        df['smooth_top1'][start:end].plot(**smooth_kwargs, color="b")
    if params:
        ax = df['normalized_nnz'][start:end].plot(**plot_kwargs, color="black")
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_normalized_nnz'] = smooth(df['normalized_nnz'], window_size)
        df['smooth_normalized_nnz'][start:end].plot(**smooth_kwargs, color="black")        
    if reward:
        ax = df['reward'][start:end].plot(**plot_kwargs, secondary_y=True, color="g")
        ax.set(xlabel="Episode", ylabel="reward", ylim=[0,1.0])
        df['smooth_reward'] = smooth(df['reward'], window_size)
        df['smooth_reward'][start:end].plot(**smooth_kwargs, secondary_y=True, color="g")    
    ax.grid(True, which='minor', axis='x', alpha=0.3)

I uploaded my raw log files to here and you can load and try them.

Still, you ask why for you the graphs overlap and for me they don't. This is because, in my files, the big drop in the MACs (at episode 3474; to ~5%) causes the left and right axes to shift and they become unaligned.

Cheers
Neta

@HKLee2040
Copy link

Hi @nzmora

Got it! It's my carelessness. I didn't check the scale of axes.
Thanks for your detailed reply.

@HKLee2040
Copy link

Hi @nzmora

May I know why you set pi_lr = 1e-4, q_lr = 1e-3 in ddpg?
Do you refer to arXiv:1811.08886, where they use a fixed learning rate of 1e−4 for the actor network and 1e−3 for the critic network.

    ddpg.ddpg(env=env1, test_env=env2, actor_critic=core.mlp_actor_critic,
              ac_kwargs=dict(hidden_sizes=[hid]*layers, output_activation=tf.sigmoid),
              gamma=1,  # discount rate
              seed=seed,
              epochs=400,
              replay_size=2000,
              batch_size=64,
              start_steps=env1.amc_cfg.num_heatup_epochs,
              steps_per_epoch=800 * env1.num_layers(),  # every 50 episodes perform 10 episodes of testing
              act_noise=0.5,
              pi_lr=1e-4,
              q_lr=1e-3,
              logger_kwargs=logger_kwargs)

@nzmora
Copy link
Contributor

nzmora commented Jan 15, 2019

Hi @HKLee2040,
I got these numbers from the DDPG paper
Continuous control with deep reinforcement learning
.
Cheers
Neta

@huxianer
Copy link

@nzmora Hi,How do you get the YAML file of pruning schedule,Could you share the pruning schedule YAML file of resnet trained in IMAGENET,THKS!

@nzmora
Copy link
Contributor

nzmora commented Jan 15, 2019

Hi @huxianer,
I'm not sure I understand which YAML file you refer to. AMC/ADC currently works w/o YAML.
There are some sample YAML files using other techniques. For example AGP.
Cheers
Neta

@huxianer
Copy link

@nzmora @HKLee2040 I refer to every YAML file,here give it directly,but it does not say how to get it.You say AMC/ADC currently works w/o YAML,could you give an example which without YAML file,Thank you for your help!

@HKLee2040
Copy link

HKLee2040 commented Jan 16, 2019

Hi @huxianer

You can refer to nzmora's message
https://github.com/NervanaSystems/distiller/issues/64#issuecomment-451766455

The command-line is:
python3 compress_classifier.py --arch=plain20_cifar ../../../data.cifar --amc --resume=checkpoint.plain20_cifar.pth.tar --lr=0.05 --amc-action-range 0.0 0.80 --vs=0.8

@huxianer
Copy link

@nzmora Hi,whether this Distiller supports detection model,and if not,do you have any intention to support it?

@RizhaoCai
Copy link

I am also interested in using AMC for detection models. How about the progress now?

@nzmora
Copy link
Contributor

nzmora commented Aug 6, 2019

Hi @huxianer , @RizhaoCai ,

I merged the revised AMC implementation to 'master'. You can now try our auto-compression code.
I'll add more information on the setup soon.

It currently doesn't support object detection. @levzlotnik is working on adding an example of object detection, after which we will consider automating. If you happen to integrate object-detection with AMC, we'd be interested in considering it for integration into the Distiller code-base.
Cheers,
Neta

Cheers
Neta

@wangyidong3
Copy link

Hi @levzlotnik @nzmora
Thank you for your great work!
Is there any update for the example of object detection with AMC? Or do you have any suggestions?
Thanks.

@nzmora nzmora closed this as completed Apr 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
automated compression Automating compression (not limited to AutoML for compression)
Projects
None yet
Development

No branches or pull requests

6 participants