Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Pruning taking too much time to train #60

Closed
dhingratul opened this issue Nov 15, 2018 · 8 comments
Closed

Model Pruning taking too much time to train #60

dhingratul opened this issue Nov 15, 2018 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@dhingratul
Copy link

Describe the bug
A clear and concise description of what the bug is.
Hi, I have been training the pruning script for last 3 days, as of now it has only generated a couple of checkpoints in model_dcp, but to generate the .tflite and .pb file, i need the model_dcp_eval , which i am assuming will be generated after the training has been "done". I want to just skip to the end, and evaluate the inference times of pruned vs non-pruned model. I dont care about accuracy at this point as much. If i freeze the graph from these checkpoints will it give me the pruned model, because in the documentation it says, "conversion script automatically detects which channels can be safely pruned, and then produces a light-weighted compressed model". I just need the pruned .pb file.
To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@jiaxiang-wu
Copy link
Contributor

To export a TF-Lite model, you need checkpoint files of the evaluation graph (stored in "./models_dcp_eval"), instead of the training graph (stored in "./models_dcp"). If you want to use these checkpoint files of the training graph, you need to restore variables from them and save again as the evaluation graph (take a look at DisChnPrunedLearner's implementation).

BTW, if you want to accelerate the training process and quickly evaluate the run-time speed comparison, and do not care much about the accuracy, you can set --nb_epochs_rat to some small value. This argument specifies how many training epochs will be used. For instance, if you set --nb_epochs_rat to 0.1, then only 10% training epochs will be used, compared with the standard setting, so that the training time will be 10 times faster.

@dhingratul
Copy link
Author

Is there a way to resume the training from where it stopped, and pass that argument this time ?

@jiaxiang-wu
Copy link
Contributor

Sorry, current DisChnPrunedLearner's implementation does not support this. You need to modify its implementation to be able to recover from a previous run.

@dhingratul
Copy link
Author

@jiaxiang-wu Does --nb_epochs_rat parameter also work with other optimizations such as uniform quantization ?

@dhingratul dhingratul reopened this Nov 20, 2018
@jiaxiang-wu
Copy link
Contributor

jiaxiang-wu commented Nov 21, 2018

@dhingratul The --nb_epochs_rat argument is supported in UniformQuantTFLearner, but not in UniformQuantLearner, due to slightly different implementation for learning rate schedules. We are considering unifying UniformQuantLearner's implementation to support this.

@jiaxiang-wu jiaxiang-wu self-assigned this Nov 21, 2018
@jiaxiang-wu jiaxiang-wu added the enhancement New feature or request label Nov 21, 2018
@jiaxiang-wu
Copy link
Contributor

Enhancement required: add support for the --nb_epochs_rat argument in UniformQuantLearner.

@dhingratul
Copy link
Author

dhingratul commented Nov 21, 2018

@jiaxiang-wu It should be added for all the optimizers. Is there a comparison on how TF version works as compared to native Pocketflow ?

@jiaxiang-wu
Copy link
Contributor

Basically, their performance (in accuracy) is similar, since the underlying training algorithm is the same, despite some implementation details. The native version, UniformQuantLearner, provides more features, including variable number of quantization bits for each layer (so that RL can be used to optimize the strategy), than the TF version, UniformQuantLearnerTFLearner. However, the latter can be exported to TF-Lite models and deployed on mobile devices, while the former cannot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants