Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between end_pruning_step and policy_end_step #4

Closed
eldarkurtic opened this issue Dec 20, 2021 · 6 comments
Closed

Difference between end_pruning_step and policy_end_step #4

eldarkurtic opened this issue Dec 20, 2021 · 6 comments
Assignees

Comments

@eldarkurtic
Copy link
Contributor

Hi,
Could you please clarify the difference between end_pruning_step and policy_end_step in the pruning config file (for example: https://github.com/IntelLabs/Model-Compression-Research-Package/blob/main/examples/transformers/language-modeling/config/iterative_unstructured_magnitude_90_config.json)?

@ofirzaf ofirzaf self-assigned this Dec 20, 2021
@ofirzaf
Copy link
Collaborator

ofirzaf commented Dec 20, 2021

Hi,

Sure.

In the interval [begin_pruning_step, end_pruning_step] defines when the scheduler allows the pruning masks to update and change the pruning pattern. Outside of this interval the masks will remain constant and the pruning pattern will remain the same regardless of the weights' magnitudes.

In the interval [policy_begin_step, policy_end_step] defines the interval of the pruning policy. The pruning policy defines how we increase the sparsity during training from the initial sparsity to the final sparsity in the assigned interval. For example, in this library we strickly use the policy that was introduced in To prune, or not to prune: exploring the efficacy of pruning for model compression:

where
  • t = current time step
  • t0 = policy begin step
  • t1 = policy end step
  • st = current sparsity ratio
  • si = initial sparsity ratio
  • sf = final sparsity ratio

@eldarkurtic
Copy link
Contributor Author

If we do a run with 100k steps and we specify the following pruning config:

  • begin_pruning_step: 0
  • end_pruning_step: 80k
  • policy_begin_step: 0
  • policy_end_step: 50k

we would get the following:

  • in [0, 50k] model's sparsity would go from initial sparsity to final sparsity following the pruning policy

But I'm not sure what would happen with model and its sparsity in [50k, 80k] and [80k, 100k] ranges?
Since pruning policy finished at step=50k and at that point our model has the final sparsity mask, why do we need the end_pruning_step at 80k?

@ofirzaf
Copy link
Collaborator

ofirzaf commented Dec 21, 2021

In the interval [50k, 80k] the sparsity ratio of the model have reached its final value, however the sparsity masks will continue to update every pruning_frequency steps, changing the sparsity pattern of the model according to the highest magnitude weights currently in the model.

@eldarkurtic
Copy link
Contributor Author

Is this part described somewhere in the paper (just checking if I've missed it)?
If not, could you please clarify a bit more how the sparsity mask changes in the [50k, 80k] range?

  1. How do you pick which masked-weights to re-introduce?
  2. Are they initialized to zero when re-introduced?

@ofirzaf
Copy link
Collaborator

ofirzaf commented Dec 22, 2021

This is not described in the paper, however, this is common practice in magnitude pruning and I think it is described in To prune, or not to prune: exploring the efficacy of pruning for model compression which we refer to in our paper.

  1. The weights values are kept as is even when they are masked out. When a weight's magnitude that is not masked drops lower than a weight's magnitude that is masked, the masked weight with the higher magnitude will replace the unmasked weight with the lower magnitude when updating the sparsity mask.
  2. When reintroduced the weights will keep their last recorded value.

@eldarkurtic
Copy link
Contributor Author

Okay, thanks a lot for clarification :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants