Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the initial determination of which filters are unimportant based on the L1 norm of the weights? #5

Open
zxd-cqu opened this issue Jun 24, 2023 · 9 comments

Comments

@zxd-cqu
Copy link

zxd-cqu commented Jun 24, 2023

Should --prune_criterion l1-norm be added in ./scripts/dist_train.sh? I noticed that the default prune_criterion is act_scale: parser.add_argument('--prune_criterion', type=str, default='act_scale', choices=['l1-norm', 'act_scale']). The entire pruning process is as follows: Firstly, for a well-trained model, the convolutional kernel weights are used to select which layers are planned to be pruned based on their L1 norm. Then, sparse training is performed by incorporating the scaling factors, targeting the scaling factors corresponding to unimportant layers. Finally, pruning is executed to remove those identified layers. I'm not sure if my understanding is accurate.

@Zj-BinXia
Copy link
Owner

Yes, more details can be found in the research paper and source code

@zxd-cqu
Copy link
Author

zxd-cqu commented Jun 24, 2023

However, it seems that directly modifying the code to use l1-norm as the criterion is not compatible because using the l1-norm of filters cannot prune the input to the first convolutional layer of residual blocks...

@Zj-BinXia
Copy link
Owner

I cannot get your idea. Actually, in the code, it is processed. You can take a look at the code to see it.

@zxd-cqu
Copy link
Author

zxd-cqu commented Jun 24, 2023

    for name, module in model.named_modules():  
        if name in layers:
            layer = layers[name]
            out = get_score_layer(name,module, wg=wg, criterion=criterion)
            score = out['score']
            layer.score = score
            layer.prescore = out['act_scale_pre']
            if raw_pr[name] > 0: # pr > 0 indicates we want to prune this layer so its score will be included in the <all_scores>
                all_scores = np.append(all_scores, score)
                if hasattr(module, 'act_scale_pre'):
                    all_scores = np.append(all_scores, out["act_scale_pre"])  

In this code snippet, if the criterion is set to L1-norm, then the score represents the L1-norm of the filters, and it is added to the all_scores array. Additionally, all_scores = np.append(all_scores, out["act_scale_pre"]) includes the scaling factors as scores in all_scores as well. I think these are two different scores, and they can be combined for importance ranking.

@Zj-BinXia
Copy link
Owner

They essentially represent the same thing, emphasizing the importance of filters. The separate naming is simply for the convenience of code organization and readability. I still cannot get your issue.

@zxd-cqu
Copy link
Author

zxd-cqu commented Jun 24, 2023

I'm sorry, I just started working on pruning-related tasks, so there may be issues with my expression. What you mean is that the L1-norm calculated using the absolute average of filter weights and the act_scale_pre calculated using the scaling factors can be combined for importance ranking, correct?

@zxd-cqu
Copy link
Author

zxd-cqu commented Jun 24, 2023

all_scores = np.append(all_scores, score)
                if hasattr(module, 'act_scale_pre'):
                    all_scores = np.append(all_scores, out["act_scale_pre"])

The score can be set as L1-norm or act_scale. I understand that when it is set as act_scale, it can be used for importance ranking along with the act_scale_pre in the third line. However, when it is set as L1-norm, is it correct to perform importance ranking together with the act_scale_pre in the third line?

@Zj-BinXia
Copy link
Owner

No, act_scale_pre item is also calculated by L1-norm as act_scale, please see get_score_layer function
image

@zxd-cqu
Copy link
Author

zxd-cqu commented Jun 24, 2023

If the criterion = 'l1-norm' in the get_score_layer(name,module, wg='filter', criterion='l1-norm') function. Then at the the second-to-last line in get_score_layer function, out['score'] = out[criterion] would be out['score'] = out['l1-norm']. The value of out['l1-norm'] is calculated using l1 = module.weight.abs().view(-1, num_fea * scale * scale, 3, 3).mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1). Therefore, out['score'] represents the score obtained from the weights of the filters.

So the outer function two lines : score = out['score'] and all_scores = np.append(all_scores, score, axis=0), means the absolute mean value of the filters weights is added to all_scores. On the other hand, all_scores = np.append(all_scores, out["act_scale_pre"]) adds the L1-norm related to the scaling factors. The difference lies in these two types of importance scores. Can these two different importance scores be combined for ranking?

def get_score_layer(name,module, wg='filter', criterion='l1-norm'):
    r"""Get importance score for a layer.

    Return:
        out (dict): A dict that has key 'score', whose value is a numpy array
    """
    # -- define any scoring scheme here as you like
    shape = module.weight.data.shape
    if "upconv" in name:
        if wg == "channel":
            l1 = module.weight.abs().mean(dim=[0, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=0)
        elif wg == "filter":
            scale=2
            num_fea=64
            l1 = module.weight.abs().view(-1,num_fea*scale*scale,3,3).mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1)
        elif wg == "weight":
            l1 = module.weight.abs().flatten()
    else:
        if wg == "channel":
            l1 = module.weight.abs().mean(dim=[0, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=0)
        elif wg == "filter":
            l1 = module.weight.abs().mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1)
        elif wg == "weight":
            l1 = module.weight.abs().flatten()
    # --


    out = {}
    out['l1-norm'] = tensor2array(l1)
    if "upconv" in name:
        out['act_scale'] = tensor2array(module.act_scale.abs().view(-1)) if hasattr(module, 'act_scale') else [1e30] * (module.weight.size(0)//4)
        if hasattr(module, 'act_scale_pre'):
            out['act_scale_pre'] = tensor2array(module.act_scale_pre.abs().view(-1))
        else:
            out['act_scale_pre'] = [1e30] * module.weight.size(1)
    else:
        out['act_scale'] = tensor2array( module.act_scale.abs().view(-1)) if hasattr(module, 'act_scale') else [1e30] * module.weight.size(0)
        if hasattr(module, 'act_scale_pre'):
            out['act_scale_pre'] = tensor2array(module.act_scale_pre.abs().view(-1))
        else:
            out['act_scale_pre'] = [1e30] * module.weight.size(1)
    # 1e30 to indicate this layer will not be pruned because of its unusually high scores
    out['score'] = out[criterion]
    return out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants