Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nn.Parameter is ommitted (with a case) #84

Closed
zezhishao opened this issue Sep 19, 2021 · 12 comments
Closed

nn.Parameter is ommitted (with a case) #84

zezhishao opened this issue Sep 19, 2021 · 12 comments

Comments

@zezhishao
Copy link

Describe the bug
nn.Parameter is omitted in summary when there are other pytorch predefined layers in the networks.
Details are as follows:

To Reproduce

import torch
import torch.nn as nn
from torchinfo import summary

class FCNets(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        # 2 layer fully connected networks
        super().__init__()
        # layer1 with nn.Parameter
        self.weight = nn.Parameter(torch.randn(input_dim, hidden_dim))
        self.bias = nn.Parameter(torch.randn(hidden_dim))
        # layer2 with nn.Linear
        self.fc2  = nn.Linear(hidden_dim, output_dim)
        # activation
        self.activation = nn.ReLU()
    
    def forward(self, x):
        # x.shape = [batch_size, input_dim]
        # layer1
        h = torch.mm(x, self.weight) + self.bias
        # activation
        h = self.activation(h)
        # layer2
        out = self.fc2(h)
        return out

# device = torch.device("cuda:0")
device = torch.device("cpu")
x = torch.randn(3, 128).to(device)
fc = FCNets(128, 64, 32).to(device)
summary(fc, input_data=x)

It seems that nn.Parameter is not compatible with other layers (nn.Module class).

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
FCNets                                   --                        --
├─ReLU: 1-1                              [3, 64]                   --
├─Linear: 1-2                            [3, 32]                   2,080
==========================================================================================
Total params: 2,080
Trainable params: 2,080
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
==========================================================================================

However, if we remove self.fc2, the output will be fine.

Pytorch version: 1.7.1 (GPU)
Torchinfo version: 1.5.3

@TylerYep
Copy link
Owner

TylerYep commented Oct 4, 2021

Thanks, I'll investigate this issue.

Notes:


def test_parameter_with_other_layers() -> None:
    class FCNets(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            # 2 layer fully connected networks
            super().__init__()
            # layer1 with nn.Parameter
            self.weight = nn.Parameter(torch.randn(input_dim, hidden_dim))
            self.bias = nn.Parameter(torch.randn(hidden_dim))
            # layer2 with nn.Linear
            self.fc2 = nn.Linear(hidden_dim, output_dim)
            # activation
            self.activation = nn.ReLU()

        def forward(self, x):
            # x.shape = [batch_size, input_dim]
            # layer1
            h = torch.mm(x, self.weight) + self.bias
            # activation
            h = self.activation(h)
            # layer2
            out = self.fc2(h)
            return out

    fc = FCNets(128, 64, 32)
    result_1 = summary(fc, input_data=torch.randn(3, 128), verbose=2)

    class FCNets(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            # 2 layer fully connected networks
            super().__init__()
            # layer1 with nn.Parameter
            self.weight = nn.Parameter(torch.randn(input_dim, hidden_dim))
            self.bias = nn.Parameter(torch.randn(hidden_dim))
            # layer2 with nn.Linear
            # self.fc2 = nn.Linear(hidden_dim, output_dim)
            # activation
            # self.activation = nn.ReLU()

        def forward(self, x):
            # x.shape = [batch_size, input_dim]
            # layer1
            h = torch.mm(x, self.weight) + self.bias
            # activation
            # out = self.activation(h)
            # layer2
            # out = self.fc2(h)
            return h

    fc = FCNets(128, 64, 32)
    result_2 = summary(fc, input_data=torch.randn(3, 128), verbose=2)
    assert result_1.total_params == result_2.total_params

@jeremyfix
Copy link

I thought I could have a look to this issue but I'm not sure actually how to handle these nn.Parameter of the top level nn.Module ;

I'm not completely sure to correctly understand the hooks to see if it is possible to detect when and if the parameters declared in the top level nn.Module (FCNets in our case) are used in the forward call.

Having the impression that there is a difference between parameters in the top level nn.Module and parameters in modules declared in this top level module, I tried the following hack of wrapping the FCNets within a Module :

    class FCNets(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            # 2 layer fully connected networks
            super().__init__()
            # layer1 with nn.Parameter
            self.weight = nn.Parameter(torch.randn(input_dim, hidden_dim))
            self.bias = nn.Parameter(torch.randn(hidden_dim))
            # layer2 with nn.Linear
            self.fc2 = nn.Linear(hidden_dim, output_dim)
            # activation
            self.activation = nn.ReLU()

        def forward(self, x):
            # x.shape = [batch_size, input_dim]
            # layer1
            h = torch.mm(x, self.weight) + self.bias
            # activation
            out = self.activation(h)
            # layer2
            out = self.fc2(h)
            return h

    class WrapperModule(nn.Module):

        def __init__(self, submodule):
            super().__init__()
            self.submodule = submodule

        def forward(self, x):
            return self.submodule(x)

and the summary then lists the parameters (but also the fc2 ones):

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
WrapperModule                            --                        --
├─FCNets: 1-1                            [3, 64]                   --
│    └─weight                                                      ├─8,192
│    └─bias                                                        ├─64
│    └─fc2.weight                                                  ├─2,048
│    └─fc2.bias                                                    └─32
│    └─ReLU: 2-1                         [3, 64]                   --
│    └─Linear: 2-2                       [3, 32]                   2,080
│    │    └─weight                                                 ├─2,048
│    │    └─bias                                                   └─32
==========================================================================================

Not sure how to progress but would be happy to get your feedback on the correct track to follow.

As a side note, If I'm not incorrect, the test to include is not exactly the one you propose but the one below (i.e. comparing he case of using a nn.Linear for fc1 or a "raw" (W.X + b). Does it make sense ?

def test_parameter_with_other_layers() -> None:
    class FCNets(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            # 2 layer fully connected networks
            super().__init__()
            # layer1 with nn.Linear
            self.fc1 = nn.Linear(input_dim, hidden_dim)
            # layer2 with nn.Linear
            self.fc2 = nn.Linear(hidden_dim, output_dim)
            # activation
            self.activation = nn.ReLU()

        def forward(self, x):
            # x.shape = [batch_size, input_dim]
            # layer1
            h = self.fc1(x)
            # activation
            h = self.activation(h)
            # layer2
            out = self.fc2(h)
            return out

    fc = FCNets(128, 64, 32)
    result_1 = summary(fc, input_data=torch.randn(3, 128), verbose=2)

    class FCNets(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            # 2 layer fully connected networks
            super().__init__()
            # layer1 with nn.Parameter
            self.weight = nn.Parameter(torch.randn(input_dim, hidden_dim))
            self.bias = nn.Parameter(torch.randn(hidden_dim))
            # layer2 with nn.Linear
            self.fc2 = nn.Linear(hidden_dim, output_dim)
            # activation
            self.activation = nn.ReLU()

        def forward(self, x):
            # x.shape = [batch_size, input_dim]
            # layer1
            h = torch.mm(x, self.weight) + self.bias
            # activation
            out = self.activation(h)
            # layer2
            out = self.fc2(h)
            return h

    fc = FCNets(128, 64, 32)
    result_2 = summary(fc, input_data=torch.randn(3, 128), verbose=2)
    assert result_1.total_params == result_2.total_params

@TylerYep
Copy link
Owner

@jeremyfix First of all, I really appreciate the time you're putting in to solve this problem!

Both test cases are important imo. The one I linked above demonstrates different behavior (aka missing rows in the summary table) when a module has no parameters vs when the module has at least 1 submodule (the Linear layer). The two models are not supposed to be equivalent, so the total params assertion may be unnecessary.

Your example is also important because it asserts that parameters are treated the same as a module, which is what the assertion is testing.

I'm not certain if this is the issue, but the issue may be related to how we treat modules in the recursive apply_hooks function in torchinfo.py. I know we have logic there to cause different behavior if the current layer has no submodules.

@TylerYep
Copy link
Owner

As a minified example, notice that a and b are missing from one of the outputs in the summary:

def test_parameter_with_other_layers() -> None:
    class FCNets(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            super().__init__()
            self.a = nn.Parameter(torch.randn(input_dim, hidden_dim))
            self.b = nn.Parameter(torch.randn(hidden_dim))
            self.fc2 = nn.Linear(hidden_dim, output_dim)

        def forward(self, x):
            h = torch.mm(x, self.a) + self.b
            return self.fc2(h)

    result_1 = summary(FCNets(128, 64, 32), input_data=torch.randn(3, 128), verbose=2)

    class FCNets(nn.Module):
        def __init__(self, input_dim, hidden_dim):
            super().__init__()
            self.a = nn.Parameter(torch.randn(input_dim, hidden_dim))
            self.b = nn.Parameter(torch.randn(hidden_dim))

        def forward(self, x):
            return torch.mm(x, self.a) + self.b

    result_2 = summary(FCNets(128, 64), input_data=torch.randn(3, 128), verbose=2)
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
FCNets                                   --                        --
├─Linear: 1-1                            [3, 32]                   2,080
│    └─weight                                                      ├─2,048
│    └─bias                                                        └─32
==========================================================================================
Total params: 2,080
Trainable params: 2,080
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
==========================================================================================
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
FCNets                                   [3, 64]                   8,256
├─a                                                                ├─8,192
├─b                                                                └─64
==========================================================================================
Total params: 8,256
Trainable params: 8,256
Non-trainable params: 0
Total mult-adds (M): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.03
Estimated Total Size (MB): 0.04
==========================================================================================

@jeremyfix
Copy link

Hum... okay. Your last example is interesting because it shows nn.Parameter is sometimes taken into consideration and sometimes not. I will try to have a look;

@jeremyfix
Copy link

I advance pretty slowly given I'm not very clear about the logic with the hooks.

For clarity, let us call FCNets_ablin and FCNets_ab respectively the networks with the linear module and the one without (in your example above

class FCNets_ablin(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            super().__init__()
            self.a = nn.Parameter(torch.randn(input_dim, hidden_dim))
            self.b = nn.Parameter(torch.randn(hidden_dim))
            self.fc2 = nn.Linear(hidden_dim, output_dim)

        def forward(self, x):
            h = torch.mm(x, self.a) + self.b
            return self.fc2(h)

class FCNets_ab(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            super().__init__()
            self.a = nn.Parameter(torch.randn(input_dim, hidden_dim))
            self.b = nn.Parameter(torch.randn(hidden_dim))

        def forward(self, x):
            return torch.mm(x, self.a) + self.b

For now, as far as I can tell, considering, the difference with a, b being listed or not comes from a difference in apply_hooks, especially the test

if module != orig_model or isinstance(module, LAYER_MODULES) or not submodules:
if hooks is None or isinstance(module, WRAPPER_MODULES):
pre_hook(module, None)
else:
hooks.append(module.register_forward_pre_hook(pre_hook))
hooks.append(module.register_forward_hook(hook))

which is False for FCNets_ablin and True for FCNets_ab because FCNets_ab does not have any submodule;

and then, it seems to me that it is the pre_hook function call (from the else case of the above test) on the FCNets_ab module which is gathering the a,b and their size.

Trying to find my way, I simply tried to remove the test (just for experimenting, I'm not saying this is useless) to unconditionally register pre_hooks, that allows to list a, b for FCNets_ablin but also the parameters of its submodules (which is also listed by the recursive call on the child submodules):

By disabling the test with FCNets_ablin

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
FCNets_ablin                                  [3, 32]                   --
├─a                                                                ├─8,192
├─b                                                                ├─64
├─fc2.weight                                                       ├─2,048
├─fc2.bias                                                         └─32
├─Linear: 1-1                            [3, 32]                   2,080
│    └─weight                                                      ├─2,048
│    └─bias                                                        └─32
==========================================================================================

@jeremyfix
Copy link

A follow up on my last comment. If I change the condition around lines 522-527 above with :

    if module != orig_model or isinstance(module, LAYER_MODULES) or not submodules:
        if hooks is None or isinstance(module, WRAPPER_MODULES):
            pre_hook(module, None)
        else:
            hooks.append(module.register_forward_pre_hook(pre_hook))
            hooks.append(module.register_forward_hook(hook))
    else:
        # List the top level parameters
        hooks.append(module.register_forward_pre_hook(pre_hook))
        hooks.append(module.register_forward_hook(hook))

and also change layer_info.py:calculate_num_params() to only iterate over the module parameters (and not the submodules ones) (which I think we can discard by checking the name of the param contains a dot or not) , I get something better (not completely though):

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
FCNets_ablin                             [3, 32]                   --
├─a                                                                ├─8,192
├─b                                                                ├─64
├─fc2.weight                                                       ├─2,048
├─fc2.bias                                                         └─32
├─Linear: 1-1                            [3, 32]                   2,080
│    └─weight                                                      ├─2,048
│    └─bias                                                        └─32
==========================================================================================
Total params: 2,080
Trainable params: 2,080
...

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
FCNets_ab                                [3, 64]                   8,256
├─a                                                                ├─8,192
├─b                                                                └─64
==========================================================================================
Total params: 8,256
Trainable params: 8,256
...

I think basically the hook is not attached to the top level module. Maybe instead of changing the code within apply_hooks, we could add some info before the first call of apply_hooks within the forward_pass function maybe. @TylerYep any idea on that ?

Also notice that for the moment, the total number of parameters is not correct.

@TylerYep
Copy link
Owner

TylerYep commented Dec 31, 2021

I can offer some insight here:

  • the purpose of calling pre_hook explicitly is to handle (often top-level) modules that are not triggered during the forward() function. Since they don't trigger hooks, we need to add them to the final list manually.
  • Adding some info before apply_hooks could be a valid approach, although we'd want to make sure the logic is as clean as possible.

One issue to call out in your modified example is that the fc2.weight and fc2.bias are listed twice, once in the topmost layer and once in the Linear layer, when they should only be listed once. This is why we have different behavior based on the layer having submodules vs not having submodules iirc.

In general, for this project I let the test cases dictate the code - anything that passes all of tests is an improvement, and anything that is missed/breaks means we are missing a test case. Beyond this, I don't have a good idea what the solution should be.

@jeremyfix
Copy link

Ok, I will have to withdraw the work on this issue. I made some progress in the output but I'm not very happy with the code I'm producing as i'm having a hard time understanding the logic behind so I prefer to stop there.

My current progress in on the branch https://github.com/jeremyfix/torchinfo/tree/issue84 in which I basically added the computation of the number of parameters for the topmost level module (the one provided to the summary() function (code taken from the LayerInfo class). I also added two tests that are passed. I thought finally that adding a hook on the top level module was probably not necessary as we would call the forward on this function anyway and we just wanted to add the eventual parameters if any in the summary list. Also, when I was adding the hook, for some reasons, although the parameters got listed, the total number of parameters was not updated and I had to filter the named_parameters to just keep the top level ones.

In the current version of the code, 2 tests of the test suite are failing because the var_name of the topmost level module is displayed while it should be not according to the expected output

  • Test test_row_settings because displaying LSTMNet (LSTMNet) instead of LSTMNet
  • Test test_lstm because displaying SingleInputNet (SingleInputNet) instead of SingleInputNet

These are certainly minor things but probably required to dig into formatting.py to understand why.

The FCNets_ablin, FCNets_ab example above outputs almost the correct output :

-------- Result 1   
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
FCNets_ablin                             --                        --
├─a                                                                ├─8,192
├─b                                                                └─64
├─Linear: 1-1                            [3, 32]                   2,080
│    └─weight                                                      ├─2,048
│    └─bias                                                        └─32
==========================================================================================
Total params: 10,336
Trainable params: 10,336
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.04
Estimated Total Size (MB): 0.04
==========================================================================================

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
FCNets_ab                                [3, 64]                   8,256
├─a                                                                ├─8,192
├─b                                                                └─64
==========================================================================================
Total params: 8,256
Trainable params: 8,256
Non-trainable params: 0
Total mult-adds (M): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.03
Estimated Total Size (MB): 0.04
==========================================================================================

"Almost correct" because I did not handle in the code the computation of the MACS, therefore the Total mult-adds and Estimated Total Size outputed are incorrect.

This can more easily be seen from the expected output of the parameter_with_other_layers.out where two exactly identical networks are summarized and the test does not succeed because these fields have incorrect values.

I'm sorry to resign but I hope that piece of work might be nevertheless of help for you.

@TylerYep
Copy link
Owner

TylerYep commented Jan 2, 2022

No worries at all, thank you so much for your help!

If there's anything I can do to make the code more approachable, feel free to let me know as well. It's a tricky codebase, but I definitely want it to get better over time.

@TylerYep
Copy link
Owner

This has been fixed in 5621cc9 and will be released in torchinfo v1.7.0.

Thank you for reporting this issue!

Also, thank you to @jeremyfix for the help on investigating. The above discussion was very helpful in diagnosing and fixing this issue as well as the 4+ other issues related to this problem. It required a fairly complex rewrite of the library and the code is now in a much better place.

@jeremyfix
Copy link

jeremyfix commented May 29, 2022

Thank you @TylerYep ; I'm happy this was fixed ; I'm sorry I did not respond to your request on improving the code to be more approachable; I was indeed confused by the way the recurrence was written and not clear about what was filled when ; But, I hope to dig into your code soon or later and hopefully better understand how the things are going on;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants