Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'tuple' object has no attribute 'size' #195

Closed
wants to merge 4 commits into from

Conversation

fabiofumarola
Copy link

fixed #141 . I have tested it only on one huggingface model but it should work for every model.
The only problem is that I could not fixe the verification of the out file

@mert-kurttutan
Copy link
Contributor

I think the fix proposed here is a good start but too specific. It seems to work only for outputs of nested tuple types. It would be better if we take an approach where we traverse the data (in a nested way) to capture sizes of all individual tensors.

An example of such nested traverse would be traverse_input_data function in torchinfo.py. With appropriate action_fn, we can get the sizes all individual tensors (and collect them inside a list maybe).

I also propose another test case where some of the hidden layers have nested output and inputs, with nested structure being complicated enough. By complicated enough, I mean dict inside list inside tuple, etc.

This way seems more robust to me.

@fabiofumarola
Copy link
Author

Ok. 2 questions:

  1. Can I add test adding the transformers dependency only for test?
  2. I have tried to perform the testing with the --overwrite but still the test fails.

I'll update the pull request as you suggested.

@mert-kurttutan
Copy link
Contributor

  1. Do you mean installing transformer package? I think we should add a test case where hidden layers has the nested output input structure that I mentioned. Maybe you can create custom model that have these properties and add to the fixture/models file.
    I think transformers models seem important enough that we test them. Then, you need to modify github actions workflow files to install transformers. This might introduce additional version compatibility issues. For instance, it might not compatible with past versions of torch.

  2. See the error reason in the checks of details. You have modify test.yml file to install transformer package.

@snimu
Copy link
Contributor

snimu commented Jan 12, 2023

The summary output has a problem: Total params is 76,961,152 but that's not the sum of the Param #-columns, which is 128,746,176.
Does this have anything to do with the proposed changes or is it a problem somewhere else in torchinfo?

@snimu
Copy link
Contributor

snimu commented Jan 13, 2023

The summary output has a problem: Total params is 76,961,152 but that's not the sum of the Param #-columns, which is 128,746,176. Does this have anything to do with the proposed changes or is it a problem somewhere else in torchinfo?

Got it: in the summary_list, some items have negative num_params.

I tested this by putting the following code into ModelStatistics.__init__(...):

warnstr = "\n"
for layer_info in summary_list:
    if layer_info.is_recursive:
        continue
    num_params = layer_info.num_params if layer_info.is_leaf_layer else layer_info.leftover_params()
    warnstr += f"{layer_info.class_name}: {num_params}\n"
import warnings
warnings.warn(warnstr).  # warnings.warn instead of print so that pytest doesn't suppress its output

I applied the fix from this pull-request in LayerInfo.calculate_size(...) to make summary work.

Then, I ran the test provided in this pull-request, which tests the following model: AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small"). Running the given test yielded this output:

tests/torchinfo_xl_test.py::test_huggingface
  /Users/sebastianmuller/Documents/Programmieren/torchinfo/torchinfo/model_statistics.py:35: UserWarning: 
  T5ForConditionalGeneration: -51782336
  T5Stack: 35332800
  Embedding: 16449536
  Dropout: 0
  ModuleList: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Embedding: 192
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5LayerNorm: 512
  T5Stack: 16449536
  Dropout: 0
  ModuleList: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Embedding: 192
  Linear: 196608
  Dropout: 0
  T5LayerCrossAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerCrossAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerCrossAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerCrossAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerCrossAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerCrossAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerCrossAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5Block: 0
  ModuleList: 0
  T5LayerSelfAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerCrossAttention: 0
  T5LayerNorm: 512
  T5Attention: 0
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Linear: 196608
  Dropout: 0
  T5LayerFF: 0
  T5LayerNorm: 512
  T5DenseGatedActDense: 0
  Linear: 524288
  NewGELUActivation: 0
  Linear: 524288
  Dropout: 0
  Linear: 524288
  Dropout: 0
  T5LayerNorm: 512
  Linear: 16449536

Summing them all up gives 76,961,152. Ignoring the negative number (-51,782,336) for num_params gives 128,743,488, the correct result (of course, negative numbers have to be ignored for trainable_params, leftover_params(), and leftover_trainable_params() as well). Since I don't quite understand where the negative number is coming from, I can't quite vouch for the correctness of the results, only their consistency.

I will (probably) open a pull-request addressing this problem and issue#141 soon, just wanted to answer my own question so that nobody is left hanging when they read this thread.

@TylerYep
Copy link
Owner

TylerYep commented Feb 5, 2023

Thanks for the work here all. Closing in favor of #212 .

@TylerYep TylerYep closed this Feb 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'tuple' object has no attribute 'size'
4 participants