Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in model = DIORModel(opt) #27

Closed
preetshah7 opened this issue Jan 19, 2022 · 14 comments
Closed

Error in model = DIORModel(opt) #27

preetshah7 opened this issue Jan 19, 2022 · 14 comments

Comments

@preetshah7
Copy link

Hi Aiyu Cui,
I have been following the topic since 2017 and dressing-in-order brings a lot of new features like tuck-in into the picture. Cheers for that. However, I have tried to recreate this framework on Google Colab & not been able to figure my way out. The notebook that I've used: link_to_nb

Tesla K80
NVIDIA-SMI 510.39.01 Driver Version: 460.32.03 CUDA Version: 11.2

While building custom CUDA modules was smooth, I am not sure about CUDA 11.2 along with torch 1.0.0

When setting up the dior_mdoel, the below error pops.

load vgg ckpt from torchvision dict.
[init] init pre-trained model vgg.
initialize network with orthogonal

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-14-81abdc6faa32> in <module>
     29 
     30 # create model
---> 31 model = DIORModel(opt)
     32 model.setup(opt)

14 frames

/content/dressing-in-order/models/dior_model.py in __init__(self, opt)
      9 class DIORModel(DIORBaseModel):
     10     def __init__(self, opt):
---> 11         DIORBaseModel.__init__(self, opt)
     12         self.netE_opt = opt.netE
     13         self.frozen_flownet = opt.frozen_flownet

/content/dressing-in-order/models/dior_base_model.py in __init__(self, opt)
     21         self.n_style_blocks = opt.n_style_blocks
     22         # init_models
---> 23         self._init_models(opt)
     24 
     25         # loss

/content/dressing-in-order/models/dior_model.py in _init_models(self, opt)
     59 
     60     def _init_models(self, opt):
---> 61         super()._init_models(opt)
     62         self.model_names += ["Flow"]
     63         if opt.frozen_flownet:

/content/dressing-in-order/models/dior_base_model.py in _init_models(self, opt)
     72                                       n_style_blocks=opt.n_style_blocks, n_human_parts=opt.n_human_parts, netG=opt.netG,
     73                                       norm=opt.norm_type, relu_type=opt.relu_type,
---> 74                                       init_type=opt.init_type, init_gain=opt.init_gain, gpu_ids=self.gpu_ids)
     75 
     76         self.netE_attr = networks.define_E(input_nc=3, output_nc=opt.style_nc, netE=opt.netE, ngf=opt.ngf, n_downsample=2,

/content/dressing-in-order/models/networks/__init__.py in define_G(input_nc, output_nc, ngf, latent_nc, style_nc, n_downsampling, n_style_blocks, n_human_parts, netG, norm, relu_type, init_type, init_gain, gpu_ids, **kwargs)
     82             norm_type=norm, relu_type=relu_type, **kwargs
     83             )
---> 84     return init_net(net, init_type, init_gain, gpu_ids)
     85 
     86 def define_D(input_nc, ndf, netD, n_layers_D=3, norm='batch', use_dropout=True, use_sigmoid=False, init_type='normal', init_gain=0.02, gpu_ids=[]):

/content/dressing-in-order/models/networks/base_networks.py in init_net(net, init_type, init_gain, gpu_ids, do_init_weight)
    107         net = torch.nn.DataParallel(net, gpu_ids)  # multi-GPUs
    108     if do_init_weight:
--> 109         init_weights(net, init_type, init_gain=init_gain)
    110     return net
    111 

/content/dressing-in-order/models/networks/base_networks.py in init_weights(net, init_type, init_gain)
     88 
     89     print('initialize network with %s' % init_type)
---> 90     net.apply(init_func)  # apply the initialization function <init_func>
     91 
     92 def init_net(net, init_type='normal', init_gain=0.02, gpu_ids=[], do_init_weight=True):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    240         """
    241         for module in self.children():
--> 242             module.apply(fn)
    243         fn(self)
    244         return self

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in apply(self, fn)
    241         for module in self.children():
    242             module.apply(fn)
--> 243         fn(self)
    244         return self
    245 

/content/dressing-in-order/models/networks/base_networks.py in init_func(m)
     78                 init.kaiming_normal_(m.weight.data, a=0, mode='fan_in')
     79             elif init_type == 'orthogonal':
---> 80                 init.orthogonal_(m.weight.data, gain=init_gain)
     81             else:
     82                 raise NotImplementedError('initialization method [%s] is not implemented' % init_type)

/usr/local/lib/python3.7/dist-packages/torch/nn/init.py in orthogonal_(tensor, gain)
    354 
    355     # Compute the qr factorization
--> 356     q, r = torch.qr(flattened)
    357     # Make Q uniform according to https://arxiv.org/pdf/math-ph/0609050.pdf
    358     d = torch.diag(r, 0)

RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu:225

link_to_cell
Please look into this, Thanks :)

@cuiaiyu
Copy link
Owner

cuiaiyu commented Jan 19, 2022

Maybe first check if the cudatoolkit is installed in correct version, something should be like
conda install pytorch=1.0.0 torchvision cudatoolkit=11.0 -c pytorch

Besides, if you only want to run Demo (so no training), you can use higher version of pytorch, which should make the compiling easier.

Thanks.

@preetshah7
Copy link
Author

I did try the above-mentioned conda install with no luck.
conda install pytorch=1.0.0 torchvision cudatoolkit=11.0 -c pytorch
Since I just want to test inference, I tried with the colab pre-installed torch 1.10 and it couldn't build the custom CUDA modules mentioned in GFLA. Note that with torch 1.0 that was happening. They've mentioned this

The Colab Demo for the Global-Flow-Local-Attention Model.
Note: we suggest to use GPUs with SM architecture higher than "SM60", such as P100, P4.
Bugs are found when running with GPUs: K80 (We would really appreciate if you can offer any help) .
Therefore, if you got GPUs listed above, please try to reset your runtime and get a different GPU.

Colab is giving me K80 always and here are the gencodes in setting up block_extractor, local_attn_reshape & resample2d_package

nvcc_args = [
    #'-gencode', 'arch=compute_50,code=sm_50',
    #'-gencode', 'arch=compute_52,code=sm_52',
    '-gencode', 'arch=compute_60,code=sm_60',
    '-gencode', 'arch=compute_61,code=sm_61',
    '-gencode', 'arch=compute_70,code=sm_70',
    '-gencode', 'arch=compute_70,code=compute_70'
]

Please suggest me a workaround for this if it's possible and let me know if it's possible on Colab

@cuiaiyu
Copy link
Owner

cuiaiyu commented Jan 20, 2022

If you only need to inference, you can bypass the installation of GLFA's CUDA function. specifying --frozen_flownet will bypass all CUDA function calls.

@preetshah7
Copy link
Author

Thanks for the response, I am onto trying that

@preetshah7
Copy link
Author

Screenshot from 2022-01-21 03-26-56
Have I passed it correctly here?

@preetshah7
Copy link
Author

Since the CUDA modules won't build, the flownet doesn't exist

@cuiaiyu
Copy link
Owner

cuiaiyu commented Jan 20, 2022

flownet.pt is the weight of pertrained flow model. please check Issue #23 at #23

In short, you don't need it, you can specify it as ```opt.flownet_path = ''````

@preetshah7
Copy link
Author

Thanks a lot for the help and yes the results are amazing. All the Best!

@nikky4D
Copy link

nikky4D commented Feb 23, 2022

flownet.pt is the weight of pertrained flow model. please check Issue #23 at #23

In short, you don't need it, you can specify it as ```opt.flownet_path = ''````

Is flownet required in the demo? In the demo, you specify opt.flownet_path = pretrained_models/flownet.pt

@cuiaiyu
Copy link
Owner

cuiaiyu commented Feb 26, 2022

No it is not required, you can specify it as empty array.

@cuiaiyu cuiaiyu closed this as completed Feb 26, 2022
@mahachaaben99
Copy link

Heey @preetshah7 did you solve the problem? I got the same error while trying the demo and I couldn't fix it

@preetshah7
Copy link
Author

Hi @maziqueen79 @nikky4D so, as suggested by the owner, I did not provide the flownet to the model.
notebook url
opt.flownet_path = ''
This worked for me.

@mahachaaben99
Copy link

thank you for your help @preetshah7

@MAmmarRaza
Copy link

Hi! respected researchers i am trying to run this demo but i am not showing here images with pose as like your output was showing before?
Screenshot from 2023-09-10 20-48-35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants