Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab pro error Interpolation #98

Open
AlexU225 opened this issue Sep 3, 2020 · 14 comments
Open

Colab pro error Interpolation #98

AlexU225 opened this issue Sep 3, 2020 · 14 comments

Comments

@AlexU225
Copy link

AlexU225 commented Sep 3, 2020

https://colab.research.google.com/github/AhabbscienceStudioPak/DAIN/blob/master/DAIN_Colab.ipynb#scrollTo=LH7EmLT2gA4l
colab PRO assign GPU
name, driver_version, memory.total [MiB]
Tesla V100-SXM2-16GB, 418.67, 16130 MiB

#interpolation

/content/DAIN
revise the unique id to a random numer 91876
Namespace(SAVED_MODEL=None, alpha=[0.0, 1.0], arg='./model_weights/91876-Thu-Sep-03-17-38/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dtype=<class 'torch.cuda.FloatTensor'>, end_frame=137, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, frame_input_dir='/content/DAIN/input_frames', frame_output_dir='/content/DAIN/output_frames', log='./model_weights/91876-Thu-Sep-03-17-38/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/91876-Thu-Sep-03-17-38', save_which=1, seed=1, start_frame=1, time_step=0.2997002997002997, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8)
cudnn is used
Interpolate 2 frames
error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Warning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) (THPFunction_do_forward at /pytorch/torch/csrc/autograd/python_function.cpp:622)
Traceback (most recent call last):
File "colab_interpolate.py", line 112, in
y_s, offset, filter = model(torch.stack((X0, X1),dim = 0))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/networks/DAIN_slowmotion.py", line 148, in forward
self.forward_flownets(self.flownets, cur_offset_input, time_offsets=time_offsets),
File "/content/DAIN/networks/DAIN_slowmotion.py", line 212, in forward_flownets
temp = model(input) # this is a single direction motion results, but not a bidirectional one
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/PWCNet/PWCNet.py", line 221, in forward
corr6 = self.corr(c16, c26)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 59, in forward
result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 27, in forward
self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fc469e26193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x628 (0x7fc46625ab38 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x1bd4a (0x7fc46626ad4a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x18890 (0x7fc466267890 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: python3() [0x50a7f5]

frame #7: python3() [0x594b01]
frame #9: THPFunction_do_forward(THPFunction
, _object
) + 0x4ac (0x7fc4b2e37d4c in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #11: python3() [0x54ac61]
frame #13: python3() [0x50a783]
frame #16: python3() [0x594b01]
frame #19: python3() [0x507f24]
frame #21: python3() [0x594b01]
frame #22: python3() [0x54ac61]
frame #24: python3() [0x50a783]
frame #26: python3() [0x507f24]
frame #28: python3() [0x594b01]
frame #31: python3() [0x507f24]
frame #33: python3() [0x594b01]
frame #34: python3() [0x54ac61]
frame #36: python3() [0x50a783]
frame #38: python3() [0x507f24]
frame #39: python3() [0x509c50]
frame #40: python3() [0x50a64d]
frame #42: python3() [0x507f24]
frame #44: python3() [0x594b01]
frame #47: python3() [0x507f24]
frame #49: python3() [0x594b01]
frame #50: python3() [0x54ac61]
frame #52: python3() [0x50a783]
frame #54: python3() [0x507f24]
frame #56: python3() [0x634dd2]
frame #61: __libc_start_main + 0xe7 (0x7fc4be047b97 in /lib/x86_64-linux-gnu/libc.so.6)

please tell me how to deal with the error?

@tianchengdw
Copy link

I have the same problem.

@AlphaGit
Copy link
Contributor

AlphaGit commented Sep 4, 2020

Hi there! You seem to be using an old version of the colab file. I believe that also the repository has changed minor things about the interpolation so if I was in your situation, I'd give it a try with the new version. You can find it here: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

@AlexU225
Copy link
Author

AlexU225 commented Sep 4, 2020

: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

Using this Colab, an error occurred in the fps detection block,
and Google drive was successfully connected.
sorry for my English, I'm using a translator
Снимок

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory
cp: cannot stat 'Drive/Pexels': No such file or directory
cp: cannot stat 'Videos': No such file or directory
cp: cannot stat '2759484.mp4': No such file or directory


CalledProcessError Traceback (most recent call last)

in ()
1 # Detecting FPS of input file.
----> 2 get_ipython().magic('shell yes | cp -f /content/gdrive/My\ Drive/{INPUT_FILEPATH} /content/DAIN/')
3
4 import os
5 filename = os.path.basename(INPUT_FILEPATH)

3 frames

/usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self)
136 if self.returncode:
137 raise subprocess.CalledProcessError(
--> 138 returncode=self.returncode, cmd=self.args, output=self.output)
139
140 def repr_pretty(self, p, cycle): # pylint:disable=unused-argument

CalledProcessError: Command 'yes | cp -f /content/gdrive/My\ Drive//content/gdrive/My Drive/Pexels Videos 2759484.mp4 /content/DAIN/' returned non-zero exit status 1.

@AlphaGit
Copy link
Contributor

AlphaGit commented Sep 4, 2020

@AlexU225 Hi, the error is simply that it's not finding the file path. See the error you got:

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory
cp: cannot stat 'Drive/Pexels': No such file or directory
cp: cannot stat 'Videos': No such file or directory
cp: cannot stat '2759484.mp4': No such file or directory

So, in parameters, instead of /content/gdrive/My Drive/Pexels... you should use Pexels...

@AlexU225
Copy link
Author

AlexU225 commented Sep 9, 2020

@AlexU225 Hi, the error is simply that it's not finding the file path. See the error you got:

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory
cp: cannot stat 'Drive/Pexels': No such file or directory
cp: cannot stat 'Videos': No such file or directory
cp: cannot stat '2759484.mp4': No such file or directory

So, in parameters, instead of /content/gdrive/My Drive/Pexels... you should use Pexels...

Thank you for your advice! But now there is an error in this block

File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

@AlphaGit
Copy link
Contributor

Hey @AlexU225 I'm glad you made it that far! Unfortunately, I cannot help you there. That seems like a problem with the image processing itself.

@niuhuojian
Copy link

I have same problem with Tesla V100-SXM2-16GB,but P100-PCIE-16GB is work.
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
I tried to use new version,but it still happened.
please tell me how to deal with the error?

@mpriessner
Copy link

mpriessner commented Oct 5, 2020

Hello,
I have the same problem as @niuhuojian also with the Tesla V100-SXM2-16GB. Which I am using on Google Colab.
Here I get the following error when running the notebook from: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

Capture

I tried already some things to fix it.
using different combinations of Cuda, gcc and torch versions . (Cuda 9.0, gcc 6.5, torch 1.0.0/Cuda 9.0, gcc 6.5, torch 1.1.0/ Cuda 9.0, gcc 4.8, torch, pytorch 0.4.1 /Cuda 10.0 gcc 7.5, torch 1.4/Cuda 10.1 gcc 7.5, torch 1.6.)
But non of them worked for me.

I also tried to use the solution from from CyFeng16 from issue#44 in but this also seems to stop working.

When I use Cuda 9.0 with gcc-4.8 g++-4.8 which used to work around 4 month ago. This one as well as some of the other combinations gave me the FilterInterpolation Module error from the my_packages folder. see below:

Traceback (most recent call last):
File "train.py", line 15, in
import networks
File "/content/DAIN/networks/init.py", line 1, in
from .DAIN import DAIN
File "/content/DAIN/networks/DAIN.py", line 4, in
from my_package.FilterInterpolation import FilterInterpolationModule
File "/content/DAIN/my_package/FilterInterpolation/init.py", line 1, in
from .FilterInterpolationModule import *
File "/content/DAIN/my_package/FilterInterpolation/FilterInterpolationModule.py", line 6, in
from .FilterInterpolationLayer import FilterInterpolationLayer,WeightLayer, PixelValueLayer,PixelWeightLayer,ReliableWeightLayer
File "/content/DAIN/my_package/FilterInterpolation/FilterInterpolationLayer.py", line 4, in
import filterinterpolation_cuda as my_lib
ModuleNotFoundError: No module named 'filterinterpolation_cuda'

I am slowly running out of ideas to fix that. Does anyone have a working notebook, or an idea what else I could try to do?
That would be great!

@iBobbyTS
Copy link

Hi there, I think that's caused by the building process of DAIN packages
ModuleNotFoundError: No module named 'filterinterpolation_cuda'
This means the "filterinterpolation_cuda" package is not installed. Did you run build.sh?
Since you have V100 with the compute compatibility of 7.0, you should uncomment the line
# '-gencode', 'arch=compute_70,code=sm_70',
at DAIN/my_package/compiler_args.py. Then run the build.sh at my_package and PWCNet.
AD. For eaiser installation and usage, refer to iBobbyTS/VFIN, this is kind of like a Video interpolation toolkit, of cause DAIN is in it. I have a colab notebook and you can store the whole built VFIN in drive, every time you only need to extract the files to Colab Runtime and you can start using it.

@semel1
Copy link

semel1 commented Dec 10, 2020

I have same problem with Tesla V100-SXM2-16GB,
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80).
niuhuojian said that P100-PCIE-16GB works, unfortunately I can't specify which GPU should be used .
The only reason to stack with this version becouse intrigued the ability to specify random output FPS (60fps)

@iBobbyTS
Copy link

I have same problem with Tesla V100-SXM2-16GB,

RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80).

niuhuojian said that P100-PCIE-16GB works, unfortunately I can't specify which GPU should be used .

The only reason to stack with this version becouse intrigued the ability to specify random output FPS (60fps)

Did you try my suggestions a month ago?

@semel1
Copy link

semel1 commented Dec 12, 2020

I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py as you suggested and switched to !pip install torch==1.0.0 torchvision==0.2.1 as TaoTeCha suggested in another post #117 (comment)
The colab is working with a V100 now

@semel1
Copy link

semel1 commented Dec 13, 2020

Any chance to make Windows binary?

@WilliamJudge94
Copy link

I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py as you suggested and switched to !pip install torch==1.0.0 torchvision==0.2.1 as TaoTeCha suggested in another post #117 (comment)
The colab is working with a V100 now

This is in DAIN/my_package/compiler_args.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants