Colab pro error Interpolation #98

AlexU225 · 2020-09-03T17:42:48Z

https://colab.research.google.com/github/AhabbscienceStudioPak/DAIN/blob/master/DAIN_Colab.ipynb#scrollTo=LH7EmLT2gA4l
colab PRO assign GPU
name, driver_version, memory.total [MiB]
Tesla V100-SXM2-16GB, 418.67, 16130 MiB

#interpolation

/content/DAIN
revise the unique id to a random numer 91876
Namespace(SAVED_MODEL=None, alpha=[0.0, 1.0], arg='./model_weights/91876-Thu-Sep-03-17-38/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dtype=<class 'torch.cuda.FloatTensor'>, end_frame=137, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, frame_input_dir='/content/DAIN/input_frames', frame_output_dir='/content/DAIN/output_frames', log='./model_weights/91876-Thu-Sep-03-17-38/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/91876-Thu-Sep-03-17-38', save_which=1, seed=1, start_frame=1, time_step=0.2997002997002997, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8)
cudnn is used
Interpolate 2 frames
error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Warning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) (THPFunction_do_forward at /pytorch/torch/csrc/autograd/python_function.cpp:622)
Traceback (most recent call last):
File "colab_interpolate.py", line 112, in
y_s, offset, filter = model(torch.stack((X0, X1),dim = 0))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/networks/DAIN_slowmotion.py", line 148, in forward
self.forward_flownets(self.flownets, cur_offset_input, time_offsets=time_offsets),
File "/content/DAIN/networks/DAIN_slowmotion.py", line 212, in forward_flownets
temp = model(input) # this is a single direction motion results, but not a bidirectional one
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/PWCNet/PWCNet.py", line 221, in forward
corr6 = self.corr(c16, c26)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 59, in forward
result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 27, in forward
self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fc469e26193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x628 (0x7fc46625ab38 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x1bd4a (0x7fc46626ad4a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x18890 (0x7fc466267890 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: python3() [0x50a7f5]

frame #7: python3() [0x594b01]
frame #9: THPFunction_do_forward(THPFunction, _object) + 0x4ac (0x7fc4b2e37d4c in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #11: python3() [0x54ac61]
frame #13: python3() [0x50a783]
frame #16: python3() [0x594b01]
frame #19: python3() [0x507f24]
frame #21: python3() [0x594b01]
frame #22: python3() [0x54ac61]
frame #24: python3() [0x50a783]
frame #26: python3() [0x507f24]
frame #28: python3() [0x594b01]
frame #31: python3() [0x507f24]
frame #33: python3() [0x594b01]
frame #34: python3() [0x54ac61]
frame #36: python3() [0x50a783]
frame #38: python3() [0x507f24]
frame #39: python3() [0x509c50]
frame #40: python3() [0x50a64d]
frame #42: python3() [0x507f24]
frame #44: python3() [0x594b01]
frame #47: python3() [0x507f24]
frame #49: python3() [0x594b01]
frame #50: python3() [0x54ac61]
frame #52: python3() [0x50a783]
frame #54: python3() [0x507f24]
frame #56: python3() [0x634dd2]
frame #61: __libc_start_main + 0xe7 (0x7fc4be047b97 in /lib/x86_64-linux-gnu/libc.so.6)

please tell me how to deal with the error?

tianchengdw · 2020-09-04T01:54:00Z

I have the same problem.

AlphaGit · 2020-09-04T11:48:47Z

Hi there! You seem to be using an old version of the colab file. I believe that also the repository has changed minor things about the interpolation so if I was in your situation, I'd give it a try with the new version. You can find it here: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

AlexU225 · 2020-09-04T13:10:58Z

: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

Using this Colab, an error occurred in the fps detection block,
and Google drive was successfully connected.
sorry for my English, I'm using a translator

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory
cp: cannot stat 'Drive/Pexels': No such file or directory
cp: cannot stat 'Videos': No such file or directory
cp: cannot stat '2759484.mp4': No such file or directory

CalledProcessError Traceback (most recent call last)

in ()
1 # Detecting FPS of input file.
----> 2 get_ipython().magic('shell yes | cp -f /content/gdrive/My\ Drive/{INPUT_FILEPATH} /content/DAIN/')
3
4 import os
5 filename = os.path.basename(INPUT_FILEPATH)

3 frames

/usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self)
136 if self.returncode:
137 raise subprocess.CalledProcessError(
--> 138 returncode=self.returncode, cmd=self.args, output=self.output)
139
140 def repr_pretty(self, p, cycle): # pylint:disable=unused-argument

CalledProcessError: Command 'yes | cp -f /content/gdrive/My\ Drive//content/gdrive/My Drive/Pexels Videos 2759484.mp4 /content/DAIN/' returned non-zero exit status 1.

AlphaGit · 2020-09-04T22:14:05Z

@AlexU225 Hi, the error is simply that it's not finding the file path. See the error you got:

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory
cp: cannot stat 'Drive/Pexels': No such file or directory
cp: cannot stat 'Videos': No such file or directory
cp: cannot stat '2759484.mp4': No such file or directory

So, in parameters, instead of /content/gdrive/My Drive/Pexels... you should use Pexels...

AlexU225 · 2020-09-09T21:21:44Z

@AlexU225 Hi, the error is simply that it's not finding the file path. See the error you got:

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory
cp: cannot stat 'Drive/Pexels': No such file or directory
cp: cannot stat 'Videos': No such file or directory
cp: cannot stat '2759484.mp4': No such file or directory

So, in parameters, instead of /content/gdrive/My Drive/Pexels... you should use Pexels...

Thank you for your advice! But now there is an error in this block

File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

AlphaGit · 2020-09-10T00:18:47Z

Hey @AlexU225 I'm glad you made it that far! Unfortunately, I cannot help you there. That seems like a problem with the image processing itself.

niuhuojian · 2020-09-10T03:10:43Z

I have same problem with Tesla V100-SXM2-16GB,but P100-PCIE-16GB is work.
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
I tried to use new version,but it still happened.
please tell me how to deal with the error?

mpriessner · 2020-10-05T14:07:18Z

Hello,
I have the same problem as @niuhuojian also with the Tesla V100-SXM2-16GB. Which I am using on Google Colab.
Here I get the following error when running the notebook from: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

I tried already some things to fix it.
using different combinations of Cuda, gcc and torch versions . (Cuda 9.0, gcc 6.5, torch 1.0.0/Cuda 9.0, gcc 6.5, torch 1.1.0/ Cuda 9.0, gcc 4.8, torch, pytorch 0.4.1 /Cuda 10.0 gcc 7.5, torch 1.4/Cuda 10.1 gcc 7.5, torch 1.6.)
But non of them worked for me.

I also tried to use the solution from from CyFeng16 from issue#44 in but this also seems to stop working.

When I use Cuda 9.0 with gcc-4.8 g++-4.8 which used to work around 4 month ago. This one as well as some of the other combinations gave me the FilterInterpolation Module error from the my_packages folder. see below:

Traceback (most recent call last):
File "train.py", line 15, in
import networks
File "/content/DAIN/networks/init.py", line 1, in
from .DAIN import DAIN
File "/content/DAIN/networks/DAIN.py", line 4, in
from my_package.FilterInterpolation import FilterInterpolationModule
File "/content/DAIN/my_package/FilterInterpolation/init.py", line 1, in
from .FilterInterpolationModule import *
File "/content/DAIN/my_package/FilterInterpolation/FilterInterpolationModule.py", line 6, in
from .FilterInterpolationLayer import FilterInterpolationLayer,WeightLayer, PixelValueLayer,PixelWeightLayer,ReliableWeightLayer
File "/content/DAIN/my_package/FilterInterpolation/FilterInterpolationLayer.py", line 4, in
import filterinterpolation_cuda as my_lib
ModuleNotFoundError: No module named 'filterinterpolation_cuda'

I am slowly running out of ideas to fix that. Does anyone have a working notebook, or an idea what else I could try to do?
That would be great!

iBobbyTS · 2020-11-11T14:31:10Z

Hi there, I think that's caused by the building process of DAIN packages
ModuleNotFoundError: No module named 'filterinterpolation_cuda'
This means the "filterinterpolation_cuda" package is not installed. Did you run build.sh?
Since you have V100 with the compute compatibility of 7.0, you should uncomment the line
# '-gencode', 'arch=compute_70,code=sm_70',
at DAIN/my_package/compiler_args.py. Then run the build.sh at my_package and PWCNet.
AD. For eaiser installation and usage, refer to iBobbyTS/VFIN, this is kind of like a Video interpolation toolkit, of cause DAIN is in it. I have a colab notebook and you can store the whole built VFIN in drive, every time you only need to extract the files to Colab Runtime and you can start using it.

semel1 · 2020-12-10T21:02:22Z

I have same problem with Tesla V100-SXM2-16GB,
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80).
niuhuojian said that P100-PCIE-16GB works, unfortunately I can't specify which GPU should be used .
The only reason to stack with this version becouse intrigued the ability to specify random output FPS (60fps)

iBobbyTS · 2020-12-12T15:22:14Z

I have same problem with Tesla V100-SXM2-16GB,

RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80).

niuhuojian said that P100-PCIE-16GB works, unfortunately I can't specify which GPU should be used .

The only reason to stack with this version becouse intrigued the ability to specify random output FPS (60fps)

Did you try my suggestions a month ago?

semel1 · 2020-12-12T17:50:37Z

I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py as you suggested and switched to !pip install torch==1.0.0 torchvision==0.2.1 as TaoTeCha suggested in another post #117 (comment)
The colab is working with a V100 now

semel1 · 2020-12-13T02:56:20Z

Any chance to make Windows binary?

WilliamJudge94 · 2021-03-10T00:33:08Z

I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py as you suggested and switched to !pip install torch==1.0.0 torchvision==0.2.1 as TaoTeCha suggested in another post #117 (comment)
The colab is working with a V100 now

This is in DAIN/my_package/compiler_args.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colab pro error Interpolation #98

Colab pro error Interpolation #98

AlexU225 commented Sep 3, 2020 •

edited

tianchengdw commented Sep 4, 2020

AlphaGit commented Sep 4, 2020

AlexU225 commented Sep 4, 2020

AlphaGit commented Sep 4, 2020 •

edited

AlexU225 commented Sep 9, 2020

AlphaGit commented Sep 10, 2020

niuhuojian commented Sep 10, 2020

mpriessner commented Oct 5, 2020 •

edited

iBobbyTS commented Nov 11, 2020

semel1 commented Dec 10, 2020

iBobbyTS commented Dec 12, 2020

semel1 commented Dec 12, 2020 •

edited

semel1 commented Dec 13, 2020

WilliamJudge94 commented Mar 10, 2021

Colab pro error Interpolation #98

Colab pro error Interpolation #98

Comments

AlexU225 commented Sep 3, 2020 • edited

tianchengdw commented Sep 4, 2020

AlphaGit commented Sep 4, 2020

AlexU225 commented Sep 4, 2020

AlphaGit commented Sep 4, 2020 • edited

AlexU225 commented Sep 9, 2020

AlphaGit commented Sep 10, 2020

niuhuojian commented Sep 10, 2020

mpriessner commented Oct 5, 2020 • edited

iBobbyTS commented Nov 11, 2020

semel1 commented Dec 10, 2020

iBobbyTS commented Dec 12, 2020

semel1 commented Dec 12, 2020 • edited

semel1 commented Dec 13, 2020

WilliamJudge94 commented Mar 10, 2021

AlexU225 commented Sep 3, 2020 •

edited

AlphaGit commented Sep 4, 2020 •

edited

mpriessner commented Oct 5, 2020 •

edited

semel1 commented Dec 12, 2020 •

edited