Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error interpolating frames #117

Open
xRoyBx opened this issue Nov 10, 2020 · 15 comments
Open

Error interpolating frames #117

xRoyBx opened this issue Nov 10, 2020 · 15 comments

Comments

@xRoyBx
Copy link

xRoyBx commented Nov 10, 2020

Hello, i always get this error during interpolation and can't proceed forward:

/content/DAIN
revise the unique id to a random numer 68776
Namespace(SAVED_MODEL=None, alpha=[0.0, 1.0], arg='./model_weights/68776-Tue-Nov-10-11-46/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dtype=<class 'torch.cuda.FloatTensor'>, end_frame=1259, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, frame_input_dir='/content/DAIN/input_frames', frame_output_dir='/content/DAIN/output_frames', log='./model_weights/68776-Tue-Nov-10-11-46/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/68776-Tue-Nov-10-11-46', save_which=1, seed=1, start_frame=1, time_step=0.5, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8)
cudnn is used
Interpolate 1 frames
error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Warning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) (THPFunction_do_forward at /pytorch/torch/csrc/autograd/python_function.cpp:622)
Traceback (most recent call last):
File "colab_interpolate.py", line 112, in
y_s, offset, filter = model(torch.stack((X0, X1),dim = 0))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/networks/DAIN_slowmotion.py", line 148, in forward
self.forward_flownets(self.flownets, cur_offset_input, time_offsets=time_offsets),
File "/content/DAIN/networks/DAIN_slowmotion.py", line 212, in forward_flownets
temp = model(input) # this is a single direction motion results, but not a bidirectional one
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/PWCNet/PWCNet.py", line 221, in forward
corr6 = self.corr(c16, c26)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 59, in forward
result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 27, in forward
self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f3c81ae3193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x628 (0x7f3c7e117b38 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x1bd4a (0x7f3c7e127d4a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x18890 (0x7f3c7e124890 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: python3() [0x50a4a5]

frame #7: python3() [0x594a01]
frame #9: THPFunction_do_forward(THPFunction
, _object
) + 0x4ac (0x7f3ccaaf4d4c in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #11: python3() [0x54a971]
frame #13: python3() [0x50a433]
frame #16: python3() [0x594a01]
frame #19: python3() [0x507be4]
frame #21: python3() [0x594a01]
frame #22: python3() [0x54a971]
frame #24: python3() [0x50a433]
frame #26: python3() [0x507be4]
frame #28: python3() [0x594a01]
frame #31: python3() [0x507be4]
frame #33: python3() [0x594a01]
frame #34: python3() [0x54a971]
frame #36: python3() [0x50a433]
frame #38: python3() [0x507be4]
frame #39: python3() [0x509900]
frame #40: python3() [0x50a2fd]
frame #42: python3() [0x507be4]
frame #44: python3() [0x594a01]
frame #47: python3() [0x507be4]
frame #49: python3() [0x594a01]
frame #50: python3() [0x54a971]
frame #52: python3() [0x50a433]
frame #54: python3() [0x507be4]
frame #56: python3() [0x634e72]
frame #61: __libc_start_main + 0xe7 (0x7f3cd5d04bf7 in /lib/x86_64-linux-gnu/libc.so.6)

@iBobbyTS
Copy link

I think you didn't build the packages correctly, make sure you have PyTorch >=1.0.0, <=1.4.0. If you're just intrested in doing the interpolation, check my repo iBobbyTS/VFIN, it's very easy to use, DAIN is included, there are colab notebooks that I tested, and there's also a tar file with everything (python, pytorch and dain packages) installed correctly. Open an issue there if there are any problem with it.

@xRoyBx
Copy link
Author

xRoyBx commented Nov 10, 2020

I don't know how to install/change PyTorch >=1.0.0, <=1.4.0 in colab (Win10).
Anyway, using the "official" notebook by Styler00Dollar and Alpha or other related notebooks, i get the same error.
I'll try with VFIN, thanks ;)

@iBobbyTS
Copy link

iBobbyTS commented Nov 10, 2020

To install pytorch 1.4, simply run this:
pip install torch==1.4.0
if you're using a notebook like Colab, add a ! befor, like
!pip install torch==1.4.0
I’m modifying code in VFIN very often now, so there mifht be errors while someone else use it. I'm still learning about GitHub, I might start using the brunch and release systems to keep stable versions and somewhere else to develop.

@TaoTeCha
Copy link

TaoTeCha commented Nov 10, 2020

This error has popped up a few times in the issues section of various DAIN colab repos. I have read it's possibly something with the GPU build. I would love for someone to look into this because I am not familiar at all with this area of coding and I've only gotten DAIN to work once (probably when I was assigned a P100). I get this same error when using different colabs for DAIN, specifically when I get a V100 I think.

Has anyone found a solution???

@AlphaGit
Copy link
Contributor

Hi! Yes, I know what the error is. I don’t know exactly how to fix it.

This is the relevant error line:

error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device

This means that the C native module built into CUDA is not properly compiled to a matching device. As Google keeps adding new models to their Colab support, we will keep finding these issues. This explains the symptoms that @TaoTeCha is seeing.

My first attempt at this was achieved in #87, where I manually added a bunch of models for all the GPUs I could find in Colab at that time (June 2020). I also added a structure that hopefully made it easier to add more overtime... but it’s less than ideal.

@xRoyBx Note that the version of the Colab with those fixes also suppresses a few warnings that I saw in your logs. Is it possible you’re not using the latest one? 1.5 is already in master, 1.5.1 is in a PR (#116).

If anyone knows how to achieve future general compatibility, I’d be glad to work on that.

@TaoTeCha
Copy link

I've trained and ran a handful of deep learning models in colab but in every case the GPU has been all set up and ready to go so I am totally ignorant with all this. I have a couple questions.

  1. Why do you need to do a 15 minute 'build' with DAIN when I have never had to do this with any other model I've used?
  2. What parameters do I need to change in the files to find a model that works with colab's V100? I'm willing to put in the trial and error work if someone enlightens me in what I should be changing.

Thanks

@AlphaGit
Copy link
Contributor

  1. DAIN is a mixture of different CNNs put together, some of them from previous papers. You can find more info here and in the original paper. So that you don’t have to run 6 CNNs in parallel, which is memory-expensive and incredibly slow, the authors compiled some of these “layers” into CUDA modules they could run in the GPU directly to train and infer with DAIN. These modules are the ones taking ~15 minutes and giving us these headaches.

  2. Check out this file.

@TaoTeCha
Copy link

Not sure if it's a coincidence but I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py and switched to !pip install torch==1.0.0 torchvision==0.2.1

The colab is working with a V100 now. I'll probably use this a handful of times over the next week and I'll keep you updated if it continues to work.

Thanks!

@xRoyBx
Copy link
Author

xRoyBx commented Nov 11, 2020

Thanks for interpolation fix, unfortunately i get this error when creating output video, apparently it doesn't create output frames:

/content/DAIN/output_frames
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
[image2 @ 0x55727fd56000] Could not open file : .png
[image2 @ 0x55727fd56000] Could not find codec parameters for stream 0 (Video: png, none(pc)): unspecified size
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Input #0, image2, from '
.png':
Duration: 00:00:00.02, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, none(pc), 60 tbr, 60 tbn, 60 tbc
Output #0, mp4, to '/content/gdrive/My Drive/DAIN/output.mp4':
Output file #0 does not contain any stream

CalledProcessError Traceback (most recent call last)
in ()
1 # Create output video
2 get_ipython().magic('cd {FRAME_OUTPUT_DIR}')
----> 3 get_ipython().magic("shell ffmpeg -y -r {TARGET_FPS} -f image2 -pattern_type glob -i '*.png' '/content/gdrive/My Drive/{OUTPUT_FILE_PATH}'")

3 frames
/usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self)
136 if self.returncode:
137 raise subprocess.CalledProcessError(
--> 138 returncode=self.returncode, cmd=self.args, output=self.output)
139
140 def repr_pretty(self, p, cycle): # pylint:disable=unused-argument

CalledProcessError: Command 'ffmpeg -y -r 60 -f image2 -pattern_type glob -i '*.png' '/content/gdrive/My Drive/DAIN/output.mp4'' returned non-zero exit status 1.

@TaoTeCha
Copy link

Are you sure your output path exists? Do you have a folder in your drive named DAIN? When you mounted you drive, did you mount it as gdrive or just drive? Try changing the output to '/content/output.mp4' and just download from the colab file folder.

Or try !ffmpeg instead of %shell ffmpeg

@xRoyBx
Copy link
Author

xRoyBx commented Nov 11, 2020

!ffmpeg

Are you sure your output path exists? Do you have a folder in your drive named DAIN? When you mounted you drive, did you mount it as gdrive or just drive? Try changing the output to '/content/output.mp4' and just download from the colab file folder.

Or try !ffmpeg instead of %shell ffmpeg

Paths are ok, DAIN folder is present, drive mounted as Gdrive: the problem is always the same (forget my previous post, sorry): it doesn't create output png frames despite the output folder is present (using Tesla V100 in colab)

@iBobbyTS
Copy link

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

@xRoyBx
Copy link
Author

xRoyBx commented Nov 14, 2020

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

using this command:
!/content/python/bin/python3 /content/VFIN/run.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4"

i get this error
/content/python/bin/python3: can't open file '/content/VFIN/run.py': [Errno 2] No such file or directory

@iBobbyTS
Copy link

iBobbyTS commented Nov 14, 2020

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

using this command:

!/content/python/bin/python3 /content/VFIN/run.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4"

i get this error

/content/python/bin/python3: can't open file '/content/VFIN/run.py': [Errno 2] No such file or directory

Sorry, I changed the name of the runing file, for this time, use
!/content/python/bin/python3 /content/VFIN/run_class.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4"
instead.
I fixed the GitHub repo and the pre-built tar file, copy the tar file to your drive again and use it next time, copy the notebook too, I edited it.
By the way, you need -a DAIN -ot video to make it use DAIN and output a video.

@iBobbyTS
Copy link

Any problem about VFIN, please open issues there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants