Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division by zero #4

Closed
Michaelwhite34 opened this issue Jan 21, 2022 · 29 comments
Closed

Division by zero #4

Michaelwhite34 opened this issue Jan 21, 2022 · 29 comments

Comments

@Michaelwhite34
Copy link

(PVDNet) PS D:\PVDNet> python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch
Laoding Config for evaluation
Traceback (most recent call last):
File "run.py", line 311, in
config = config_lib.get_config(args.project, args.mode, args.config)
File "D:\PVDNet\configs\config_PVDNet_large.py", line 42, in get_config
IpE = math.ceil((len(list(range(0, total_frame_num - (config.frame_itr_num-1), config.frame_itr_num)))) / actual_batch_size) * config.frame_itr_num
ZeroDivisionError: division by zero

@codeslake
Copy link
Owner

Hi, @Michaelwhite34.

Is Cuda available on your device?

@Michaelwhite34
Copy link
Author

Hi, @Michaelwhite34.

Is Cuda available on your device?

Yes,I have a gtx 1660Ti,by the way it's on windows 10.

@codeslake
Copy link
Owner

codeslake commented Jan 21, 2022

I've never tried running PyTorch on Windows.
Is nvidia-smi available on Windows?
Would you prepend CUDA_VISIBLE_DEVICES=0 before the command for evaluating PVDNet?

Or, you can just set IpE=0. I just put it to check the number of iterations needed for an epoch.

@Michaelwhite34
Copy link
Author

I've never tried running PyTorch on Windows. Is nvidia-smi available on Windows? Would you prepend CUDA_VISIBLE_DEVICES=0 before the command for evaluating PVDNet?

Or, you can just set IpE=0. I just put it to check the number of iterations needed for an epoch.

After setting it to 0,(PVDNet) PS D:\PVDNet> python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch
Laoding Config for evaluation
Project : PVDNet_TOG2021
Mode : PVDNet_large_nah
Config: config_PVDNet_large
Network: PVDNet_large
Trainer: trainer
Loading Model...
initializing deblurring network
Traceback (most recent call last):
File "run.py", line 341, in
eval(config)
File "D:\PVDNet\eval.py", line 198, in eval
eval_quan_qual(config)
File "D:\PVDNet\eval.py", line 68, in eval_quan_qual
blur_folder_path_list, blur_file_path_list, gt_file_path_list = init(config, mode)
File "D:\PVDNet\eval.py", line 43, in init
model = create_model(config)
File "D:\PVDNet\models_init_.py", line 5, in create_model
model = lib.Model(config)
File "D:\PVDNet\models\trainers\trainer.py", line 35, in init
self.network = DeblurNet(config).to(torch.device('cuda'))
File "D:\PVDNet\models\trainers\trainer.py", line 255, in init
lib = importlib.import_module('models.archs.{}'.format(config.network_BIMNet))
File "D:\anaconda3\lib\importlib_init_.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "D:\PVDNet\models\archs\liteFlowNet.py", line 6, in
if float(torch.version.cuda) > 10.1:
TypeError: float() argument must be a string or a number, not 'NoneType'

@codeslake
Copy link
Owner

You will need to find a way to get cuda version in Windows (which is torch.viersion.cuda in ubuntu).

@Michaelwhite34
Copy link
Author

You will need to find a way to get cuda version in Windows (which is torch.viersion.cuda in ubuntu).

nvidia-smi
Sat Jan 22 10:13:20 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 472.84 Driver Version: 472.84 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 On | N/A |
| N/A 46C P8 4W / N/A | 507MiB / 6144MiB | 22% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

@codeslake
Copy link
Owner

codeslake commented Jan 22, 2022

I meant to find a way to replace this line for Windows, but, since your machine has cuda 11, try replacing these 4 lines to from models.archs.torch_correlation_C11 import FunctionCorrelation

I hope the correlation layer needed for the lineFlowNet is installed without problem for Windows (I can't help with that).

@Michaelwhite34
Copy link
Author

(PVDNet) PS D:\PVDNet> python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch
�[1m�[32mLaoding Config for evaluation�[0m
�[1m�[31m Project : PVDNet_TOG2021�[0m
�[1m�[31m Mode : PVDNet_large_nah�[0m
�[1m�[31m Config: config_PVDNet_large�[0m
�[1m�[31m Network: PVDNet_large�[0m
�[1m�[31m Trainer: trainer�[0m
�[1m�[32mLoading Model...�[0m
�[1m�[31m initializing deblurring network�[0m
Warning! No positional inputs found for a module, assuming batch size is 1.
�[1m�[32mComputing model complexity...�[0m
Computational complexity (Macs): 1755.80841016 B
Number of parameters: 23.36178 M

Loading checkpoint 'PVDNet_large_nah.pytorch' on model 'PVDNet_large_nah':
Traceback (most recent call last):
File "run.py", line 341, in
eval(config)
File "D:\PVDNet\eval.py", line 198, in eval
eval_quan_qual(config)
File "D:\PVDNet\eval.py", line 186, in eval_quan_qual
total_itr_time = total_itr_time / total_norm
ZeroDivisionError: division by zero

@codeslake
Copy link
Owner

That means the evaluation has never gone through a loop. Make sure the test set is in the right place.

@Michaelwhite34
Copy link
Author

Michaelwhite34 commented Jan 22, 2022

My sequence structure is D:\PVDNet\datasets\video_deblur\random\1\ *.png ...Is this not right ?

@codeslake
Copy link
Owner

I believe it is right. Did you change data_offset?

@Michaelwhite34
Copy link
Author

No,should I change it ?

@Michaelwhite34
Copy link
Author

emm,should I just unmark the line "config.data_offset = 'datasets/video_deblur" ?

@codeslake
Copy link
Owner

codeslake commented Jan 22, 2022

Set it to D:\PVDNet\datasets\video_deblur.
Also, for checking the result, change log_offet to ./.

@Michaelwhite34
Copy link
Author

Michaelwhite34 commented Jan 22, 2022

(PVDNet) PS D:\PVDNet> python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch
Traceback (most recent call last):
File "run.py", line 201, in
from configs.config import set_train_path
File "D:\PVDNet\configs\config.py", line 43
config.data_offset = 'D:/PVDNet/datasets/video_deblur'
^
IndentationError: unexpected indent

@codeslake
Copy link
Owner

It says indentation error. Check the indentation.

@Michaelwhite34
Copy link
Author

Still
(PVDNet) PS D:\PVDNet> python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch
�[1m�[32mLaoding Config for evaluation�[0m
�[1m�[31m Project : PVDNet_TOG2021�[0m
�[1m�[31m Mode : PVDNet_large_nah�[0m
�[1m�[31m Config: config_PVDNet_large�[0m
�[1m�[31m Network: PVDNet_large�[0m
�[1m�[31m Trainer: trainer�[0m
�[1m�[32mLoading Model...�[0m
�[1m�[31m initializing deblurring network�[0m
Warning! No positional inputs found for a module, assuming batch size is 1.
�[1m�[32mComputing model complexity...�[0m
Computational complexity (Macs): 1755.80841016 B
Number of parameters: 23.36178 M

Loading checkpoint 'PVDNet_large_nah.pytorch' on model 'PVDNet_large_nah':
Traceback (most recent call last):
File "run.py", line 341, in
eval(config)
File "D:\PVDNet\eval.py", line 198, in eval
eval_quan_qual(config)
File "D:\PVDNet\eval.py", line 186, in eval_quan_qual
total_itr_time = total_itr_time / total_norm
ZeroDivisionError: division by zero

@codeslake
Copy link
Owner

codeslake commented Jan 22, 2022

Try to use '\' for data_offset, not '/'.

I will be out for a few hours. I am pretty sure it is the path problem.

@Michaelwhite34
Copy link
Author

Try to use '' for data_offset, not '/'.

I will be out for a few hours. I am pretty sure it is the path problem.

I am using \ already

@codeslake
Copy link
Owner

(PVDNet) PS D:\PVDNet> python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch
Traceback (most recent call last):
File "run.py", line 201, in
from configs.config import set_train_path
File "D:\PVDNet\configs\config.py", line 43
config.data_offset = 'D:/PVDNet/datasets/video_deblur'
^
IndentationError: unexpected indent

Here, you are using '/'.

@Michaelwhite34
Copy link
Author

Michaelwhite34 commented Jan 22, 2022

(PVDNet) PS D:\PVDNet> python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch
Traceback (most recent call last):
File "run.py", line 201, in
from configs.config import set_train_path
File "D:\PVDNet\configs\config.py", line 43
config.data_offset = 'D:/PVDNet/datasets/video_deblur'
^
IndentationError: unexpected indent

Here, you are using '/'.

I changed after that.
config.data_offset = 'D:\PVDNet\datasets\video_deblur'

@Michaelwhite34
Copy link
Author

Do I need to change the line "config.data = 'DVD' # 'nah'"?

@codeslake
Copy link
Owner

Do I need to change the line "config.data = 'DVD' # 'nah'"?

No, it is given in python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch. Did you get it running?

@Michaelwhite34
Copy link
Author

Do I need to change the line "config.data = 'DVD' # 'nah'"?

No, it is given in python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch. Did you get it running?

still (PVDNet) PS D:\PVDNet> python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data random --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch
�[1m�[32mLaoding Config for evaluation�[0m
�[1m�[31m Project : PVDNet_TOG2021�[0m
�[1m�[31m Mode : PVDNet_large_nah�[0m
�[1m�[31m Config: config_PVDNet_large�[0m
�[1m�[31m Network: PVDNet_large�[0m
�[1m�[31m Trainer: trainer�[0m
�[1m�[32mLoading Model...�[0m
�[1m�[31m initializing deblurring network�[0m
Warning! No positional inputs found for a module, assuming batch size is 1.
�[1m�[32mComputing model complexity...�[0m
Computational complexity (Macs): 1755.80841016 B
Number of parameters: 23.36178 M

Loading checkpoint 'PVDNet_large_nah.pytorch' on model 'PVDNet_large_nah':
Traceback (most recent call last):
File "run.py", line 341, in
eval(config)
File "D:\PVDNet\eval.py", line 198, in eval
eval_quan_qual(config)
File "D:\PVDNet\eval.py", line 186, in eval_quan_qual
total_itr_time = total_itr_time / total_norm
ZeroDivisionError: division by zero

@codeslake
Copy link
Owner

Would you replace this line to the following lines and print out the result?

print(config.EVAL.data_path, config.EVAL.input_path)
blur_folder_path_list, blur_file_path_list, _ = load_file_list(config.EVAL.data_path, config.EVAL.input_path)
print(blur_file_path_list)

@Michaelwhite34
Copy link
Author

Would you replace this line to the following lines and print out the result?

print(config.EVAL.data_path, config.EVAL.input_path)
blur_folder_path_list, blur_file_path_list, _ = load_file_list(config.EVAL.data_path, config.EVAL.input_path)
print(blur_file_path_list)

Got it working,turns out I should use "/"

@codeslake
Copy link
Owner

Got it working, turns out I should use "/"

Great!

@Michaelwhite34
Copy link
Author

Can we deal with cuda out of memory?

@codeslake
Copy link
Owner

You may try smaller model e.g., python run.py --mode PVDNet_nah --config config_PVDNet --data random --ckpt_abs_name ckpt/PVDNet_nah.pytorch.
You may also try downsmaple input frames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants