Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run.sh is killed #58

Closed
YeeHoran opened this issue May 1, 2023 · 5 comments
Closed

run.sh is killed #58

YeeHoran opened this issue May 1, 2023 · 5 comments

Comments

@YeeHoran
Copy link

YeeHoran commented May 1, 2023

Dears,

When I run the .sh by cmd "./run.sh" as following, it always shows ***** killed. Do you know how to get rid of the problem, please?

Thank you in advance!

Yi Huo

(base) yihuo@yihuo:~/Documents/deephar-master$ ./run.sh

~/Documents/deephar-master ~/Documents/deephar-master
fatal: not a git repository (or any of the parent directories): .git
Initializing deephar v0.5.0
CUDA_VISIBLE_DEVICES: 2
2023-04-30 20:27:21.740019: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-04-30 20:27:21.797162: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-04-30 20:27:22.088804: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-04-30 20:27:22.089272: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-30 20:27:22.808816: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Using keras version "2.12.0"
/home/yihuo/Documents/deephar-master/exp/mpii/train_mpii_singleperson.py:94: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.
model.fit_generator(data_tr,
2023-04-30 20:28:17.264850: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
[[{{node Placeholder/0}}]]
./run.sh: line 52: 104766 Killed python3 exp/mpii/train_mpii_singleperson.py output/mpii_singleperson_trial-00

@dluvizon
Copy link
Owner

dluvizon commented May 1, 2023

Hi @YeeHoran ,
Please check the file requirements.txt and double check your environment. It seems you are using a different version of Keras and that your tensorflow is not able to use the gpu.

@YeeHoran
Copy link
Author

YeeHoran commented May 2, 2023

Dear @dluvizon

Thank you so much for your quick reply! And really the tensorflow is not enabled to use the gpu. In fact the version of the tools listed in requirement.txt are posted in the below:

numpy: 1.23.5
keras: 2.12.0
tensorflow-gpu: 2.12.0
pillow: 9.4.0
scipy: 1.10.0
h5py:3.7.0

It could be observed that they are all newer than the versions listed in the requirement.txt file. In my opinion, newer versions are backward compatible, so they could be used as well. Do you think this is correct, please?

Thank you for your kind support!

@dluvizon
Copy link
Owner

dluvizon commented May 2, 2023

Hi, I'm pretty sure that it will not work. This code uses TF 1.x, which is quite often not compatible with TF 2.

@YeeHoran
Copy link
Author

YeeHoran commented May 4, 2023

Thank you so much again for your kind support @dluvizon,

Now I run it, but it stopped again, and present the following information:

File "/home/yihuo/Documents/deephar-master/exp/mpii/train_mpii_singleperson.py", line 94, in
model.fit_generator(data_tr,
File "/home/yihuo/anaconda3/lib/python3.10/site-packages/keras/src/engine/training.py", line 2810, in fit_generator
return self.fit(
File "/home/yihuo/anaconda3/lib/python3.10/sinceite-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/yihuo/Documents/deephar-master/exp/common/mpii_tools.py", line 156, in on_epoch_end
scores = eval_singleperson_pckh(model, self.fval, self.pval,
File "/home/yihuo/Documents/deephar-master/exp/common/mpii_tools.py", line 67, in eval_singleperson_pckh
input_shape = model.get_input_shape_at(0)
RuntimeError: The layer model_1 has never been called and thus has no defined input shape.
./run.sh: line 52: 2374 Killed python3 exp/mpii/train_mpii_singleperson.py output/mpii_singleperson_trial-00_

Additionally, the log.txt is in Epoch 1/120, and since the end of the file shows 1073/1073, so may be now the problem is it runs out of ram, do you think this is right, please? Thank you!
log.txt

@github-actions
Copy link

github-actions bot commented Jul 4, 2023

Stale issue message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants