Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model command keeps getting killed. #158

Open
dr-slater opened this issue Mar 31, 2022 · 3 comments
Open

Model command keeps getting killed. #158

dr-slater opened this issue Mar 31, 2022 · 3 comments

Comments

@dr-slater
Copy link

After getting all the necessary libraries installed and resolving conflicts, running into the following issue (regardless of model) from live_demo.ipynb. The following was run from the command line to simplify, but gives the same result:

slater@linux:~/trt_pose/tasks/human_pose$ python3
Python 3.6.9 (default, Mar 15 2022, 13:55:28)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import json
import trt_pose.coco

with open('human_pose.json', 'r') as f:
... human_pose = json.load(f)
...
topology = trt_pose.coco.coco_category_to_topology(human_pose)
import trt_pose.models

num_parts = len(human_pose['keypoints'])
num_links = len(human_pose['skeleton'])

model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()
Killed

There is so little information there I'm not even sure where to start. Suggestions?

@dr-slater
Copy link
Author

So, in the tradition of "don't come to me with problems, come to me with solutions" - starting with that the board is a Jetson Nano 2GB (with a 1 TB microSD card):
I tried increasing the swap file to 4 GB - still killed.
I tried increasing the swap file to 8 GB - still killed.
Last shot, I tried increasing the swap file to 16GB - it worked!

Not sure why everyone else who has run this code managed to do it with smaller swap file sizes, but there you are.

@dr-slater
Copy link
Author

Well, spoke too soon. Flushed with success, I kept working through the live_demo all the way down to:

model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)

At which point, after a long while, it was killed again. Even increasing the swap file to 32G made no difference. Clearly everyone else who has made this work has not had this problem, so any suggestions would be great :)

@dr-slater
Copy link
Author

Final note - as an experiment, I increased the swap fill to 256G. Still crashed:

import json
import trt_pose.coco

with open('human_pose.json', 'r') as f:
... human_pose = json.load(f)
...
topology = trt_pose.coco.coco_category_to_topology(human_pose)
import trt_pose.models

num_parts = len(human_pose['keypoints'])
num_links = len(human_pose['skeleton'])

model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()
Killed
slater@linux:/trt_pose/tasks/human_pose$ free -h
total used free shared buff/cache available
Mem: 1.9G 1.6G 44M 15M 254M 197M
Swap: 249G 266M 249G
slater@linux:
/trt_pose/tasks/human_pose$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant