Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting segmentation_fault on training with viewer #26

Open
cduguet opened this issue Jul 6, 2023 · 2 comments
Open

Getting segmentation_fault on training with viewer #26

cduguet opened this issue Jul 6, 2023 · 2 comments

Comments

@cduguet
Copy link

cduguet commented Jul 6, 2023

Hello,
I'm running a remote ec2 instance, with a remote desktop client called Nice DCV (a competitor to VNC for enterprise, free for ec2). 24GB VRAM and 64GB RAM.

I can train without a viewer with no problems. However, when I try to run it with a viewer, I get segmentation_fault. The app window opens and nothing gets to load before it crashes.

I have tried both experimental and normal docker builds (I have only tried docker). I have tried checking out multiple versions of the repo (783c41f and e72ae5b), to see if the problem was recently introduced. Nothing has worked so far. The problem I get looks like this:

/workspace/permuto_sdf$ ./permuto_sdf_py/train_permuto_sdf.py --dataset dtu --scene dtu_scan24 --comp_name comp_3 --exp_info default 
args.with_mask False
args.low_res False
checkpoint_path /workspace/permuto_sdf/checkpoints
with_viewer True
has_apex True
[    D96CB740]DataLoaderDTU.cxx:173      1| loaded nr of scenes 1 for mode train
[    D96CB740]DataLoaderDTU.cxx:432      1| reading poses and intrinsics for scene "dtu_scan24"
[    D96CB740]DataLoaderDTU.cxx:173      1| loaded nr of scenes 1 for mode test
[    D96CB740]DataLoaderDTU.cxx:432      1| reading poses and intrinsics for scene "dtu_scan24"
[    D96CB740]    Mesh.cxx:3390     1| read obj with path /workspace/easy_pbr/data/sphere.obj
Segmentation fault (core dumped)

In contrast, when I train without a viewer, it looks like this:

/workspace/permuto_sdf$ ./permuto_sdf_py/train_permuto_sdf.py --dataset dtu --scene dtu_scan24 --comp_name comp_3 --exp_info default --no_viewer
args.with_mask False
args.low_res False
checkpoint_path /workspace/permuto_sdf/checkpoints
with_viewer False
has_apex True
[    2A5FF740]DataLoaderDTU.cxx:173      1| loaded nr of scenes 1 for mode train
[    2A5FF740]DataLoaderDTU.cxx:432      1| reading poses and intrinsics for scene "dtu_scan24"
[    2A5FF740]DataLoaderDTU.cxx:173      1| loaded nr of scenes 1 for mode test
[    2A5FF740]DataLoaderDTU.cxx:432      1| reading poses and intrinsics for scene "dtu_scan24"
phase.iter_nr 1000 loss  1.3530950546264648
phase.iter_nr 2000 loss  0.15609805285930634
phase.iter_nr 3000 loss  0.10311679542064667
...

How should I best troubleshoot this?

@yankuai
Copy link

yankuai commented Oct 27, 2023

The same, I'm using wsl on windows to get a linux virtual system. Then run the docker as instructed. I can train without viewer, but get segmentation fault with the viewer. Now I check the result by creating mesh from the saved checkpoint.

@RaduAlexandru
Copy link
Owner

Hi @cduguet @yankuai ,

Unfortunately the viewer cannot currently render on headless machines so I only used it when training locally. If you train on EC2 instances I recommend disabling the viewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants