Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run inference on single GPU #32

Closed
kamalasubha opened this issue Aug 28, 2022 · 7 comments
Closed

Run inference on single GPU #32

kamalasubha opened this issue Aug 28, 2022 · 7 comments

Comments

@kamalasubha
Copy link

Hi,
I am able to do all setup as per instructions given in README
In the evaluation step,

python -m torch.distributed.launch --nproc_per_node=4 tools/test.py --cfg config/lidar_rcnn.yaml --checkpoint outputs/lidar_rcnn/checkpoint_lidar_rcnn_59.pth.tar
python tools/create_results.py --cfg config/lidar_rcnn.yaml

I am facing the following questions while running the evaluation.

  1. How to change the command to run a single GPU, nproc_per_node needs to be 1.
  2. What should be MODEL.Frame number for checkpoint_lidar_rcnn_59.pth.tar?
    Since I am trying to understand the evaluation, kindly help me on this to fix.
@Lzc6996
Copy link
Collaborator

Lzc6996 commented Aug 31, 2022

@kamalasubha

  1. Note that, you should keep the nGPUS in config equal to nproc_per_node , in your case, set both of them 1.
  2. checkpoint_lidar_rcnn_59.pth.tar is trained by frame = 1

@kamalasubha
Copy link
Author

@Lzc6996
Thanks for the inputs
I am facing following error with above config,

Traceback (most recent call last):
  File "tools/test.py", line 91, in <module>
    test(cfg, 0, valloader, model, device, cfg.TEST.TAT_PATH)
  File "/home/lidar/LiDAR_RCNN/src/LiDAR_RCNN/core/function.py", line 113, in test
    for idx, batch in enumerate(testloader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
    data.append(next(self.dataset_iter))
  File "/home/lidar/LiDAR_RCNN/src/LiDAR_RCNN/datasets/waymo/loader.py", line 87, in transform_test
    pcd_cur, pcd_pre, proposal, gt_box, gt_cls = load_data(it, self.frame)
  File "/home/lidar/LiDAR_RCNN/src/LiDAR_RCNN/datasets/waymo/data_utils.py", line 65, in load_data
    pcd_cur_ri1 = np.hstack([pcd_cur_ri1, pcd_add_ones_ri1])
  File "<__array_function__ internals>", line 6, in hstack
  File "/usr/local/lib/python3.6/dist-packages/numpy/core/shape_base.py", line 344, in hstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'tools/test.py', '--local_rank=0', '--cfg', 'config/lidar_rcnn.yaml', '--checkpoint', '/home/lidar/models/checkpoint_lidar_rcnn_59.pth.tar']' returned non-zero exit status 1.

I used val tfrecords from allpreprocessor.zip that shared via mail. Any clue on this?

@Lzc6996
Copy link
Collaborator

Lzc6996 commented Aug 31, 2022

@kamalasubha
I guess this because we update the code for multi-frame, but the val.tfrecord is generated by previous code.
Can you generate data by yourself following data_processer ?

@kamalasubha
Copy link
Author

Ok @Lzc6996. I will generate val tf record based on the link given. But, can you please explain which part of the code needs to be changed to support a single frame? Does any separate demo exist for the same?

@Lzc6996
Copy link
Collaborator

Lzc6996 commented Aug 31, 2022

@kamalasubha
I get another plan, you can use the old release version of our code. v0.1.1
Actually, I can't figure out why the two versions are incompatible without running it by myself.

@kamalasubha
Copy link
Author

Thanks @Lzc6996 . I will look into it

@kamalasubha
Copy link
Author

@Lzc6996 I am able to run with the older version. Thanks for the inputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants