Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation failed #200

Closed
Oliverwang11 opened this issue Jan 26, 2024 · 10 comments
Closed

Segmentation failed #200

Oliverwang11 opened this issue Jan 26, 2024 · 10 comments

Comments

@Oliverwang11
Copy link

Oliverwang11 commented Jan 26, 2024

Hi I try to evaluate the transfuser based agent using./leaderboard/scripts/local_evaluation.sh /home/<usrname>/Desktop/transfuser/carla /home/<username>/Desktop/transfuser on Ubuntu 22.04.3.

But with some errors pop out

/home/<usrname>/Desktop/transfuser/leaderboard/leaderboard/leaderboard_evaluator_local.py:89: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(dist.version) < LooseVersion('0.9.10'):
./leaderboard/scripts/local_evaluation.sh: line 32: 20110 Segmentation fault (core dumped) python3 ${LEADERBOARD_ROOT}/leaderboard/leaderboard_evaluator_local.py --scenarios=${SCENARIOS} --routes=${ROUTES} --repetitions=${REPETITIONS} --track=${CHALLENGE_TRACK_CODENAME} --checkpoint=${CHECKPOINT_ENDPOINT} --agent=${TEAM_AGENT} --agent-config=${TEAM_CONFIG} --debug=${DEBUG_CHALLENGE} --resume=${RESUME}

anyone has some idea?

Thanks

@Kait0
Copy link
Collaborator

Kait0 commented Jan 26, 2024

Segmentation faults are hard to analyse. I would suggest you use a debugger or print statements to find the line of code that crashes, than we can help you better.

@Oliverwang11
Copy link
Author

Thanks I will try!

@Oliverwang11
Copy link
Author

Oliverwang11 commented Jan 26, 2024

It seems like the crash is in self.module_agent = importlib.import_module(module_name) when importing the module_name which is submission_agent.
BTW I run the evaluation in my own computer with a GTX3060 GPU-6GB and 16 GB RAM

@Kait0
Copy link
Collaborator

Kait0 commented Jan 28, 2024

hm that is strange line is just trying to import the agent py file.
Are you using the conda environment from this repository?
WORK_DIR is the work dir variable in the script correct (e.g. does module_name point to the correct file?)
Maybe some second order import problem. If the submission_agent file is executed can you check how far it gets?

@Oliverwang11
Copy link
Author

Hi thanks for your reply, the work dir seems fine.
I went deep into the submission_agent.py file it seems like the code crashed at from model import LidarCenterNet when trying to load the LidarCenterNet

@Kait0
Copy link
Collaborator

Kait0 commented Jan 28, 2024

Failing somewhere specific within LidarCenterNet?

You can try commenting this line. It sometimes makes problems since it depends on an external cuda lib. Its optional for an ablation so its fine to turn it off.

from point_pillar import PointPillarNet

@Oliverwang11
Copy link
Author

Oliverwang11 commented Jan 28, 2024

Aha after I comment this line from point_pillar import PointPillarNet the crash disappear, but there is another problem pop up
seems like the town map is loaded but the ego car haven't shown.

./leaderboard/scripts/local_evaluation.sh /home/oliverwang/Desktop/transfuser/carla /home/oliverwang/Desktop/transfuser
/home/oliverwang/Desktop/transfuser/leaderboard/leaderboard/leaderboard_evaluator_local.py:89: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(dist.version) < LooseVersion('0.9.10'):
/home/oliverwang/Desktop/transfuser
-----submission_agen line 209
-----submission_agen line 16
-----submission_agen line 18
-----submission_agen line 20
-----submission_agen line 21
-----submission_agen line 209
-----submission_agen line 209
Registering the global statistics

Do you have any clue? Thanks!

@Kait0
Copy link
Collaborator

Kait0 commented Jan 28, 2024

"-----submission_agen line" I suppose these are your debug prints.
I don't see an error here. these are just warnings and prints.
It might be that you need to delete the results.json file, because the code thinks it already finished all routes.

@SY-LG
Copy link

SY-LG commented Jan 31, 2024

I came accross this segmentation days before. Check if your cuda and pytorch stuff versions matches.

d at from model import LidarCenterNet when trying to load the LidarCenterNet

To check if you are facing the same situation as mine, you can trace even further, and eventually it would turn out that the segmentation fault take place somewhere irrelevant with transfuser but relevant with pytorch stuff

@SY-LG
Copy link

SY-LG commented Jan 31, 2024

BTW, the version relationships are kind of ambiguous for pytorch and mmcv stuffs, sometimes even need to test it yourself.
You can try this setup, it works fine for me.

  • ubuntu 22.04
  • nvidia-smi:12.0
  • sudo apt install -cuda-toolkit:10.1
  • torch: 1.8.1+cu101
  • torch-scatter: 2.0.7
  • torchaudio: 0.8.1
  • torchvision: 0.9.1
  • mmcv-full: 1.71
  • timm: 0.3.2
  • mmdet: 2.26.0
  • py-trees: 0.8.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants