Randomness of Evaluation #8

coderlemon17 · 2024-03-04T08:00:33Z

Hi, thanks for providing the code. However, when I evaluate the agent's performance on the town05-long-benchmark, there is some randomness in the evaluation results, even with a fixed seed.

After checking the visualization results, I believe some of the randomness comes from the different behaviors of the NPC vehicles, but I'm not sure how this will happen with a fixed random seed. Am I doing something wrong or it's just normal? Any help will be appreciated!

jiaxiaosong1002 · 2024-03-05T08:15:25Z

@coderlemon17 Yes. Exacty, this has been discussed widely in the community about the randomness. It can not be controlled.

coderlemon17 · 2024-03-06T14:09:20Z

@jiaxiaosong1002 Thanks for your reply. If this randomness does exist, how many evaluations will you conduct to measure the model's performance under one seed? And will this randomness affect the model's final performance a lot?

jiaxiaosong1002 · 2024-03-14T08:22:22Z

@coderlemon17 Hi, 3 runs are generally used. Yes, it could be and thus we need results on multiple benchmarks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomness of Evaluation #8

Randomness of Evaluation #8

coderlemon17 commented Mar 4, 2024

jiaxiaosong1002 commented Mar 5, 2024

coderlemon17 commented Mar 6, 2024

jiaxiaosong1002 commented Mar 14, 2024

Randomness of Evaluation #8

Randomness of Evaluation #8

Comments

coderlemon17 commented Mar 4, 2024

jiaxiaosong1002 commented Mar 5, 2024

coderlemon17 commented Mar 6, 2024

jiaxiaosong1002 commented Mar 14, 2024