Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models

Abstract

Large Language Models (LLMs) are demonstrating outstanding potential for tasks such as text generation, summarization, and classification. Given that such models are trained on a humongous amount of online knowledge, we hypothesize that LLMs can assess whether driving scenarios generated by autonomous driving testing techniques are realistic, i.e., being aligned with real-world driving conditions. To test this hypothesis, we conducted an empirical evaluation to assess whether LLMs are effective and robust in performing the task. This reality check is an important step towards devising LLM-based autonomous driving testing techniques. For our empirical evaluation, we selected 64 realistic scenarios from DeepScenario--an open driving scenario dataset. Next, by introducing minor changes to them, we created 512 additional realistic scenarios while keeping them realistic, to form an overall dataset of 576 scenarios. With this dataset, we evaluated three LLMs (GPT-3.5, Llama2-13B and Mistral-7B) to assess their robustness in assessing the realism of driving scenarios. Our results demonstrate that: (1) Overall, GPT-3.5 achieved the highest robustness compared to Llama2-13B and Mistral-7B, consistently throughout almost all scenarios, roads, and weather conditions; (2) Mistral-7B performed the worst consistently; (3) Llama2-13B achieved good results under certain conditions but not for the others; and (4) roads and weather conditions do influence the robustness of the LLMs.

Setup

Python 3.8 or higher

pip install -r requirements.txt

Scenario Dataset

The folder deepscenario contains all scenario files involved.

Script Structure

Scenario Mutation

mutate_scenarios.py contains functions related to scenario mutation.

LLM API

llm_api.py is used to generate prompts and call LLM API to output the answers.

In this script, api_key needs to be added manually:

client = OpenAI(api_key="")
...
fireworks.client.api_key = ""

api_key for OpenAI: OpenAI

api_key for fireworks.ai: fireworks.ai

Result Analysis

parse_results.py involves all program implementations associated with the analysis of results.

LLM Outputs & Results

The folder outputs_results contains all LLM output and analysis result files.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
deepscenario/randomly_selected_scenarios		deepscenario/randomly_selected_scenarios
outputs_results		outputs_results
README.md		README.md
llm_api.py		llm_api.py
mutate_scenarios.py		mutate_scenarios.py
parse_results.py		parse_results.py
requirements.txt		requirements.txt
scenario-toolset.tar.gz		scenario-toolset.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepscenario/randomly_selected_scenarios

deepscenario/randomly_selected_scenarios

outputs_results

outputs_results

README.md

README.md

llm_api.py

llm_api.py

mutate_scenarios.py

mutate_scenarios.py

parse_results.py

parse_results.py

requirements.txt

requirements.txt

scenario-toolset.tar.gz

scenario-toolset.tar.gz

Repository files navigation

Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models

Abstract

Setup

Scenario Dataset

Script Structure

Scenario Mutation

LLM API

Result Analysis

LLM Outputs & Results

About

Releases

Packages

Languages

Simula-COMPLEX/RealityBites

Folders and files

Latest commit

History

Repository files navigation

Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models

Abstract

Setup

Scenario Dataset

Script Structure

Scenario Mutation

LLM API

Result Analysis

LLM Outputs & Results

About

Resources

Stars

Watchers

Forks

Languages