# Benchmarking Behavior Planners in BARK

This notebook shows the benchmarking workflow of BARK.

Systematically benchmarking behavior consists of
1. A reproducable set of scenarios (we call it **BenchmarkDatabase**)
2. Metrics, which you use to study the performance (we call it **Evaluators**)
3. The behavior model(s) under test

Our **BenchmarkRunner** can then run the benchmark and produce the results.

In [1]:
%matplotlib tk

import os
import matplotlib.pyplot as plt
from IPython.display import Video

from benchmark_database.load.benchmark_database import BenchmarkDatabase
from benchmark_database.serialization.database_serializer import DatabaseSerializer
from bark.benchmark.benchmark_runner import BenchmarkRunner, BenchmarkConfig, BenchmarkResult
from bark.benchmark.benchmark_analyzer import BenchmarkAnalyzer

from bark.runtime.commons.parameters import ParameterServer

from bark.runtime.viewer.matplotlib_viewer import MPViewer
from bark.runtime.viewer.video_renderer import VideoRenderer


from bark.core.models.behavior import BehaviorIDMClassic, BehaviorConstantAcceleration
from bark.core.models.behavior import BehaviorModel

pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html


# Database
The benchmark database provides a reproducable set of scenarios.
A scenario get's created by a ScenarioGenerator (we have a couple of them). The scenarios are serialized into binary files and packed together with the map file and the parameter files into a `.zip`-archive. We call this zipped archive a relase, which can be published at Github, or processed locally.

## We will first start with the DatabaseSerializer

The **DatabaseSerializer** recursively serializes all scenario param files sets
 within a folder.
 
We will process the database directory from Github.

In [2]:
dbs = DatabaseSerializer(test_scenarios=1, test_world_steps=5, num_serialize_scenarios=10)
dbs.process("../../../benchmark_database/data/tutorial_database")
local_release_filename = dbs.release(version="tutorial")

print('Filename:', local_release_filename)

<bark.runtime.commons.parameters.ParameterServer object at 0x7fd46dae6710>
maps/city_highway_straight.xodr


INFO:root:Testing scenario_set1 with seed 2000 from generator ConfigurableScenarioGeneration
INFO:root:Running scenario 0 of 10 in set scenario_set1
INFO:root:The following list of files will be released:
INFO:root:/maps/city_highway_straight.xodr/city_highway_straight.xodr
INFO:root:/scenario_sets/highway_merging/set_info_scenario_set1/set_info_scenario_set1
INFO:root:/scenario_sets/highway_merging/scenario_set1_scenarios10_seed2000.bark_scenarios/scenario_set1_scenarios10_seed2000.bark_scenarios
INFO:root:/scenario_sets/highway_merging/scenario_set1.json/scenario_set1.json
INFO:root:Packed release file /home/esterle/.cache/bazel/_bazel_esterle/d337abac8c371120c1b9affa1049fa7e/execroot/bark_project/bazel-out/k8-fastbuild/bin/docs/tutorials/run.runfiles/benchmark_database/data/benchmark_database_tutorial.zip
INFO:root:Assuming local release as you did not provide a github token.


Filename: ../../../benchmark_database/data/benchmark_database_tutorial.zip


Then reload to test correct parsing

In [3]:
db = BenchmarkDatabase(database_root=local_release_filename)
scenario_generation, _, _ = db.get_scenario_generator(scenario_set_id=0)

for scenario_generation, _, _ in db:
  print('Scenario: ', scenario_generation)

INFO:root:extracting zipped-database ../../../benchmark_database/data/benchmark_database_tutorial.zip to temporary directory ../tmp/bark_extracted_databases/5b5eb9c9-98cf-44ad-bf70-c5c64b6f0200
INFO:root:Found info dict set_info_scenario_set1
INFO:root:The following scenario sets are available
INFO:root:
                    GeneratorName  NumScenarios                                            Params    Seed                                         Serialized        SetName   SetParameters
0  ConfigurableScenarioGeneration          10.0  scenario_sets/highway_merging/scenario_set1.json  2000.0  scenario_sets/highway_merging/scenario_set1_sc...  scenario_set1  {'Test1': 200}


Scenario:  <bark.runtime.scenario.scenario_generation.scenario_generation.ScenarioGeneration object at 0x7fd46ca43b90>


## Evaluators

Evaluators allow to calculate a boolean, integer or real-valued metric based on the current simulation world state.

The current evaluators available in BARK are:
- StepCount: returns the step count the scenario is at.
- GoalReached: checks if a controlled agent’s Goal Definitionis satisfied.
- DrivableArea: checks whether the agent is inside its RoadCorridor.
- Collision(ControlledAgent): checks whether any agent or only the currently controlled agent collided
- LTLEvaluator: checking traffic rules based on arbitrary LTL formulas

Let's now map those evaluators to some symbols, that are easier to interpret.

In [4]:
evaluators = {"success" : "EvaluatorGoalReached", \
              "collision" : "EvaluatorCollisionEgoAgent", \
              "max_steps": "EvaluatorStepCount"}

We will now define the terminal conditions of our benchmark. We state that a scenario ends, if
- a collision occured
- the number of time steps exceeds the limit
- the definition of success becomes true (which we defined to reaching the goal, using EvaluatorGoalReached)

In [5]:
terminal_when = {"collision" :lambda x: x, \
                 "max_steps": lambda x : x>50, \
                 "success" : lambda x: x}

# Behaviors Under Test
Let's now define the Behaviors we want to compare. We will compare IDM with Constant Velocity, but we could also compare two different parameter sets for IDM. 

In [13]:
params = ParameterServer()

#params["BehaviorIDMClassic"]["AccelerationLowerBound"] = -0.1

behaviors_tested = {"IDM": BehaviorIDMClassic(params), \
                    "Const" : BehaviorConstantAcceleration(params)}

# Benchmark Runner

The BenchmarkRunner allows to evaluate behavior models with different parameter configurations over the entire benchmarking database. 

Technically, the benchmark runner will run all configs.

A config is basically a simulation run, where step size, controlled agent, terminal conditions and metrics have been defined.

In [14]:
benchmark_runner = BenchmarkRunner(benchmark_database=db,\
                                   evaluators=evaluators,\
                                   terminal_when=terminal_when,\
                                   behaviors=behaviors_tested,\
                                   log_eval_avg_every=10)

result = benchmark_runner.run(maintain_history=True)

INFO:BenchmarkRunner:Total number of 20 configs to run
INFO:BenchmarkRunner:Running config idx 0 being 0/19: Scenario 0 of set "scenario_set1" for behavior "IDM"
INFO:BenchmarkRunner:Running config idx 1 being 1/19: Scenario 1 of set "scenario_set1" for behavior "IDM"
INFO:BenchmarkRunner:Running config idx 2 being 2/19: Scenario 2 of set "scenario_set1" for behavior "IDM"
INFO:BenchmarkRunner:Running config idx 3 being 3/19: Scenario 3 of set "scenario_set1" for behavior "IDM"
INFO:BenchmarkRunner:Running config idx 4 being 4/19: Scenario 4 of set "scenario_set1" for behavior "IDM"
INFO:BenchmarkRunner:Running config idx 5 being 5/19: Scenario 5 of set "scenario_set1" for behavior "IDM"
INFO:BenchmarkRunner:Running config idx 6 being 6/19: Scenario 6 of set "scenario_set1" for behavior "IDM"
INFO:BenchmarkRunner:Running config idx 7 being 7/19: Scenario 7 of set "scenario_set1" for behavior "IDM"
INFO:BenchmarkRunner:Running config idx 8 being 8/19: Scenario 8 of set "scenario_set1" f

We will now dump the files, to allow them to be postprocessed later.

In [15]:
result.dump(os.path.join("./benchmark_results.zip"), dump_configs=True, dump_histories=True)


INFO:root:Saved BenchmarkResult to /home/esterle/.cache/bazel/_bazel_esterle/d337abac8c371120c1b9affa1049fa7e/execroot/bark_project/bazel-out/k8-fastbuild/bin/docs/tutorials/run.runfiles/bark_project/docs/tutorials/benchmark_results.zip


# Benchmark Results

Benchmark results contain
- the evaluated metrics of each simulation run, as a Panda Dataframe
- the world state of every simulation (optional)

In [16]:
result_loaded = BenchmarkResult.load(os.path.join("./benchmark_results.zip"), load_configs=True, load_histories=True)

We will now first analyze the dataframe.

In [17]:
df = result_loaded.get_data_frame()

df.head()

Unnamed: 0,Terminal,Test1,behavior,collision,config_idx,max_steps,scen_idx,scen_set,step,success
0,[collision],200,IDM,True,0,6,0,scenario_set1,5,False
1,[collision],200,IDM,True,1,6,1,scenario_set1,5,False
2,[success],200,IDM,False,2,35,2,scenario_set1,34,True
3,[collision],200,IDM,True,3,11,3,scenario_set1,10,False
4,[collision],200,IDM,True,4,10,4,scenario_set1,9,False


# Benchmark Analyzer

The benchmark analyzer allows to filter the results to visualize what really happened. These filters can be set via a dictionary with lambda functions specifying the evaluation criteria which must be fullfilled.

Let us first load the results into the BenchmarkAnalyzer and then filter the results.

In [18]:
analyzer = BenchmarkAnalyzer(benchmark_result=result_loaded)


configs_idm = analyzer.find_configs(criteria={"behavior": lambda x: x=="IDM", "collision": lambda x : x})
configs_const = analyzer.find_configs(criteria={"behavior": lambda x: x=="Const", "collision": lambda x : x})

In [19]:
configs_idm

[0, 1, 3, 4, 5, 6, 7, 8]

We will now create a video from them. We will use Matplotlib Viewer and render everything to a video.

In [20]:
sim_step_time=0.2

params2 = ParameterServer()

fig = plt.figure(figsize=[10, 10])
viewer = MPViewer(params=params2, y_length = 80, enforce_y_length=True, enforce_x_length=False,\
                  follow_agent_id=True, axis=fig.gca())
video_exporter = VideoRenderer(renderer=viewer, world_step_time=sim_step_time)

analyzer.visualize(viewer = video_exporter, real_time_factor = 1, configs_idx_list=configs_idm[1:3], \
                  fontsize=6)
                   
video_exporter.export_video(filename="/tmp/tutorial_video")
