# Uber Prize Starter Kit Python Utilities: Tutorial

To simplify interaction with the Docker-based simulation execution and evaluation, we've provided a set of Python utilities (located in the `/utilities` folder of the Starter-Kit repository). This notebook demonstrates how they may be used to accomplish the following tasks:

 - Starting a simulation or several simulations
 - Checking simulation completion
 - Retrieving the simulation score in a convenient Pandas `DataFrame` format.
 - Generating fake data 

*Note*: It is assumed that this notebook is started from the `/examples` folder.

In [1]:
# Adding the module to the path for future import
import sys
import os
import docker
from pathlib import Path
# Note that the following is idempotent when this notebook is run from "/examples"
os.chdir('../utilities')
%load_ext autoreload
%autoreload 2

## Running a simulation: the `competition_executor` module

A `CompetitionContainerExecutor` object may be used to start, stop, and gather information about containers running simulations and/or completed simulation scores and stats.

In [2]:
from competition_executor import CompetitionContainerExecutor

path_input = (Path(Path.cwd()).parent / "submission-inputs").absolute()
path_ouput = (Path(Path.cwd()).parent / "output").absolute()

my_executor = CompetitionContainerExecutor(input_root=path_input,
                                  output_root=path_ouput)

# Note: Instantiating a CompetitionContainerExecutor with path arguments is not strictly necessary. Each simulation
# can be run with its own set of input and/or output arguments. However, if you prefer to designate
# a single directory for inputs and/or a single directory for outputs, then, for convenience, 
# you may pass those arguments in here and avoid re-entering them for each simulation run.

Run the simulation using the `run_sumulation` method. For example:

In [3]:
# Note: the `submission_id`s for each run must be unique; however, this cell may be run idempotently
try:
    my_submission = my_executor.client.containers.get('my_submission')
    my_submission.stop()
    my_submission.remove()
    print("Found container from previous run, removing and creating new sim container...")
except docker.errors.NotFound:
    print("Creating new simulation container...")
    
my_executor.run_simulation('my_submission', num_iterations=1, num_cpus=72)

# Note that a dictionary indicating which containers IDs already exist can be accessed.
for key, value in my_executor.containers.items():
    print("Container ID : {0}".format(key))

Creating new simulation container...


APIError: 500 Server Error: Internal Server Error ("Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)")

You may view the **debug logs** of a specific simulation as follows:

In [4]:
logs = my_executor.output_simulation_logs('my_submission')

ValueError: Container not executed using this interface.

Submissions run for a certain amount of time. If you interrupt it before the end, you will not get any outputs. You can **check if a submission is finished** with the following method:

In [55]:
my_executor.check_if_submission_complete('my_submission')

True

When a simulation run is done, you can import its **results**:

In [56]:
scores, stats = my_executor.get_submission_scores_and_stats('my_submission')

The **scores** and **statistics** are stored in pandas DataFrames which contains the information described [here](https://github.com/vgolfier/Uber-Prize-Starter-Kit/blob/master/docs/Understanding_the_outputs_and_the%20scoring_function.md).

At the end of any run and to avoid any conflicts between submission ids (particularly if you want to reuse names), it is advised to **clean up containers** calling the following method:

In [57]:
my_executor.stop_all_simulations()

## The `input_sampler` module:

Randomly sampled data may be used to either initialize the re-planning algorithm or otherwise test the simulation. The `input_sampler` has been provided towards this end. This subsection provides an example of how synthetic random input files may be generated for each of the available input policies to the simulation.

In [59]:
from input_sampler import *
# Specify the common string for the scenario name 
# (currently only "siouxfalls", which refers to the Sioux Falls scenario).
SCENARIO_NAME = 'sioux_faux'
agency = 'sioux_faux_bus_lines'

# Set the paths appropriately
DIR = Path(__name__).absolute().parent.parent
sys.path.append(str(DIR))
DATA_DIR = DIR / 'reference-data/'

agency_dict = scenario_agencies(DATA_DIR,SCENARIO_NAME)
    # Create a lazy cache of GTFS data for the agency:
sf_gtfs_manager = AgencyGtfsDataManager(agency_dict[agency])

# Sample each input five (5) times (naming of functions is indicative of type of input sampled).
freq_df = sample_frequency_adjustment_input(5, sf_gtfs_manager)
mode_incentive_df = sample_mode_incentives_input(5)
vehicle_fleet_mix_df = sample_vehicle_fleet_mix_input(5, sf_gtfs_manager)

# Generated inputs may now be saved in, say, /submission-inputs or some other directory. 
# Remember the location of this directory when executing new simulations.

In [61]:
vehicle_fleet_mix_df

Unnamed: 0,agencyId,routeId,vehicleTypeId
0,217,1351,BUS-STD-HD
1,217,1340,BUS-DEFAULT
2,217,1342,BUS-SMALL-HD
3,217,1344,BUS-DEFAULT
4,217,1346,BUS-SMALL-HD


In [62]:
mode_incentive_df

Unnamed: 0,mode,age,income,amount
0,drive_transit,[15:70),[207000:260000),8.3
1,walk_transit,[28:62],[138000:259000],9.1
2,drive_transit,(7:23),[58000:60000],8.9
3,drive_transit,[7:89),[197000:239000),2.3
4,ondemand_ride,(18:69),[7000:174000),19.0


In [63]:
freq_df

Unnamed: 0,trip_id,start_time,end_time,headway_secs,exact_times
0,t_75367_b_219_tn_3,43260,66000,6000,0
1,t_75351_b_219_tn_3,9780,21000,3420,0
2,t_72465_b_219_tn_2,7080,38700,6720,0
3,t_75323_b_219_tn_1,47580,79200,7080,0
4,t_75356_b_219_tn_1,43740,81120,2760,0
