# Uber Prize Starter Kit Python Utilities: Tutorial

To simplify interaction with the Docker-based simulation execution and evaluation, we've provided a set of Python utilities (located in the `/utilities` folder of the Starter-Kit repository).This notebook demonstrates how they may be used to accomplish the following tasks:

 - Starting a simulation or several simulations
 - Checking simulation completion
 - Retrieving the simulation score in a convenient Pandas `DataFrame` format.
 - Generating fake data 

*Note 1*: This notebook and its accompanying utilities are written in Python 3.5+. Please install the associated requirements using the provided `requirements.txt` in the root folder.

*Note 2*: It is assumed that this notebook is started from the `/examples` folder.

In [1]:
# Adding the module to the path for future import
import sys
import os
import docker
from pathlib import Path
# Note that the following is idempotent when this notebook is run from "/examples"
os.chdir('../utilities')
%load_ext autoreload
%autoreload 2

## Running a simulation: the `competition_executor` module

A `CompetitionContainerExecutor` object may be used to start, stop, and gather information about containers running simulations and/or completed simulation scores and stats.

In [13]:
from competition_executor import CompetitionContainerExecutor

# Current round uses "sioux faux" scenario
SCENARIO_NAME = "sioux_faux"

# Evaluation is on 15k sample, but for now we will use the 1k sample for demo purposes.
SAMPLE_SIZE = "1k"

path_input = (Path(Path.cwd()).parent / "submission-inputs").absolute()
path_output = (Path(Path.cwd()).parent / "output").absolute()

my_executor = CompetitionContainerExecutor(input_root=path_input,
                                  output_root=path_output)

# Note: Instantiating a CompetitionContainerExecutor with path arguments is not strictly necessary. Each simulation
# can be run with its own set of input and/or output arguments. However, if you prefer to designate
# a single directory for inputs and/or a single directory for outputs, then, for convenience, 
# you may pass those arguments in here and avoid re-entering them for each simulation run.

Run the simulation using the `run_simulation` method. For example:

In [16]:
# Note: the `submission name` for each run must be unique. However, this cell is designed to be run repeatedly. 
# If a simulation is currently in progress, it will be stopped and removed.
try:
    my_submission = my_executor.client.containers.get('my_submission')
    print("WARNING: The executor found a simulation named '{0}' (container ID '{1}') from a previous run.\n\
    - Stopping simulation and removing the old container.\n \
    - Creating new container and restarting simulator.\n \
    \n \
    ~~~Please wait a moment~~~".format(my_submission.name, my_submission.short_id))
    my_submission.stop()
    my_submission.remove()
except docker.errors.NotFound:
    print("Creating new container and starting simulator.\n \
    ~~~Please wait a moment~~~\n")
    
my_executor.run_simulation('my_submission', num_iterations=1, num_cpus=2, sample_size=SAMPLE_SIZE)

# Note that a dictionary indicating which simulation names already exist can be accessed.
for key, value in my_executor.containers.items():
    
    print("\nSuccessfully created new container and currently executing a simulation run\n \
    - Name: '{0}'\n \
    - Container ID : '{1}'\n \
    \n \
    ~~~Scores will be available when this run completes.~~~\n".format(value.name, value.id))

    - Stopping simulation and removing the old container.
     - Creating new container and restarting simulator.
     
     ~~~Please wait a moment~~~

Successfully created new container and currently executing a simulation run
     - Name: 'my_submission'
     - Container ID : 'c182a895e6'
     
     ~~~Scores will be available when this run completes.~~~



You may view the parameters of your simulation using the `output_simulation_parameters(simulation_name)` method on the executor. 

In [20]:
my_executor.output_simulation_parameters('my_submission')
print("~~~This simulation's parameters cannot be changed until this run completes.~~~\n")

Simulation parameters
         - Submission id: 'my_submission'
         - Timestamp: '2019-02-16_06-25-22'
         - Scenario Name: 'sioux_faux'
         - Number of Iterations: '1'
         - Sample Size: '1k'

~~~This simulation's parameters cannot be changed until this run completes.~~~



You may view (or save to file) the **debug logs** of your simulation using the `output_simulation_logs(simulation_name,file_name)` method on the executor. 

To follow simulation progress, the following cell may be run repeatedly. 

In [24]:
logs = my_executor.output_simulation_logs('my_submission')

# NOTE: While it may take a while to finish, as long as your execution environment meets 
# the minimum hardware requirements the  simulation is (most likely) not stuck! 

# However, if you notice that the logs do not change for 
# over 5 minutes, there is likely an error somewhere and you should stop the simulation (see below). 
# In this case, we would greatly appreciate it if you filed a bug report (issue) 
# together with your hardware environment (number of cpus, RAM, etc.) and a copy of the output 
# logs (zipped, preferably).

Dependency Service(s) Init Complete
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.2.6 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 7
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _

Submissions run for a certain amount of time. If you interrupt it before the end, you will not get any outputs. You can **check if a submission is finished** with the following method:

In [27]:
my_executor.check_if_submission_complete('my_submission')

True

When a simulation run is done, you can import its **results**:

In [28]:
scores, stats = my_executor.get_submission_scores_and_stats('my_submission')

In [29]:
scores['Raw Score']

Component Name
Level of service: costs and benefits                                         1.002339
Accessibility: Number of secondary locations accessible within 15 minutes    1.020872
Accessibility: Number of work locations accessible within 15 minutes         0.989149
Sustainability: Total PM 2.5 Emissions                                       1.011651
Level of service: average trip expenditure - secondary                       0.880546
Level of service: average on-demand ride wait times                          0.942077
Level of service: average bus crowding experienced                           1.049553
Congestion: total vehicle miles traveled                                     1.014462
Congestion: average vehicle delay per passenger trip                         0.900014
Level of service: average trip expenditure - work                            1.010162
Submission Score                                                                  NaN
Name: Raw Score, dtype: float64

The **scores** and **statistics** are stored in pandas DataFrames which contains the information described [here](https://github.com/vgolfier/Uber-Prize-Starter-Kit/blob/master/docs/Understanding_the_outputs_and_the%20scoring_function.md).

At the end of any run and to avoid any conflicts between submission ids (particularly if you want to reuse names), it is advised to **clean up containers** calling the following method:

In [30]:
my_executor.stop_all_simulations()

Stopping simulation:
 Submission_id: my_submission
	 Scenario name: sioux_faux
	 # iters: 1
	 sample size: 1k
Done.


## The `input_sampler` module:

Randomly sampled data may be used to either initialize the re-planning algorithm or otherwise test the simulation. The `input_sampler` has been provided towards this end. This subsection provides an example of how synthetic random input files may be generated for each of the available input policies to the simulation.

In [31]:
from input_sampler import *
# Specify the common string for the scenario name 
# (currently only "siouxfalls", which refers to the Sioux Falls scenario).
agency = 'sioux_faux_bus_lines'

# Set the paths appropriately
DIR = Path(__name__).absolute().parent.parent
sys.path.append(str(DIR))
DATA_DIR = DIR / 'reference-data/'

agency_dict = scenario_agencies(DATA_DIR,SCENARIO_NAME)
    # Create a lazy cache of GTFS data for the agency:
sf_gtfs_manager = AgencyGtfsDataManager(agency_dict[agency])

# Sample each input five (5) times (naming of functions is indicative of type of input sampled).
freq_df = sample_frequency_adjustment_input(5, sf_gtfs_manager)
mode_incentive_df = sample_mode_incentives_input(5)
vehicle_fleet_mix_df = sample_vehicle_fleet_mix_input(5, sf_gtfs_manager)

# Generated inputs may now be saved in, say, /submission-inputs or some other directory. 
# Remember the location of this directory when executing new simulations.

In [32]:
vehicle_fleet_mix_df

Unnamed: 0,agencyId,routeId,vehicleTypeId
0,217,1347,BUS-SMALL-HD
1,217,1340,BUS-STD-ART
2,217,1348,BUS-SMALL-HD
3,217,1342,BUS-STD-ART
4,217,1350,BUS-DEFAULT


In [33]:
mode_incentive_df

Unnamed: 0,mode,age,income,amount
0,ondemand_ride,(92:118],(168000:294000),18.7
1,ondemand_ride,[29:32),[104000:199000),20.0
2,walk_transit,[34:91],[98000:205000),1.4
3,walk_transit,(24:88),[38000:286000),6.5
4,drive_transit,(6:90],(37000:77000),19.4


In [34]:
freq_df

Unnamed: 0,trip_id,start_time,end_time,headway_secs,exact_times
0,t_75331_b_219_tn_1,29400,41880,4320,0
1,t_75326_b_219_tn_10,72540,83580,3540,0
2,t_72464_b_219_tn_8,44220,72120,5100,0
3,t_72463_b_219_tn_5,24840,80160,2820,0
4,t_75350_b_219_tn_6,13500,67320,3780,0
