# Transit Boardings Report for specified list of routes

This notebook generates a single (logical) report, consisting of five physical reports (CSV files)
on the total boardings for a __user-specified list__ of transit routes under a __user-specified scenario__.
The individual reports generated are for
1. total boardings during the AM period
2. total boardings during the MD period
3. total boardings during the PM period
4. total boardings during the NT period
5. toal boardings during the entire day (sum of the above)

This notebook does not produce visualization of the report results.
Use the __TBD1__ notebook for this purpose.

To generate a report comparing the total boardings for a specified list of transit routes 
under __two__ scenarios:
* Run this notebook for the first scenario, capturing the output in a specified set of CSV files
* Run this notebook for the _second_ scenario, storing the output in a _second_ set of CSV files.
* Run the notebook __TBD2__ to generate a comparative report of the two scenarios.

### Organization of this notebook
1. Import required packages
2. Read config.py file to get paths to input and output directories
3. User inputs
  1. Specify scenario to run
  2. Specify list of routes for which to generate report (specified in input CSV file)
  3. Specify 'base' of output report file names
  4. Specify aggregation method: aggregate by route or by meta-mode
4. Logic of the workbook _per se_
  1. _calculate_total_daily_boardings_ - helper function for _import_transit_assignment_
  2. _import_transit_assignment_ - reads all *ONO* CSV files for a time-of-day and sums them in a dataframe
  3. _mode_to_metamode_ - maps TransCAD 'mode' to human-comprehensible 'meta-mode'; to be moved into _modxlib_.
  4. _set_up_metamode_table_ - creates route-to-mode-to-metamode mapping table for all specified routes
  5. _join_and_agg_ - aggregate results of running _import_transit_assignment_ either by route or meta-mode

In [None]:
# Import required packages
import pandas as pd
import numpy as np
import os
import glob
from functools import reduce
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from plotly import tools

###  User input required: Specity (and run) config.py file - get names of input and output directories

In [None]:
%run "S:/jupyter_notebooks/config.py"

### User input required: Specify scenario to run, referencing relevant variable in config.py file

In [None]:
scenario = base_scenario_dir

In [None]:
# The contents of this cell are vestigial - kept for the moment for reference only.
#
# Reference data: CSV file containing list of _ALL_ transit routes:
# *** This is JUST reference data - it's not used
# all_transit_routes_csv_fn = \
# r'G:\Regional_Modeling\1A_Archives\LRTP_2018\2016 Scen 00_08March2019_MoDXoutputs\Databases\Statewide_Routes_2018S.csv'

### User input required: supply name of CSV file with list of routes on which to report

In [None]:
# The list of routes for which to generate a report is specified in the following input CSV file:
#
# The path if 'G:' is mounted as CTPS's Google Drive
# routes_csv_fn = 'G:/Shared drives/TMD_TSA/Data/MoDX/DataStore/transit info.csv'
#
# The path if 'G:' is mounted to //lilliput/groups
routes_csv_fn = 'G:/Data_Resources/DataStore//transit info.csv'

### User input required: Specify 'base' of name of output CSV files

In [None]:
output_files_base_name = 'my_report'

### User input required: Specify aggregation mode for report: aggregate by route or by meta-mode

In [None]:
# aggregation_mode can either by 'ROUTE' or 'metaMode'; it defaults to 'metaMode'.
aggregation_mode = 'metaMode'

In [None]:
# calculate_total_daily_boardings: Calculate the daily total across all time periods.

#    This calculation requires a bit of subtelty, because the number of rows in the four
#    data frames produced by produced in the calling function is NOT necessarily the same. 
#    A brute-force apporach will not work, generally speaking.
#    See comments in the code below for details.
#
# NOTE: This is a helper function for import_transit_assignment (q.v.)
#   
# Parameter: boardings_by_tod - a dict with the keys 'AM', 'MD', 'PM', and 'NT'
#            for which the value of each key is a data frame containing the total
#            boardings for the list of routes specified in the input CSV file.
#
# Return value: The input dict (boardings_by_tod) with an additional key 'daily'
#               the value of which is a dataframe with the total daily boardings
#               for all routes specified in the input CSV across all 4 time periods.
#
def calculate_total_daily_boardings(boardings_by_tod):
    am_results = boardings_by_tod['AM']
    md_results = boardings_by_tod['MD']
    pm_results = boardings_by_tod['PM']
    nt_results = boardings_by_tod['NT']
    
    # Compute the daily sums.
    #
    # Step 1: Join 'am' and 'md' dataframes
    j1 = pd.merge(am_results, md_results, on=['ROUTE', 'STOP'], how='outer', suffixes=('_am', '_md'))
    # Step 1.1 Replace NaN's with 0's
    j1 = j1.fillna(0)

    # Step 1.2 Compute the 'AM' + 'MD' sums
    j1['DirectTransferOff'] = j1['DirectTransferOff_am'] + j1['DirectTransferOff_md']
    j1['DirectTransferOn'] = j1['DirectTransferOn_am'] + j1['DirectTransferOn_md']
    j1['DriveAccessOn'] = j1['DriveAccessOn_am'] + j1['DriveAccessOn_md']
    j1['EgressOff'] = j1['EgressOff_am'] + j1['EgressOff_md']
    j1['Off'] = j1['Off_am'] + j1['Off_md']
    j1['On'] = j1['On_am'] + j1['On_md']
    j1['WalkAccessOn'] = j1['WalkAccessOn_am'] + j1['WalkAccessOn_md'] 
    j1['WalkTransferOff'] = j1['WalkTransferOff_am'] + j1['WalkTransferOff_md']
    j1['WalkTransferOn'] = j1['WalkTransferOn_am'] + j1['WalkTransferOn_md']

    # Step 1.3: Drop un-needed columns
    cols_to_drop = ['DirectTransferOff_am', 'DirectTransferOff_md',
                    'DirectTransferOn_am', 'DirectTransferOn_md',
                    'DriveAccessOn_am', 'DriveAccessOn_md',
                    'EgressOff_am','EgressOff_md',
                    'Off_am', 'Off_md',
                    'On_am', 'On_md',
                    'WalkAccessOn_am', 'WalkAccessOn_md',
                    'WalkTransferOff_am', 'WalkTransferOff_md',
                    'WalkTransferOn_am', 'WalkTransferOn_md'
                    ]
    j1 = j1.drop(columns=cols_to_drop)

    # Step 2: j2 - join 'pm' and 'nt' data frames
    j2 = pd.merge(pm_results, nt_results, on=['ROUTE', 'STOP'], how='outer', suffixes=('_pm', '_nt'))
    # Step 2.1: Replace NaN's with 0's
    j2 = j2.fillna(0)

    # Step 2.2: Compute the 'PM' + 'NT' sums
    j2['DirectTransferOff'] = j2['DirectTransferOff_pm'] + j2['DirectTransferOff_nt']
    j2['DirectTransferOn'] = j2['DirectTransferOn_pm'] + j2['DirectTransferOn_nt']
    j2['DriveAccessOn'] = j2['DriveAccessOn_pm'] + j2['DriveAccessOn_nt']
    j2['EgressOff'] = j2['EgressOff_pm'] + j2['EgressOff_nt']
    j2['Off'] = j2['Off_pm'] + j2['Off_nt']
    j2['On'] = j2['On_pm'] + j2['On_nt']
    j2['WalkAccessOn'] = j2['WalkAccessOn_pm'] + j2['WalkAccessOn_nt'] 
    j2['WalkTransferOff'] = j2['WalkTransferOff_pm'] + j2['WalkTransferOff_nt']
    j2['WalkTransferOn'] = j2['WalkTransferOn_pm'] + j2['WalkTransferOn_nt']

    # Step 2.3: Drop un-needed columns
    cols_to_drop = ['DirectTransferOff_pm', 'DirectTransferOff_nt',
                    'DirectTransferOn_pm', 'DirectTransferOn_nt',
                    'DriveAccessOn_pm', 'DriveAccessOn_nt',
                    'EgressOff_pm','EgressOff_nt',
                    'Off_pm', 'Off_nt',
                    'On_pm', 'On_nt',
                    'WalkAccessOn_pm', 'WalkAccessOn_nt',
                    'WalkTransferOff_pm', 'WalkTransferOff_nt',
                    'WalkTransferOn_pm', 'WalkTransferOn_nt'
                    ]
    j2 = j2.drop(columns=cols_to_drop)

    # Step 3: Join "j1" and "j2" to produce a dataframe with the daily totals
    daily_df = pd.merge(j1, j2, on=['ROUTE', 'STOP'], how='outer', suffixes=('_j1', '_j2'))
    # Step 3.1 : Replace any NaN's with 0's. This line _shouldn't_ be needed - just being extra cautious.
    daily_df = daily_df.fillna(0)

    # Step 3.2 : Compute THE daily sums
    daily_df['DirectTransferOff'] = daily_df['DirectTransferOff_j1'] + daily_df['DirectTransferOff_j2']
    daily_df['DirectTransferOn'] = daily_df['DirectTransferOn_j1'] + daily_df['DirectTransferOn_j2']
    daily_df['DriveAccessOn'] = daily_df['DriveAccessOn_j1'] + daily_df['DriveAccessOn_j2']
    daily_df['EgressOff'] = daily_df['EgressOff_j1'] + daily_df['EgressOff_j2']
    daily_df['Off'] = daily_df['Off_j1'] + daily_df['Off_j2']
    daily_df['On'] = daily_df['On_j1'] + daily_df['On_j2']
    daily_df['WalkAccessOn'] = daily_df['WalkAccessOn_j1'] + daily_df['WalkAccessOn_j2'] 
    daily_df['WalkTransferOff'] = daily_df['WalkTransferOff_j1'] + daily_df['WalkTransferOff_j2']
    daily_df['WalkTransferOn'] = daily_df['WalkTransferOn_j1'] + daily_df['WalkTransferOn_j2']

    # Step 3.3 : Drop un-needed columns
    cols_to_drop = ['DirectTransferOff_j1', 'DirectTransferOff_j2',
                    'DirectTransferOn_j1', 'DirectTransferOn_j2',
                    'DriveAccessOn_j1', 'DriveAccessOn_j2',
                    'EgressOff_j1','EgressOff_j2',
                    'Off_j1', 'Off_j2',
                    'On_j1', 'On_j2',
                    'WalkAccessOn_j1', 'WalkAccessOn_j2',
                    'WalkTransferOff_j1', 'WalkTransferOff_j2',
                    'WalkTransferOn_j1', 'WalkTransferOn_j2'
                    ]
    daily_df = daily_df.drop(columns=cols_to_drop)

    # Finally, we've got the 'daily' total dataframe!
    boardings_by_tod['daily'] = daily_df
    return boardings_by_tod
# end_def calculate_total_daily_boardings()

In [None]:
# import_transit_assignment: Import transit assignment result CSV files for a given scenario.
#
# 1. Read all CSV files for each time period ('tod'), and caclculate the sums for each time period.
#    Step 1 can be performed as a brute-force sum across all columns, since the number of rows in
#    the CSVs (and thus the dataframes) for any given time period are all the same.
#
# 2. Calculate the daily total across all time periods.
#    Step 2 requires a bit of subtelty, because the number of rows in the data frames produced in 
#    Step 1 is NOT necessarily the same. A brute-force apporach will not work, generally speaking.
#    See comments in the code below for details.
#    NOTE: This step is performed by the helper function calculate_total_daily_boardings.
#
# 3. Return value: a dict of the form:
#    {'AM'    : dataframe with totals for the AM period,
#     'MD'    : datafrme with totals for the MD period,
#     'PM'    : dataframe with totals for the PM period,
#     'NT'    : dataframe with totals for the NT period,
#     'daily' : dataframe with totals for the entire day
#   }
# 
def import_transit_assignment(scenario):
    base = scenario + r'out/'
    tods = ["AM", "MD", "PM", "NT"]
    # At the end of execution of this function, the dictionary variable'TODsums' will contain all the TOD summed results:
    # one key-value-pair for each 'tod' AND the 'daily' total as well.
    
    # The dict 'TODsums' is the return value of this function.
    TODsums = { 'AM' : None, 'MD' : None, 'PM' : None, 'NT' : None }

    # Import CSV files and create sum tables for each T-O-D (a.k.a. 'time period').
    for tod in tods:
        # Get full paths to _all_ CSV files for the current t-o-d 
        x = tod + '/' 
        fq_csv_fns = glob.glob(os.path.join(base,x,r'*.csv'))
        # 'tablist' : List of all the dataframes created from reading in the all the CSV files for the current t-o-d
        tablist = []
        for csv_file in fq_csv_fns:
            # Read CSV file into dataframe, set indices, and append to 'tablist'
            tablist.append(pd.read_csv(csv_file).set_index(['ROUTE','STOP']))
        #
        
        # Filter dataframe to include rows where 'ROUTE' is one of those selected to report on
        for t in range(len(tablist)):
            tablist[t] = tablist[t][tablist[t].index.get_level_values('ROUTE').isin(route_list)]
        #
        
        # Sum the tables for the current TOD
        TODsums[tod] = reduce(lambda a, b: a.add(b, fill_value=0), tablist)
    # end_for over all tod's

    TODsums =  calculate_total_daily_boardings(TODsums)
    
    # Ensure that the ROUTE and STOP columns of each dataframe in TODsums aren't indices.
    for k in TODsums.keys():
        TODsums[k] = TODsums[k].reset_index()
    #
    return TODsums
# end_def import_transit_assignment()

In [None]:
# Define data structure and function to map a TransCAD 'Mode' to the corresponding 'Meta-mode'
_mode_to_metamode_mapping_table = {
    1:  'MBTA_Bus',
    2:  'MBTA_Bus',
    3:  'MBTA_Bus' ,
    4:  'Light_Rail',
    5:  'Heavy_Rail',
    6:  'Heavy_Rail',
    7:  'Heavy_Rail',
    8:  'Heavy_Rail',
    9:  'Commuter_Rail',
    10: 'Ferry',
    11: 'Ferry',
    12: 'Light_Rail',
    13: 'Light_Rail',
    14: 'Shuttle_Express',
    15: 'Shuttle_Express',
    16: 'Shuttle_Express',
    17: 'RTA',
    18: 'RTA',
    19: 'RTA',
    20: 'RTA',
    21: 'RTA',
    22: 'RTA',
    23: 'Private',
    24: 'Private',
    25: 'Private',
    26: 'Private',
    27: 'Private',
    28: 'Private',
    29: 'Private',
    30: 'Private',
    31: 'Private',
    32: 'Commuter_Rail',
    33: 'Commuter_Rail',
    34: 'Commuter_Rail',
    35: 'Commuter_Rail',
    36: 'Commuter_Rail',
    37: 'Commuter_Rail',
    38: 'Commuter_Rail',
    39: 'Commuter_Rail',
    40: 'Commuter_Rail',
    41: 'RTA',
    42: 'RTA',
    43: 'RTA',
    70: 'Walk' }

def mode_to_metamode(mode):
    retval = 'None'
    if mode in _mode_to_metamode_mapping_table:
        return _mode_to_metamode_mapping_table[mode]
    # end_if
    return retval

In [None]:
def set_up_metamode_table(scenario):
    routemode = pd.read_csv(scenario + r'Databases/Statewide_Routes_2018S.csv', 
                            usecols=["Routes_ID", "Mode"]).drop_duplicates()
    routemode['metaMode'] = routemode.apply(lambda x: mode_to_metamode(x['Mode']), axis=1)
    return routemode

In [None]:
def join_and_aggregate(TODSums, agg_mode):
    for x in TODsums.keys():
        # In 'routefile' the TransCAD route ID is found in the Route_ID field;
        # In the TODsums dictionaries (generated from the '*ONO*' CSV files),
        # the TransCAD route ID is in the ROUTE field.
        TODsums[x] = routes_df.merge(TODsums[x], how='outer', left_on='Route_ID', right_on='ROUTE')
        # The following statement generates a route name that is intelligible by mere mortals
        # from the funky MBTA route name (in the 'Route_Name' field of the routes_df dataframe.)
        # We store this in a new column ('ROUTE_TEXT') rather than overwriting the contents of the 'ROUTE' column.
        TODsums[x]['ROUTE_TEXT'] = TODsums[x]['Route_Name'].str.split('.:()').str[0]
        # Join each 'TOD' dataframe to the route-mode-metamode mapping table
        TODsums[x] = routemode.merge(TODsums[x], how='right', left_on='Routes_ID', right_on='Route_ID')
        # *** The following line, from the original version of this notebook is applicable when
        #     the user has _NOT_ specified a list of routes to report on. (This implicitly means
        #     that a report should be generated for all routes listed in the *ONO* CSV files.)
        #     This code is being kept here, but commented out, for reference purposes only.
        #     -- BK 10/08/2021
        # else:
        #    TODsums[x] = routemode.merge(TODsums[x], how='right', left_on='Routes_ID', right_on='ROUTE')
        #
        # Sum all On/Off fields by the specified aggregation mode.
        # Note: If the user has specified aggregation by route (aggregation_mode == 'ROUTE'),
        #       be sure to actually aggregate on the ROUTE_TEXT field.
        agg_mode = 'ROUTE_TEXT' if agg_mode == 'ROUTE' else agg_mode
        TODsums[x] = TODsums[x].groupby([agg_mode])[['DirectTransferOff','DirectTransferOn',
                                                     'DriveAccessOn','EgressOff','Off','On',
                                                     'WalkAccessOn','WalkTransferOff','WalkTransferOn'
                                                    ]].agg('sum').reset_index()
    return TODsums

### Here begins the driver logic for this notebook

In [None]:
# Read CSV file with list of routes on which to report into a pandas dataframe:
routes_df = pd.read_csv(routes_csv_fn)

# Get the list of TransCAD route IDs for the routes to report on from the 'Route_ID' field of this dataframe:
route_list = routes_df['Route_ID']

In [None]:
# Get total boardings per route for each time period ('tod') and for the day as a whole.
TODsums = import_transit_assignment(scenario) 

In [None]:
# Generate route-to-mode-to-metamode mapping table.
# Note: We have to do this for _each scenario_, because the list of routes may vary from one scenario to another!
routemode = set_up_metamode_table(scenario) 

In [None]:
# Perform aggregation: aggregate results by route or by meta-mode, as specified by user.
TODsums = join_and_aggregate(TODsums, aggregation_mode)

In [None]:
# Generate fully-qualified names of output CSV files.
#
# First, pluck name of scenario from last element of scenario directory name remove trailing '/'
temp1 = scenario[0:len(scenario)-1]
temp2 = os.path.split(temp1)
# Get 'raw' scenario name: may have blanks. Ugh!
raw_scenario_name = temp2[1]
clean_scenario_name = raw_scenario_name.replace(' ', '_')
#
am_report_csv_fn = sandbox_dir + clean_scenario_name + '_am_transit_report.csv'
md_report_csv_fn = sandbox_dir + clean_scenario_name + '_md_transit_report.csv'
pm_report_csv_fn = sandbox_dir + clean_scenario_name + '_pm_transit_report.csv'
nt_report_csv_fn = sandbox_dir + clean_scenario_name + '_nt_transit_report.csv'
daily_report_csv_fn = sandbox_dir + clean_scenario_name + '_daily_transit_report.csv'

In [None]:
# Write output CSV report files
#
TODsums['AM'].to_csv(am_report_csv_fn, sep=',')
TODsums['MD'].to_csv(md_report_csv_fn, sep=',')
TODsums['PM'].to_csv(pm_report_csv_fn, sep=',')
TODsums['NT'].to_csv(nt_report_csv_fn, sep=',')
TODsums['daily'].to_csv(daily_report_csv_fn, sep=',')