# Preprocessing Wellington Data

**This dataset includes raw data required to calculate carbon emissions, our focus is on studying the decrease in carbon emissions for buses primarily designated for reducing carbon emissions, rather than buses primarily utilized for passenger transportation.**

**While weekend buses tend to be predominantly employed for passenger transportation, it's important to note that weekday buses also serve as a means of passenger transport. However, during weekdays, buses take on an additional role: they are strategically utilized for carbon reduction efforts, particularly during peak hours like 7am-9am and 4pm-6pm.**

### NetBI

The following dimensions were used to generate the datasets

X:
- Actual Running Time
- Actual In-Service KM
- Passenger Km
- Scheduled In-Service Km
- Cancelled Trips
- Sched Running Time per Trip

Y:
- Data
- Route
- Route Variant
- Direction
- Trip Number
- Actual Vehicle Type
- Vehicle Number
- Vehicle Emissions Standard
- Start Minute(Sched)
- Day

### Goals

- Split running time into, hourly run time intervals
- Calculate Average speed of a bus (distance/time) <input type="checkbox" checked> 
- Calculate Average Occupancy (passenger Km/ actual km <input type="checkbox" checked> 
- Calculate Carbon for routes
- Calculate the per person carbon emissions reduction

### Further Goals

- Create Heat Map
- Learn how to upload data to mongoDB
- Integrate heat map into Bean

### Where I left off on 6 September (Oxana):

- Trying to create an itneractive plot
- thinking what should be on X and Y axis of the plot, which should be the filters?
- Also at this stage could not figure out how to explort the interactive plot outside of python


### Loading Data

In [4]:
#I needed to install dash first
# pip install dash


import pandas as pd
import numpy as np
import plotly.express as px
from ipywidgets import interact, widgets
import plotly.graph_objs as go

# # Load and process DataFrames
# jan1_3 = pd.read_csv("Wellington Raw Daily Data/Jan 1-3 2022.csv")
# jan4_6 = pd.read_csv("Wellington Raw Daily Data/Jan 4-6 2022.csv")
# jan7_9 = pd.read_csv("Wellington Raw Daily Data/Jan 7-9 2022.csv")
# jan10_11 = pd.read_csv("Wellington Raw Daily Data/Jan 10-11 2022.csv")
# jan12_13 = pd.read_csv("Wellington Raw Daily Data/Jan 12-13 2022.csv")
# jan14_15 = pd.read_csv("Wellington Raw Daily Data/Jan 14-15 2022.csv")
# jan16_17 = pd.read_csv("Wellington Raw Daily Data/Jan 16-17 2022.csv")
# jan18_19 = pd.read_csv("Wellington Raw Daily Data/Jan 18-19 2022.csv")
# jan20_21 = pd.read_csv("Wellington Raw Daily Data/Jan 20-21 2022.csv")
# jan22_23 = pd.read_csv("Wellington Raw Daily Data/Jan 22-23 2022.csv")
# jan24_25 = pd.read_csv("Wellington Raw Daily Data/Jan 24-25 2022.csv")
# jan26_27 = pd.read_csv("Wellington Raw Daily Data/Jan 26-27 2022.csv")
# jan28_29 = pd.read_csv("Wellington Raw Daily Data/Jan 28-29 2022.csv")
# jan30_31 = pd.read_csv("Wellington Raw Daily Data/Jan 30-31 2022.csv")

# # List of DataFrames
# dataframes = [
#     jan1_3, jan4_6, jan7_9, jan10_11, jan12_13,
#     jan14_15, jan16_17, jan18_19, jan20_21, jan22_23,
#     jan24_25, jan26_27, jan28_29, jan30_31
# ]

# # The bottom row of each data frame must be removed due to it being a total column.
# def drop_last_row(df):
#     return df.drop(df.tail(1).index)

# dataframes = list(map(drop_last_row, dataframes))

# # Stacking DataFrames
# combined_df = pd.concat(dataframes, ignore_index=True)

# # writing and storing processed dataframes as csv
# combined_df.to_csv("Wellington Raw Monthly Data/January 2022.csv")

# print(combined_df)


In [5]:
#read in sample data from Hamish. This dataframe contains all trips for FY2022/2023

wlgt2022_trips = pd.read_csv('Wellington Raw Daily Data/Trips_2022-01-01_to_2023-06-30.csv')


In [6]:
#come back to this, trips with negative values seem to be train trips. We do not want train trips in this data frame.
sum(wlgt2022_trips['Passenger km']<0)

103

**Bus Speed**

The calculation of bus speed is straightforward and involves the formula where velocity equals the ratio of distance to time ( $ v = \frac{d}{t} $ ).

Since Actual Running Time is in seconds, we divide the time by 3600 to get speed in km her hour.

In [7]:
#Calculate Average speed of a bus (distance/time) with a condition to avoid division by zero
wlgt2022_trips['Speed'] = np.where(wlgt2022_trips['Actual Running Time'] != 0,
                                    wlgt2022_trips['Actual In-Service KM'] / (wlgt2022_trips['Actual Running Time'] / 3600),
                                    0)  # Set Speed to 0 when Running Time is 0

**Average Occupancy**

 Mean occupancy (per kilometre driven) equals the ratio Passenger km to Actual In-Service KM
 
 Where Passenger km is a unit of measurement representing the transport of 1 passenger over 1 km
 and Actual In-Service KM is the lengh of a trip


In [8]:
wlgt2022_trips['Average Occupancy'] = np.where(wlgt2022_trips['Actual In-Service KM'] != 0,
                                               wlgt2022_trips['Passenger km']/wlgt2022_trips['Actual In-Service KM'],
                                               0) # Set Occupancy to 0 when Distance travelled is 0

In [9]:
# Now look up tateWeight of each bus
wlgt2022_fleetmaster = pd.read_csv('Wellington Raw Daily Data/fleet_master.csv')
wlgt2022_fleetmaster.info()

FileNotFoundError: [Errno 2] No such file or directory: 'Wellington Raw Daily Data/fleet_master.csv'

# Calculate Emissions

In [None]:
LUBE = 0.265170857776354
PASSENGERWEIGHT = 80

def calcCO2equiv(row):
    

**Average car occupancy**

We use the assumption that on average 1.3 passengers travel in each car. To calculate how many Car km it would take to transport the same number of passengers we divide Passenger km by 1.3

**CO_2 emissions**

CO_2 are taken to be 0.265 kg per Car km. 
To calculate how much CO_2 would be emitted if the bus passengers travelled by a private car we use the following calculation:
$$ CO_2 emissions = \frac{0.265\times Passenger km}{1.3} $$

**Valuation of CO_2 emissions**

The whole-of-government agreed shadow price of carbon ($ per tonne of CO2 equivalent) emissions, in Table 1, is to be used for calculating the economic impact of carbon for transport activities.  This means applying the **central** price path as the default analysis in the economic evaluation of transport proposals and accompanying this with sensitivity analysis based on the low and high price paths. Quoted from here: https://www.nzta.govt.nz/assets/resources/monetised-benefits-and-costs-manual/Monetised-benefits-and-costs-manual.pdf

Table 1: Shadow Price of Carbon (NZ$2022 per tonne of CO2 equivalent)



| Year  | 2023  | 2024  | 2025  | 2026  | 2027  | 2028  | 2029  | 2030  | 2031  | 2032  | 2033  | 2034  |
|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| Low   | \$59  | \$65  | \$72  | \$78  | \$85  | \$91  | \$98  | \$104 | \$108 | \$112 | \$116 | \$120 |
| Middle| \$87  | \$97  | \$107 | \$116 | \$126 | \$136 | \$146 | \$155 | \$161 | \$167 | \$174 | \$180 |
| High  | \$171 | \$182 | \$193 | \$203 | \$214 | \$219 | \$224 | \$230 | \$235 | \$241 | \$247 | \$253 |


Therefore the calculation for shadow cost of CO_2 if the same trip was taken by private cars looks as follows:
(dividing by 1000 as price is per tonne)

$$ CO_2 shadow \ cost = \frac{0.265\times Passenger km\times \$\ Middle \ price for 2023}{1.3\times1000} $$

Please note that part of our data is from 2022, so technically we should be using 2022 price for that part:
Should your analysis require shadow prices for years prior to 2023, email MBCM@nzta.govt.nz.  

In [32]:
#you can update shadow price here if you want to use it for a different year
shadow_price2023 = 87

wlgt2022_trips['CO2cost'] = wlgt2022_trips['Passenger km']*0.265*shadow_price2023/(1.3*1000)

In [None]:
# First attempt to create interactive plot

In [64]:
import pandas as pd
import dash
from dash import dcc, html
from dash.dependencies import Input, Output
import plotly.express as px

app = dash.Dash(__name__)

# Layout of the web app
app.layout = html.Div([
    html.Label('Select Day of Week:'),
    dcc.Dropdown(
        id='day-dropdown',
        options=[{'label': day, 'value': day} for day in wlgt2022_trips['Day of Week'].unique()],
        value=wlgt2022_trips['Day of Week'].unique()[0]
    ),
    
    html.Label('Select Start Time (Sched):'),
    dcc.Dropdown(
        id='time-dropdown',
        options=[{'label': time, 'value': time} for time in wlgt2022_trips['Start Minute (Sched)'].unique()],
        value=wlgt2022_trips['Start Minute (Sched)'].unique()[0]
    ),
    
    dcc.Graph(id='co2-cost-plot')
])

# Callback function to update the plot based on dropdown selections
@app.callback(
    Output('co2-cost-plot', 'figure'),
    [Input('day-dropdown', 'value'), Input('time-dropdown', 'value')]
)
def update_plot(selected_day, selected_time):
    filtered_df = wlgt2022_trips[(wlgt2022_trips['Day of Week'] == selected_day) & (wlgt2022_trips['Start Minute (Sched)'] == selected_time)]
    
    fig = px.scatter(filtered_df, x='Date', y='CO2cost', title=f'CO2 Cost for {selected_day} - {selected_time}')
    
    fig.update_xaxes(title_text='Date')
    fig.update_yaxes(title_text='CO2 Cost')
    
    return fig

if __name__ == '__main__':
    app.run_server(debug=False)


In [90]:
# 2nd attempt, trying to explore what dimentions the plot should have
# check package tkinter, gui, flask app
app = dash.Dash(__name__)

# Layout of the web app
app.layout = html.Div([
    html.Label('Select Days of the Week:'),
    dcc.Checklist(
        id='day-checkboxes',
        options=[{'label': day, 'value': day} for day in wlgt2022_trips['Day of Week'].unique()],
        value=wlgt2022_trips['Day of Week'].unique()  # All days initially selected
    ),
    
    html.Label('Select Start Time (Sched):'),
    dcc.Dropdown(
        id='time-dropdown',
        options=[{'label': time, 'value': time} for time in wlgt2022_trips['Start Minute (Sched)'].unique()],
        value=wlgt2022_trips['Start Minute (Sched)'].unique()[0]
    ),
    
    html.Div(id='total-co2-cost'),  # Display the total CO2 cost
    dcc.Graph(id='co2-cost-plot')
])

# Callback function to update the plot based on checkbox selections and time dropdown
@app.callback(
    [Output('co2-cost-plot', 'figure'), Output('total-co2-cost', 'children')],
    [Input('day-checkboxes', 'value'), Input('time-dropdown', 'value')]
)
def update_plot(selected_days, selected_time):
    filtered_df = wlgt2022_trips[(wlgt2022_trips['Day of Week'].isin(selected_days)) & (wlgt2022_trips['Start Minute (Sched)'] == selected_time)]
    
    total_co2_cost = filtered_df['CO2cost'].sum()  # Calculate the total CO2 cost
    
    fig = px.scatter(filtered_df, x='Day of Week', y='CO2cost', title=f'CO2 Cost for Days of the Week - {selected_time}')
    
    fig.update_xaxes(title_text='Day of Week')
    fig.update_yaxes(title_text='CO2 Cost')
    
    return fig, f'Total CO2 Cost: {total_co2_cost:.2f}'

if __name__ == '__main__':
    app.run_server(debug=False)