# Preprocessing Wellington Data

**This dataset exclusively covers weekdays, as our focus is on studying the decrease in carbon emissions for buses primarily designated for reducing carbon emissions, rather than buses primarily utilized for passenger transportation.**

**While weekend buses tend to be predominantly employed for passenger transportation, it's important to note that weekday buses also serve as a means of passenger transport. However, during weekdays, buses take on an additional role: they are strategically utilized for carbon reduction efforts, particularly during peak hours like 7am-9am and 4pm-6pm.**

### NetBI

The following dimensions were used to generate the datasets

X:
- Actual Running Time
- Actual In-Service KM
- Passenger Km
- Scheduled In-Service Km
- Cancelled Trips
- Sched Running Time per Trip

Y:
- Data
- Route
- Route Variant
- Direction
- Trip Number
- Actual Vehicle Type
- Vehicle Number
- Vehicle Emissions Standard
- Start Minute(Sched)
- Day

### Goals

- Split running time into, hourly run time intervals
- Calculate Average speed of a bus (distance/time)
- Calculate Average Occupancy (passenger Km/ actual km
- Calculate Carbon for routes
- Calculate the per person carbon emissions reduction

### Further Goals

- Create Heat Map
- Learn how to upload data to mongoDB
- Integrate heat map into Bean

### Loading Data

In [32]:
import pandas as pd

jan1_3 = pd.read_csv("Wellington Raw Daily Data/Jan 1-3 2022.csv")
jan4_6 = pd.read_csv("Wellington Raw Daily Data/Jan 4-6 2022.csv")
jan7_9 = pd.read_csv("Wellington Raw Daily Data/Jan 7-9 2022.csv")

dataframes = [jan1_3, jan4_6, jan7_9]

# The bottom row of each data frame must be removed due to it being a total column.
def drop_last_row(df):
    return df.drop(df.tail(1).index)

dataframes = list(map(drop_last_row, dataframes))

# Stacking Dataframes
combined_df = pd.concat(dataframes, ignore_index=True)

combined_df

Unnamed: 0,Date,Route,Route Variant,Direction,Trip Number,Actual Vehicle Type,Vehicle Number,Vehicle Emissions Standard,Start Minute (Sched),Actual Running Time,Actual In-Service KM,Passenger km,Scheduled In-Service km,Cancelled Trips,Sched Running Time per Trip
0,01-Jan-2022,1,1_1,Inbound,4000.0,ELVDD,3705,ELECTRIC,06:45,931,21.239,120.695,21.239,0,3720
1,01-Jan-2022,1,1_1,Inbound,4060.0,ELVDD,3704,ELECTRIC,07:45,3134,21.239,42.381,21.239,0,3720
2,01-Jan-2022,1,1_1,Inbound,4100.0,LV,3412,EURO6,08:15,3094,21.239,127.285,21.239,0,3720
3,01-Jan-2022,1,1_1,Inbound,4140.0,LV,3401,EURO6,08:45,3212,21.239,69.496,21.239,0,4080
4,01-Jan-2022,1,1_1,Inbound,4180.0,ELVDD,3705,ELECTRIC,09:15,948,21.239,73.054,21.239,0,4080
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23410,09-Jan-2022,WHF,WHF_8,Outbound,1048.0,Unknown,Unknown,Unknown,17:05,0,0.000,0.000,21.804,0,3000
23411,09-Jan-2022,WRL,WRL_1,Inbound,1613.0,Unknown,4398,Unknown,07:45,3174,90.803,0.000,90.803,0,6000
23412,09-Jan-2022,WRL,WRL_1,Inbound,1615.0,Unknown,4398,Unknown,16:45,3357,90.803,0.000,90.803,0,6000
23413,09-Jan-2022,WRL,WRL_2,Outbound,1614.0,Unknown,4398,Unknown,09:55,4536,90.805,0.000,90.805,0,6000
