# Preprocessing Wellington Data

**This dataset includes raw data required to calculate carbon emissions, our focus is on studying the decrease in carbon emissions for buses primarily designated for reducing carbon emissions, rather than buses primarily utilized for passenger transportation.**

**While weekend buses tend to be predominantly employed for passenger transportation, it's important to note that weekday buses also serve as a means of passenger transport. However, during weekdays, buses take on an additional role: they are strategically utilized for carbon reduction efforts, particularly during peak hours like 7am-9am and 4pm-6pm.**

### NetBI

The following dimensions were used to generate the datasets

X:
- Actual Running Time
- Actual In-Service KM
- Passenger Km
- Scheduled In-Service Km
- Cancelled Trips
- Sched Running Time per Trip

Y:
- Data
- Route
- Route Variant
- Direction
- Trip Number
- Actual Vehicle Type
- Vehicle Number
- Vehicle Emissions Standard
- Start Minute(Sched)
- Day

### Goals

- Split running time into, hourly run time intervals
- Calculate Average speed of a bus (distance/time)
- Calculate Average Occupancy (passenger Km/ actual km
- Calculate Carbon for routes
- Calculate the per person carbon emissions reduction

### Further Goals

- Create Heat Map
- Learn how to upload data to mongoDB
- Integrate heat map into Bean

### Loading Data

In [35]:
import pandas as pd

import pandas as pd

# Load and process DataFrames
jan1_3 = pd.read_csv("Wellington Raw Daily Data/Jan 1-3 2022.csv")
jan4_6 = pd.read_csv("Wellington Raw Daily Data/Jan 4-6 2022.csv")
jan7_9 = pd.read_csv("Wellington Raw Daily Data/Jan 7-9 2022.csv")
jan10_11 = pd.read_csv("Wellington Raw Daily Data/Jan 10-11 2022.csv")
jan12_13 = pd.read_csv("Wellington Raw Daily Data/Jan 12-13 2022.csv")
jan14_15 = pd.read_csv("Wellington Raw Daily Data/Jan 14-15 2022.csv")
jan16_17 = pd.read_csv("Wellington Raw Daily Data/Jan 16-17 2022.csv")
jan18_19 = pd.read_csv("Wellington Raw Daily Data/Jan 18-19 2022.csv")
jan20_21 = pd.read_csv("Wellington Raw Daily Data/Jan 20-21 2022.csv")
jan22_23 = pd.read_csv("Wellington Raw Daily Data/Jan 22-23 2022.csv")
jan24_25 = pd.read_csv("Wellington Raw Daily Data/Jan 24-25 2022.csv")
jan26_27 = pd.read_csv("Wellington Raw Daily Data/Jan 26-27 2022.csv")
jan28_29 = pd.read_csv("Wellington Raw Daily Data/Jan 28-29 2022.csv")
jan30_31 = pd.read_csv("Wellington Raw Daily Data/Jan 30-31 2022.csv")

# List of DataFrames
dataframes = [
    jan1_3, jan4_6, jan7_9, jan10_11, jan12_13,
    jan14_15, jan16_17, jan18_19, jan20_21, jan22_23,
    jan24_25, jan26_27, jan28_29, jan30_31
]

# The bottom row of each data frame must be removed due to it being a total column.
def drop_last_row(df):
    return df.drop(df.tail(1).index)

dataframes = list(map(drop_last_row, dataframes))

# Stacking DataFrames
combined_df = pd.concat(dataframes, ignore_index=True)

# writing and storing processed dataframes as csv
combined_df.to_csv("Wellington Raw Monthly Data/January 2022.csv")

print(combined_df)


               Date Route Route Variant Direction  Trip Number  \
0       01-Jan-2022     1           1_1   Inbound       4000.0   
1       01-Jan-2022     1           1_1   Inbound       4060.0   
2       01-Jan-2022     1           1_1   Inbound       4100.0   
3       01-Jan-2022     1           1_1   Inbound       4140.0   
4       01-Jan-2022     1           1_1   Inbound       4180.0   
...             ...   ...           ...       ...          ...   
100719  31-Jan-2022   WRL         WRL_2  Outbound       1610.0   
100720  31-Jan-2022   WRL         WRL_3   Inbound       1607.0   
100721  31-Jan-2022   WRL         WRL_3   Inbound       1609.0   
100722  31-Jan-2022   WRL         WRL_4  Outbound       1602.0   
100723  31-Jan-2022   WRL         WRL_4  Outbound       1604.0   

       Actual Vehicle Type Vehicle Number Vehicle Emissions Standard  \
0                    ELVDD           3705                   ELECTRIC   
1                    ELVDD           3704                   ELE