<a href="https://colab.research.google.com/github/chloebs4590/Metis-Engineering/blob/main/trains_cars_emissions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Description of Notebook

This notebook is for the following:

*   calculating distances by train between cities in each Amtrak route (i.e., A-B, B-C, C-D, etc.)
*   calculating carbon emissions based on these distances
*   creating a dataset that contains each city pair in an Amtrak route (i.e., A-B, B-C, C-D, etc.), along with corresponding distances by train between the cities and the carbon emissions 
*   saving the trains dataset in a CSV to store on Github
*   calculating distances by car between combinations of two cities in each Amtrak route (i.e., A-B, A-C, A-D, etc.)
*   calculating carbon emissions based on these distances
*   creating a dataset that contains each city pair in an Amtrak route, along with corresponding distances by car between the cities and the carbon emissions
*   storing the cars dataset on a cloud-based MongoDB database













In [None]:
reset -fs

In [None]:
!pip install googlemaps

In [None]:
import pandas as pd
import os
import pickle
import googlemaps
import time
import pickle
import requests
import json
from itertools import combinations
import itertools

In [None]:
# mount Google Drive
from google.colab import drive 
from os.path import join
ROOT = "/content/drive"    
print(ROOT)                 

drive.mount(ROOT)           

/content/drive
Mounted at /content/drive


In [None]:
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

In [None]:
os.getcwd()

'/content'

In [None]:
os.chdir('/content/drive/MyDrive/Data Science Metis/Engineering/Project')

# Trains

Read in train station spreadsheets


*   spreadsheet of route names and stations within each route (in order)
*   spreadsheet of geocoded stations



In [None]:
# routes and stations

worksheet = gc.open('Amtrak Routes and Train Stations').sheet1
rows = worksheet.get_all_values()
routes_stations = pd.DataFrame.from_records(rows[1:])

routes_stations.columns = ['route','station_name']

In [None]:
routes_stations.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1022 entries, 0 to 1021
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   route         1022 non-null   object
 1   station_name  1022 non-null   object
dtypes: object(2)
memory usage: 16.1+ KB


In [None]:
# geocoded stations

with open('stations_locations_geocoded.pkl','rb') as fid:
  stations_locations = pickle.load(fid)

Clean up geocoded stations dataframe

In [None]:
# convert latitude and longitude columns to numeric

stations_locations[['latitude', 'longitude']] = stations_locations[['latitude', 'longitude']].apply(pd.to_numeric)

In [None]:
# merge latitude and longitude columns

stations_locations['coordinates'] = stations_locations[['latitude', 'longitude']].apply(tuple, axis=1)

In [None]:
# create combined city_state column

stations_locations['city_state'] = stations_locations['city'] + ',' + " " + stations_locations['state_rev']

Clean up routes and stations dataframe in preparation for merge with geocoded stations dataframe

In [None]:
# create station code column

def create_code_col(x):
  code = x[-5:]
  code = code[1:-1]
  return code

routes_stations['code'] = routes_stations['station_name'].map(create_code_col)

In [None]:
# remove code from station_name column

routes_stations['station_name'] = routes_stations['station_name'].map(lambda x: x[:-6])

In [None]:
routes_stations.head()

Unnamed: 0,route,station_name,code
0,Acela - Boston - DC,"Boston, MA - South Station",BOS
1,Acela - Boston - DC,"Boston, MA - Back Bay Station",BBY
2,Acela - Boston - DC,"Route 128, MA",RTE
3,Acela - Boston - DC,"Providence, RI - Amtrak/MBTA Station",PVD
4,Acela - Boston - DC,"New Haven, CT - Union Station",NHV


Merge routes and stations dataframe with geocoded stations dataframe

In [None]:
routes_stations_locations = routes_stations.merge(stations_locations, how='left', on='code',suffixes=('_route', '_location'))

In [None]:
routes_stations_locations.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1022 entries, 0 to 1021
Data columns (total 22 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   route                  1022 non-null   object 
 1   station_name_route     1022 non-null   object 
 2   code                   1022 non-null   object 
 3   x                      1022 non-null   object 
 4   y                      1022 non-null   object 
 5   objectid_1             1022 non-null   object 
 6   objectid               1022 non-null   object 
 7   station_descripton     1022 non-null   object 
 8   bus_or_train           1022 non-null   object 
 9   zip_code               1022 non-null   object 
 10  state                  1022 non-null   object 
 11  city                   1022 non-null   object 
 12  address_2              1022 non-null   object 
 13  address_1              1022 non-null   object 
 14  name                   1022 non-null   object 
 15  stat

Clean up merged routes and station locations dataframe

In [None]:
# create a column with coordinates in opposite order (longitude, latitude)

def coordinates_rev(x):
  long = x[1]
  lat = x[0]
  return list((long,lat))

routes_stations_locations['coordinates_rev'] = routes_stations_locations['coordinates'].map(lambda x:coordinates_rev(x))

In [None]:
# since I only really care about stations at the city level, I'll remove duplicate cities per route

print(sum(routes_stations_locations.duplicated(subset=['station_name_location', 'route'])))

routes_cities_locations = routes_stations_locations.drop_duplicates(subset=['station_name_location', 'route'])

24


Create a new dataframe that contains routes, cities in each route, and coordinates corresponding to each city (which are of a station within it)

In [None]:
routes_locations = (routes_cities_locations.groupby('route', as_index=False)
               .agg({'station_name_location': (lambda x: list(x)), 'coordinates_rev':(lambda x: list(x))})
               .rename(columns={'station_name_location': 'locations_per_route', 'locations_coordinates':'coordinates_rev'}))

In [None]:
# pickle routes_locations for use later

with open('routes_locations_df.pkl', 'wb') as fid:
     pickle.dump(routes_locations, fid)

Prepare for calculating distances by train between cities in each route

In [None]:
# get number of routes
routes_cities_locations.route.nunique()

46

In [None]:
# create list of all route names
routes = list(routes_cities_locations.route.unique())
len(routes)

46

In [None]:
# create a dictionary where key = route and values = tuples of cities next to each other in the route plus the first and last 
  # cities in a route paired with themselves (this will make indexing and calculating down columns easier, which will come up in the app)

route_combos_dict = {}

for route in routes:
  route_df = routes_cities_locations.loc[routes_cities_locations.route == route]
  route_combos = []
  last_idx = len(route_df)-1
  for i in range(0, len(route_df)):
    if i == 0:
      x = route_df.code[i:i+1].values[0]
      combo = (x,x)
      route_combos.append(combo)
      combo = tuple(route_df.code[i:i + 2])
      route_combos.append(combo)
    elif i == last_idx:
      y = route_df.code[i:i+1].values[0]
      combo = (y,y)
      route_combos.append(combo)
    else:
      combo = tuple(route_df.code[i:i + 2])
      route_combos.append(combo)
  route_combos_dict[route] = route_combos

Calculate distances between cities within a route using Google's Distance Matrix API

In [None]:
# create a list of lists, in which each nested list contains each city/station combo, including the locations, station codes, 
  # coordinates of each station, and the distance via train mode

API_key = 'my_key'
gmaps = googlemaps.Client(key=API_key)

routes_combos_list = []

for key,value in route_combos_dict.items():
    for i in range(len(value)):
      city_1_coords = routes_cities_locations.loc[routes_cities_locations.code == value[i][0]]['coordinates'].values[0]
      city_2_coords = routes_cities_locations.loc[routes_cities_locations.code == value[i][1]]['coordinates'].values[0]

      city_1_name = routes_cities_locations.loc[routes_cities_locations.code == value[i][0]]['station_name_location'].values[0]
      city_2_name = routes_cities_locations.loc[routes_cities_locations.code == value[i][1]]['station_name_location'].values[0]

      distance = gmaps.distance_matrix(city_1_coords, city_2_coords,transit_mode='train')["rows"][0]["elements"][0]["distance"]["value"]
      routes_combos_list.append([value[i][0], city_1_name, city_1_coords, value[i][1], city_2_name, city_2_coords, distance])

In [None]:
# create a dataframe from the above nested list

cities_combos_df = pd.DataFrame(routes_combos_list, columns = ['city_1_code','city_1_name','city_1_coords','city_2_code','city_2_name',
                                                 'city_2_coords', 'distance_meters'])
cities_combos_df['distance_mi'] = cities_combos_df['distance_meters'] / 1609

In [None]:
# drop meters column

cities_combos_df = cities_combos_df.drop(columns='distance_meters',axis=1)

In [None]:
# add a column to the dataframe containing the route corresponding to each city pair

route_combos_dict_keys = list(route_combos_dict.keys())
route_combos_dict_keys_lengths = [len(v) for v in route_combos_dict.values()]

# the below code came from here: https://stackoverflow.com/questions/48837245/how-to-multiply-a-list-of-strings-by-a-list-of-integers
routes_column = sum([[s] * n for s, n in zip(route_combos_dict_keys, route_combos_dict_keys_lengths)], [])

cities_combos_df['route'] = routes_column

In [None]:
# pickle cities_combos_df
with open('train_cities_combos_df_gmaps.pkl', 'wb') as fid:
     pickle.dump(cities_combos_df, fid)

Calculate carbon emissions of train travel using distances calculated above using calls to the Climatiq API 

In [None]:
# Climatiq API info

headers = {'Authorization': 'my_key','Content-type': 'application/json'}

In [None]:
# create list of distances 

distances = list(cities_combos_df.distance_mi)

In [None]:
# save results from API calls to a dictionary

responses_dict = {}

for idx, distance in enumerate(distances):
  response_dict = requests.post('https://beta2.api.climatiq.io/estimate',
                           data=json.dumps({"emission_factor": "passenger_train-route_type_intercity-fuel_source_na",\
                               "parameters": {"passengers": 1,"distance": distance,"distance_unit": "mi"}}),
                           headers=headers).json()
  responses_dict[idx] = response_dict
  time.sleep(3)

In [None]:
# pickle responses_dict

with open('climatiq_train_responses_dict.pkl', 'wb') as fid:
     pickle.dump(responses_dict, fid)

Convert train emissions data to dataframe and merge with cities combos dataframe

In [None]:
climatiq_train_responses_dict_vals = train_responses_dict.values()
climatiq_df = pd.json_normalize(climatiq_train_responses_dict_vals)
cities_combos_c02 = pd.concat([cities_combos_df, climatiq_df], axis=1)
cities_combos_c02.shape

(1044, 16)

Merge cities locations df with routes locations df

In [None]:
cities_combos_c02 = cities_combos_c02.merge(routes_locations, how='left', on='route')

Merge cities locations df with cities combos co2 df to pull in location data for origin city

In [None]:
cities_combos_c02_locs = cities_combos_c02.merge(stations_locations, how='left', left_on='station_1_code', right_on='code')

In [None]:
cities_combos_c02_locs.head(2)

Unnamed: 0,station_1_code,station_1_name,station_1_coords,station_2_code,station_2_name,station_2_coords,distance_mi,route,co2e,co2e_unit,...,address_1,name,code,station_name,state_rev,full_address,longitude,latitude,coordinates,city_state
0,BOS,"Boston, MA","(42.348695, -71.059861)",BOS,"Boston, MA","(42.348695, -71.059861)",0.0,Acela - Boston - DC,0.0,kg,...,2 South Station,South Station,BOS,"Boston, MA",Massachusetts,"2 South Station, Boston, Massachusetts 2110",-71.059861,42.348695,"(42.348695, -71.059861)","Boston, Massachusetts"
1,BOS,"Boston, MA","(42.348695, -71.059861)",RTE,"Route 128, MA","(42.2111905, -71.148665)",17.250466,Acela - Boston - DC,3.143035,kg,...,2 South Station,South Station,BOS,"Boston, MA",Massachusetts,"2 South Station, Boston, Massachusetts 2110",-71.059861,42.348695,"(42.348695, -71.059861)","Boston, Massachusetts"


Clean up merged cities combos dataframe with origin location data

In [None]:
# rename city_state, latitude and longitude columns so it's clear they correspond to the the origin station

cities_combos_c02_locs.rename(columns={'city_state':'origin_location',
                                         'latitude':'origin_lat',
                                         'longitude':'origin_lon'},inplace=True)

In [None]:
# drop columns not needed

cities_combos_c02_locs = cities_combos_c02_locs.drop(columns=['co2e_unit','id','source','year','region','category',
                                                                  'lca_activity','code','x','y','objectid_1','objectid','station_descripton',
                                                               'bus_or_train','objectid_1','objectid','station_descripton',
                                                      'bus_or_train','address_2','name', 'zip_code','state','city','address_1','station_name',
                                                      'state_rev','full_address'])

Merge stations locations df with cities combos co2 df to pull in location data for destination city

In [None]:
cities_combos_c02_locs = cities_combos_c02_locs.merge(stations_locations, how='left', left_on='station_2_code', right_on='code')

In [None]:
cities_combos_c02_locs.head(2)

Unnamed: 0,station_1_code,station_1_name,station_1_coords,station_2_code,station_2_name,station_2_coords,distance_mi,route,co2e,locations_per_route,...,address_1,name,code,station_name,state_rev,full_address,longitude,latitude,coordinates_y,city_state
0,BOS,"Boston, MA","(42.348695, -71.059861)",BOS,"Boston, MA","(42.348695, -71.059861)",0.0,Acela - Boston - DC,0.0,"[Boston, MA, Route 128, MA, Providence, RI, Ne...",...,2 South Station,South Station,BOS,"Boston, MA",Massachusetts,"2 South Station, Boston, Massachusetts 2110",-71.059861,42.348695,"(42.348695, -71.059861)","Boston, Massachusetts"
1,BOS,"Boston, MA","(42.348695, -71.059861)",RTE,"Route 128, MA","(42.2111905, -71.148665)",17.250466,Acela - Boston - DC,3.143035,"[Boston, MA, Route 128, MA, Providence, RI, Ne...",...,50 University Avenue,,RTE,"Route 128, MA",Massachusetts,"50 University Avenue, Westwood, Massachusetts ...",-71.148665,42.211191,"(42.2111905, -71.148665)","Westwood, Massachusetts"


Clean up merged dataframe

In [None]:
# rename, drop and add columns

cities_combos_c02_locs.columns = ['origin_code','origin_name','origin_coords','dest_code','dest_name','dest_coords','distance_mi',
                                    'route','co2e_kg','route_locations','coordinates_rev','origin_lon','origin_lat','coordinates_x','origin_location',
                                  'x','y','objectid_1','objectid','station_descripton','bus_or_train','zip_code','state','city','address_2','address_1',
                                  'name','code','station_name','state_rev','full_address','longitude','latitude','coordinates_y','dest_location']

cities_combos_c02_locs = cities_combos_c02_locs.drop(columns=['coordinates_rev','coordinates_x','x','y','objectid_1','objectid','station_descripton',
                                                              'bus_or_train','zip_code','state','city','address_2','address_1','name','code',
                                                              'station_name','state_rev','full_address','longitude','latitude','coordinates_y',],axis=1)                       

cities_combos_c02_locs['co2e_kg_round'] = cities_combos_c02_locs['co2e_kg'].map(lambda x: int(x))

cities_combos_c02_locs['co2e_lb'] = cities_combos_c02_locs['co2e_kg'].map(lambda x: int(x*2.2))

In [None]:
# reorder columns

cities_combos_c02_locs = cities_combos_c02_locs[['origin_code','origin_name','origin_coords','origin_location','origin_lat','origin_lon',
                                                 'dest_code','dest_name','dest_coords','dest_location','route','route_locations',
                                                 'distance_mi','co2e_kg','co2e_kg_round','co2e_lb']]

In [None]:
cities_combos_c02_locs.head(2)

Unnamed: 0,origin_code,origin_name,origin_coords,origin_location,origin_lat,origin_lon,dest_code,dest_name,dest_coords,dest_location,route,route_locations,distance_mi,co2e_kg,co2e_kg_round,co2e_lb
0,BOS,"Boston, MA","(42.348695, -71.059861)","Boston, Massachusetts",42.348695,-71.059861,BOS,"Boston, MA","(42.348695, -71.059861)","Boston, Massachusetts",Acela - Boston - DC,"[Boston, MA, Route 128, MA, Providence, RI, Ne...",0.0,0.0,0,0
1,BOS,"Boston, MA","(42.348695, -71.059861)","Boston, Massachusetts",42.348695,-71.059861,RTE,"Route 128, MA","(42.2111905, -71.148665)","Westwood, Massachusetts",Acela - Boston - DC,"[Boston, MA, Route 128, MA, Providence, RI, Ne...",17.250466,3.143035,3,6


Export above dataframe to upload to Github for Streamlit app

In [None]:
from google.colab import files
cities_combos_c02_locs.to_csv('train_emissions_46.csv', encoding = 'utf-8-sig', index=False) 
files.download('train_emissions_46.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Cars

Create a dictionary where key = route and values = tuples of all combinations of two cities within each Amtrak route route

In [None]:
cars_route_combos_dict = {}

for route in routes:
  route_df = routes_cities_locations.loc[routes_cities_locations.route == route]
  route_combos = list(combinations(route_df.code, 2))
  cars_route_combos_dict[route] = route_combos

In [None]:
# create a list of lists, in which each nested list contains each city/station combo, including the locations, station codes, 
  # coordinates of each station, and the distance via train mode

API_key = 'my_key'
gmaps = googlemaps.Client(key=API_key)

cars_routes_combos_list = []

for key,value in cars_route_combos_dict.items():
    for i in range(len(value)):
      city_1_coords = routes_cities_locations.loc[routes_cities_locations.code == value[i][0]]['coordinates'].values[0]
      city_2_coords = routes_cities_locations.loc[routes_cities_locations.code == value[i][1]]['coordinates'].values[0]

      city_1_name = routes_cities_locations.loc[routes_cities_locations.code == value[i][0]]['station_name_location'].values[0]
      city_2_name = routes_cities_locations.loc[routes_cities_locations.code == value[i][1]]['station_name_location'].values[0]

      distance = gmaps.distance_matrix(city_1_coords, city_2_coords,mode='driving')["rows"][0]["elements"][0]["distance"]["value"]
      cars_routes_combos_list.append([value[i][0], city_1_name, city_1_coords, value[i][1], city_2_name, city_2_coords, distance])

In [None]:
# create a dataframe from the above nested list

cities_combos_df_cars = pd.DataFrame(cars_routes_combos_list, columns = ['city_1_code','city_1_name','city_1_coords','city_2_code','city_2_name',
                                                 'city_2_coords', 'distance_meters'])
cities_combos_df_cars['distance_mi'] = cities_combos_df_cars['distance_meters'] / 1609

In [None]:
# drop meters column

cities_combos_df_cars = cities_combos_df_cars.drop(columns='distance_meters',axis=1)

In [None]:
# add a column to the dataframe containing the route corresponding to each city pair

cars_route_combos_dict_keys = list(cars_route_combos_dict.keys())
cars_route_combos_dict_keys_lengths = [len(v) for v in cars_route_combos_dict.values()]

routes_column = sum([[s] * n for s, n in zip(cars_route_combos_dict_keys, cars_route_combos_dict_keys_lengths)], [])

cities_combos_df_cars['route'] = routes_column

In [None]:
# pickle cities_combos_df_cars

with open('train_stations_combos_df_cars_gmaps.pkl', 'wb') as fid:
     pickle.dump(cities_combos_df_cars, fid)

Calculate carbon emissions of car travel using distances calculated above using calls to the Climatiq API

In [None]:
# create list of distances 

cars_distances = list(cities_combos_df_cars.distance_mi)

In [None]:
len(cars_distances)

13319

In [None]:
# save results from API calls to a dictionary

cars_responses_dict = {}

for idx, distance in enumerate(cars_distances):
  response_dict = requests.post('https://beta2.api.climatiq.io/estimate',
                           data=json.dumps({"emission_factor": "passenger_vehicle-vehicle_type_car-fuel_source_na-engine_size_na-vehicle_age_na-vehicle_weight_na",\
                               "parameters": {"passengers": 1,"distance": distance,"distance_unit": "mi"}}),
                           headers=headers).json()
  cars_responses_dict[idx] = response_dict
  time.sleep(3)

# pickle cars_responses_dict

with open('climatiq_car_responses_dict_gmaps.pkl', 'wb') as fid:
     pickle.dump(cars_responses_dict, fid)

Convert car emissions data to dataframe and concatenate it with city combos dataframe (along the columns axis)

In [None]:
climatiq_car_responses_dict_vals = car_responses_dict.values()
climatiq_df_car = pd.json_normalize(climatiq_car_responses_dict_vals)
cities_combos_c02_cars = pd.concat([cities_combos_df_cars, climatiq_df_car], axis=1)
cities_combos_c02_cars.shape

(13319, 16)

Merge cities locations df with routes locations df

In [None]:
cities_combos_c02_cars = cities_combos_c02_cars.merge(routes_locations, how='left', on='route')

In [None]:
cities_combos_c02_cars.head(2)

Unnamed: 0,city_1_code,city_1_name,city_1_coords,city_2_code,city_2_name,city_2_coords,distance_mi,route,co2e,co2e_unit,id,source,year,region,category,lca_activity,locations_per_route,coordinates_rev
0,BOS,"Boston, MA","(42.348695, -71.059861)",RTE,"Route 128, MA","(42.2111905, -71.148665)",17.250466,Acela - Boston - DC,5.330291,kg,passenger_vehicle-vehicle_type_car-fuel_source...,ADEME,2021,FR,Vehicle,unspecified,"[Boston, MA, Route 128, MA, Providence, RI, Ne...","[[-71.059861, 42.348695], [-71.148665, 42.2111..."
1,BOS,"Boston, MA","(42.348695, -71.059861)",PVD,"Providence, RI","(41.8305099, -71.4131785)",48.474208,Acela - Boston - DC,14.978242,kg,passenger_vehicle-vehicle_type_car-fuel_source...,ADEME,2021,FR,Vehicle,unspecified,"[Boston, MA, Route 128, MA, Providence, RI, Ne...","[[-71.059861, 42.348695], [-71.148665, 42.2111..."


Merge cities locations df with cities combos co2 df to pull in location data for origin city

In [None]:
cities_combos_c02_cars_locs = cities_combos_c02_cars.merge(stations_locations, how='left', left_on='city_1_code', right_on='code')

In [None]:
cities_combos_c02_cars_locs.head(2)

Unnamed: 0,city_1_code,city_1_name,city_1_coords,city_2_code,city_2_name,city_2_coords,distance_mi,route,co2e,co2e_unit,...,address_1,name,code,station_name,state_rev,full_address,longitude,latitude,coordinates,city_state
0,BOS,"Boston, MA","(42.348695, -71.059861)",RTE,"Route 128, MA","(42.2111905, -71.148665)",17.250466,Acela - Boston - DC,5.330291,kg,...,2 South Station,South Station,BOS,"Boston, MA",Massachusetts,"2 South Station, Boston, Massachusetts 2110",-71.059861,42.348695,"(42.348695, -71.059861)","Boston, Massachusetts"
1,BOS,"Boston, MA","(42.348695, -71.059861)",PVD,"Providence, RI","(41.8305099, -71.4131785)",48.474208,Acela - Boston - DC,14.978242,kg,...,2 South Station,South Station,BOS,"Boston, MA",Massachusetts,"2 South Station, Boston, Massachusetts 2110",-71.059861,42.348695,"(42.348695, -71.059861)","Boston, Massachusetts"


Clean up merged cities combos dataframe with origin location data

In [None]:
# rename city_state, latitude and longitude columns so it's clear they correspond to the the origin station

cities_combos_c02_cars_locs.rename(columns={'city_state':'origin_location',
                                         'latitude':'origin_lat',
                                         'longitude':'origin_lon'},inplace=True)

In [None]:
# drop columns not needed
cities_combos_c02_cars_locs = cities_combos_c02_cars_locs.drop(columns=['co2e_unit','id','source','year','region','category',
                                                                  'lca_activity','code','x','y','objectid_1','objectid','station_descripton',
                                                               'bus_or_train','objectid_1','objectid','station_descripton',
                                                      'bus_or_train','address_2','name', 'zip_code','state','city','address_1','station_name',
                                                      'state_rev','full_address'])

Merge stations locations df with cities combos co2 df to pull in location data for destination city

In [None]:
cities_combos_c02_cars_locs = cities_combos_c02_cars_locs.merge(stations_locations, how='left', left_on='city_2_code', right_on='code')

Clean up merged dataframe

In [None]:
# rename, drop and add columns

cities_combos_c02_cars_locs.columns = ['origin_code','origin_name','origin_coords','dest_code','dest_name','dest_coords','distance_mi',
                                    'route','co2e_kg','route_locations','coordinates_rev','origin_lon','origin_lat','coordinates_x','origin_location',
                                  'x','y','objectid_1','objectid','station_descripton','bus_or_train','zip_code','state','city','address_2','address_1',
                                  'name','code','station_name','state_rev','full_address','longitude','latitude','coordinates_y','dest_location']

cities_combos_c02_cars_locs = cities_combos_c02_cars_locs.drop(columns=['coordinates_rev','coordinates_x','x','y','objectid_1','objectid','station_descripton',
                                                              'bus_or_train','zip_code','state','city','address_2','address_1','name','code',
                                                              'station_name','state_rev','full_address','longitude','latitude','coordinates_y',],axis=1)                       

cities_combos_c02_cars_locs['co2e_kg_round'] = cities_combos_c02_cars_locs['co2e_kg'].map(lambda x: int(x))

cities_combos_c02_cars_locs['co2e_lb'] = cities_combos_c02_cars_locs['co2e_kg'].map(lambda x: int(x*2.2))

In [None]:
# reorder columns

cities_combos_c02_cars_locs = cities_combos_c02_cars_locs[['origin_code','origin_name','origin_coords','origin_location','origin_lat','origin_lon',
                                                 'dest_code','dest_name','dest_coords','dest_location','route','route_locations',
                                                 'distance_mi','co2e_kg','co2e_kg_round','co2e_lb']]

Since in the app, a user will select an origin and destination by city and not specific station, I'll remove instances where origin and destination location are in the same city

In [None]:
print(len(cities_combos_c02_cars_locs.loc[cities_combos_c02_cars_locs.origin_location == cities_combos_c02_cars_locs.dest_location]))

cities_combos_c02_cars_locs = cities_combos_c02_cars_locs.loc[~(cities_combos_c02_cars_locs.origin_location == cities_combos_c02_cars_locs.dest_location)]

18


In [None]:
cities_combos_c02_cars_locs.head(2)

Unnamed: 0,origin_code,origin_name,origin_coords,origin_location,origin_lat,origin_lon,dest_code,dest_name,dest_coords,dest_location,route,route_locations,distance_mi,co2e_kg,co2e_kg_round,co2e_lb
0,BOS,"Boston, MA","(42.348695, -71.059861)","Boston, Massachusetts",42.348695,-71.059861,RTE,"Route 128, MA","(42.2111905, -71.148665)","Westwood, Massachusetts",Acela - Boston - DC,"[Boston, MA, Route 128, MA, Providence, RI, Ne...",17.250466,5.330291,5,11
1,BOS,"Boston, MA","(42.348695, -71.059861)","Boston, Massachusetts",42.348695,-71.059861,PVD,"Providence, RI","(41.8305099, -71.4131785)","Providence, Rhode Island",Acela - Boston - DC,"[Boston, MA, Route 128, MA, Providence, RI, Ne...",48.474208,14.978242,14,32


MongoDB database set up and data storage

In [None]:
# convert cities_combos_c02_cars_locs to a dictionary to go into the Mongodb DB

cars_emissions_dict = cities_combos_c02_cars_locs.to_dict('records')

In [None]:
import pymongo
from pymongo import MongoClient
from getpass import getpass

In [None]:
uri = 'mongodb://udunkhpo2nmne8de5ygi:37Ke4KMMHIcBlSPVlmYL@bitnqlsaoiuc1yk-mongodb.services.clever-cloud.com:27017/bitnqlsaoiuc1yk'
client = MongoClient( uri )

In [None]:
# MongoDB connection info
hostname = 'bitnqlsaoiuc1yk-mongodb.services.clever-cloud.com'
port = 27017
username = 'udunkhpo2nmne8de5ygi'
password = getpass('Enter the secret value: ')
databaseName = 'bitnqlsaoiuc1yk'

# authenticate the database
client = MongoClient(hostname, username=username, password=password, authSource = databaseName, 
                    authMechanism = 'SCRAM-SHA-256')
db = client[databaseName]

Enter the secret value: ··········


In [None]:
# create a collection in the database in which to store the data

db.create_collection('cars_emission_gmaps_fin')

Collection(Database(MongoClient(host=['bitnqlsaoiuc1yk-mongodb.services.clever-cloud.com:27017'], document_class=dict, tz_aware=False, connect=True, authsource='bitnqlsaoiuc1yk', authmechanism='SCRAM-SHA-256'), 'bitnqlsaoiuc1yk'), 'cars_emission_gmaps_fin')

In [None]:
# insert the cars emissions data into the MongoDB database

db.cars_emission_gmaps_fin.insert_many(cars_emissions_dict)

<pymongo.results.InsertManyResult at 0x7fa7fd9d3810>