# Number of Trips to the New Employment Centers
#### Purpose:
To find the number of trips to the new employment centers by different factors such as primary mode, trip purpose, and vehicle type on weekdays of fall 2019 in the county of San Diego. 

#### Data Source:
There are two data sources that were used. Both data sources come from Replica Place Studies(please login to your Replica account):\
$\;\;\;\;\;\;$ 1. Replica Place Studies: https://studio.replicahq.com/places/studies/p544f96 \
$\;\;\;\;\;\;$ 2. Replica Place Studies: https://studio.replicahq.com/places/studies/ulztfb3


#### Transformations being preformed:
Two datasets were downloaded separately from Replica Place Studies. They were then uploaded to J Drive. 

#### Location of Outputs:
J:\DataScience\DSEconProdDessem\EC2\Replica\Trips_to_EC_fall_2019_thursday\Outputs

#### Author: 
Navid Hedayati (navid.hedayati@sandag.org)

#### Data Created 
3/2/2023

# Import Libraries

In [1]:
# Necessary libraries
import pandas as pd
import numpy as np
#import geopandas as gpd

# Read Data
Two datasets were read. They are called replica and replica_2.

In [8]:
replica = pd.read_csv(r"J:\DataScience\DSEconProdDessem\EC2\Replica\Trips_to_EC_fall_2019_thursday\replica-first_study-02_14_23-trips_dataset\replica-first_study-02_14_23-trips_dataset.csv", nrows=10000)
replica['dummy_val'] = 1
replica

Unnamed: 0,origin_bgrp,origin_cty,origin_st,destination_bgrp,destination_cty,destination_custom,primary_mode,trip_purpose,previous_trip_purpose,trip_start_time,...,trip_taker_resident_type,trip_taker_home_bgrp,trip_taker_home_trct,trip_taker_home_cty,trip_taker_home_st,trip_taker_work_bgrp,trip_taker_work_trct,trip_taker_work_cty,trip_taker_work_st,dummy_val
0,"1 (Tract 5520.01, Los Angeles, CA)","Los Angeles County, CA",California,"1 (Tract 187, San Diego, CA)","San Diego County, CA",Marine Corps Base Camp Pendleton,private_auto,work,home,07:39:00,...,core,"1 (Tract 5520.01, Los Angeles, CA)","5520.01 (Los Angeles, CA)","Los Angeles County, CA",California,"1 (Tract 187, San Diego, CA)","187 (San Diego, CA)","San Diego County, CA",California,1
1,"2 (Tract 1105, Orange, CA)","Orange County, CA",California,"1 (Tract 187, San Diego, CA)","San Diego County, CA",Marine Corps Base Camp Pendleton,private_auto,social,shop,10:35:51,...,visitor,\N,\N,\N,\N,\N,\N,\N,\N,1
2,"2 (Tract 5736.01, Los Angeles, CA)","Los Angeles County, CA",California,"1 (Tract 187, San Diego, CA)","San Diego County, CA",Marine Corps Base Camp Pendleton,private_auto,social,social,14:31:22,...,core,"5 (Tract 5549, Los Angeles, CA)","5549 (Los Angeles, CA)","Los Angeles County, CA",California,\N,\N,\N,\N,1
3,"2 (Tract 5039.01, Los Angeles, CA)","Los Angeles County, CA",California,"1 (Tract 187, San Diego, CA)","San Diego County, CA",Marine Corps Base Camp Pendleton,carpool,work,social,14:45:47,...,core,"5 (Tract 423.13, Orange, CA)","423.13 (Orange, CA)","Orange County, CA",California,"1 (Tract 187, San Diego, CA)","187 (San Diego, CA)","San Diego County, CA",California,1
4,"4 (Tract 5705.02, Los Angeles, CA)","Los Angeles County, CA",California,"1 (Tract 187, San Diego, CA)","San Diego County, CA",Marine Corps Base Camp Pendleton,carpool,work,home,06:25:00,...,core,"4 (Tract 5705.02, Los Angeles, CA)","5705.02 (Los Angeles, CA)","Los Angeles County, CA",California,"1 (Tract 187, San Diego, CA)","187 (San Diego, CA)","San Diego County, CA",California,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,Outside of region,Outside of region,Outside of region,"1 (Tract 62, San Diego, CA)","San Diego County, CA",San Diego Airport,other_travel_mode,region_departure,recreation,14:57:37,...,visitor,\N,\N,\N,\N,\N,\N,\N,\N,1
9996,Outside of region,Outside of region,Outside of region,"1 (Tract 62, San Diego, CA)","San Diego County, CA",San Diego Airport,other_travel_mode,region_departure,maintenance,20:49:01,...,visitor,\N,\N,\N,\N,\N,\N,\N,\N,1
9997,Outside of region,Outside of region,Outside of region,"1 (Tract 62, San Diego, CA)","San Diego County, CA",San Diego Airport,other_travel_mode,region_departure,maintenance,21:44:38,...,visitor,\N,\N,\N,\N,\N,\N,\N,\N,1
9998,Outside of region,Outside of region,Outside of region,"1 (Tract 38, San Diego, CA)","San Diego County, CA",Naval Base San Diego,other_travel_mode,other_activity_type,school,20:20:40,...,visitor,\N,\N,\N,\N,\N,\N,\N,\N,1


In [53]:
replica['origin_building_use']

0                  single_family
1                         retail
2                  single_family
3                  single_family
4                  single_family
                  ...           
9995                  open_space
9996                      office
9997                      retail
9998    transportation_utilities
9999    transportation_utilities
Name: origin_building_use, Length: 10000, dtype: object

In [50]:
replica['trip_duration_minutes'] = replica['trip_duration_minutes'].replace(0,1)
replica['trip_duration'] = pd.cut(x=replica['trip_duration_minutes'], bins=[0,5,15,30,60,np.inf], labels=['0-5 mins', '6-15 mins', '16-30 mins','31-60 mins','61+ mins'], ordered =True)
replica[['trip_duration_minutes', 'trip_duration']]

Unnamed: 0,trip_duration_minutes,trip_duration
0,94,61+ mins
1,55,31-60 mins
2,60,31-60 mins
3,131,61+ mins
4,61,61+ mins
...,...,...
9995,539,61+ mins
9996,530,61+ mins
9997,538,61+ mins
9998,543,61+ mins


In [52]:
replica['trip_distance'] = pd.cut(x=replica['trip_distance_miles'], bins=[0,1,2,5,10,25,np.inf], labels=['<1 mile','1-2 miles','3-5 miles','6-10 miles','11-25 miles','26+ miles'], ordered =True)

In [None]:
replica_2 = pd.read_csv(r"J:\DataScience\DSEconProdDessem\EC2\Replica\Trips_to_EC_fall_2019_thursday\replica-num_trips_blockgrps_to_ecs_fall_19_thusrday-02_23_23-trips_dataset\replica-num_trips_blockgrps_to_ecs_fall_19_thusrday-02_23_23-trips_dataset.csv")

# Number of Trips by Primary Mode
The goal of this section is to get the number of trips to each employment centers by primary modes. These primary modes are biking, carpool, commercial, on demand auto, and other travel modes.  

In [None]:
test = replica.head(1000)
test = test[test['destination_custom'] == 'Carlsbad Palomar Airport']
test = test[test['primary_mode'] == 'other_travel_mode']
test

Unnamed: 0,origin_bgrp,origin_cty,origin_st,destination_bgrp,destination_cty,destination_custom,primary_mode,trip_purpose,previous_trip_purpose,trip_start_time,...,trip_taker_available_vehicles,trip_taker_resident_type,trip_taker_home_bgrp,trip_taker_home_trct,trip_taker_home_cty,trip_taker_home_st,trip_taker_work_bgrp,trip_taker_work_trct,trip_taker_work_cty,trip_taker_work_st
413,Outside of region,Outside of region,Outside of region,"1 (Tract 221, San Diego, CA)","San Diego County, CA",Carlsbad Palomar Airport,other_travel_mode,lodging,maintenance,15:02:37,...,\N,visitor,\N,\N,\N,\N,\N,\N,\N,\N
547,Outside of region,Outside of region,Outside of region,"3 (Tract 198.06, San Diego, CA)","San Diego County, CA",Carlsbad Palomar Airport,other_travel_mode,home,social,02:53:26,...,two,core,"3 (Tract 198.06, San Diego, CA)","198.06 (San Diego, CA)","San Diego County, CA",California,\N,\N,\N,\N
693,Outside of region,Outside of region,Outside of region,"2 (Tract 221, San Diego, CA)","San Diego County, CA",Carlsbad Palomar Airport,other_travel_mode,maintenance,shop,08:45:11,...,\N,visitor,\N,\N,\N,\N,\N,\N,\N,\N


In [None]:
replica.head(1000).groupby(['destination_custom', 'primary_mode']).agg({'count'})

Unnamed: 0_level_0,Unnamed: 1_level_0,origin_bgrp,origin_cty,origin_st,destination_bgrp,destination_cty,trip_purpose,previous_trip_purpose,trip_start_time,trip_end_time,trip_duration_minutes,...,trip_taker_available_vehicles,trip_taker_resident_type,trip_taker_home_bgrp,trip_taker_home_trct,trip_taker_home_cty,trip_taker_home_st,trip_taker_work_bgrp,trip_taker_work_trct,trip_taker_work_cty,trip_taker_work_st
Unnamed: 0_level_1,Unnamed: 1_level_1,count,count,count,count,count,count,count,count,count,count,...,count,count,count,count,count,count,count,count,count,count
destination_custom,primary_mode,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
Barona Resort & Casino,private_auto,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Carlsbad Palomar Airport,other_travel_mode,3,3,3,3,3,3,3,3,3,3,...,3,3,3,3,3,3,3,3,3,3
Carlsbad Palomar Airport,private_auto,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
Carlsbad State Beach,carpool,2,2,2,2,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
Carlsbad State Beach,other_travel_mode,7,7,7,7,7,7,7,7,7,7,...,7,7,7,7,7,7,7,7,7,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Vista Guajome,private_auto,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Vista Village,private_auto,4,4,4,4,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4
West Bernardo,carpool,3,3,3,3,3,3,3,3,3,3,...,3,3,3,3,3,3,3,3,3,3
West Bernardo,other_travel_mode,4,4,4,4,4,4,4,4,4,4,...,4,4,4,4,4,4,4,4,4,4


In [None]:
# Group the replica dataset by destination_custom(i.e, the new  employment centers) and primary_mode fields. Then aggregate it to get the count of trips in each primary mode.
trips_primary_mode = replica.groupby(['destination_custom', 'primary_mode']).agg({'count'}).reset_index()[[ 'destination_custom', 'primary_mode', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [9]:
replica[['destination_custom', 'primary_mode', 'dummy_val']].groupby(['destination_custom', 'primary_mode']).agg({'count'})

Unnamed: 0_level_0,Unnamed: 1_level_0,dummy_val
Unnamed: 0_level_1,Unnamed: 1_level_1,count
destination_custom,primary_mode,Unnamed: 2_level_2
Alpine,other_travel_mode,4
Alpine,private_auto,3
Balboa Park,carpool,1
Balboa Park,other_travel_mode,14
Balboa Park,private_auto,2
...,...,...
West Bernardo,commercial,8
West Bernardo,on_demand_auto,8
West Bernardo,other_travel_mode,38
West Bernardo,private_auto,61


In [10]:
['hi', 'there'] + ['ok']

['hi', 'there', 'ok']

In [25]:
def number_of_trips_by_manipulation(agregation_column):
    groupby_vals = ['destination_custom', agregation_column] 
    temp_df = replica[groupby_vals + ['dummy_val']]
    grouped = temp_df.groupby(groupby_vals).agg({'count'})
    grouped = grouped.reset_index()
    grouped.columns = ['employment_center', 'agg_column', 'count']
    return grouped

In [33]:
output = number_of_trips_by_manipulation(agregation_column='primary_mode')
output

Unnamed: 0,employment_center,agg_column,count
0,Alpine,other_travel_mode,4
1,Alpine,private_auto,3
2,Balboa Park,carpool,1
3,Balboa Park,other_travel_mode,14
4,Balboa Park,private_auto,2
...,...,...,...
356,West Bernardo,commercial,8
357,West Bernardo,on_demand_auto,8
358,West Bernardo,other_travel_mode,38
359,West Bernardo,private_auto,61


In [80]:
input_dictionaries = {
    'trips_by_primary_mode': {'agg_column':'primary_mode', 'column_header': 'Mode'},
    'trips_by_purpose': {'agg_column': 'trip_purpose', 'column_header': 'Trip Purpose'},
    'trips_by_previous_trip': {'agg_column':'previous_trip_purpose', 'column_header': 'Prev Trip Purpose'},
    'trips_by_vehicle_type': {'agg_column': 'vehicle_type', 'column_header': 'Vehicle Type'},
    'trips_by_origin_land_use': {'agg_column': 'origin_building_use', 'column_header': 'Origin Land Use'}, 
    'trips_by_destination_land_use': {'agg_column': 'destination_land_use', 'column_header': 'Destination Land Use'},
    'trips_by_destination_builidng_use': {'agg_column': 'destination_building_use', 'column_header': 'Destination Building Use'},
    'trips_by_duration': {'agg_column': 'trip_duration', 'column_header': 'Trip Duration'},
    'trips_by_distance': {'agg_column': 'trip_distance', 'column_header': 'Trip Distance'},
    'trips_by_origin_building_use': {'agg_column': 'origin_building_use', 'column_header': 'Origin Building Use'}
}

In [32]:
def pivot_table_output(df):
    output = pd.pivot_table(df, values='count', index=['employment_center'], columns=['agg_column'])
    output.columns.name = ''
    return output


In [36]:
pivot_table_output(output)

Unnamed: 0_level_0,carpool,commercial,on_demand_auto,other_travel_mode,private_auto,public_transit,walking
employment_center,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Alpine,,,,4.0,3.0,,
Balboa Park,1.0,,,14.0,2.0,,
Barona Resort & Casino,,,,13.0,2.0,,
Barrio Logan,4.0,,,2.0,13.0,,
Carlsbad Palomar Airport,300.0,12.0,1.0,34.0,24.0,1.0,
...,...,...,...,...,...,...,...
Vista Guajome,30.0,1.0,,3.0,11.0,1.0,
Vista Sycamore,51.0,1.0,,1.0,4.0,,
Vista Tech Park,143.0,11.0,,11.0,8.0,,
Vista Village,118.0,11.0,1.0,8.0,38.0,1.0,


In [35]:
def add_header(df, column_header):
    header = pd.MultiIndex.from_product([[column_header], df.columns])
    df.columns = header
    return df

In [39]:
add_header(df=pivot_table_output(output), column_header='Mode')

Unnamed: 0_level_0,Mode,Mode,Mode,Mode,Mode,Mode,Mode
Unnamed: 0_level_1,carpool,commercial,on_demand_auto,other_travel_mode,private_auto,public_transit,walking
employment_center,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Alpine,,,,4.0,3.0,,
Balboa Park,1.0,,,14.0,2.0,,
Barona Resort & Casino,,,,13.0,2.0,,
Barrio Logan,4.0,,,2.0,13.0,,
Carlsbad Palomar Airport,300.0,12.0,1.0,34.0,24.0,1.0,
...,...,...,...,...,...,...,...
Vista Guajome,30.0,1.0,,3.0,11.0,1.0,
Vista Sycamore,51.0,1.0,,1.0,4.0,,
Vista Tech Park,143.0,11.0,,11.0,8.0,,
Vista Village,118.0,11.0,1.0,8.0,38.0,1.0,


In [75]:
def create_output(key):
    grouped_data = number_of_trips_by_manipulation(agregation_column=input_dictionaries[key]['agg_column'])
    
    pivoted_table = pivot_table_output(grouped_data)
    if key == 'trips_by_previous_trip':
        pivoted_table = pivoted_table.drop('\\N', axis = 1)

    header_added = add_header(df=pivoted_table, column_header=input_dictionaries[key]['column_header'])

    return header_added

In [106]:
output_df = pd.DataFrame()

for input_key in input_dictionaries.keys():
    print(input_key)
    if output_df.empty:
        output_df = create_output(key=input_key)
        output_df['Total'] = output_df.sum(axis=1)
    else:
        output_df = output_df.merge(create_output(key=input_key), how='inner', left_index=True, right_index=True)

output_df = output_df.merge(ec_list, how='left', left_index=True, right_on='EC_Name')
output_df = output_df.set_index(['EC_ID', 'EC_Name', 'Tier'])
output_df.columns = pd.MultiIndex.from_tuples(output_df.columns)

output_df

trips_by_primary_mode
trips_by_purpose
trips_by_previous_trip
trips_by_vehicle_type
trips_by_origin_land_use
trips_by_destination_land_use
trips_by_destination_builidng_use
trips_by_duration
trips_by_distance
trips_by_origin_building_use


  return merge(


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Mode,Mode,Mode,Mode,Mode,Mode,Mode,Total,Trip Purpose,Trip Purpose,...,Origin Building Use,Origin Building Use,Origin Building Use,Origin Building Use,Origin Building Use,Origin Building Use,Origin Building Use,Origin Building Use,Origin Building Use,Origin Building Use
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,carpool,commercial,on_demand_auto,other_travel_mode,private_auto,public_transit,walking,Unnamed: 10_level_1,commercial,eat,...,industrial,multi_family,non_retail_attraction,office,open_space,other,retail,single_family,transportation_utilities,unknown
EC_ID,EC_Name,Tier,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2
2,Alpine,4,,,,4.0,3.0,,,7.0,,,...,,,,,,,3.0,3.0,1.0,
88,Balboa Park,4,1.0,,,14.0,2.0,,,17.0,,7.0,...,,,,1.0,,,5.0,2.0,6.0,3.0
100,Barona Resort & Casino,6,,,,13.0,2.0,,,15.0,,,...,,,1.0,1.0,,,,,10.0,3.0
3,Barrio Logan,4,4.0,,,2.0,13.0,,,19.0,,3.0,...,,,,1.0,,,6.0,9.0,1.0,1.0
4,Carlsbad Palomar Airport,2,300.0,12.0,1.0,34.0,24.0,1.0,,372.0,14.0,33.0,...,11.0,32.0,12.0,23.0,1.0,,90.0,153.0,9.0,22.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,Vista Guajome,4,30.0,1.0,,3.0,11.0,1.0,,46.0,1.0,5.0,...,4.0,3.0,2.0,7.0,,,13.0,14.0,1.0,1.0
76,Vista Sycamore,4,51.0,1.0,,1.0,4.0,,,57.0,1.0,7.0,...,,5.0,4.0,6.0,,,25.0,10.0,1.0,1.0
77,Vista Tech Park,3,143.0,11.0,,11.0,8.0,,,173.0,12.0,7.0,...,4.0,9.0,3.0,13.0,,,40.0,74.0,6.0,13.0
78,Vista Village,4,118.0,11.0,1.0,8.0,38.0,1.0,,177.0,11.0,36.0,...,14.0,9.0,6.0,23.0,,,53.0,40.0,5.0,11.0


In [89]:
output_df.merge(ec_list, how='left', left_index=True, right_on='EC_Name').to_csv('new.csv')

  return merge(


In [84]:
ec_list = pd.read_csv('ec_list.csv')
ec_list

Unnamed: 0,EC_ID,EC_Name,Tier
0,1,San Diego Airport,3
1,2,Alpine,4
2,3,Barrio Logan,4
3,4,Carlsbad Palomar Airport,2
4,5,Carlsbad State Beach,3
...,...,...,...
97,104,Jamul Casino,6
98,105,Pala Casino Spa Resort,6
99,106,Sycuan Casino Resort,6
100,107,Valley View Casino & Hotel,6


In [79]:
create_output(key='trips_by_destination_builidng_use')

KeyError: 'column_header'

In [63]:
input_dictionaries.keys()

dict_keys(['trips_by_primary_mode', 'trips_by_purpose', 'trips_by_previous_trip', 'trips_by_vehicle_type', 'trips_by_origin_land_use', 'trips_by_destination_land_use', 'trips_by_destination_builidng_use', 'trips_by_duration', 'trips_by_distance', 'trips_by_origin_building_use'])

In [76]:
create_output(key='trips_by_previous_trip')

Unnamed: 0_level_0,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose,Prev Trip Purpose
Unnamed: 0_level_1,commercial,eat,home,lodging,maintenance,other_activity_type,recreation,school,shop,social,stage,work
employment_center,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Alpine,,1.0,2.0,,,,,,2.0,,,1.0
Balboa Park,,1.0,2.0,1.0,1.0,,,5.0,3.0,,,
Barona Resort & Casino,,,,,1.0,,1.0,,,,,
Barrio Logan,,3.0,8.0,,2.0,3.0,,,1.0,1.0,,
Carlsbad Palomar Airport,14.0,26.0,160.0,,21.0,14.0,12.0,12.0,58.0,25.0,,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...
Vista Guajome,1.0,5.0,10.0,,2.0,,2.0,2.0,11.0,6.0,,7.0
Vista Sycamore,1.0,9.0,15.0,,4.0,2.0,4.0,3.0,15.0,,,3.0
Vista Tech Park,12.0,15.0,74.0,,15.0,8.0,3.0,4.0,24.0,9.0,,5.0
Vista Village,11.0,17.0,38.0,,13.0,5.0,4.0,7.0,31.0,14.0,,35.0


In [99]:
import pandas as pd

# example existing dataframe
existing_df = pd.DataFrame({
    ('A', 'B'): [1, 2],
    ('A', 'C'): [3, 4],
}, index=['employment_center_1', 'employment_center_2'])

# example new dataframe
new_df = pd.DataFrame({
    'D': [5, 6],
    'E': [7, 8],
    'F': [9, 10],
}, index=['employment_center_1', 'employment_center_2'])

# merge the two dataframes on the index and add the columns D, E, and F to the index
merged_df = existing_df.merge(new_df, left_index=True, right_index=True).set_index(['D', 'E', 'F'], append=True)

# print the merged dataframe
merged_df
merged_df.columns = pd.MultiIndex.from_tuples(merged_df.columns)
merged_df

  return merge(


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,A,A
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,B,C
Unnamed: 0_level_2,D,E,F,Unnamed: 4_level_2,Unnamed: 5_level_2
employment_center_1,5,7,9,1,3
employment_center_2,6,8,10,2,4


In [95]:
import pandas as pd

# example existing dataframe
existing_df = pd.DataFrame({
    ('A', 'B'): [1, 2],
    ('A', 'C'): [3, 4],
}, index=['employment_center_1', 'employment_center_2'])

# example new dataframe
new_df = pd.DataFrame({
    'D': [5, 6],
    'E': [7, 8],
    'F': [9, 10],
}, index=['employment_center_1', 'employment_center_2'])

# merge the two dataframes on the index and add the columns D, E, and F to the index
merged_df = existing_df.merge(new_df, left_index=True, right_index=True).set_index(['D', 'E', 'F'], append=True)

# restore the multi-index columns
merged_df.columns = merged_df.columns.reorder_levels([1, 2, 0])

# print the merged dataframe
merged_df


  return merge(


AttributeError: 'Index' object has no attribute 'reorder_levels'

In [96]:
import pandas as pd

# example existing dataframe
existing_df = pd.DataFrame({
    ('A', 'B'): [1, 2],
    ('A', 'C'): [3, 4],
}, index=['employment_center_1', 'employment_center_2'])

# example new dataframe
new_df = pd.DataFrame({
    'D': [5, 6],
    'E': [7, 8],
    'F': [9, 10],
}, index=['employment_center_1', 'employment_center_2'])

# merge the two dataframes on the index and add the columns D, E, and F to the index
merged_df = existing_df.merge(new_df, left_index=True, right_index=True).set_index(['D', 'E', 'F'], append=True)

# swap the levels of the multi-index columns to match the original order
merged_df.columns = merged_df.columns.swaplevel(0, 1)

# print the merged dataframe
merged_df


  return merge(


AttributeError: 'Index' object has no attribute 'swaplevel'

In [94]:
header = pd.MultiIndex.from_product(merged_df.columns)
merged_df.columns = header
merged_df

ValueError: Length mismatch: Expected axis has 2 elements, new values have 4 elements

In [None]:
trips_primary_mode.shape

(809, 3)

In [None]:
trips_primary_mode.head()

Unnamed: 0_level_0,destination_custom,primary_mode,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,biking,75
1,Alpine,carpool,4283
2,Alpine,commercial,1510
3,Alpine,on_demand_auto,316
4,Alpine,other_travel_mode,288


In [None]:
# Save the outputs in a csv file
trips_primary_mode.to_csv("trips_primary_mode.csv",sep=",")

# Number of Trips by Purpose
The goal of this section is to get the number of trips to each employment center by trip puroses such as commercial, eat, and home.

In [None]:
# Group the replica dataset by destination_custom(i.e, the new employment centers) and trip_purpose fields. Then aggregate it to get the count of trips in each trip purpose.
trips_purpose = replica.groupby(['destination_custom','trip_purpose']).agg({'count'}).reset_index()[['destination_custom', 'trip_purpose', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [None]:
trips_purpose.shape

(1085, 3)

In [None]:
trips_purpose.head()

Unnamed: 0_level_0,destination_custom,trip_purpose,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,commercial,1523
1,Alpine,eat,3504
2,Alpine,home,4912
3,Alpine,lodging,142
4,Alpine,maintenance,1548


In [None]:
# Save outputs in a csv file
trips_purpose.to_csv("trips_purpose.csv", sep=",")

# Number of Trips by Previous Trip Purpose 
The goal of this section is to get the number of trips to each new employment center by previous trip purpose. 

In [None]:
# Group the replica dataset by destination_custom(i.e, the new employment centers) and previous_trip_purpose fields. Then aggregate it to get the count of trips in each previous trip purposes.
previous_trips_purpose = replica.groupby(['destination_custom','previous_trip_purpose']).agg({'count'}).reset_index()[['destination_custom','previous_trip_purpose', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [None]:
previous_trips_purpose.shape

(1259, 3)

In [None]:
previous_trips_purpose.head()

Unnamed: 0_level_0,destination_custom,previous_trip_purpose,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,\N,32
1,Alpine,commercial,1523
2,Alpine,eat,1970
3,Alpine,home,7289
4,Alpine,lodging,312


In [None]:
# Save the outputs in a csv file
previous_trips_purpose.to_csv("previous_trips_purpose.csv",sep=",")

# Number of Trips by Vehicle Type
The goal of this section is to get the number of trips to each employment center by vehicle types. The vehicle types  are heavy_commercial, medium_commercial, and unknown_vehicle_type. 

In [None]:
# Group the replica dataset by destination_custom(i.e, the new employment centers) and vehicle_type fields. Then aggregate it to get the count of trips in each vehile_type.
trips_vehicle_type = replica.groupby(['destination_custom','vehicle_type']).agg({'count'}).reset_index()[['destination_custom','vehicle_type', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [None]:
trips_vehicle_type.shape

(340, 3)

In [None]:
trips_vehicle_type.head()

Unnamed: 0_level_0,destination_custom,vehicle_type,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,heavy_commercial,23
1,Alpine,medium_commercial,1487
2,Alpine,unknown_vehicle_type,21581
3,Balboa Park,heavy_commercial,5
4,Balboa Park,medium_commercial,680


In [None]:
# Save the outputs in a csv file
trips_vehicle_type.to_csv("trips_vehicle_type.csv",sep=",")

# Number of Trips by Origin Land Use
The goal of this section is to get the number of trips to each employment center by the trips origin land use. The origin_land_use field have categories such as civic_institutional, education, multi_family, and single family.

In [None]:
# Group the replica dataset by destination_custom(i.e, the new employment centers) and origin_building_use fields. Then aggregate it to get the count of trips in each trip origin land use category.
trips_origin_building_use = replica.groupby(['destination_custom','origin_building_use']).agg({'count'}).reset_index()[['destination_custom','origin_building_use', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [None]:
trips_origin_building_use.shape

(1350, 3)

In [None]:
trips_origin_building_use.head()

Unnamed: 0_level_0,destination_custom,origin_building_use,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,civic_institutional,531
1,Alpine,education,1638
2,Alpine,healthcare,179
3,Alpine,industrial,322
4,Alpine,multi_family,1099


In [None]:
# Save the outputs in a csv file
trips_origin_building_use.to_csv("trips_origin_building_use.csv",sep=",")

# Number of Trips by Destination Land Use
The goal of this section is to get the number of trips to each employment center by the trips destination land use. The destination_land_use field have categories such as civic_institutional, education, healthcare, and mixed_use. 

In [None]:
# Group the replica dataset by destination_custom(i.e, the new employment centers) and destination_land_use fields. Then aggregate it to get the count of trips in each trip destination land use category.
trips_destination_land_use = replica.groupby(['destination_custom','destination_land_use']).agg({'count'}).reset_index()[['destination_custom','destination_land_use', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [None]:
trips_destination_land_use.shape

(1150, 3)

In [None]:
trips_destination_land_use.head()

Unnamed: 0_level_0,destination_custom,destination_land_use,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,civic_institutional,427
1,Alpine,education,659
2,Alpine,healthcare,12
3,Alpine,industrial,107
4,Alpine,mixed_use,1661


In [None]:
# Save the outputs in a csv file
trips_destination_land_use.to_csv("trips_destination_land_use.csv",sep=",")

# Number of Trips by Destination Building Use
The goal of this section is to get the number of trips to each employment center by the trips destination building use. The destination_building_use field has categories such as civic_institutional, education, healthcare, and industrial. 

In [None]:
# Group the replica dataset by destination_custom(i.e, the new employment centers) and destination_building_use fields. Then aggregate it to get the count of trips in each trip destination building use category.
trips_destination_building_use = replica.groupby(['destination_custom','destination_building_use']).agg({'count'}).reset_index()[['destination_custom','destination_building_use', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [None]:
trips_destination_building_use.shape

(1151, 3)

In [None]:
trips_destination_building_use.head()

Unnamed: 0_level_0,destination_custom,destination_building_use,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,civic_institutional,591
1,Alpine,education,695
2,Alpine,healthcare,168
3,Alpine,industrial,119
4,Alpine,multi_family,773


In [None]:
# Save the outputs in a csv file
trips_destination_building_use.to_csv("trips_destination_building_use.csv",sep=",")

# Number of Trips by Duration
The goal of this section is to get the number of trips to each employment center by trip duratin. trip_duration_minutes field was used to create a new field called trip_duration. This filed has five categories. these categories are 0-5_min, 5-15_min, 15-30_min, 30-60_min, and 60+_min.

In [42]:
# Group the replica dataset by destination_custom(i.e, the new employment centers) and trip_duration_minutes fields. Then aggregate it to get the count of trips in each category in trip_duration_minutes field.
trips_duration = replica.groupby(['destination_custom','trip_duration_minutes']).agg({'count'}).reset_index()[['destination_custom','trip_duration_minutes', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [None]:
trips_duration.shape

(36020, 3)

In [None]:
set(replica[replica['trip_duration_minutes'] == 0]['primary_mode'])

{'biking',
 'carpool',
 'commercial',
 'on_demand_auto',
 'other_travel_mode',
 'private_auto',
 'walking'}

In [None]:
trips_duration.head()

Unnamed: 0_level_0,destination_custom,trip_duration_minutes,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,0,988
1,Alpine,1,1446
2,Alpine,2,1547
3,Alpine,3,1466
4,Alpine,4,1138


In [43]:
trips_duration['trip_duration_minutes'] = trips_duration['trip_duration_minutes'].replace(0,1)

In [44]:
trips_duration.head()

Unnamed: 0_level_0,destination_custom,trip_duration_minutes,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,81,1
1,Alpine,82,1
2,Alpine,116,1
3,Alpine,195,1
4,Alpine,294,1


In [45]:
#trip_duration_minutes field was used to create a new field called trip_duration. This filed has five categories. these categories are 0-5_min, 5-15_min, 15-30_min, 30-60_min, and 60+_min.
trips_duration['trip_duration'] = pd.cut(x=trips_duration['trip_duration_minutes'], bins=[0,5,15,30,60,np.inf], labels=['0-5_min', '5-15_min', '15-30_min','30-60_min','60+_min'], ordered =True)

In [None]:
trips_duration.head(40)

Unnamed: 0_level_0,destination_custom,trip_duration_minutes,trips,trip_duration
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count,Unnamed: 4_level_1
0,Alpine,1,988,0-5_min
1,Alpine,1,1446,0-5_min
2,Alpine,2,1547,0-5_min
3,Alpine,3,1466,0-5_min
4,Alpine,4,1138,0-5_min
5,Alpine,5,977,0-5_min
6,Alpine,6,797,5-15_min
7,Alpine,7,621,5-15_min
8,Alpine,8,556,5-15_min
9,Alpine,9,540,5-15_min


In [None]:
trips_duration = trips_duration.drop(columns='trip_duration_minutes')

In [None]:
trips_duration = trips_duration.groupby(['destination_custom','trip_duration']).agg({'sum'}) 

In [None]:
trips_duration.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,count
Unnamed: 0_level_2,Unnamed: 1_level_2,sum
destination_custom,trip_duration,Unnamed: 2_level_3
Alpine,0-5_min,7562
Alpine,5-15_min,6021
Alpine,15-30_min,5303
Alpine,30-60_min,3409
Alpine,60+_min,796
Balboa Park,0-5_min,5879
Balboa Park,5-15_min,9181
Balboa Park,15-30_min,7339
Balboa Park,30-60_min,2665
Balboa Park,60+_min,1395


In [None]:
# Save the outputs in a csv file
trips_duration.to_csv("trips_duration.csv", sep = ",")

# Number of Trips by Distance
The goal of this section is to get the number of trips to each employment center by trip distance in miles. trip_distance_miles field was used to create a new fiel called trip_distance. This filed has six categories. these categories are less_than_1_mile, 1-2_miles, 2-5_miles, 5-10_miles, 10-25_miles, and 25+_miles.

In [None]:
# Group the replica dataset by destination_custom(i.e, the new employment centers) and trip_distance_miles fields. Then aggregate it to get the count of trips in each category in trip_distance_miles field.
trips_distance = replica.groupby(['destination_custom','trip_distance_miles']).agg({'count'}).reset_index()[['destination_custom','trip_distance_miles', 'origin_bgrp']].rename(columns={'origin_bgrp':'trips'})

In [None]:
trips_distance.shape

(137596, 3)

In [None]:
trips_distance.head()

Unnamed: 0_level_0,destination_custom,trip_distance_miles,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,Alpine,0.0,508
1,Alpine,0.1,602
2,Alpine,0.2,413
3,Alpine,0.3,419
4,Alpine,0.4,450


In [None]:
trips_distance['trip_distance_miles'] = trips_distance['trip_distance_miles'].replace(0.0,0.1)

In [None]:
#trip_distance_miles field was used to create a new fiel called trip_distance. This filed has six categories. these categories are are less_than_1_mile, 1-2_miles, 2-5_miles, 5-10_miles, 10-25_miles, and 25+_miles.
trips_distance['trip_distance'] = pd.cut(x=trips_distance['trip_distance_miles'], bins=[0,1,2,5,10,25,np.inf], labels=['less_than_1_mile','1-2_miles','2-5_miles','5-10_miles','10-25_miles','25+_miles'], ordered =True )

In [None]:
trips_distance.head()

Unnamed: 0_level_0,destination_custom,trip_distance_miles,trips,trip_distance
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count,Unnamed: 4_level_1
0,Alpine,0.1,508,less_than_1_mile
1,Alpine,0.1,602,less_than_1_mile
2,Alpine,0.2,413,less_than_1_mile
3,Alpine,0.3,419,less_than_1_mile
4,Alpine,0.4,450,less_than_1_mile


In [None]:
trips_distance = trips_distance.drop(columns='trip_distance_miles')

In [None]:
trips_distance = trips_distance.groupby(['destination_custom','trip_distance']).agg({'sum'})

In [None]:
trips_distance.head(12)

Unnamed: 0_level_0,Unnamed: 1_level_0,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,count
Unnamed: 0_level_2,Unnamed: 1_level_2,sum
destination_custom,trip_distance,Unnamed: 2_level_3
Alpine,less_than_1_mile,4687
Alpine,1-2_miles,2772
Alpine,2-5_miles,2890
Alpine,5-10_miles,2143
Alpine,10-25_miles,6301
Alpine,25+_miles,4298
Balboa Park,less_than_1_mile,4726
Balboa Park,1-2_miles,1735
Balboa Park,2-5_miles,5469
Balboa Park,5-10_miles,5093


In [None]:
# Save the outputs in a csv file 
trips_distance.to_csv("trips_distance.csv",sep=",")

# Number of Trips by Block Groups
The goal of this section is to get the number of trips to each employment center from each block group. 

In [None]:
replica_2.shape

(7557621, 37)

In [None]:
# Group the replica_2 dataset by destination_custom(i.e, the new employment centers) and origin_bgrp fields. Then aggregate it to get the count of trips from each block groups to the employment centers.
trips_bgrp_to_EC = replica_2.groupby(['origin_bgrp','destination_custom']).agg({'count'}).reset_index()[['origin_bgrp','destination_custom','origin_cty']].rename(columns={'origin_cty':'trips'})

In [None]:
trips_bgrp_to_EC.shape

(218271, 3)

In [None]:
trips_bgrp_to_EC.head()

Unnamed: 0_level_0,origin_bgrp,destination_custom,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
0,"0 (Tract 5766.02, Los Angeles, CA)",Oceanside Civic Center,1
1,"0 (Tract 5766.02, Los Angeles, CA)",Otay Mesa East,1
2,"0 (Tract 9901, San Diego, CA)",Carlsbad Village,1
3,"0 (Tract 9901, San Diego, CA)",Chula Vista Northwest,1
4,"0 (Tract 9901, San Diego, CA)",Chula Vista Otay,1


In [None]:
trips_bgrp_to_EC = trips_bgrp_to_EC.sort_values(by=['destination_custom'])

In [None]:
trips_bgrp_to_EC.head()

Unnamed: 0_level_0,origin_bgrp,destination_custom,trips
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count
14228,"1 (Tract 153.01, San Diego, CA)",Alpine,16
44281,"1 (Tract 32.09, San Diego, CA)",Alpine,8
86384,"2 (Tract 117, San Diego, CA)",Alpine,2
13611,"1 (Tract 149.02, San Diego, CA)",Alpine,82
86351,"2 (Tract 117, Imperial, CA)",Alpine,6


In [None]:
trips_bgrp_to_EC.columns = ['origin_bgrp', 'destination_custum', 'trips']
trips_bgrp_to_EC

Unnamed: 0,origin_bgrp,destination_custum,trips
14228,"1 (Tract 153.01, San Diego, CA)",Alpine,16
44281,"1 (Tract 32.09, San Diego, CA)",Alpine,8
86384,"2 (Tract 117, San Diego, CA)",Alpine,2
13611,"1 (Tract 149.02, San Diego, CA)",Alpine,82
86351,"2 (Tract 117, Imperial, CA)",Alpine,6
...,...,...,...
78163,"1 (Tract 9203.39, Los Angeles, CA)",West Bernardo,1
21046,"1 (Tract 170.46, San Diego, CA)",West Bernardo,49
51688,"1 (Tract 427.16, Riverside, CA)",West Bernardo,3
150907,"2 (Tract 96.03, San Diego, CA)",West Bernardo,7


In [None]:
trips_bgrp_to_EC.columns

MultiIndex([(       'origin_bgrp',      ''),
            ('destination_custom',      ''),
            (             'trips', 'count')],
           )

# Recreating Dave's Excels

## EC_Trips_by_BG_Build.xlsx

In [34]:
ec_list = pd.read_csv('ec_list.csv')

In [37]:
subset = trips_bgrp_to_EC[trips_bgrp_to_EC['origin_bgrp'].str.contains('San Diego')]
subset.columns = ['origin_bgrp', 'destination_custom', 'trip_count']
output = subset.merge(ec_list, how='left', left_on='destination_custom', right_on='EC_Name')
output['Tract'] = output['origin_bgrp'].str.extract(r'Tract\s(\d+(?:\.\d+)?)')[0]
output['BG'] = output['origin_bgrp'].str.extract(r'(\d+)')
output = output[['EC_ID', 'destination_custom', 'Tract', 'BG', 'trip_count']]
output['Tract'] = output['Tract'].astype(float)
output['BG'] = output['BG'].astype(int)
output['EC_ID'] = output['EC_ID'].astype(int)
output = output.sort_values(['EC_ID', 'Tract']).reset_index(drop=True)
output.columns = ['EC ID', 'Emp Ctr', 'Tract', 'BG', 'Trips']
output

Unnamed: 0,EC ID,Emp Ctr,Tract,BG,Trips
0,1,San Diego Airport,1.00,1,170
1,1,San Diego Airport,1.00,2,273
2,1,San Diego Airport,2.01,1,294
3,1,San Diego Airport,2.02,1,517
4,1,San Diego Airport,2.02,2,206
...,...,...,...,...,...
140006,108,Viejas Casino & Resort,219.00,2,2
140007,108,Viejas Casino & Resort,220.00,1,3
140008,108,Viejas Casino & Resort,220.00,2,3
140009,108,Viejas Casino & Resort,221.00,1,5


## Trip_Summary_Data_Build.xlsx

In [36]:
output

Unnamed: 0,index,EC_ID,destination_custom,Tract,BG,trip_count
0,70,1,San Diego Airport,1.00,1,170
1,50560,1,San Diego Airport,1.00,2,273
2,25713,1,San Diego Airport,2.01,1,294
3,25798,1,San Diego Airport,2.02,1,517
4,73458,1,San Diego Airport,2.02,2,206
...,...,...,...,...,...,...
140006,79219,108,Viejas Casino & Resort,219.00,2,2
140007,32260,108,Viejas Casino & Resort,220.00,1,3
140008,79466,108,Viejas Casino & Resort,220.00,2,3
140009,32356,108,Viejas Casino & Resort,221.00,1,5


In [28]:
output = subset.merge(ec_list, how='left', left_on='destination_custom', right_on='EC_Name')
output

Unnamed: 0,origin_bgrp,destination_custom,trip_count,EC_ID,EC_Name,Tier
0,"0 (Tract 9901, San Diego, CA)",Carlsbad Village,1,6,Carlsbad Village,4
1,"0 (Tract 9901, San Diego, CA)",Chula Vista Northwest,1,12,Chula Vista Northwest,3
2,"0 (Tract 9901, San Diego, CA)",Chula Vista Otay,1,13,Chula Vista Otay,4
3,"0 (Tract 9901, San Diego, CA)",Del Mar Fairgrounds,1,87,Del Mar Fairgrounds,4
4,"0 (Tract 9901, San Diego, CA)",Imperial Beach - Nestor,2,29,Imperial Beach - Nestor,4
...,...,...,...,...,...,...
140006,"7 (Tract 97.06, San Diego, CA)",Sycuan Casino Resort,3,106,Sycuan Casino Resort,6
140007,"7 (Tract 97.06, San Diego, CA)",University City,1,72,University City,4
140008,"7 (Tract 97.06, San Diego, CA)",University Heights,26,73,University Heights,4
140009,"7 (Tract 97.06, San Diego, CA)",University of San Diego,8,74,University of San Diego,3


In [29]:
output['Tract'] = output['origin_bgrp'].str.extract(r'Tract\s(\d+(?:\.\d+)?)')[0]
output['BG'] = output['origin_bgrp'].str.extract(r'(\d+)')
output = output[['EC_ID', 'destination_custom', 'Tract', 'BG', 'trip_count']]
output['Tract'] = output['Tract'].astype(float)
output['BG'] = output['BG'].astype(int)
output['EC_ID'] = output['EC_ID'].astype(int)
output = output.sort_values(['EC_ID', 'Tract']).reset_index()
output

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  output['Tract'] = output['Tract'].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  output['BG'] = output['BG'].astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  output['EC_ID'] = output['EC_ID'].astype(int)


Unnamed: 0,index,EC_ID,destination_custom,Tract,BG,trip_count
0,70,1,San Diego Airport,1.00,1,170
1,50560,1,San Diego Airport,1.00,2,273
2,25713,1,San Diego Airport,2.01,1,294
3,25798,1,San Diego Airport,2.02,1,517
4,73458,1,San Diego Airport,2.02,2,206
...,...,...,...,...,...,...
140006,79219,108,Viejas Casino & Resort,219.00,2,2
140007,32260,108,Viejas Casino & Resort,220.00,1,3
140008,79466,108,Viejas Casino & Resort,220.00,2,3
140009,32356,108,Viejas Casino & Resort,221.00,1,5


In [24]:
output.dtypes

EC_ID                  int64
destination_custom    object
Tract                 object
BG                    object
trip_count             int64
dtype: object

In [None]:
import pandas as pd

# create sample dataframe
df = pd.DataFrame({'Location': ['# (Tract 12.34, San Diego, CA)', '# (Tract 56.78, San Diego, CA)', '# (Tract 202, San Diego, CA)']})

# extract tract number using regex and create new column
df['Tract Number'] = df['Location'].str.extract(r'Tract\s(\d+(?:\.\d+)?)')[0]

print(df)


In [15]:
output['origin_bgrp'][4].split(',')[0].split('(')[1]

'Tract 9901'

In [11]:
output[output['EC_ID'] == 61]

Unnamed: 0,origin_bgrp,destination_custom,trip_count,EC_ID,EC_Name,Tier
69,"1 (Tract 1, San Diego, CA)",Rancho Peñasquitos,6,61,Rancho Peñasquitos,4
146,"1 (Tract 10, San Diego, CA)",Rancho Peñasquitos,5,61,Rancho Peñasquitos,4
213,"1 (Tract 100.01, San Diego, CA)",Rancho Peñasquitos,2,61,Rancho Peñasquitos,4
285,"1 (Tract 100.03, San Diego, CA)",Rancho Peñasquitos,8,61,Rancho Peñasquitos,4
354,"1 (Tract 100.04, San Diego, CA)",Rancho Peñasquitos,1,61,Rancho Peñasquitos,4
...,...,...,...,...,...,...
139625,"6 (Tract 76, San Diego, CA)",Rancho Peñasquitos,5,61,Rancho Peñasquitos,4
139703,"6 (Tract 85.07, San Diego, CA)",Rancho Peñasquitos,1,61,Rancho Peñasquitos,4
139786,"6 (Tract 9, San Diego, CA)",Rancho Peñasquitos,4,61,Rancho Peñasquitos,4
139927,"7 (Tract 76, San Diego, CA)",Rancho Peñasquitos,3,61,Rancho Peñasquitos,4


In [32]:
subset['trips'].sum()

7272074

In [52]:
trips_bgrp_to_EC.shape

(218271, 3)

In [66]:
# Save the outputs in a csv file
trips_bgrp_to_EC.to_csv("trips_bgrp_to_EC.csv",sep=",")