## TIRCP Calsta
* TIRCP outcomes for cycles 3-5 for California State Transportation Agency. 
* [Cycles 1-6](https://calsta.ca.gov/subject-areas/transit-intercity-rail-capital-prog)
* Cycle 1: 2015
* Cycle 2: 2016
* Cycle 3: 2018
* Cycle 4: 2020
* Cycle 5: 2022
* Cycle 6: 2023

In [243]:
import A1_data_prep
import A2_tableau
import numpy as np
import pandas as pd
from babel.numbers import format_currency
from calitp import *

In [244]:
pd.options.display.max_columns = 100
pd.options.display.float_format = "{:.2f}".format
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

In [245]:
# GCS File Path:
GCS_FILE_PATH = "gs://calitp-analytics-data/data-analyses/tircp/"

### Filter out for cycles of interest

In [246]:
df_tircp = to_snakecase(A2_tableau.tableau_dashboard())

  warn(msg)


In [247]:
df_tircp2 = df_tircp.loc[df_tircp["award_year"] >= 2018].reset_index(drop=True)

In [248]:
df_tircp2.award_year.value_counts(), 

(2018    28
 2022    23
 2020    17
 Name: award_year, dtype: int64,)

In [249]:
df_tircp2.ppno.nunique(), df_tircp2.title.nunique(), len(df_tircp2)

(59, 67, 68)

### Add info based on SCCP's output example
Project ID	Project Name	Implementing Agency	Program	Project Description	 Total Cost 	 SB 1 Funds 	Fiscal Year	Is SB 1?	Project Status	Assembly Districts	Senate Districts	Counties	Cities	Caltrans Districts	Is on SHS?	Date Updated	Cycle


#### GIS Template has Assembly District/Senate District/City/Counties info

In [250]:
# Read in sheet with Assembly info.
gis = to_snakecase(
    pd.read_excel(
        f"{GCS_FILE_PATH}TIRCP_GIS_Template_Requirements 6-1-2022.xlsx",
        sheet_name="Projects Table",
    )
)

In [251]:
# Clean some column names
gis = gis.rename(
    columns={
        "ppno_": "ppno",
        "assembly\ndistricts": "assembly_districts",
        "senate\ndistricts": "senate_districts",
        "caltrans\ndistrict": "CT_district",
    }
)

In [252]:
# Clean PPNO
gis = A1_data_prep.ppno_slice(gis)

In [253]:
# Subset for only cols of interest
gis2 = gis[
    [
        "project_number",
        "ppno",
        "projecttitle",
        "projectstatus",
        "assembly_districts",
        "senate_districts",
        "city_code",
        "CT_district",
        "county_code",
    ]
]

In [254]:
gis2.ppno.nunique()

45

In [255]:
# There are mulitple entries for each ppno.
gis2.ppno.value_counts().head()

CP033    60
CP035    21
CP042    18
CP032    14
CP031    11
Name: ppno, dtype: int64

In [256]:
# Inglewood Transit Center coded as CP063, should be CP062
gis2.loc[(gis2["projecttitle"] ==  'Inglewood Transit Center (2020:04)'), "ppno"] = "CP062"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gis2.loc[(gis2["projecttitle"] ==  'Inglewood Transit Center (2020:04)'), "ppno"] = "CP062"


In [257]:
# North State Intercity Bus System coded as CP063 in TIRCP Tracking sheet.
gis2.loc[(gis2["projecttitle"] ==  'North State Intercity Bus System-Lake County Interregional Transit Center (2020:05)'), "ppno"] = "CP063"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gis2.loc[(gis2["projecttitle"] ==  'North State Intercity Bus System-Lake County Interregional Transit Center (2020:05)'), "ppno"] = "CP063"


In [258]:
# gis2.loc[gis2['ppno'] == 'CP063']

In [259]:
gis2.loc[gis2['projecttitle'] == 'North State Intercity Bus System-Lake County Interregional Transit Center (2020:05)']

Unnamed: 0,project_number,ppno,projecttitle,projectstatus,assembly_districts,senate_districts,city_code,CT_district,county_code
202,2020:05,CP063,North State Intercity Bus System-Lake County Interregional Transit Center (2020:05),PA&ED,|04|02|10|,|02|,|5427|5028|,|01|,|5914|5920|
203,2020:05,CP063,North State Intercity Bus System-Lake County Interregional Transit Center (2020:05),R/W,|04|02|10|,|02|,|5427|5028|,|01|,|5914|5920|
204,2020:05,CP063,North State Intercity Bus System-Lake County Interregional Transit Center (2020:05),Construction,|04|02|10|,|02|,|5427|5028|,|01|,|5914|5920|
205,2020:05,CP063,North State Intercity Bus System-Lake County Interregional Transit Center (2020:05),Ops./Procure,|04|02|10|,|02|,|5427|5028|,|01|,|5914|5920|


In [260]:
# Clean project_number, only keep year
gis2["project_number"] = gis2["project_number"].str.split(":").str[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gis2["project_number"] = gis2["project_number"].str.split(":").str[0]


In [261]:
gis2["project_number"] = gis2["project_number"].fillna(0).astype("int64")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gis2["project_number"] = gis2["project_number"].fillna(0).astype("int64")


In [262]:
# Place project status all on one row & Remove duplicate statuses
def summarize_rows(df, col_to_group: str, col_to_summarize: str):
    df = df.groupby(col_to_group)[col_to_summarize].apply(",".join).reset_index()

    df[col_to_summarize] = (
        df[col_to_summarize]
        .apply(lambda x: ", ".join(set([y.strip() for y in x.split(",")])))
        .str.strip()
    )
    return df

In [263]:
project_status_gis = summarize_rows(gis2, "ppno", "projectstatus")

In [264]:
# Check that each row matches the number of unique ppno
len(project_status_gis) == gis2.ppno.nunique()

True

In [265]:
# Drop old project status
gis2 = gis2.drop(columns=["projectstatus"])

In [266]:
# Merge with original gis, so there is only one row for each PPNO
final_gis = (
    pd.merge(project_status_gis, gis2, how="left", on=["ppno"])
    .drop_duplicates("ppno")
    .reset_index(drop=True)
)

In [267]:
len(final_gis), final_gis.ppno.nunique()

(45, 45)

#### Merge with TIRCP Tracking

In [268]:
# Merge with df_tircp2
merge1 = pd.merge(
    df_tircp2,
    final_gis,
    how="left",
    left_on=["ppno", "award_year"],
    right_on=["ppno", "project_number"],
    indicator=True,
)

In [269]:
merge1._merge.value_counts()

both          43
left_only     25
right_only     0
Name: _merge, dtype: int64

In [270]:
# Double Check that titles & years correspond with one another
"""
merge1.loc[merge1['_merge'] == 'both'][
    [
        "award_year",
        'project_number',
        "title",
        "ppno",
        "projecttitle",
        "_merge",
        
    ]
].sort_values(['award_year','project_number'])
"""

'\nmerge1.loc[merge1[\'_merge\'] == \'both\'][\n    [\n        "award_year",\n        \'project_number\',\n        "title",\n        "ppno",\n        "projecttitle",\n        "_merge",\n        \n    ]\n].sort_values([\'award_year\',\'project_number\'])\n'

### Project Sheet 

In [271]:
# Subset to cols simila to SCCP
projects = merge1[
    [
        "award_year",
        "ppno",
        "title",
        "grant_recipient",
        "district",
        "county",
        "description",
        "total__cost",
        "tircp",
        "projectstatus",
        "assembly_districts",
        "county_code",
        "senate_districts",
        "city_code",
    ]
]

In [272]:
# Fill in empty values with NA
projects = projects.fillna(projects.dtypes.replace({"float64": 0.0, "object": "None", "int64": 0}))

In [273]:
# Format moentary cols
monetary_cols = ['total__cost', 'tircp']
for i in monetary_cols:
    projects[i] = projects[i].apply(
    lambda x: format_currency(x, currency="USD", locale="en_US"))

In [274]:
# Clean up columns 
projects = A1_data_prep.clean_up_columns(projects)

### Outcomes Sheet

In [275]:
# Create a detailed title column
merge1["award_year"] = merge1["award_year"].astype("object")

In [276]:
detailed_title_cols = [
    "award_year",
    "title",
    "grant_recipient",
]

In [277]:
# https://stackoverflow.com/questions/39291499/how-to-concatenate-multiple-column-values-into-a-single-column-in-pandas-datafra
merge1["detailed_title_cols"] = merge1[detailed_title_cols].apply(
    lambda row: "-".join(row.values.astype(str)), axis=1
)

In [278]:
# Measure columns 
measure_cols=[ "estimated_tircp_ghg_reductions",
        "cost_per_ghg_ton_reduced",
        "increased_ridership",
        "service_integration",
        "improve_safety",]

In [279]:
# Turn estimated GHG reductions into a number
merge1["estimated_tircp_ghg_reductions"] = (
    merge1["estimated_tircp_ghg_reductions"]
    .str.replace("MTCO2e", "")
    .str.replace("None", "")
    .str.replace(",", "")
)

In [280]:
merge1["estimated_tircp_ghg_reductions"] = merge1[
    "estimated_tircp_ghg_reductions"
].apply(pd.to_numeric, errors="coerce").fillna(0)

In [306]:
# Subset to cols simila to SCCP
outcomes = merge1[
    [
        "award_year",
        "detailed_title_cols",
        "estimated_tircp_ghg_reductions",
        "cost_per_ghg_ton_reduced",
        "increased_ridership",
        "service_integration",
        "improve_safety",
    ]
].sort_values(["award_year", "detailed_title_cols"])

In [307]:
outcomes = A1_data_prep.clean_up_columns(outcomes)

##### Version 1

In [308]:
# Drop award year
outcomes_transformed = outcomes.drop(columns=["Award Year"]).T

In [309]:
# Make first row to column names
outcomes_transformed.columns = outcomes_transformed.iloc[0]

In [310]:
# Del first row
outcomes_transformed = outcomes_transformed.iloc[1:]

In [311]:
outcomes_transformed.head(1)

Detailed Title Cols,2018-#Electrify Anaheim: Changing the Transit Paradigm in Southern California-Anaheim Transportation Network,2018-Accelerating Rail Modernization and Expansion in the Capital Region-Sacramento Regional Transit District,2018-All Aboard 2018: Transforming SoCal Rail Travel-Los Angeles-San Diego-San Luis Obispo Rail Corridor Agency,2018-Blue Line Rail Corridor Transit Enhancements-San Diego Metropolitan Transit System,2018-Building Up: LOSSAN North Improvement Program-Los Angeles-San Diego-San Luis Obispo Rail Corridor Agency,2018-Coastal Express/Pacific Surfliner Peak Hour Service Expansion and Integration Project-Santa Barbara County Association Of Governments,2018-Diesel Multiple Unit Vehicle to Zero- or Low-Emission Vehicle Conversion and West Valley Connector Bus Rapid Transit-San Bernardino County Transportation Authority,2018-Dublin/Pleasanton Capacity Improvement and Congestion Reduction Program-Livermore Amador Valley Transit Authority,2018-Electric Blue: Electrification of City of Santa Monica's Big Blue Bus-City Of Santa Monica,2018-Extend rail service to Monterey County-Transportation Agency For Monterey County,2018-From the Desert to the Sea: Antelope Valley Transit Authority and Long Beach Transit Zero Emission Bus Initiative-Antelope Valley Transit Authority,2018-Goleta Train Depot-Santa Barbara County Association Of Governments,2018-Los Angeles City: Leading the Transformation to Zero-Emission Electric Bus Transit Service-City Of Los Angeles,2018-Los Angeles Region Transit System Integration and Modernization Program of Projects-Los Angeles County Metropolitan Transportation,2018-North State Intercity Bus System-Shasta Regional Transportation Agency,2018-Peninsula Corridor Electrification Expansion Project-Peninsula Corridor Joint Powers Board,2018-Purchase Zero Emission High Capacity Buses to Support Transbay Tomorrow and Clean Corridors Plan-Alameda Contra Costa Transit District,2018-Ride Between the Line: Enhancing Access to Transit in San Diego-San Diego Association Of Governments,2018-SMART Larkspur to Windsor Corridor-Sonoma-Marin Area Rail Transit District,2018-SamTrans Express Bus Pilot-San Mateo County Transit District,2018-Solano Regional Transit Improvements-Solano Transportation Authority,2018-Southern California Optimized Rail Expansion (SCORE)-Southern California Regional Rail Authority,2018-Southwest Fresno Community Connector-City Of Fresno,2018-The Northern California Corridor Enhancement Program-Capitol Corridor Joint Powers Authority,2018-The Transbay Corridor Core Capacity Program: Vehicle Acquistion and Communications-Based Train Control System-Bay Area Rapid Transit,2018-Transit Capacity Expansion Program-San Francisco Municipal Transportation Agency,"2018-VTA’s BART Silicon Valley Extension, Phase II-Santa Clara Valley Transportation Authority",2018-Valley Rail-San Joaquin Joint Powers Authority,2020-Building Up Control: LOSSAN Service Enhancement Program-Los Angeles-San Diego-San Luis Obispo Rail Corridor Agency,2020-Core Capacity Program-San Francisco Municipal Transportation Agency,2020-Expansion of WETA Ferry Services-San Francisco Bay Area Water Emergency Transportation Authority,"2020-For People, Place and Planet: Connecting Inglewood to Regional Opportunities-Santa Monica Big Blue Bus","2020-Improving Air Quality & Economic Growth with Electric Buses in Merced County, the Gateway to Yosemite-Transit Joint Powers Authority Of Merced County",2020-Inglewood Transit Connector Project-City Of Inglewood,2020-LBT/UCLA Electric Commuter Express-Long Beach Transit,2020-Light Rail Modernization and Expansion of Low-Floor Fleet-Sacramento Regional Transit District,2020-Metrolink Antelope Valley Line Capital and Service Improvements-Los Angeles County Metropolitan Transportation,2020-North State Intercity Bus System-Lake Transit Authority,"2020-Reaching the Most Transit-Vulnerable: AVTA's Zero Emission ""Microtransit"" & Bus Expansion Proposal-Antelope Valley Transit Authority",2020-SDConnect: San Diego Rail Improvement Program-San Diego Association Of Governments,2020-Sacramento Valley Station (SVS) Transit Center-Capitol Corridor Joint Powers Authority,2020-Solano Regional Transit Improvements Phase 2-Solano Transportation Authority,2020-The Transbaby Corridor Core Capacity Program: Vehicle Acquisition-Bay Area Rapid Transit,2020-Torrance Transit Bus Service Enhancement Program-Torrance Transit Department,2020-West Valley Connector Bus Rapid Transit Phase 1 & ZEB Initiative-San Bernardino County Transportation Authority,2022-ATN FAST (Family of Advanced Solutions for Transit): Revolutionizing Transit for a Global Audience-Anaheim Transportation Network,2022-City of Wasco Improving Air Quality and Economic Growth with Bus Electrification-City Of Wasco,2022-East Bay Transit-Oriented Development Mobility Enhancement Project-Bay Area Rapid Transit,2022-Expanding Transit Services and Introducing Zero-Emission Fleets on California’s North Coast-Humboldt Transit Authority,2022-Fleet Modernization Project-Sacramento Regional Transit District,2022-Fresno County Rural Transit Agency Resiliency Hub-Fresno County Rural Transit Agency,2022-I-680 Express Bus Program-Contra Costa Transportation Authority,2022-Los Angeles Nextgen and Zero Emission Bus Implementation Project-Los Angeles County Metropolitan Transportation,2022-Making a Beeline for Electrification - City of Glendale and Arroyo Verdugo Communities Zoom towards Cleaner Transportation-City Of Glendale,2022-Metrolink Perris Valley Line Capacity Improvements-Southern California Regional Rail Authority,2022-Next Wave: Expanding MTD's Electric Legacy on the South Coast-Santa Barbara Metropolitan Transit District,2022-Oakland Waterfront Mobility Hub-City Of Oakland,2022-SFMTA Core Capacity Program-San Francisco Municipal Transportation Agency,2022-SURF! Busway and Bus Rapid Transit-Monterey-Salinas Transit District,2022-Sacramento Valley Station (SVS) Transit Center: Priority Project-Capitol Corridor Joint Powers Authority,2022-San Francisco Zero Emissions High-Frequency Ferry Network-San Francisco Bay Area Water Emergency Transportation Authority,2022-Sonoma Regional Bus and Rail Connectivity Improvements-Sonoma County Transportation Authority,2022-South Bay Microtransit Expansion-City Of Cupertino,"2022-Sweet Home Antelope Valley, Where the Skies are so Blue-Antelope Valley Transit Authority",2022-The Regional Connectivity Improvement Bus Program-City Of Torrance,2022-Tulare Cross-Valley Corridor ZEB Expansion-Tulare County Regional Transit Agency,2022-Valley Rail Expansion: Altamont Corridor Express (ACE) Ceres to Turlock Extension-San Joaquin Regional Rail Commission,2022-Zero-Emission Transit Enhancement Project-San Diego Metropolitan Transit System
Estimated Tircp Ghg Reductions,61000.0,234000.0,957000.0,68000.0,1160000.0,7000.0,67000.0,0.0,17000.0,81000.0,23000.0,73000.0,196000.0,7966000.0,26000.0,737000.0,14000.0,7000.0,134000.0,47000.0,138000.0,5714000.0,9000.0,1348000.0,4272000.0,156000.0,4063000.0,4369000.0,325000.0,369000.0,41000.0,18000.0,31000.0,772000.0,9000.0,85000.0,584000.0,14000.0,12000.0,34000.0,39000.0,125000.0,2495000.0,30000.0,33000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### Outputs: Measures except GHG Reductions.

In [321]:
outcomes_melt = pd.melt(outcomes, id_vars=[ 'Award Year',
         'Detailed Title Cols',], value_vars=[
       'Cost Per Ghg Ton Reduced', 'Increased Ridership',
       'Service Integration', 'Improve Safety',])

In [322]:
outcomes_melt = A1_data_prep.clean_up_columns(outcomes_melt)

In [323]:
year_summary = (outcomes_melt
                .groupby(['Award Year','Variable', 'Value'])
                .agg({'Detailed Title Cols':'nunique'})
                .rename(columns = {'Detailed Title Cols':
                                   'Number of Projects in this Value Category'}) 
               )

In [324]:
year_summary

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Number of Projects in this Value Category
Award Year,Variable,Value,Unnamed: 3_level_1
2018,Cost Per Ghg Ton Reduced,High,16
2018,Cost Per Ghg Ton Reduced,Medium,3
2018,Cost Per Ghg Ton Reduced,Medium-High,8
2018,Cost Per Ghg Ton Reduced,,1
2018,Improve Safety,High,9
2018,Improve Safety,Medium,12
2018,Improve Safety,Medium-High,7
2018,Increased Ridership,High,13
2018,Increased Ridership,Medium,10
2018,Increased Ridership,Medium-High,5


##### GHG Reductions.

In [327]:
GHG_by_year = outcomes.groupby(['Award Year']).agg({
       'Estimated Tircp Ghg Reductions':'sum'}) 

In [328]:
GHG_by_year

Unnamed: 0_level_0,Estimated Tircp Ghg Reductions
Award Year,Unnamed: 1_level_1
2018,31944000.0
2020,5016000.0
2022,0.0


#### Save

In [330]:
with pd.ExcelWriter(f"{GCS_FILE_PATH}calsta_draft.xlsx") as writer:
    outcomes.to_excel(writer, sheet_name="outcomes_unpivoted", index=True)
    outcomes_transformed.to_excel(writer, sheet_name="outcomes_transformed", index=True)
    projects.to_excel(writer, sheet_name="projects", index=True)
    year_summary.to_excel(writer, sheet_name="year_summary", index=True)
    GHG_by_year.to_excel(writer, sheet_name="GHG_reduction_year", index=True)