# TIRCP Calsta
* TIRCP outcomes for cycles 3-5 for California State Transportation Agency. 
* [Cycles 1-6](https://calsta.ca.gov/subject-areas/transit-intercity-rail-capital-prog)
* Cycle 1: 2015
* Cycle 2: 2016
* Cycle 3: 2018
* CYcle 4: 2020
* Cycle 5: 2022

In [157]:
# Normal packages
# String Manipulation
import re
from collections import Counter

# My utilities
import A1_data_prep
import A2_tableau
import A7_zev
import nltk
import numpy as np
import pandas as pd

# Formatting
from babel.numbers import format_currency
from calitp import *
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk import ngrams

In [120]:
pd.options.display.max_columns = 100
pd.options.display.float_format = "{:.2f}".format
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

In [121]:
df_tircp = to_snakecase(A2_tableau.tableau_dashboard())

  warn(msg)


In [122]:
df_tircp2 = (df_tircp.loc[df_tircp["award_year"] >= 2018].reset_index(drop = True))

In [123]:
df_tircp2.award_year.value_counts(), len(df_tircp2)

(2018    28
 2022    23
 2020    17
 Name: award_year, dtype: int64,
 68)

In [124]:
# Change the description to lower case
# so string search will be more accurate
df_tircp2["description"] = df_tircp2["description"].str.lower()

# Some numbers are spelled out: replace them
# Replace numbers that are written out into integers
df_tircp2["description"] = (
    df_tircp2["description"]
    .str.replace("two", "2")
    .str.replace("three", "3")
    .str.replace("four", "4")
    .str.replace("five", "5")
    .str.replace("six", "6")
    .str.replace("seven", "7")
    .str.replace("eight", "8")
    .str.replace("nine", "9")
    .str.replace("eleven", "11")
    .str.replace("fifteen", "15")
    .str.replace("twenty", "20")
)

In [146]:
# Extract numbers from description into a new column
# cast as float, fill in NA with 0
df_tircp2["number"] = (
        df_tircp2["description"].str.extract("(\d+)").astype("float64").fillna(0)
    )


In [147]:
df_tircp2[['award_year','title','description','number']]

Unnamed: 0,award_year,title,description,number
0,2018,Purchase Zero Emission High Capacity Buses to Support Transbay Tomorrow and Clean Corridors Plan,zero-emission buses for service expansion,0.0
1,2018,#Electrify Anaheim: Changing the Transit Paradigm in Southern California,"deploys 40 zero-emission electric buses to double service levels on up to 8 routes, add 2 new routes; implements a new circulator/on-demand first-mile/last-mile service; and construction of a new maintenance facility with solar canopy structures.",40.0
2,2018,From the Desert to the Sea: Antelope Valley Transit Authority and Long Beach Transit Zero Emission Bus Initiative,"deploys 7 zero-emission battery electric buses and upgrades charging infrastructure serving avta local and commuter bus routes, bringing the entire avta system to fully electric status (the first in the nation) by 2019; deploys 5 zero-emission battery electric buses and related infrastructure for long beach transit services. increased frequency on up to 5 local and community transit routes operated by lbt.",7.0
3,2018,The Transbay Corridor Core Capacity Program: Vehicle Acquistion and Communications-Based Train Control System,"deploys 272 new rail vehicles and completes a communication-based train control system (cbtc), allowing an increase in train frequency to 30 trains per hour through the transbay tunnel as well as an increase in train length to 10 car trains during peak hours to alleviate crowing. allows for over 200,00 new riders per day to ride bart.",272.0
4,2018,The Northern California Corridor Enhancement Program,"funds the statewide service and ticket integration effort, allowing riders on at least 10 different rail and transit systems to plan travel and purchase tickets in a single, seamless transaction. \n\nre-route ccjpa service, allowing 13 minutes of travel time savings between oakland and san jose.\n\nne2rk integration\n\nnew connections to dumbarton bridge services.",10.0
5,2018,Southwest Fresno Community Connector,purchase of 6 zero-emission battery-electric buses and the construction of charging infrastructure to allow extension of 15-min service connecting southwest fresno to the northern part of fresno and creating a new route providing access to job centers.,6.0
6,2018,Los Angeles City: Leading the Transformation to Zero-Emission Electric Bus Transit Service,"acquire 112 zero-emission buses to replace existing propane vehicles and add new vehicles, in order to increase frequency of all existing dash routes to 15-minute service and add 4 new routes, serving communities throughout the city of los angeles as recommended in the comprehensive transit service analysis.",112.0
7,2018,Electric Blue: Electrification of City of Santa Monica's Big Blue Bus,construction- purchase 10- 40 foot battery electric buses,10.0
8,2018,Dublin/Pleasanton Capacity Improvement and Congestion Reduction Program,"increase bart ridership through\nconstruction of a new multi-level parking structure to create over 500 additional parking spaces, including prioritized vanpool parking, at the dublin-pleasanton bart station.",500.0
9,2018,Los Angeles Region Transit System Integration and Modernization Program of Projects,"capital improvements that will broaden and modernize transit connectivity in los angeles county and the southern california region by advancing new transit corridors simultaneously: gold line light rail extension to montclair, east san fernando valley transit corridor, west santa ana light rail transit corridor, green line light rail extension to torrance, and the orange/red line to gold line bus rapid transit connector (north hollywood to pasadena). includes support for the development of a vermont transit corridor project and regional ne2rk integration with metrolink, amtrak, and additional transit services. projects will add over 120,00 additional riders per day by 2028.",2.0


In [148]:
# Natalie's function
def get_list_of_words(df, col):

    # get just the one col
    column = df[[col]]
    # remove single-dimensional entries from the shape of an array
    col_text = column.squeeze()
    # get list of words
    text_list = col_text.tolist()
    # join list of words
    text_list = " ".join(text_list).lower()

    # remove punctuation
    text_list = re.sub(r"[^\w\s]", "", text_list)
    swords = [re.sub(r"[^A-z\s]", "", sword) for sword in stopwords.words("english")]
    # remove stopwords
    clean_text_list = [
        word for word in word_tokenize(text_list.lower()) if word not in swords
    ]

    return clean_text_list

In [167]:
# https://stackoverflow.com/questions/64593557/how-to-find-most-common-word-from-the-entire-column-of-string-in-python
descriptions_list = get_list_of_words(df_tircp2, "description")

In [150]:
counter = Counter()  # Initializing a counter variable

In [151]:
for tag in descriptions_list:
    split_string = re.findall(r"\w+", tag)
    counter.update(split_string)

In [152]:
most_common = counter.most_common

In [153]:
# Turn results into a dataframe
# https://stackoverflow.com/questions/31111032/transform-a-counter-object-into-a-pandas-dataframe
most_common_dictionary = Counter(
    {
        "service": 63,
        "transit": 52,
        "new": 51,
        "rail": 40,
        "buses": 37,
        "bus": 36,
        "station": 34,
        "project": 32,
        "zeroemission": 27,
        "construction": 24,
        "line": 24,
        "services": 23,
        "purchase": 22,
        "electric": 21,
        "includes": 21,
        "infrastructure": 20,
        "improvements": 16,
        "county": 16,
        "corridor": 16,
        "vehicles": 15,
        "train": 15,
        "san": 15,
        "santa": 15,
        "routes": 14,
        "facility": 14,
        "charging": 14,
        "including": 14,
        "also": 14,
        "center": 14,
        "trains": 13,
        "expansion": 12,
        "frequency": 12,
        "system": 11,
        "allow": 11,
        "route": 11,
        "capacity": 11,
        "downtown": 11,
        "increase": 10,
        "integration": 10,
        "existing": 10,
        "city": 10,
        "additional": 10,
        "passenger": 10,
        "emission": 10,
        "sacramento": 10,
        "bart": 9,
        "light": 9,
        "valley": 9,
        "support": 9,
        "regional": 9,
        "fleet": 9,
        "zero": 9,
        "expand": 9,
        "layover": 9,
        "la": 9,
        "2": 8,
        "serving": 8,
        "local": 8,
        "increased": 8,
        "well": 8,
        "extension": 8,
        "access": 8,
        "los": 8,
        "angeles": 8,
        "ridership": 8,
        "development": 8,
        "microtransit": 8,
        "key": 8,
        "provide": 8,
        "battery": 7,
        "allowing": 7,
        "10": 7,
        "travel": 7,
        "network": 7,
        "connecting": 7,
        "4": 7,
        "improve": 7,
        "3": 7,
        "along": 7,
        "improved": 7,
        "barbara": 7,
        "two": 7,
        "housing": 7,
        "associated": 7,
        "commuter": 6,
        "per": 6,
        "peak": 6,
        "riders": 6,
        "time": 6,
        "centers": 6,
        "communities": 6,
        "metrolink": 6,
        "projects": 6,
        "pacific": 6,
        "surfliner": 6,
        "program": 6,
        "funding": 6,
        "lowfloor": 6,
        "priority": 6,
        "future": 6,
        "stations": 6,
        "track": 6,
        "connect": 6,
        "design": 6,
        "1": 6,
        "add": 5,
        "implements": 5,
        "maintenance": 5,
        "7": 5,
        "5": 5,
        "connections": 5,
        "california": 5,
        "corridors": 5,
        "amtrak": 5,
        "signal": 5,
        "study": 5,
        "facilities": 5,
        "multiple": 5,
        "operations": 5,
        "ceres": 5,
        "area": 5,
        "lanes": 5,
        "provides": 5,
        "airport": 5,
        "purchases": 5,
        "three": 5,
        "inglewood": 5,
        "hydrogen": 5,
        "el": 5,
        "muni": 5,
        "bay": 5,
        "street": 5,
        "deploys": 4,
        "8": 4,
        "solar": 4,
        "related": 4,
        "control": 4,
        "30": 4,
        "funds": 4,
        "jose": 4,
        "6": 4,
        "creating": 4,
        "replace": 4,
        "parking": 4,
        "capital": 4,
        "gold": 4,
        "green": 4,
        "performance": 4,
        "coaster": 4,
        "longer": 4,
        "express": 4,
        "investment": 4,
        "investments": 4,
        "trolley": 4,
        "9": 4,
        "efforts": 4,
        "implement": 4,
        "one": 4,
        "improving": 4,
        "metro": 4,
        "entertainment": 4,
        "components": 4,
        "cities": 4,
        "cajon": 4,
        "reliability": 4,
        "include": 4,
        "transportation": 4,
        "levels": 3,
        "upgrades": 3,
        "entire": 3,
        "beach": 3,
        "operated": 3,
        "completes": 3,
        "hour": 3,
        "hours": 3,
        "day": 3,
        "statewide": 3,
        "effort": 3,
        "13": 3,
        "batteryelectric": 3,
        "fresno": 3,
        "providing": 3,
        "create": 3,
        "connectivity": 3,
        "torrance": 3,
        "rapid": 3,
        "right": 3,
        "way": 3,
        "diego": 3,
        "relocation": 3,
        "supports": 3,
        "allelectric": 3,
        "increases": 3,
        "expanded": 3,
        "award": 3,
        "platform": 3,
        "boarding": 3,
        "unit": 3,
        "set": 3,
        "would": 3,
        "conversion": 3,
        "acquisition": 3,
        "stops": 3,
        "bike": 3,
        "university": 3,
        "blue": 3,
        "imperial": 3,
        "current": 3,
        "ace": 3,
        "ventura": 3,
        "proposed": 3,
        "significant": 3,
        "shuttle": 3,
        "serve": 3,
        "improvement": 3,
        "opportunities": 3,
        "intercity": 3,
        "pedestrian": 3,
        "extending": 3,
        "tracks": 3,
        "orange": 3,
        "connection": 3,
        "district": 3,
        "high": 3,
        "metros": 3,
        "sonoma": 3,
        "rosa": 3,
        "fuel": 3,
        "extended": 3,
        "targeted": 3,
        "bidirectional": 3,
        "mobility": 3,
        "efficiency": 3,
        "terminals": 3,
        "currently": 3,
        "phase": 3,
        "forward": 3,
        "south": 3,
        "merced": 3,
        "better": 3,
        "francisco": 3,
        "ferry": 3,
        "svs": 3,
        "implementation": 3,
        "glendale": 3,
        "contactless": 3,
        "payment": 3,
        "smart": 3,
        "turlock": 3,
        "40": 2,
        "double": 2,
        "canopy": 2,
        "avta": 2,
        "bringing": 2,
        "fully": 2,
        "long": 2,
        "community": 2,
        "transbay": 2,
        "car": 2,
        "allows": 2,
        "least": 2,
        "systems": 2,
        "plan": 2,
        "seamless": 2,
        "oakland": 2,
        "15min": 2,
        "part": 2,
        "job": 2,
        "acquire": 2,
        "order": 2,
        "throughout": 2,
        "comprehensive": 2,
        "dublinpleasanton": 2,
        "southern": 2,
        "advancing": 2,
        "montclair": 2,
        "east": 2,
        "west": 2,
        "connector": 2,
        "north": 2,
        "hollywood": 2,
        "2028": 2,
        "ontime": 2,
        "investing": 2,
        "optimization": 2,
        "robust": 2,
        "fencing": 2,
        "prepare": 2,
        "higher": 2,
        "maintenancelayover": 2,
        "cars": 2,
        "accommodate": 2,
        "improves": 2,
        "bicycle": 2,
        "expands": 2,
        "tircp": 2,
        "15": 2,
        "direction": 2,
        "lrvs": 2,
        "replacement": 2,
        "highest": 2,
        "conversions": 2,
        "efficient": 2,
        "pilot": 2,
        "operate": 2,
        "redlands": 2,
        "used": 2,
        "testing": 2,
        "equipment": 2,
        "avenue": 2,
        "addition": 2,
        "international": 2,
        "crossing": 2,
        "stockton": 2,
        "total": 2,
        "commuters": 2,
        "use": 2,
        "employment": 2,
        "oxnard": 2,
        "overtheroad": 2,
        "rider": 2,
        "reductions": 2,
        "round": 2,
        "trip": 2,
        "reduction": 2,
        "using": 2,
        "gas": 2,
        "benefits": 2,
        "multimodal": 2,
        "ab": 2,
        "1550": 2,
        "awarded": 2,
        "clean": 2,
        "extends": 2,
        "clara": 2,
        "critical": 2,
        "delivers": 2,
        "frequent": 2,
        "reconfiguration": 2,
        "union": 2,
        "clarita": 2,
        "bernardino": 2,
        "riverside": 2,
        "11": 2,
        "supportive": 2,
        "integrated": 2,
        "intermodal": 2,
        "core": 2,
        "railyards": 2,
        "plaza": 2,
        "planned": 2,
        "lines": 2,
        "sports": 2,
        "stadium": 2,
        "growth": 2,
        "areas": 2,
        "destinations": 2,
        "cell": 2,
        "contributing": 2,
        "greater": 2,
        "antelope": 2,
        "enable": 2,
        "construct": 2,
        "brt": 2,
        "pomona": 2,
        "ontario": 2,
        "rancho": 2,
        "cucamonga": 2,
        "continuing": 2,
        "santee": 2,
        "consolidation": 2,
        "j": 2,
        "upgrade": 2,
        "enhance": 2,
        "extend": 2,
        "14": 2,
        "options": 2,
        "providers": 2,
        "weta": 2,
        "contra": 2,
        "costa": 2,
        "inductive": 2,
        "reduce": 2,
        "limited": 2,
        "shoreside": 2,
        "link": 2,
        "mission": 2,
        "vans": 2,
        "ondemand": 2,
        "micro": 2,
        "augment": 2,
        "installation": 2,
        "photovoltaic": 2,
        "eastwest": 2,
        "school": 2,
        "result": 2,
        "affordable": 2,
        "vehicle": 2,
        "locations": 2,
        "offering": 2,
        "modes": 2,
        "elements": 2,
        "boulevard": 2,
        "completing": 2,
        "operators": 2,
        "application": 2,
        "busonly": 2,
        "overall": 2,
        "periods": 2,
        "fueling": 2,
        "shared": 2,
        "hub": 2,
        "offered": 2,
        "mendocino": 2,
        "tier": 2,
        "information": 2,
        "marina": 2,
        "monterey": 2,
        "faster": 2,
        "safety": 2,
        "goleta": 2,
        "petaluma": 2,
        "perrissouth": 2,
        "circulatorondemand": 1,
        "firstmilelastmile": 1,
        "structures": 1,
        "status": 1,
        "first": 1,
        "nation": 1,
        "2019": 1,
        "lbt": 1,
        "272": 1,
        "communicationbased": 1,
        "cbtc": 1,
        "tunnel": 1,
        "length": 1,
        "alleviate": 1,
        "crowing": 1,
        "20000": 1,
        "ride": 1,
        "ticket": 1,
        "different": 1,
        "tickets": 1,
        "single": 1,
        "transaction": 1,
        "reroute": 1,
        "ccjpa": 1,
        "minutes": 1,
        "savings": 1,
        "dumbarton": 1,
        "bridge": 1,
        "southwest": 1,
        "northern": 1,
        "112": 1,
        "propane": 1,
        "dash": 1,
        "15minute": 1,
        "recommended": 1,
        "analysis": 1,
        "foot": 1,
        "multilevel": 1,
        "structure": 1,
        "500": 1,
        "spaces": 1,
        "prioritized": 1,
        "vanpool": 1,
        "broaden": 1,
        "modernize": 1,
        "region": 1,
        "simultaneously": 1,
        "fernando": 1,
        "ana": 1,
        "orangered": 1,
        "pasadena": 1,
        "vermont": 1,
        "12000": 1,
        "nctd": 1,
        "caltrain": 1,
        "expanding": 1,
        "procurement": 1,
        "lengthens": 1,
        "platforms": 1,
        "wayside": 1,
        "onboard": 1,
        "wifi": 1,
        "folsom": 1,
        "combines": 1,
        "previous": 1,
        "min": 1,
        "weekdays": 1,
        "plus": 1,
        "begins": 1,
        "initial": 1,
        "20": 1,
        "accessible": 1,
        "develop": 1,
        "zemu": 1,
        "diesel": 1,
        "dmu": 1,
        "could": 1,
        "impact": 1,
        "like": 1,
        "separated": 1,
        "lane": 1,
        "otay": 1,
        "mesa": 1,
        "border": 1,
        "supplemental": 1,
        "eleven": 1,
        "60foot": 1,
        "articulated": 1,
        "modesto": 1,
        "sjjpa": 1,
        "weekday": 1,
        "madera": 1,
        "oakley": 1,
        "natomas": 1,
        "supported": 1,
        "sb": 1,
        "132": 1,
        "procument": 1,
        "zeremission": 1,
        "complements": 1,
        "counties": 1,
        "enhancing": 1,
        "commute": 1,
        "host": 1,
        "lead": 1,
        "hov": 1,
        "completed": 1,
        "estimates": 1,
        "expect": 1,
        "45minute": 1,
        "gain": 1,
        "greenhouse": 1,
        "geographic": 1,
        "diversity": 1,
        "states": 1,
        "pedestrians": 1,
        "constructing": 1,
        "modern": 1,
        "safe": 1,
        "functional": 1,
        "inviting": 1,
        "accommodates": 1,
        "shuttles": 1,
        "250000": 1,
        "address": 1,
        "identify": 1,
        "coordina": 1,
        "facchinited": 1,
        "28": 1,
        "sbcag": 1,
        "air": 1,
        "lossan": 1,
        "luis": 1,
        "obispo": 1,
        "intod": 1,
        "52000": 1,
        "2035": 1,
        "100000": 1,
        "2075": 1,
        "increasing": 1,
        "caltran": 1,
        "diridon": 1,
        "coachstyle": 1,
        "connects": 1,
        "redding": 1,
        "crossings": 1,
        "segments": 1,
        "larkspur": 1,
        "northward": 1,
        "windsor": 1,
        "healdsburg": 1,
        "cloverdale": 1,
        "reliable": 1,
        "runthough": 1,
        "movement": 1,
        "30min": 1,
        "basin": 1,
        "moorpark": 1,
        "highperformance": 1,
        "longrange": 1,
        "vision": 1,
        "gilroy": 1,
        "salinas": 1,
        "positive": 1,
        "twotrain": 1,
        "allocation": 1,
        "500000": 1,
        "networks": 1,
        "34": 1,
        "306": 1,
        "completion": 1,
        "tube": 1,
        "23": 1,
        "operation": 1,
        "10car": 1,
        "northside": 1,
        "10000": 1,
        "units": 1,
        "i5": 1,
        "northbound": 1,
        "ramp": 1,
        "16mile": 1,
        "electrically": 1,
        "powered": 1,
        "automated": 1,
        "people": 1,
        "mover": 1,
        "apm": 1,
        "passengers": 1,
        "directly": 1,
        "crenshawlax": 1,
        "regionally": 1,
        "lased": 1,
        "parksofi": 1,
        "basketball": 1,
        "ibec": 1,
        "inglewoods": 1,
        "clearlake": 1,
        "fuelcell": 1,
        "terminal": 1,
        "technology": 1,
        "range": 1,
        "coach": 1,
        "ucla": 1,
        "capacityincreasing": 1,
        "step": 1,
        "assess": 1,
        "feasibility": 1,
        "rmu": 1,
        "propulsion": 1,
        "avl": 1,
        "together": 1,
        "regular": 1,
        "60minute": 1,
        "30minute": 1,
        "restructuring": 1,
        "frequencies": 1,
        "made": 1,
        "possible": 1,
        "central": 1,
        "coast": 1,
        "overhaul": 1,
        "modernization": 1,
        "railcars": 1,
        "leverages": 1,
        "2018": 1,
        "stateoftheart": 1,
        "mile": 1,
        "stretch": 1,
        "terminate": 1,
        "relieve": 1,
        "operational": 1,
        "constraints": 1,
        "impacting": 1,
        "still": 1,
        "del": 1,
        "mar": 1,
        "bluffs": 1,
        "stabilization": 1,
        "combination": 1,
        "transitonly": 1,
        "stop": 1,
        "complementary": 1,
        "included": 1,
        "corridorsthe": 1,
        "mlines": 1,
        "near": 1,
        "term": 1,
        "build": 1,
        "advance": 1,
        "third": 1,
        "playa": 1,
        "vista": 1,
        "disadvantaged": 1,
        "integrating": 1,
        "available": 1,
        "solano": 1,
        "travelers": 1,
        "solanoexpress": 1,
        "capitol": 1,
        "sta": 1,
        "coordinated": 1,
        "napa": 1,
        "vine": 1,
        "share": 1,
        "seeking": 1,
        "ghg": 1,
        "vacaville": 1,
        "fairfieldvacaville": 1,
        "hannigan": 1,
        "fairfield": 1,
        "vallejo": 1,
        "suisun": 1,
        "walnut": 1,
        "creek": 1,
        "4x": 1,
        "downton": 1,
        "crenshaw": 1,
        "newly": 1,
        "kaiser": 1,
        "permanente": 1,
        "medical": 1,
        "western": 1,
        "portion": 1,
        "130": 1,
        "artesia": 1,
        "galleria": 1,
        "mall": 1,
        "size": 1,
        "fixed": 1,
        "intercommunity": 1,
        "rural": 1,
        "operates": 1,
        "enough": 1,
        "keep": 1,
        "demand": 1,
        "developed": 1,
        "residential": 1,
        "vessel": 1,
        "26mile": 1,
        "creates": 1,
        "ecosystem": 1,
        "offers": 1,
        "endtoend": 1,
        "solutions": 1,
        "residents": 1,
        "employees": 1,
        "global": 1,
        "audience": 1,
        "drawn": 1,
        "tourismconvention": 1,
        "summer": 1,
        "olympics": 1,
        "events": 1,
        "john": 1,
        "wayne": 1,
        "anaheim": 1,
        "neighborhoods": 1,
        "electricity": 1,
        "generation": 1,
        "zero_x0002_emission": 1,
        "partnership": 1,
        "agency": 1,
        "essential": 1,
        "lake": 1,
        "merritt": 1,
        "cerrito": 1,
        "transitoriented": 1,
        "2000": 1,
        "homes": 1,
        "built": 1,
        "miles": 1,
        "traveled": 1,
        "grow": 1,
        "neighborhood": 1,
        "vitality": 1,
        "places": 1,
        "mix": 1,
        "uses": 1,
        "households": 1,
        "income": 1,
        "interrelated": 1,
        "introduce": 1,
        "redesigned": 1,
        "facilitate": 1,
        "convenient": 1,
        "transfers": 1,
        "realignment": 1,
        "cycle": 1,
        "h": 1,
        "pickup": 1,
        "dropoff": 1,
        "loop": 1,
        "x": 1,
        "6th": 1,
        "8th": 1,
        "richards": 1,
        "midtown": 1,
        "joaquin": 1,
        "altamont": 1,
        "12": 1,
        "cupertino": 1,
        "focus": 1,
        "facilitating": 1,
        "trips": 1,
        "27": 1,
        "busses": 1,
        "reduced": 1,
        "headway": 1,
        "college": 1,
        "transition": 1,
        "arroyo": 1,
        "verdugo": 1,
        "cañada": 1,
        "flintridge": 1,
        "crescenta": 1,
        "montrose": 1,
        "deck": 1,
        "lastly": 1,
        "400": 1,
        "phone": 1,
        "suite": 1,
        "aimed": 1,
        "historic": 1,
        "waterfront": 1,
        "several": 1,
        "underserved": 1,
        "specific": 1,
        "intersection": 1,
        "safey": 1,
        "across": 1,
        "freight": 1,
        "cng": 1,
        "deployed": 1,
        "take": 1,
        "advantage": 1,
        "wascos": 1,
        "diaaride": 1,
        "availability": 1,
        "50": 1,
        "i680": 1,
        "martinez": 1,
        "pleasanton": 1,
        "bollinger": 1,
        "canyon": 1,
        "road": 1,
        "training": 1,
        "gomentum": 1,
        "parttime": 1,
        "lanestransit": 1,
        "shoulder": 1,
        "resiliency": 1,
        "backup": 1,
        "energey": 1,
        "storage": 1,
        "ev": 1,
        "doubling": 1,
        "procure": 1,
        "install": 1,
        "private": 1,
        "trinidad": 1,
        "scotia": 1,
        "ukiah": 1,
        "located": 1,
        "lowincome": 1,
        "census": 1,
        "tracts": 1,
        "eureka": 1,
        "humboldt": 1,
        "seat": 1,
        "largest": 1,
        "261": 1,
        "deploy": 1,
        "divisions": 1,
        "18": 1,
        "silver": 1,
        "many": 1,
        "agencies": 1,
        "bulbs": 1,
        "islands": 1,
        "shelters": 1,
        "realtime": 1,
        "dedicated": 1,
        "busway": 1,
        "parallel": 1,
        "highway": 1,
        "seaside": 1,
        "mst": 1,
        "tamcowned": 1,
        "branch": 1,
        "morning": 1,
        "afternoon": 1,
        "congested": 1,
        "rapidly": 1,
        "growing": 1,
        "commercial": 1,
        "hospitality": 1,
        "jobs": 1,
        "peninsula": 1,
        "resulting": 1,
        "journeys": 1,
        "optimized": 1,
        "enhancement": 1,
        "consists": 1,
        "signaling": 1,
        "rehabilitation": 1,
        "12th": 1,
        "overhead": 1,
        "division": 1,
        "speeds": 1,
        "customer": 1,
        "communications": 1,
        "mts": 1,
        "achieving": 1,
        "full": 1,
        "2040": 1,
        "vessels": 1,
        "necessary": 1,
        "treasure": 1,
        "island": 1,
        "ferries": 1,
        "rest": 1,
        "k": 1,
        "n": 1,
        "38r": 1,
        "geary": 1,
        "times": 1,
        "comfort": 1,
        "invests": 1,
        "embarcadero": 1,
        "3rd": 1,
        "location": 1,
        "delay": 1,
        "interim": 1,
        "direct": 1,
        "eight": 1,
        "zones": 1,
        "uc": 1,
        "general": 1,
        "deployment": 1,
        "racks": 1,
        "shelter": 1,
        "constructs": 1,
        "zeb": 1,
        "amenities": 1,
        "citybus": 1,
        "among": 1,
        "partners": 1,
        "authority": 1,
        "final": 1,
        "metrolinks": 1,
        "91perris": 1,
        "91pvl": 1,
        "peakperiod": 1,
        "4th": 1,
        "cp": 1,
        "eastridge": 1,
        "moreno": 1,
        "valleymarch": 1,
        "field": 1,
        "phased": 1,
        "cross": 1,
        "purchasing": 1,
        "feeder": 1,
        "16": 1,
        "selected": 1,
        "speed": 1,
    }
)

In [154]:
df_common_words = (pd.DataFrame.from_dict(most_common_dictionary, orient="index")
      .reset_index()
      .rename(columns = {'index':'word',0:'total apperance'})
     )

In [None]:
df

Unnamed: 0,word,total apperance
0,service,63
1,transit,52
2,new,51
3,rail,40
4,buses,37
5,bus,36
6,station,34
7,project,32
8,zeroemission,27
9,construction,24


## Second Try
* Find most common phrases. 
* https://stackoverflow.com/questions/60037924/how-count-the-most-frequently-repeated-phrases-in-pandas

In [200]:
c = Counter([' '.join(y) for x in [2,3,4] for y in ngrams(descriptions_list, x)])

In [201]:
df_phrases = pd.DataFrame({'phrases': list(c.keys()),
                   'total': list(c.values())})

In [202]:
len(df_phrases)

7585

In [203]:
df_phrases.sort_values('total', ascending = False).head(100)

Unnamed: 0,phrases,total
7,electric buses,13
36,charging infrastructure,11
23,construction new,10
474,bus service,9
250,light rail,9
411,zero emission,9
0,zeroemission buses,8
200,los angeles,8
979,transit center,8
552,santa barbara,7
