# Cleaning CSV Agency names to match Official Locode List

Using data from obtained from the Division of Local Assistance Website, E-76 Obligated List, and the Official Locode List obtained from DLA. 

In this notebook we will change the names of certain agencies to match the offical name in the Locode List starting with the agencies below. 


| Locode | Agency Official Name | Agency name in Obligated List | 
| ------------- | ------------- | ------------- |
| 5940 | Mariposa County | Humboldt, Mariposa| 
| 5953 | Los Angeles County | Los Angeles, Marin County, Paradise, Trinity County  |
| 5957 | San Diego County | San Diego, Cathedral City|
| 6190 | U.S. Forest Service, Pacific Southwest Region | Usda Forest Ser, United States Forest Service|
| 5939 | Merced County | Merced | 
| 5956 | Riverside County | Riverside |
| 5958 | Imperial County | Imperial | 
| 5961 | Kern County (District 9) | Kern |
| 6000 | San Francisco Bay Area Rapid Transit District | Bay Area Rt | 
| 6002 | Alameda - Contra Costa Transit District | Ala-Con Costa T | 
| 6065 | Los Angeles County Metropolitan Transportation... | La Co M T A |
| 6081 | Department of Parks and Recreation | Parks And Rec | 
| 6264 | Santa Clara Valley Transportation Authority | Vta |
| 6343 | Marin County Transit District | Mctd | 
| 6365 | San Francisco Bay Area Water Transit Authority | Wta |

In [1]:
import pandas as pd
from siuba import *

import numpy as np

from datetime import date
from IPython.display import Markdown, HTML, display_html

from calitp import *

import ipywidgets as widgets
from ipywidgets import *
from IPython.display import Markdown
from IPython.core.display import display

In [2]:
df = pd.read_csv('gs://calitp-analytics-data/data-analyses/dla/e-76Obligated/clean_obligated_waiting.csv', low_memory=False).drop('Unnamed: 0', axis=1)



In [3]:
df.head()

Unnamed: 0,location,prefix,project_no,agency,prepared_date,submit__to_hq_date,hq_review_date,submit_to_fhwa_date,to_fmis_date,fed_requested,...,project_location,type_of_work,seq,date_request_initiated,date_completed_request,mpo,warning,projectID,projectNO,compare_id_locode
0,Obligated,BPMPL,5904(121),Humboldt County,2018-12-18,2018-12-18,2018-12-18,2018-12-18,2018-12-27,0.0,...,14 Bridges In Humboldt County,Bridge Preventive Maintenance - Deck Joints,3,,,NON-MPO,,5904,121,True
1,Obligated,ER,32D0(008),Mendocino County,2018-12-17,2018-12-19,2018-12-20,2018-12-20,2018-12-27,11508.0,...,"Comptche Ukiah Road, Cr 223 Pm 17.25",Permanent Restoration,3,2018-12-17,2018-12-18,NON-MPO,,32D0,8,False
2,Obligated,ER,4820(004),Humboldt County,2018-12-07,2018-12-21,2018-12-21,2018-12-21,2018-12-27,45499.64,...,Mattole Rd Pm 43.17,Permanent Restoration,5,2018-12-06,2018-12-07,NON-MPO,,4820,4,False
3,Obligated,CML,5924(244),Sacramento County,2018-12-11,2018-12-11,2018-12-21,2018-12-27,2018-12-27,207002.0,...,Fair Oaks Blvd. Between Howe Ave And Munroe St,Create A Smart Growth Corridor With Barrier Se...,1,2018-12-07,2018-12-07,SACOG,,5924,244,True
4,Obligated,CML,5924(214),Sacramento County,2018-12-05,2018-12-11,2018-12-21,2018-12-27,2018-12-27,0.0,...,Florin Rd Between Power Inn Rd. And Florin Per...,Streetscape (tc),3,2018-11-28,2018-12-04,SACOG,,5924,214,True


In [4]:
#number of unique values pre changing names

df.agency.nunique()

671

## Replace the Agency Names
code help: https://stackoverflow.com/questions/53410748/change-column-value-based-on-multiple-conditions

### Mariposa County 5940

In [5]:
df>>filter(_.locode=='5940')>>count(_.agency)

Unnamed: 0,agency,n
0,Humboldt,1
1,Mariposa,4
2,Mariposa County,37


In [6]:
df.loc[(df['locode']=='5940') & (df['agency']=='Mariposa'),'agency'] = 'Mariposa County'

In [7]:
df>>filter(_.locode=='5940')>>count(_.agency)

Unnamed: 0,agency,n
0,Humboldt,1
1,Mariposa County,41


### Los Angeles County 5953

In [8]:
df>>filter(_.locode=='5953')>>count(_.agency)

Unnamed: 0,agency,n
0,Los Angeles,12
1,Los Angeles County,404
2,Marin County,1
3,Paradise,4
4,Trinity County,3


In [9]:
df.loc[(df['locode']=='5953') & (df['agency']=='Los Angeles'),'agency'] = 'Los Angeles County'

In [10]:
df>>filter(_.locode=='5953')>>count(_.agency)

Unnamed: 0,agency,n
0,Los Angeles County,416
1,Marin County,1
2,Paradise,4
3,Trinity County,3


### San Diego County 5957

In [11]:
df>>filter(_.locode=='5957')>>count(_.agency)

Unnamed: 0,agency,n
0,Cathedral City,1
1,San Diego,6
2,San Diego County,75


In [12]:
df.loc[(df['locode']=='5957') & (df['agency']=='San Diego'),'agency'] = 'San Diego County'

### U.S. Forest Service, Pacific Southwest Region 6190

In [13]:
df>>filter(_.locode=='6190')>>count(_.agency)

Unnamed: 0,agency,n
0,"U.S. Forest Service, Pacific Southwest Region",2
1,United States Forest Service,1
2,Usda Forest Ser,4


In [14]:
df.loc[(df['locode']=='6190') & (df['agency']=='United States Forest Service'),'agency'] = 'U.S. Forest Service, Pacific Southwest Region'
df.loc[(df['locode']=='6190') & (df['agency']=='Usda Forest Ser'),'agency'] = 'U.S. Forest Service, Pacific Southwest Region'


### Merced County 5939

In [15]:
df>>filter(_.locode=='5939')>>count(_.agency)

Unnamed: 0,agency,n
0,Merced,4
1,Merced County,88


In [16]:
df.loc[(df['locode']=='5939') & (df['agency']=='Merced'),'agency'] = 'Merced County'

### Riverside County 5956


In [17]:
df>>filter(_.locode=='5956')>>count(_.agency)

Unnamed: 0,agency,n
0,Riverside,1
1,Riverside County,103


In [18]:
df.loc[(df['locode']=='5956') & (df['agency']=='Riverside'),'agency'] = 'Riverside County'

### Imperial County 5958

In [19]:
df>>filter(_.locode=='5958')>>count(_.agency)

Unnamed: 0,agency,n
0,Imperial,4
1,Imperial County,59


In [20]:
df.loc[(df['locode']=='5958') & (df['agency']=='Imperial'),'agency'] = 'Imperial County'

### Kern County (District 9) 5961

In [21]:
df>>filter(_.locode=='5961')>>count(_.agency)

Unnamed: 0,agency,n
0,Kern,2
1,Kern County (District 9),17


In [22]:
df.loc[(df['locode']=='5961') & (df['agency']=='Kern'),'agency'] = 'Kern County (District 9)'

### San Francisco Bay Area Rapid Transit District 6000

In [23]:
df>>filter(_.locode=='6000')>>count(_.agency)

Unnamed: 0,agency,n
0,Bay Area Rt,1
1,San Francisco Bay Area Rapid Transit District,40


In [24]:
df.loc[(df['locode']=='6000') & (df['agency']=='Bay Area Rt'),'agency'] = 'San Francisco Bay Area Rapid Transit District'

### Alameda - Contra Costa Transit District 6002

In [25]:
df>>filter(_.locode=='6002')>>count(_.agency)

Unnamed: 0,agency,n
0,Ala-Con Costa T,1
1,Alameda - Contra Costa Transit District,6


In [26]:
df.loc[(df['locode']=='6002') & (df['agency']=='Ala-Con Costa T'),'agency'] = 'Alameda - Contra Costa Transit District'

### Los Angeles County Metropolitan Transportation 6065

In [27]:
df>>filter(_.locode=='6065')>>count(_.agency)

Unnamed: 0,agency,n
0,La Co M T A,1
1,Los Angeles County Metropolitan Transportation...,84


In [28]:
df.loc[(df['locode']=='6065') & (df['agency']=='La Co M T A'),'agency'] = 'Los Angeles County Metropolitan Transportation Authority'


### Department of Parks and Recreation 6081

In [29]:
df>>filter(_.locode=='6081')>>count(_.agency)

Unnamed: 0,agency,n
0,Department Of Parks And Recreation,10
1,Parks And Rec,1


In [30]:
df.loc[(df['locode']=='6081') & (df['agency']=='Parks And Rec'),'agency'] = 'Department Of Parks And Recreation'

### Santa Clara Valley Transportation Authority 6264

In [31]:
df>>filter(_.locode=='6264')>>count(_.agency)

Unnamed: 0,agency,n
0,Santa Clara Valley Transportation Authority,50
1,Vta,1


In [32]:
df.loc[(df['locode']=='6264') & (df['agency']=='Vta'),'agency'] = 'Santa Clara Valley Transportation Authority'

###  Marin County Transit District 6343


In [33]:
df>>filter(_.locode=='6343')>>count(_.agency)

Unnamed: 0,agency,n
0,Marin County Transit District,1
1,Mctd,1


In [34]:
df.loc[(df['locode']=='6343') & (df['agency']=='Mctd'),'agency'] = 'Marin County Transit District'

### San Francisco Bay Area Water Transit Authority 6365

In [35]:
df>>filter(_.locode=='6365')>>count(_.agency)

Unnamed: 0,agency,n
0,San Francisco Bay Area Water Transit Authority,1
1,Wta,2


In [36]:
df.loc[(df['locode']=='6365') & (df['agency']=='Wta'),'agency'] = 'San Francisco Bay Area Water Transit Authority'

# Count Agencies

In [37]:
df.agency.nunique()

662

## To CSV

In [38]:
#df.to_csv('obligated_list_1202.csv',index=False)

# Agencies/Locodes Remaining 

| Locodes  |Agency in Locode List | Agencies to Change |
| ------------- | ------------- | ------------- |
| 5953 | Los Angeles County | Marin County, Paradise, Trinity County  |
| 5903 |Modoc County | Alpine County, Monterey County, Nevada County  |
| 5954 |San Bernardino County | Tehama County, Yucaipa    |
| 5916 |Yuba County | Shasta County, Tuolumne County|
| 5940 | Mariposa County | Humboldt| 
| 5957 | San Diego County | Cathedral City|
| 5020 |Yreka City | Sonoma County|
| 5275 | Indio | Palm Springs | 
| 5351 | Pico Rivera | Los Angeles County |
| 5391 | Morro Bay | Ora Co Trans Au | 
| 5463 | Calabasas | Calaveras |
| 5912 | Butte County | Santa Barbara County | 
| 5921 | Napa County | Shasta County |
| 5930 | Calaveras County | Los Angeles County | 
| 5936 | Santa Cruz County | Monterey County |
| 7500  | NaN | Banning, Fowler, Lancaster, Los Angeles, Palmdale, Richmond, San Luis Obispo, San Mateo, Sgvc, Stockton, Sutter, Ventura  |
| 40A0  | NaN | Mendocino, San Bernardino, Santa Cruz  |
| NBIL  | NaN | La Quinta, Yucaipa                     |
