# **District 5 - San Luis Obispo**

<hr style="height:2px;border-width:0;color:black;background-color:black">

How does funding for local agencies differ district to district? Using E-76 obligation data, we can gain insight to how agencies in District 5 use federal program funds, and help identify DLA‘s core customers.

In [1]:
%%capture

import numpy as np
import pandas as pd
from siuba import *

import altair as alt
import altair_saver
from plotnine import *

from IPython.display import Markdown, HTML, Image
import ipywidgets as widgets
from ipywidgets import interact, interactive

from calitp import to_snakecase
import intake

from shared_utils import altair_utils
from shared_utils import styleguide

from dla_utils import _dla_utils


In [2]:
#Parameter Cell

district = 2

In [3]:
# Parameters
district = 5
district_title = "District 5 - San Luis Obispo"


In [4]:
df= pd.read_parquet("gs://calitp-analytics-data/data-analyses/dla/e-76Obligated/dla_df.parquet")


In [5]:
df = (df>>filter(_.dist==district))
    
    
#subsetting the data
df_years = _dla_utils.count_all_years(df)
df_top = _dla_utils.find_top(df)
    

In [6]:
display(HTML("<h3>Quick Stats</h3>"))

display(
    HTML(
        f"There are <strong>{(df.primary_agency_name.nunique())} Unique Agencies</strong>"
    )
)

transit = df >> filter(_.transit == 1)

def find_transit(df):
    if (len(transit)) == 0: 
        return display((HTML(f"Out of <strong>{len(df)}</strong> obligations," 
                             f" <strong>0 are transit realted</strong>")))
    
    else:
        return display(
            HTML(
                f"Out of <strong>{len(df)}</strong> obligations, <strong>{len(transit)} are transit-related</strong>."
                f"<br><strong>{(transit>>count(_.primary_agency_name)>>arrange(-_.n)).iloc[0, 0]}</strong> "
                f"has the <strong>highest transit</strong> obligations."
            ))
    
find_transit(df)

q = df >> count(_.primary_agency_name) >> arrange(_.n)

q2 = q.n.quantile(0.95)

display(
    HTML(
        f"There are <strong>{len(q>>filter(_.n> (q2)))} agencies have over {('%.2f'%(q2))}</strong>" 
        f"obligations (95th percentile) since {(df.prepared_y.min())}"
    )
)

q3 = q.n.quantile(0.1)
display(
    HTML(
        (
            f"There are <strong>{len(q>>filter(_.n< (q3)))} agencies have less than"
            f" {('%.2f'%(q3))}</strong> obligations (5th percentile) since {(df.prepared_y.min())}"
        )
    )
)

## tables
display(HTML("<h4>Number of Unique Prefix Codes by Agency</h4>"))
nunique_prefix_codes = ((_dla_utils.get_nunique(df, "prefix", "primary_agency_name"))
                        .rename(columns={"primary_agency_name": "Agency", "n": "Number of Unqiue Prefix Codes"})
                        .head(5))
display(HTML(_dla_utils.pretify_tables(nunique_prefix_codes)))

display(HTML("<h4>Number of Unique Agencies by Prefix Codes</h4>"))
prefix_codes = ((_dla_utils.get_nunique(df, "primary_agency_name", "prefix"))
                .rename(columns={"prefix": "Prefix", "n": "Number of Unqiue Agencies"})
                .head(5))
display(HTML(_dla_utils.pretify_tables(prefix_codes)))


display(HTML("<h4>Top 5 Types of Work</h4>"))
work_types = (
    (df >> count(_.type_of_work) >> arrange(-_.n) >> select(_.type_of_work))
    .rename(columns={"type_of_work": "Type of Work"})
    .head(5)
)
display(HTML(_dla_utils.pretify_tables(work_types)))
# get rid of index using:
## https://stackoverflow.com/questions/24644656/how-to-print-pandas-dataframe-without-index


Agency,Number Of Unqiue Prefix Codes
San Luis Obispo County,22
Monterey County,19
Santa Barbara County,17
Santa Cruz County,13
Santa Cruz,11


Prefix,Number Of Unqiue Agencies
HSIPL,18
ER,13
RPSTPL,12
ACSTER,10
BRLS,10


Type Of Work
Bridge Replacement (tc)
Bridge Replacement
Stabilize And Reconstruct Roadway
Stabilize Slope And Reconstruct Roadway
Bridge Preventive Maintenance


## Obligations

<hr style="height:2px;border-width:0;color:black;background-color:black">

Obligations indicate a unique entry in the E-76 dataset. By counting the obligations for each year, district, and organization, we can see what the volume each as well which organizations are the most and last frequent customers. 

Metrics:
* Obligations by Year
* Number of Unique Agencies by District
* Agencies With The Most Obligations

### Number of Obligations by Year

In [7]:
 #Line chart for Obligations by Year
chart_df = (df_top>>filter(_.variable=='prepared_y')).rename(columns= {"value":"Year"})
    
chart1= (_dla_utils.basic_line_chart_test_no_save(chart_df, 'Year', 'count', district)).encode(x=alt.X('Year:O', title='Prepared Year'))
display(chart1)

### Number of Unique Agencies by District

In [8]:
# Unique Agencies by Dist
dist_years_agency = ((
        df
        >> group_by(_.prepared_y, _.dist)
        >> summarize(n=_.primary_agency_name.nunique())
        >> arrange(-_.prepared_y)
    )
        .rename(columns={'dist':'District', 'n':'Count'})
    )
chart10 = (alt.Chart(dist_years_agency).mark_bar().encode(
        column='District:N',
        x=alt.X('prepared_y:O', title='Prepared Year'),
        y=alt.Y('Count:Q', title='Number of Unique Agencies'),
        color = alt.Color("District:N", 
                              scale=alt.Scale(
                                  range=altair_utils.CALITP_SEQUENTIAL_COLORS),  
                               legend=alt.Legend(title="Prepared Year")
                              )))
                              
chart10 = styleguide.preset_chart_config(chart10)
chart10 = _dla_utils.add_tooltip(chart10, 'prepared_y', 'Count')

display(chart10)

### Agencies With The Most Obligations

In [9]:
#Bar chart Agencies With The Most Obligations
chart_df = (df_top>>filter(_.variable=='primary_agency_name')).rename(columns={"value":"Agency",
                                 "count":"Number of Obligations"})
chart2 = (_dla_utils.basic_bar_chart_no_save(chart_df, 'Agency', 'Number of Obligations', 'Agency', district))
    

display(chart2)


## Prefix Codes

<hr style="height:2px;border-width:0;color:black;background-color:black">

Prefix Codes refer to the program an obligation is in. Similar to the number of obligations, calcuating the unique prefix codes provides insight to how many progams DLA is involved in each year as well as workload at the district and organization level.

Metrics: 
* Number of Unique Prefix Codes by Districts
* Most Used Prefix Codes
* Agencies With The Most Unique Prefix Codes

### Number of Unique Prefix Codes by District

In [10]:
#Unique Prefixes by Dist
dist_years_prefix = ((
        df
        >> group_by(_.prepared_y, _.dist)
        >> summarize(n=_.prefix.nunique())
        >> arrange(-_.prepared_y)
    ).rename(columns={'dist':'District', 'n':'Count'}))

chart11 = (alt.Chart(dist_years_prefix).mark_bar().encode(
        column='District:N',
        x=alt.X('prepared_y:O', title='Prepared Year'),
        y=alt.Y('Count:Q', title='Number of Unique Agencies'),
        color = alt.Color("District:N", 
                              scale=alt.Scale(
                                  range=altair_utils.CALITP_SEQUENTIAL_COLORS),  
                               legend=alt.Legend(title="District")
                              )
                              ))
chart11 = styleguide.preset_chart_config(chart11)
chart11 = _dla_utils.add_tooltip(chart11, 'prepared_y','Count')
    

display(chart11)

### Most Used Prefix Codes

In [11]:
#Bar chart with the Most Used Prefix Counts
chart_df = (df_top>>filter(_.variable=='prefix')).rename(columns={"value":"Prefix",
                                 "count":"Number of Obligations"})
chart9= (_dla_utils.basic_bar_chart_no_save(chart_df, 'Prefix', 'Number of Obligations', 'Prefix', district))
    

display(chart9)
    

### Agencies With The Most Unique Prefix Codes

In [12]:
#Bar chart Agencies With The Most Unique Prefix Codes
    
chart3 = (_dla_utils.basic_bar_chart_no_save(((_dla_utils.get_nunique(df, 'prefix', 'primary_agency_name')).head(30)),
                            'primary_agency_name', 'n', 'primary_agency_name', district))
    
display(chart3)

## Funding Distributions

<hr style="height:2px;border-width:0;color:black;background-color:black">

With each E-76, three types of funding amounts are included in the obligations: 
* Total Requested (`total_requested`)
* Advance Construction Requested (`ac_requested`)
* Federal Requested (`fed_requested`)

Using this information, we can determine how much on average an organization recieves with these funds, and the distribution of the funds.


Metrics:
* Average Total Requested Funds by Agency
* Lowest Average Total Funds by Agency
* Average Total Requested Funds by Prefix

### Average Total Requested Funds by Agency ($2021)

In [13]:
#Bar chart Average Total Requested Funds by Agency
chart4=(_dla_utils.basic_bar_chart_no_save((((_dla_utils.calculate_data_all(df, 'adjusted_total_requested', 'primary_agency_name', aggfunc="mean"))
                          >>arrange(-_.adjusted_total_requested)).head(30)
                        ), 'primary_agency_name','adjusted_total_requested', 'primary_agency_name', district
                          
                       ))
    
display(chart4)

### Lowest Average Total Funds by Agency ($2021)

In [14]:
#Bar chart Bottom Average Total Requested Funds by Agency
avg_funds_bottom = (df>>group_by(_.primary_agency_name)>>summarize(avg_funds=_.adjusted_total_requested.mean())>>arrange(-_.avg_funds)).tail(50)

chart5=( _dla_utils.basic_bar_chart_no_save((avg_funds_bottom.tail(40)), 'primary_agency_name','avg_funds', 'primary_agency_name', district))
    
display(chart5)

### Average Total Requested Funds by Prefix ($2021)

In [15]:
# Bar chart Average Total Requested Funds by Prefix
chart8 = (_dla_utils.basic_bar_chart_no_save((((_dla_utils.calculate_data_all(df, 'adjusted_total_requested', 'prefix', aggfunc="mean"))
                          >>arrange(-_.adjusted_total_requested)).head(30)), 'prefix','adjusted_total_requested', 'prefix', district
                       ))
    
display(chart8)
    


## Work Categories

<hr style="height:2px;border-width:0;color:black;background-color:black">

While the data includes a description column, organizations have the option to manually input the descriptions. Using the organizations descriptions of the obligattion type, we can categorize the obligations in terms of types of work. We used the following type of work categories:
* Active Transportation
* Transit
* Bridge
* Street
* Freeway
* Infrastructure/Resiliency/Emergency Relief 
* Congestion Relief

With these categories, we can determine which organizations have the most obligations in that category and what percent of the category that organization accounts for. 

In [16]:
# create loop:

work_cat = ['active_transp', 'transit', 'bridge', 'street','freeway', 'infra_resiliency_er','congestion_relief']

for cat in work_cat:
    _dla_utils.project_cat(df, cat, district)

HTML(value='<h3> Top Agencies using Active Transportation Projects </h3>')

HTML(value='<style type="text/css">\n#T_dd57f th {\n  text-align: center;\n}\n#T_dd57f_row0_col0, #T_dd57f_row…

HTML(value='<h3> Top Agencies using Transit Projects </h3>')

HTML(value='<style type="text/css">\n#T_7b5fe th {\n  text-align: center;\n}\n#T_7b5fe_row0_col0, #T_7b5fe_row…

HTML(value='<h3> Top Agencies using Bridge Projects </h3>')

HTML(value='<style type="text/css">\n#T_468cb th {\n  text-align: center;\n}\n#T_468cb_row0_col0, #T_468cb_row…

HTML(value='<h3> Top Agencies using Street Projects </h3>')

HTML(value='<style type="text/css">\n#T_30a0f th {\n  text-align: center;\n}\n#T_30a0f_row0_col0, #T_30a0f_row…

HTML(value='<h3> Top Agencies using Freeway Projects </h3>')

HTML(value='<style type="text/css">\n#T_d324a th {\n  text-align: center;\n}\n#T_d324a_row0_col0, #T_d324a_row…

HTML(value='<h3> Top Agencies using Infrastructure & Emergency Relief Projects </h3>')

HTML(value='<style type="text/css">\n#T_ebcec th {\n  text-align: center;\n}\n#T_ebcec_row0_col0, #T_ebcec_row…

HTML(value='<h3> Top Agencies using Congestion Relief Projects </h3>')

HTML(value='<style type="text/css">\n#T_633b6 th {\n  text-align: center;\n}\n#T_633b6_row0_col0, #T_633b6_row…