![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)



<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=government-spending/government-spending.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"></a>

# Callysto’s Weekly Data Visualization

## Government Spending

### Recommended Grade levels: 6-9
<br>

### Instructions

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll back to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

## Question

Which sectors receive funding from the Canadian government, and what is the allocation of the government's budget to specific categories? 

### Goal

Our mission is to look at how the different sectors in the Canadian government and identify which spend and most/least of the allocated budget set. We also want to look at the distribution of expenses and find any trends with how the budget is allocated amongst important categories such as health, defense, etc.

### Background

Analyzing the Canadian government's budget, expenses, and future spending habits is important as it promotes transparency and accountability, enabling citizens to assess government priorities and decisions. We can also identify areas for optimization, specifically looking at areas of high spending.

## Gather

All of our data sources used in this notebook comes from [Statistics Canada](https://www.statcan.gc.ca/en/start) and the government of Canada's [open government](https://search.open.canada.ca/opendata/) portal. 

### Code: 

Run the code cells below to import the libraries we need for this project. Libraries are pre-made code that make it easier to analyze our data.

In [1]:
import pandas as pd
import plotly.express as px
import folium
import geopandas as gpd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
import ipywidgets
from IPython.display import display
warnings.filterwarnings("ignore")

print("Libraries imported.")

Libraries imported.


### Data

To begin, we'll obtain our datasets necessary for this notebook by using the cell below. In certain notebooks, the column names will be altered to enhance clarity on what the column means.

### Import the data

In [2]:
# Renaming columns
expenses_cols = ['Year','Social protection','Health','Education','General public services','Economic affairs','Other functions']
share_of_expenses_cols = ['Province','Health','Education','General public services','Social protection','Economic affairs','Other functions']

fte = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/government-spending/FTE.csv")
expenditures = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/government-spending/expenditures.csv")
overall_expenses = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/government-spending/overall_expenses.csv", header=1, names=expenses_cols)
share_of_expenses = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/government-spending/share_of_expenses.csv", header=1, names=share_of_expenses_cols)
federal_provincial = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/government-spending/federal_provincial.csv")
print("Datasets imported.")

Datasets imported.


### Comment on the data

Now that we've obtained our data, we can take a look at what each dataset represents and try to analyze better meaning from it. Throughout this notebook, when a new dataset is being used, the first cell will print the contents of the dataset. This will help establish when a new dataset is being analyzed and the contents of visualizations. 

To begin, let's analyze our `fte` dataset.

In [3]:
fte

Unnamed: 0,Organization,Program,2017-18 Actual FTEs,2018-19 Actual FTEs,2019-20 Actual FTEs,2020-21 Actual FTEs,2021-22 Actual FTEs,2023-24 Planned FTEs,2024-25 Planned FTEs,2025-26 Planned FTEs
0,Atlantic Canada Opportunities Agency,Enterprise Development,217.0,0.0,0.0,0.0,0.0,0,0,0
1,Atlantic Canada Opportunities Agency,Community Development,94.0,0.0,0.0,0.0,0.0,0,0,0
2,Atlantic Canada Opportunities Agency,"Policy, Advocacy and Coordination",65.0,0.0,0.0,0.0,0.0,0,0,0
3,Atlantic Canada Opportunities Agency,Inclusive Communities,0.0,39.0,32.0,36.0,43.0,32,32,32
4,Atlantic Canada Opportunities Agency,Diversified Communities,0.0,71.0,74.0,71.0,76.0,77,76,76
...,...,...,...,...,...,...,...,...,...,...
1941,Prairies Economic Development Canada,Business Services,0.0,0.0,0.0,0.0,0.0,37,37,37
1942,Prairies Economic Development Canada,Community Initiatives,0.0,0.0,0.0,0.0,0.0,38,30,3
1943,Prairies Economic Development Canada,Business Development and Innovation,97.0,0.0,0.0,0.0,0.0,0,0,0
1944,Prairies Economic Development Canada,Internal Services,91.0,0.0,0.0,0.0,0.0,0,0,0


Moving forward, we will be calling our datasets, *dataframes*. A dataframe is like a digital spreadsheet or table which contains rows and columns of data. Each row in a dataframe represents a different piece of information or a record, while each column represents a specific attribute or characteristic of that information.

In the dataframe above, we see that the columns `Organization` and `Program` identify the government organization and the different programs they have. The columns `2017-18 Actual FTEs`-`2021-22 Actual FTEs` identify the organization's FTE's. The definition of an FTE (Full Time Equivalents), from the [government of Canada](https://open.canada.ca/data/en/dataset/e2e60f18-95fe-487b-9edd-d1f7bcdd9f9f) is:

"a measure of the extent to which an employee represents a full person-year charge against the departmental budget for future spending years."

In simplified terms, it's like asking, "if one person worked full-time for a year, how much of our budget would they use?". The final columns are `2023-24 Planned FTEs`-`2025-26 Planned FTEs`, which indicate how many FTEs are planned for this particular organization's future. 

# Explore

To begin, let's see how many different organizations are supported under the Government of Canada. 

In [4]:
# Find unique organizations
unique_organizations = set(fte['Organization'])

# Print unique organization
for org in unique_organizations:
    print(org)

Administrative Tribunals Support Service of Canada
Office of the Auditor General of Canada
Department of Finance Canada
Parole Board of Canada
Federal Economic Development Agency for Southern Ontario
Shared Services Canada
Office of the Commissioner of Official Languages
Prairies Economic Development Canada
Invest in Canada Hub
Public Safety Canada
Military Police Complaints Commission of Canada
Office of the Intelligence Commissioner
Royal Canadian Mounted Police
Natural Sciences and Engineering Research Council of Canada
Offices of the Information and Privacy Commissioners of Canada
Office of the Public Sector Integrity Commissioner of Canada
Canada Science and Technology Museum
Canadian Commercial Corporation
Natural Resources Canada
Statistics Canada
Canadian Dairy Commission
Accessibility Standards Canada
Agriculture and Agri-Food Canada
Financial Transactions and Reports Analysis Centre of Canada
Courts Administration Service
Financial Consumer Agency of Canada
Senate Ethics Offi

Looking at the output above, there appears to be many different organizations supported by the government of Canada, reflecting the country's commitment to a wide array of sectors and initiatives. Using this list, let's create a visualization that can utilize these organizations. Read the commented lines in the code cell below (the lines that start with #).

In [5]:
# Change this to the organization you'd like to look at 
# Example: "Atlantic Canada Opportunities Agency" can be changed to "Canadian Grain Commission"
organization_to_find = "Atlantic Canada Opportunities Agency"

searched_df = fte[fte['Organization'] == organization_to_find]

columns_to_melt = [col for col in searched_df.columns if col != 'Program']
melted_df = pd.melt(searched_df, id_vars=['Organization'], value_vars=columns_to_melt, var_name='Year', value_name='Value')
org_df = melted_df.groupby(['Year', 'Organization'])['Value'].sum().reset_index()

searched_df = fte[fte['Organization'] == organization_to_find]
df_prog = searched_df.groupby('Program').sum().reset_index()
columns_to_melt_prog = [col for col in df_prog.columns if col != 'Program']
program_df = pd.melt(df_prog, id_vars=['Program'], value_vars=columns_to_melt_prog, var_name='Year', value_name='Value')
# Removes unwanted Organization column after melting
program_df = program_df[program_df.Year != "Organization"]

fte_fig = make_subplots(rows=1, cols=2, subplot_titles=("Total FTEs", "Program FTEs"))

for org in org_df['Organization'].unique():
    org_data = org_df[org_df['Organization'] == org]
    fte_fig.add_trace(go.Scatter(x=org_data['Year'], y=org_data['Value'], mode='lines',
                             name=f'{org}'), 
                             row=1, col=1)

for program in program_df['Program'].unique():
    prog_data = program_df[program_df['Program'] == program]
    fte_fig.add_trace(go.Scatter(x=prog_data['Year'], y=prog_data['Value'], mode='lines',
                             name=f'{program}'), row=1, col=2,
                             )
   
    
fte_fig.update_layout(title=f'Progression of FTEs for: {organization_to_find}',
                  xaxis_title='Year', yaxis_title='FTEs',
                  xaxis2_title='Year', yaxis2_title='FTEs')

fte_fig.show()

After viewing the different Full-Time Equivalents (FTEs) of various Canadian organizations, have you gained a different sense of perspective? How might this newfound perspective influence your views on government support for various sectors?

Now that we've done some exploratory analysis on FTE data, let's move onto something more tangible, such as *budgeting*.

In [6]:
expenditures

Unnamed: 0,Organization,Vote 2021-22 / Statutory - Description,2017-18 - Total budgetary authority available for use,2017-18 - Expenditures,2018-19 - Total budgetary authority available for use,2018-19 - Expenditures,2019-20 - Total budgetary authority available for use,2019-20 - Expenditures,2020-21 - Total budgetary authority available for use,2020-21 - Expenditures,2021-22 - Total budgetary authority available for use,2021-22 - Expenditures
0,Atlantic Canada Opportunities Agency,Operating/Program - 1,69938738.0,6.700567e+07,70427706,65427353.0,71610830,68838886,74468028,69790224,76419187.0,70687244.0
1,Atlantic Canada Opportunities Agency,Grants & Contributions - 5,283971202.0,2.839712e+08,290437502,276137362.0,274953030,274953026,452840687,441945575,373936114.0,363964211.0
2,Atlantic Canada Opportunities Agency,Budget Implementation Vote - 10,0.0,0.000000e+00,0,0.0,26772,0,0,0,0.0,0.0
3,Atlantic Canada Opportunities Agency,Budget Implementation Vote - 15,0.0,0.000000e+00,0,0.0,0,0,0,0,0.0,0.0
4,Atlantic Canada Opportunities Agency,Contributions to employee benefit plans,8008731.0,8.008731e+06,8032837,8032837.0,7867729,7867729,9056403,9056403,8838758.0,8838758.0
...,...,...,...,...,...,...,...,...,...,...,...,...
901,Prairies Economic Development Canada,Contributions to employee benefit plans,3739442.0,3.739442e+06,3798669,3798669.0,4413622,4413622,5381397,5381397,5087300.0,5087300.0
902,Prairies Economic Development Canada,Payments for the Community Futures Network pur...,0.0,0.000000e+00,0,0.0,0,0,0,0,0.0,0.0
903,Prairies Economic Development Canada,Payments pursuant to the Public Health Events ...,0.0,0.000000e+00,0,0.0,0,0,472394942,472394942,0.0,0.0
904,Prairies Economic Development Canada,Payments to provide financing support to small...,0.0,0.000000e+00,0,0.0,0,0,0,0,0.0,0.0


# Organize

We can also look at the distribution percentage of government expenses on particular categories in the dataframe `overall_expenses`.

In [10]:
overall_expenses

Unnamed: 0,Year,Social protection,Health,Education,General public services,Economic affairs,Other functions
0,2008,23,23,13,18,9,14
1,2009,24,23,13,17,9,14
2,2010,23,23,14,18,9,13
3,2011,23,24,14,17,8,14
4,2012,23,24,14,16,9,14
5,2013,23,25,14,16,8,14
6,2014,23,25,14,16,8,14
7,2015,24,25,14,16,8,13
8,2016,24,25,14,16,8,13
9,2017,24,24,14,16,8,14


We can first create a visualization representing the general progression of government expenses.

In [11]:
columns = overall_expenses.columns[1:7]

stacked_categories = px.bar(overall_expenses, x='Year', y=columns, title="Stacked Bar Graph of Government Expenses, 2008-2021",
             labels={'variable': f"Category", 'index': "Year", 'value': 'Percentage of Budget'})

stacked_categories.update_layout(barmode='stack').show()

Looking at the visualization, we can generally interpret which categories are prioritized by the government. *Social protection* and *health* appears to be of the highest priority without major changes throughout the years. This is also similar with other categories, appearing to change minimally. However, these minimal changes may be drastic when interpreted in the larger scale of money. 

Let's try to identify the changes throughout the years with more detail. We can calculate this by taking each year's current percent and subtracting it with the previous year. These will be documented in different columns.

In [12]:
for col in columns:
    overall_expenses[col + '_change'] = overall_expenses[col] - overall_expenses[col].shift(1)

# Replace the first row values with 0
overall_expenses.fillna(0, inplace=True)
display(overall_expenses)

Unnamed: 0,Year,Social protection,Health,Education,General public services,Economic affairs,Other functions,Social protection_change,Health_change,Education_change,General public services_change,Economic affairs_change,Other functions_change
0,2008,23,23,13,18,9,14,0.0,0.0,0.0,0.0,0.0,0.0
1,2009,24,23,13,17,9,14,1.0,0.0,0.0,-1.0,0.0,0.0
2,2010,23,23,14,18,9,13,-1.0,0.0,1.0,1.0,0.0,-1.0
3,2011,23,24,14,17,8,14,0.0,1.0,0.0,-1.0,-1.0,1.0
4,2012,23,24,14,16,9,14,0.0,0.0,0.0,-1.0,1.0,0.0
5,2013,23,25,14,16,8,14,0.0,1.0,0.0,0.0,-1.0,0.0
6,2014,23,25,14,16,8,14,0.0,0.0,0.0,0.0,0.0,0.0
7,2015,24,25,14,16,8,13,1.0,0.0,0.0,0.0,0.0,-1.0
8,2016,24,25,14,16,8,13,0.0,0.0,0.0,0.0,0.0,0.0
9,2017,24,24,14,16,8,14,0.0,-1.0,0.0,0.0,0.0,1.0


Now let's visualize these percent changes in the visualization below.

In [13]:
percentage_fig = go.Figure()

for col in columns:
    color = 'red' if overall_expenses[col + '_change'].iloc[1:].mean() < 0 else 'green'
    percentage_fig.add_trace(go.Scatter(
        x=overall_expenses['Year'],
        y=overall_expenses[col],
        mode='lines+markers',
        line=dict(dash='dot'),
        name=col,
        marker=dict(color=color),
        text=[f'{col}: {y:.2f}%<br>Change: {change:.2f}%' for y, change in zip(overall_expenses[col], overall_expenses[col + '_change'])],
        hoverinfo='text',
    ))

percentage_fig.update_layout(
    title="Progression of Expenses 2008-2021",
    xaxis=dict(title="Year"),
    yaxis=dict(title="Percentage Change"),
).show()

It appears that half of the categories have *decreased* in overall percentage while the other half of categories *increased* in percentage throughout the years of 2008 to 2021. An interesting thing to note is the largest positive increase was in the year 2019-2020 with a 7% increase in social protection. A potential reasoning to why social protection increased so much in this year was that this was that this was the beginning of Covid-19. Unemployment soared during this time and the government may have had to assist more due to increasing poverty rates. However, many would assume *health* would have been prioritized if this was the case, but surprisingly it's allocation in budget decreased. 

Overall, upon closer examination of the specific percentage changes, it becomes apparent that the government has opted for relatively modest adjustments to the budget. This approach is generally positive, as it promotes stability and flexibility to address changing priorities and challenges.

We can also look at the percentages of budget based on province using the `share_of_expenses` dataframe.

In [14]:
share_of_expenses

Unnamed: 0,Province,Health,Education,General public services,Social protection,Economic affairs,Other functions
0,Newfoundland and Labrador,39,17,19,8,9,8
1,Prince Edward Island,33,20,15,9,15,8
2,Nova Scotia,39,18,14,10,9,10
3,New Brunswick,39,18,15,8,11,9
4,Quebec,33,15,18,17,9,8
5,Ontario,35,19,9,14,12,11
6,Manitoba,39,18,14,12,6,11
7,Saskatchewan,29,18,9,13,19,12
8,Alberta,33,18,9,12,17,11
9,British Columbia,36,16,10,15,10,13


## Extended Organization

As an extension to organizing data, *data-cleaning* is an essential step in the data preparation process. Generally, it involves identifying, correcting, and handling errors, inconsistencies, and inaccuracies within a dataset. In our particular case, we will be changing the names of our columns to better suit analysis in later code cells.  

We will also be reading in a *geojson* file, which contains information about the geometric latitude/longitude of the borders of provinces in Canada. 

In [15]:
prov_data = gpd.read_file('https://raw.githubusercontent.com/callysto/data-files/main/Science/ClimateAcrossProvinces/geopandas.geojson')

prov_data.prov_name_fr.replace(
    {
        'Alberta': 'Alberta',
        'Manitoba': 'Manitoba',
        'Yukon': 'Yukon',
        'Terre-Neuve-et-Labrador': 'Newfoundland and Labrador',
        'Nouvelle-Écosse': 'Nova Scotia',
        'Territoires du Nord-Ouest': 'Northwest Territories',
        'Île-du-Prince-Édouard': 'Prince Edward Island',
        'Nunavut': 'Nunavut',
        'Québec': 'Quebec',
        'Ontario': 'Ontario',
        'Colombie-Britannique': 'British Columbia'
    },
    inplace=True
)

prov_data.rename(columns={'prov_name_fr': "Province"}, inplace=True)  
prov_data

Unnamed: 0,geo_point_2d,year,prov_code,prov_name_en,prov_area_code,prov_type,Province,geometry
0,"{'lon': -114.51101438986483, 'lat': 55.1689029...",2021,[48],[Alberta],CAN,province,Alberta,"POLYGON ((-110.00502 48.99970, -110.00449 49.5..."
1,"{'lon': -97.43375328505664, 'lat': 54.93018410...",2021,[46],[Manitoba],CAN,province,Manitoba,"POLYGON ((-94.82808 60.00000, -95.32500 60.000..."
2,"{'lon': -135.51007960983821, 'lat': 63.6304438...",2021,[60],[Yukon],CAN,territory / territoire,Yukon,"MULTIPOLYGON (((-139.13388 69.63737, -139.2478..."
3,"{'lon': -60.51575069823122, 'lat': 52.88843621...",2021,[10],[Newfoundland and Labrador],CAN,province,Newfoundland and Labrador,"MULTIPOLYGON (((-53.63678 48.87151, -53.67894 ..."
4,"{'lon': -66.37667726184117, 'lat': 46.61898879...",2021,[13],[New Brunswick],CAN,province,Nouveau-Brunswick,"MULTIPOLYGON (((-66.93908 45.02470, -67.01600 ..."
5,"{'lon': -105.89075902236681, 'lat': 54.4167141...",2021,[47],[Saskatchewan],CAN,province,Saskatchewan,"POLYGON ((-102.00000 60.00000, -102.75000 60.0..."
6,"{'lon': -63.32450309629852, 'lat': 45.15131558...",2021,[12],[Nova Scotia],CAN,province,Nova Scotia,"MULTIPOLYGON (((-65.60612 43.51377, -65.65127 ..."
7,"{'lon': -118.98262958354398, 'lat': 66.3689231...",2021,[61],[Northwest Territories],CAN,territory / territoire,Northwest Territories,"MULTIPOLYGON (((-109.83296 75.93393, -110.0324..."
8,"{'lon': -63.24228135586117, 'lat': 46.38954858...",2021,[11],[Prince Edward Island],CAN,province,Prince Edward Island,"POLYGON ((-61.98606 46.46286, -62.14666 46.484..."
9,"{'lon': -88.8416338799438, 'lat': 71.041389444...",2021,[62],[Nunavut],CAN,territory / territoire,Nunavut,"MULTIPOLYGON (((-106.67059 73.67679, -106.8352..."


Now that we've properly cleaned our dataframe, we will merge our two dataframes. Don't worry about the particular details on why this is being done; it's primarily for coding purposes to prepare for a future visualization.

In [16]:
prov_data['prov_name_en'] = prov_data['prov_name_en'].apply(lambda x: ''.join(map(str, x)))

merged_data = share_of_expenses.merge(prov_data, left_on='Province', right_on='Province', how='left')
merged_data

Unnamed: 0,Province,Health,Education,General public services,Social protection,Economic affairs,Other functions,geo_point_2d,year,prov_code,prov_name_en,prov_area_code,prov_type,geometry
0,Newfoundland and Labrador,39,17,19,8,9,8,"{'lon': -60.51575069823122, 'lat': 52.88843621...",2021.0,[10],Newfoundland and Labrador,CAN,province,"MULTIPOLYGON (((-53.63678 48.87151, -53.67894 ..."
1,Prince Edward Island,33,20,15,9,15,8,"{'lon': -63.24228135586117, 'lat': 46.38954858...",2021.0,[11],Prince Edward Island,CAN,province,"POLYGON ((-61.98606 46.46286, -62.14666 46.484..."
2,Nova Scotia,39,18,14,10,9,10,"{'lon': -63.32450309629852, 'lat': 45.15131558...",2021.0,[12],Nova Scotia,CAN,province,"MULTIPOLYGON (((-65.60612 43.51377, -65.65127 ..."
3,New Brunswick,39,18,15,8,11,9,,,,,,,
4,Quebec,33,15,18,17,9,8,"{'lon': -71.78624104931967, 'lat': 53.39528349...",2021.0,[24],Quebec,CAN,province,"MULTIPOLYGON (((-73.86407 45.51852, -73.94004 ..."
5,Ontario,35,19,9,14,12,11,"{'lon': -86.0409603890758, 'lat': 50.452667579...",2021.0,[35],Ontario,CAN,province,"MULTIPOLYGON (((-79.91454 44.99301, -79.91794 ..."
6,Manitoba,39,18,14,12,6,11,"{'lon': -97.43375328505664, 'lat': 54.93018410...",2021.0,[46],Manitoba,CAN,province,"POLYGON ((-94.82808 60.00000, -95.32500 60.000..."
7,Saskatchewan,29,18,9,13,19,12,"{'lon': -105.89075902236681, 'lat': 54.4167141...",2021.0,[47],Saskatchewan,CAN,province,"POLYGON ((-102.00000 60.00000, -102.75000 60.0..."
8,Alberta,33,18,9,12,17,11,"{'lon': -114.51101438986483, 'lat': 55.1689029...",2021.0,[48],Alberta,CAN,province,"POLYGON ((-110.00502 48.99970, -110.00449 49.5..."
9,British Columbia,36,16,10,15,10,13,"{'lon': -124.7656381299199, 'lat': 54.75518309...",2021.0,[59],British Columbia,CAN,province,"MULTIPOLYGON (((-124.68935 49.54964, -124.7063..."


Now that we've obtained a merged dataframe, we can visualize the different provincial spending habits via a folium map! By using the top tab called *Column*, you can select a specific column to visualize. Provinces that are deeper in green represent a higher allocation of government budget, while lighter colours represent the opposite. 

In [25]:
spendingbyprov = ipywidgets.Output(layout={'border': '1px solid black'})

column_names = merged_data.columns[1:7].tolist()
dropdown_options = ipywidgets.Dropdown(
    options=column_names,
    value=column_names[0],
    description='Column:',
    disabled=False
)

def update_choropleth(change):
    spendingbyprov.clear_output()
    with spendingbyprov:
        m = folium.Map(location=[60, -85], zoom_start=4)
        folium.Choropleth(
            geo_data=prov_data,
            data=merged_data,
            columns=['prov_name_en', dropdown_options.value],  
            key_on='feature.properties.prov_name_en',  
            fill_color='YlGn',
            fill_opacity=0.7,
            line_opacity=0.2,
            legend_name=f'Spending on {dropdown_options.value} by Province',
        ).add_to(m)
        display(m)

dropdown_options.observe(update_choropleth, names='value')
display(dropdown_options)
update_choropleth({'new': column_names[0]})

spendingbyprov

Dropdown(description='Column:', options=('Health', 'Education', 'General public services', 'Social protection'…

Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…

In the final section of our notebook, we'll be visualizing differences in allocated dollars of federal and provincial governments. We'll also be performing another form of *data-cleaning* in our dataframe called `federal_provincial` by converting all our dollar amounts into valid numbers for analysis later.

In [18]:
federal_provincial = federal_provincial.drop(federal_provincial.index[20:], axis=0).reset_index(drop=True)
federal_provincial['Canadian Classification of Functions of Government (CCOFOG) '] = federal_provincial['Canadian Classification of Functions of Government (CCOFOG) '].str.strip()

columns_to_convert = federal_provincial.columns[1:15]
for col in columns_to_convert:
    federal_provincial[col] = federal_provincial[col].str.replace(',', '', regex=True).astype(float)
federal_provincial

Unnamed: 0,Canadian Classification of Functions of Government (CCOFOG),2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,Public sector components
0,General public services,97511.0,97414.0,105765.0,106642.0,104795.0,104685.0,104394.0,106415.0,107110.0,113692.0,121319.0,128317.0,130723.0,143512.0,Consolidated Canadian general government
1,Defence,15805.0,16357.0,15194.0,16710.0,16506.0,16210.0,15654.0,15773.0,15962.0,17668.0,16925.0,18209.0,21580.0,18622.0,Consolidated Canadian general government
2,Public order and safety,27167.0,28915.0,29569.0,31243.0,31460.0,32633.0,33609.0,33814.0,34703.0,36344.0,37614.0,40517.0,40528.0,43408.0,Consolidated Canadian general government
3,Economic affairs,46470.0,49274.0,51220.0,52998.0,53321.0,52867.0,53670.0,53249.0,54745.0,57730.0,67694.0,76104.0,162153.0,110569.0,Consolidated Canadian general government
4,Environmental protection,9722.0,10739.0,11724.0,11871.0,11709.0,11312.0,11705.0,13708.0,14699.0,14837.0,15760.0,16424.0,16625.0,16905.0,Consolidated Canadian general government
5,Housing and community amenities,7756.0,8252.0,8381.0,8415.0,8393.0,8606.0,9226.0,9359.0,10767.0,10865.0,10107.0,10838.0,11110.0,11706.0,Consolidated Canadian general government
6,Health,125458.0,133246.0,138998.0,145819.0,149139.0,154292.0,159331.0,165025.0,170053.0,175947.0,180593.0,188186.0,214820.0,228759.0,Consolidated Canadian general government
7,"Recreation, culture and religion",14547.0,14527.0,14796.0,15707.0,16610.0,16679.0,16335.0,17423.0,17545.0,18599.0,19126.0,19804.0,19810.0,19652.0,Consolidated Canadian general government
8,Education,72348.0,76766.0,80145.0,83711.0,85743.0,88536.0,90489.0,91762.0,94463.0,101374.0,106575.0,109645.0,110909.0,115292.0,Consolidated Canadian general government
9,Social protection,123892.0,134802.0,137155.0,140713.0,141348.0,144774.0,149281.0,157582.0,167464.0,175456.0,184106.0,191659.0,323063.0,261042.0,Consolidated Canadian general government


Now that we have valid numeric values, let's create a visualization of each different classifications based on federal and provincial budgets. 

*Note*: Since many of the classifications have long names, we'll be using an abbreviation system. The abbreviated names will also be printed below the visualization.

In [19]:
classification_name_mapping = {
    'General public services': 'GPS',
    'Public order and safety': 'POS',
    'Economic affairs': 'EA',
    'Environmental protection': 'EP',
    'Housing and community amenities': 'HCA',
    'Recreation, culture and religion': 'RCR',
    'Social protection': 'SP'
}

federal_provincial['Canadian Classification of Functions of Government (CCOFOG) '] = federal_provincial['Canadian Classification of Functions of Government (CCOFOG) '].map(classification_name_mapping).fillna(federal_provincial['Canadian Classification of Functions of Government (CCOFOG) '])
classifications = federal_provincial['Canadian Classification of Functions of Government (CCOFOG) '].unique()
num_columns = len(classifications)

all_classifications_fig = make_subplots(rows=1, cols=num_columns, subplot_titles=classifications)

col_num = 1

for classification in classifications:
    temp_df = federal_provincial[federal_provincial['Canadian Classification of Functions of Government (CCOFOG) '] == classification]
    temp_df = temp_df.melt(id_vars=['Canadian Classification of Functions of Government (CCOFOG) ', 'Public sector components'], var_name='Year', value_name='Value')

    traces = []
    for component in temp_df['Public sector components'].unique():
        trace = go.Scatter(
            x=temp_df[temp_df['Public sector components'] == component]['Year'],
            y=temp_df[temp_df['Public sector components'] == component]['Value'],
            mode='lines+markers',
            name=component
        )
        traces.append(trace)

    for trace in traces:
        all_classifications_fig.add_trace(trace, row=1, col=col_num)

    all_classifications_fig.update_xaxes(title_text='Year', row=1, col=col_num)

    all_classifications_fig.update_yaxes(title_text='', row=1, col=col_num)

    col_num += 1

all_classifications_fig.update_layout(
    title='Spending Over the Years by Classification',
    showlegend=False  
).show()

print("Classification Name Mapping:")
for full_name, short_name in classification_name_mapping.items():
    print(f"{full_name} => {short_name}")

Classification Name Mapping:
General public services => GPS
Public order and safety => POS
Economic affairs => EA
Environmental protection => EP
Housing and community amenities => HCA
Recreation, culture and religion => RCR
Social protection => SP


Looking at our final visualization, we can definitively say that federal government budgets are *much* larger than their respective provincial counterparts. The only classification that has a similar federal and provincial budget is *Health*. Interestingly, the classification *Defence* has no budget allocated in the provincial sector. Can you interpret ideas on why this could be?

# Interpret

### Reflect on What You See

Think about the following questions.

1. How have shifts in government spending patterns impacted your perception of public service accessibility and quality in your area?
2. What strategies or policies do you believe should be implemented to ensure fair and equitable distribution of government resources across various sectors and regions?
3. What insights can be gained from historical instances of government budget adjustments, and how can they inform proactive measures to address future challenges related to government resource allocation and spending decisions?

# Communicate

Below are some writing prompts to help you reflect on the new information that is presented from the data. When we look at the evidence, think about what you perceive about the information. Is this perception based on what the evidence shows? If others were to view it, what perceptions might they have?

- I used to think ____________________ but now I know ____________________. 
- I wish I knew more about ____________________. 
- This visualization reminds me of ____________________. 
- I really like ____________________.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)