# City of Chicago Yearly Budget Analysis - Pulling Data from API

Data is taken from the City of Chicago Data Portal website:

https://data.cityofchicago.org/

Specifically the 2011 through 2017 "Budget Ordinance - Positions and Salaries" APIs found in the Administration & Finance category

In [1]:
#Dependencies
import pandas as pd
from sodapy import Socrata
from config import apikey
from pprint import pprint

In [2]:
# Base url
url = "data.cityofchicago.org"

# authenticated client (needed for non-public datasets):
client = Socrata(url, apikey)

# Budget ordinance codes for 2011-2017
budget_eps = ["sgie-nk8s", "c5pd-qhn7", "vjc3-qwxu", "c3a6-cgvg", "2mkx-pejj", "d823-3fqq", "ghci-5ryk"]

In [3]:
# set the first year of data to be pulled from the API
y = 2011

# Create an empty list to append the year for each row of data being pulled
year = []

# Create an empty data frame to append the yearly budget data to
overall_budget_df = pd.DataFrame()

# Loop through the yearly budget codes to pull relevant data from the API
for item in budget_eps:
    
    # Get results for a specific year and append to a data frame
    results = client.get(item, limit=10000)
    results_df = pd.DataFrame.from_records(results)
    
    # Column names are different for the 2011 data. Use this loop to make consistent with data from 2012-2017
    if y == 2011:
        results_df.rename(columns={'department_code': 'department_number', 'department_name': 'department_description',#\
                                  'division_name': 'division_description', 'fund_name': 'fund_description',\
                                  'section_name': 'section_description', 'subsection_code': 'sub_section_code',\
                                  'subsection_name': 'sub_section_description', 'total_budgeted_units': 'total_budgeted_unit'}, \
                          inplace=True)
    
    # append budget data to the empty dataframe above
    overall_budget_df = pd.concat([overall_budget_df,results_df])
    
    for x in range(0,len(results)):
        year.append(y)
    
    # Change to the next year in the budget list
    y = y + 1

In [4]:
# Add the year column to the overall dataframe
overall_budget_df['year'] = year

# Print a sample of the dataframe
overall_budget_df.head()

Unnamed: 0,bargaining_unit,budgeted_pay_rate,budgeted_unit,department,department_code,department_description,department_number,division_code,division_description,fund_code,...,schedule_grade,section_code,section_description,sub_section_code,sub_section_description,title_code,title_description,total_budgeted_amount,total_budgeted_unit,year
0,,2485,Annual,CITY CLERK,,CITY CLERK,25,2005,City Clerk,300,...,,3030,Vehicle License Data Services,0,,15,Schedule Salary Adjustments,2485,0,2011
1,,2389,Annual,DEPT OF FIN,,DEPARTMENT OF FINANCE,27,2005,City Comptroller,610,...,,3030,Auditing,0,,15,Schedule Salary Adjustments,2389,0,2011
2,,73932,Annual,DEPT OF FIN,,DEPARTMENT OF FINANCE,27,2005,City Comptroller,610,...,,3030,Auditing,0,,102,Accountant II,73932,1,2011
3,,76536,Annual,TREAS,,CITY TREASURER,28,2005,City Treasurer,100,...,,3015,Financial Reporting,0,,104,Accountant IV,76536,1,2011
4,,88140,Annual,TREAS,,CITY TREASURER,28,2005,City Treasurer,100,...,,3015,Financial Reporting,0,,104,Accountant IV,88140,1,2011


In [15]:
# Make a clean dataframe with desired columns
budget_by_year = overall_budget_df[['year', 'fund_type', 'fund_code', 'fund_description', 'department_number', 'department_description', \
                                   'organization_code', 'division_code', 'division_description', 'section_code', 'section_description', \
                                   'title_code', 'title_description', 'budgeted_unit', 'total_budgeted_unit', 'budgeted_pay_rate', \
                                   'total_budgeted_amount']]

In [16]:
# print head of dataframe
budget_by_year.head()

Unnamed: 0,year,fund_type,fund_code,fund_description,department_number,department_description,organization_code,division_code,division_description,section_code,section_description,title_code,title_description,budgeted_unit,total_budgeted_unit,budgeted_pay_rate,total_budgeted_amount
0,2011,Local,300,VEHICLE FUND,25,CITY CLERK,1005,2005,City Clerk,3030,Vehicle License Data Services,15,Schedule Salary Adjustments,Annual,0,2485,2485
1,2011,Local,610,MIDWAY AIRPORT FUND,27,DEPARTMENT OF FINANCE,1005,2005,City Comptroller,3030,Auditing,15,Schedule Salary Adjustments,Annual,0,2389,2389
2,2011,Local,610,MIDWAY AIRPORT FUND,27,DEPARTMENT OF FINANCE,1005,2005,City Comptroller,3030,Auditing,102,Accountant II,Annual,1,73932,73932
3,2011,Local,100,CORPORATE FUND,28,CITY TREASURER,1005,2005,City Treasurer,3015,Financial Reporting,104,Accountant IV,Annual,1,76536,76536
4,2011,Local,100,CORPORATE FUND,28,CITY TREASURER,1005,2005,City Treasurer,3015,Financial Reporting,104,Accountant IV,Annual,1,88140,88140


In [17]:
# Convert budget information from string to float
budget_by_year["total_budgeted_amount"] = budget_by_year["total_budgeted_amount"].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


In [18]:
# check to ensure the budget information is now float
budget_by_year["total_budgeted_amount"].max()

189413136.0

In [19]:
# Convert pay rate information from string to float
budget_by_year["budgeted_pay_rate"] = budget_by_year["budgeted_pay_rate"].astype(str)
budget_by_year["budgeted_pay_rate"] = budget_by_year["budgeted_pay_rate"].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [20]:
# check to ensure the pay rate information is now float
budget_by_year["budgeted_pay_rate"].max()

51455255.0

In [21]:
# write the data from to a csv
budget_by_year.to_csv("budget_by_year.csv", index=False)