# Analyzing New York City employees' payroll database 

## Data Source: [NYC open data](https://data.cityofnewyork.us/City-Government/Citywide-Payroll-Data-Fiscal-Year-/k397-673e/data)

In [1]:
import pandas as pd
df = pd.read_csv('Citywide_Payroll_Data__Fiscal_Year_.csv')
df.columns = df.columns.str.replace(" ", "_")
df.columns = df.columns.str.replace("-", "_")
df.columns = df.columns.str.lower()
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.options.display.float_format = '{:,.2f}'.format



In [2]:
df.shape

(2864545, 17)

In [3]:
# Previous versions of the dataset didn't import all the years correctly, so I'll do these sanity checks 
# a couple of times
df.fiscal_year.value_counts()

2019    592431
2020    590210
2021    573477
2017    562266
2018    546161
Name: fiscal_year, dtype: int64

#### Cleaning the data

The database includes people whose work locations are outside of NYC. For this analyses, I'm only including employees with work locations in NYC boroughs. Since Staten Island wasn't listed (unless included in the "other" location, this is filtered down to Queens, Manhattan, Bronx and Brooklyn.

In [4]:
boroughs = ['QUEENS', 'MANHATTAN', 'BROOKLYN', 'BRONX']

In [5]:
df = df[df.work_location_borough.isin(boroughs)]

In [6]:
df.shape

(2760682, 17)

Note: This reduced the dataset by 103,863 rows.

🚨 `Editorial choice`

There's plenty of possibilities here, but I'm mostly interested in looking at overtime pay. Let's see who made the most in overtime

In [7]:
df.head(3)

Unnamed: 0,fiscal_year,payroll_number,agency_name,last_name,first_name,mid_init,agency_start_date,work_location_borough,title_description,leave_status_as_of_june_30,base_salary,pay_basis,regular_hours,regular_gross_paid,ot_hours,total_ot_paid,total_other_pay
0,2017,,ADMIN FOR CHILDREN'S SVCS,AARON,TERESA,,03/21/2016,BRONX,CHILD PROTECTIVE SPECIALIST,ACTIVE,51315.0,per Annum,1825.0,51709.59,588.0,22374.31,639.66
1,2017,,ADMIN FOR CHILDREN'S SVCS,AARONS,CAMELIA,M,08/08/2016,BROOKLYN,CHILD PROTECTIVE SPECIALIST,ACTIVE,51315.0,per Annum,1595.55,41960.18,121.75,3892.19,108.25
2,2017,,ADMIN FOR CHILDREN'S SVCS,ABDUL,MODUPE,,02/11/2008,BROOKLYN,CHILD PROTECTIVE SPECIALIST,ACTIVE,54720.0,per Annum,1825.0,56298.93,54.75,2455.88,3938.75


In [8]:
df.sort_values(by='total_ot_paid', ascending = False).head(10)

Unnamed: 0,fiscal_year,payroll_number,agency_name,last_name,first_name,mid_init,agency_start_date,work_location_borough,title_description,leave_status_as_of_june_30,base_salary,pay_basis,regular_hours,regular_gross_paid,ot_hours,total_ot_paid,total_other_pay
2291076,2021,996.0,NYC HOUSING AUTHORITY,PROCIDA,ROBERT,,04/13/1987,BRONX,SUPERVISOR PLUMBER,ACTIVE,387.03,per Day,1820.0,100627.8,2249.5,248749.72,7215.34
2291070,2021,816.0,DEPT OF HEALTH/MENTAL HYGIENE,MCGROARTY,MICHAEL,,10/06/2014,QUEENS,STATIONARY ENGINEER,ACTIVE,508.8,per Day,2080.0,132288.0,2374.75,238829.13,40105.0
2291085,2021,996.0,NYC HOUSING AUTHORITY,MARKOWSKI,JAKUB,,05/31/2016,BRONX,PLUMBER,ACTIVE,369.53,per Day,1820.0,96077.8,2119.5,223776.86,5899.29
2291072,2021,816.0,DEPT OF HEALTH/MENTAL HYGIENE,PETTIT,PATRICK,J,08/02/2010,MANHATTAN,STATIONARY ENGINEER,ACTIVE,508.8,per Day,2080.0,132288.0,2152.75,218694.96,38611.82
2291071,2021,816.0,DEPT OF HEALTH/MENTAL HYGIENE,HALLAHAN,PATRICK,M,02/26/2018,BROOKLYN,STATIONARY ENGINEER,ACTIVE,508.8,per Day,2080.0,132288.0,2115.25,218628.18,56616.07
2291081,2021,3.0,BOARD OF ELECTION,"ORTIZ, JR",ANTONIO,,08/27/1995,MANHATTAN,SENIOR SYSTEMS ANALYSTS,ACTIVE,117003.0,per Annum,1820.0,116673.77,2461.25,217915.94,2974.95
2234227,2020,996.0,NYC HOUSING AUTHORITY,PROCIDA,ROBERT,,04/13/1987,BRONX,SUPERVISOR PLUMBER,ACTIVE,387.03,per Day,1820.0,100627.8,1944.5,215022.81,6468.93
2234228,2020,996.0,NYC HOUSING AUTHORITY,ORTIZ,JOSE,,11/27/1989,QUEENS,SUPERVISOR PLUMBER,ACTIVE,387.03,per Day,1820.0,100627.8,1937.5,214248.85,5860.74
1069369,2018,996.0,NYC HOUSING AUTHORITY,GIURBINO,VINCENZO,,04/28/2003,BROOKLYN,PLUMBER,ACTIVE,361.48,per Day,1825.0,93984.8,2043.0,213634.68,7539.44
2291078,2021,996.0,NYC HOUSING AUTHORITY,DALEY,GARFIELD,D,05/24/1994,BRONX,SUPERVISOR ELECTRICIAN,ACTIVE,460.25,per Day,1820.0,119469.25,2032.5,200038.56,28316.97


# 📝 The top 10 employees who made the most in overtime all made over $200K

- 6 of them work for NYCHA
- 3 for the Dept of Health/Mental Hygiene and 1 for the Board of Election
- 9 out of these 10 are paid on a daily basis
- They have all worked more hours in overtime than regular hours

#### I'm interested in looking at employees who worked more overtime hours than regular hours, but first, let's look at NYCHA a bit closely.

In [11]:
nycha = df.query('agency_name == "NYC HOUSING AUTHORITY"')

In [None]:
nycha