# **An Exploratory Analysis of Payroll and Overtime Trends in Louisville Metro Government**

# **Project Overview**

**Domain**

Public Administration | HR Analytics | Payroll & Workforce Analalysis

**Objective**

This project analyzes publicly available employee salary data from the Louisville Metro Government to understand compensation structure, overtime usage, departmental spending patterns, and year-over-year salary trends. The dataset provides transparency into how public funds are allocated across departments, job roles, and time periods.

**Problem Definition**

Louisville Metro faces challenges in ensuring fair compensation practices, managing overtime costs, and maintaining budget efficiency across diverse departments. Without clear insights into salary distribution, overtime patterns, and departmental pay structures, leadership struggles to identify pay disparities, control excessive overtime expenses, and make informed data-driven compensation and budgeting decisions.

**Dataset**

The dataset consists of 40,829 rows and 11 columns, representing employee-wise, year-wise payroll and overtime details

# **Step 1: Data Loading and Initial Overview**

**Goal:** In this step, the dataset is loaded and an initial inspection is performed to understand its structure, size, columns, data types, and missing values.

# **1.1 Import Required Libraries**

In this step, essential Python libraries are imported to support data manipulation, numerical operations, and data visualization required for analyzing the employee salary dataset.

In [None]:
import pandas as pd # Import pandas library for data manipulation and analysis
import numpy as np  # Import numpy library for numerical computations and handling NaN values

pd.set_option('display.max_columns',None) # Set pandas option to display all columns in the DataFrame output
pd.set_option('display.float_format','{:.2f}'.format) # Set pandas option to format floating-point numbers to two decimal places

In [None]:
print('Python libraries imported successfully')

Python libraries imported successfully


# **1.2 Data Source and Loading**

Data Source: Louisville Metro Government – Human Resources Department

Source Link: https://catalog.data.gov/dataset/louisville-metro-ky-employee-salary-data-6cc9e

**Dataset Description**

The key highlights of the given dataset are the following:

**Scale:** 40,829 records and 11 attributes

**Content:** Employee-wise, year-wise payroll data including department, job title, annual salary rate, regular pay, overtime pay, allowances, other payments, and total YTD compensation

**Focus:** Analysis of employee compensation structure, overtime dependency, departmental payroll distribution, and year-over-year salary trends within Louisville Metro Government

# **Loading Objectives:**
Let’s load the CSV file and quickly examine it to understand the data.

**Total number of records (rows) –** The dataset contains 40,829 employee-year records, with each entry representing an individual employee’s compensation details for a specific calendar year.

**Total number of attributes (columns) –** There are 11 variables describing employee compensation, including department, job title, annual salary rate, regular pay, overtime pay, allowances, other payments, and total year-to-date compensation.

**Overall data structure and quality:** The dataset is well-structured with clearly defined employee compensation fields, making it suitable for analyzing salary patterns and changes in total compensation driven by overtime after basic data cleaning and validation.

In [None]:
df = pd.read_csv('/content/Louisville_Metro_KY_-_Employee_Salary_Data.csv') # Load the employee salary dataset into a Pandas DataFrame
print('Employee salary dataset loaded successfully')

Employee salary dataset loaded successfully


# **1.3 Dataset Dimensions (Rows and Columns)**
The df.shape output (40829, 11) indicates that the dataset contains 40,829 rows, representing employee-year compensation records, and 11 columns, capturing various components of employee salary and overtime-related compensation.

In [None]:
df.shape # Display the number of rows and columns in the dataset

(40829, 11)

# **1.4 Column Data Types Overview**

The df.dtypes function displays the data type of each column in the dataset, helping to verify whether variables are correctly interpreted as numerical or categorical for further analysis.

In [None]:
df.dtypes # Shows data types of all columns
print("Data types of each column:")
print(df.dtypes)

Data types of each column:
CalYear                  int64
Employee_Name           object
Department              object
jobTitle                object
Annual_Rate            float64
Regular_Rate           float64
Overtime_Rate          float64
Incentive_Allowance    float64
Other                  float64
YTD_Total              float64
ObjectId                 int64
dtype: object


# **1.5 Dataset Information Summary**

The df.info() function provides a concise summary of the dataset, including the number of non-null values, data types, and memory usage, helping to assess overall data completeness and quality before analysis.

In [None]:
print("Dataset information:")
df.info() # Display dataset information

Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40829 entries, 0 to 40828
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   CalYear              40829 non-null  int64  
 1   Employee_Name        40827 non-null  object 
 2   Department           40828 non-null  object 
 3   jobTitle             40828 non-null  object 
 4   Annual_Rate          40829 non-null  float64
 5   Regular_Rate         40829 non-null  float64
 6   Overtime_Rate        40829 non-null  float64
 7   Incentive_Allowance  40829 non-null  float64
 8   Other                0 non-null      float64
 9   YTD_Total            40829 non-null  float64
 10  ObjectId             40829 non-null  int64  
dtypes: float64(6), int64(2), object(3)
memory usage: 3.4+ MB


# **1.6 Descriptive Statistics Overview**

The df.describe() function generates descriptive statistics for numerical columns, including count, mean, standard deviation, minimum, maximum, and quartile values, providing an initial understanding of data distribution and variability.

In [None]:
print("Statistical summary of the dataset:")
df.describe() # Display statistical summary of numerical columns

Statistical summary of the dataset:


Unnamed: 0,CalYear,Annual_Rate,Regular_Rate,Overtime_Rate,Incentive_Allowance,Other,YTD_Total,ObjectId
count,40829.0,40829.0,40829.0,40829.0,40829.0,0.0,40829.0,40829.0
mean,2023.47,58951.58,40022.06,6157.04,1929.25,,49063.73,20415.0
std,1.67,27009.11,31773.6,12416.97,3921.08,,40597.17,11786.46
min,2021.0,0.0,-320.64,-58.1,-2500.0,,-320.64,1.0
25%,2022.0,42827.2,6589.12,0.0,0.0,,7490.7,10208.0
50%,2024.0,56243.2,41785.54,289.62,0.0,,47318.69,20415.0
75%,2025.0,74131.2,61484.8,5636.03,2000.0,,76387.1,30622.0
max,2026.0,520000.0,275961.58,176601.62,43853.33,,294733.23,40829.0


# **1.7 Preview of the Dataset (First 10 Records)**
The df.head(10) function displays the first ten rows of the dataset, allowing an initial review of the data values, column structure, and overall formatting to ensure the dataset has been loaded correctly.

In [None]:
print("First 10 rows of the dataset:")
df.head(10) # Display the first 10 rows of the dataset print

First 10 rows of the dataset:


Unnamed: 0,CalYear,Employee_Name,Department,jobTitle,Annual_Rate,Regular_Rate,Overtime_Rate,Incentive_Allowance,Other,YTD_Total,ObjectId
0,2026,"Summers, William E",OMB Finance,Board Member,0.0,0.0,0.0,0.0,,0.0,1
1,2026,"Martin, David",Louisville Free Public Library,Library Page LU,30763.2,1183.2,0.0,0.0,,1183.2,2
2,2026,"Ammon, Darrell Sheridan",Louisville Metro Police Department,Criminal Justice Specialist,70844.8,2724.8,0.0,0.0,,2724.8,3
3,2026,"Waggoner, David B.",Louisville Fire,Fire Prevention Inspector I,75944.96,2555.84,0.0,239.25,,14478.93,4
4,2026,"Lenahan, Larry J",OMB Finance,Budget Analyst I,46248.8,1200.71,0.0,0.0,,1200.71,5
5,2026,"Taylor, Steven R",Louisville Zoo,ASST. Director,113882.82,4380.17,0.0,0.0,,4380.17,6
6,2026,"Cole, Aaron Willis",Codes & Regulations,Board Member,0.0,0.0,0.0,0.0,,0.0,7
7,2026,"May, Theresa Y",Louisville Metro Police Department,Word Processing Clerk Police,39499.2,1519.2,0.0,60.12,,1579.32,8
8,2026,"Taylor, Mary D",Louisville Zoo,Volunteer Coordinator,56950.4,2190.43,0.0,0.0,,2190.43,9
9,2026,"Gardner, Kevin",Facilities and Fleet Management,Mechanic III-Heavy Equipment,76440.0,3032.0,3.79,300.0,,3335.79,10


# **1.8 Preview of the Dataset (Last 10 Records)**
The df.tail(10) function displays the last ten rows of the dataset, helping verify the dataset’s completeness, ordering, and consistency of values toward the end of the data.



In [None]:
print("Last 10 rows of the dataset:")
df.tail(10)  # Display the last 10 rows of the dataset print

Last 10 rows of the dataset:


Unnamed: 0,CalYear,Employee_Name,Department,jobTitle,Annual_Rate,Regular_Rate,Overtime_Rate,Incentive_Allowance,Other,YTD_Total,ObjectId
40819,2021,"Craft, Mary",Public Health & Wellness,Environmental Health Spec U317,43160.0,1660.0,39.43,0.0,,1699.43,40820
40820,2021,"Sachse, Jeremy",Public Works,Equipment Operator-TM,39041.6,1501.6,0.0,0.0,,1501.6,40821
40821,2021,"Carter, Carissa",Metro Animal Services,Animal Control Officer Trainee,36046.4,1386.4,38.13,0.0,,1424.53,40822
40822,2021,"Campbell, Gwendolyn",Develop Louisville,Staff Helper/Internal,15600.0,105.0,0.0,0.0,,105.0,40823
40823,2021,"Jones, Angela",Human Resources,Human Resources Generalist,37440.0,1440.0,0.0,0.0,,1440.0,40824
40824,2021,"Johnson, Darrin",Technology Services,PC Support Analyst II,44803.2,1723.2,0.0,0.0,,1723.2,40825
40825,2021,"Biagi, Tristan",Technology Services,PC Support Analyst II,44803.2,861.6,0.0,0.0,,861.6,40826
40826,2021,"Lett, Ethan",Develop Louisville,Planning Technician U315,34049.6,654.8,0.0,0.0,,654.8,40827
40827,2021,"Brown, LaFon",Solid Waste Management,Sanitation Tipper-CDL,41891.2,2094.56,594.13,0.0,,2688.69,40828
40828,2021,"King, Timothy",Public Works,Equipment Operator-TM,39041.6,750.8,0.0,0.0,,750.8,40829
