# Kaggle API SQL Analysis: Team Payroll Statistics

## 5-Step Analytics Framework

**1. Define the business problem**  
Understand payroll spending patterns across MLB teams and how payrolls have evolved over time.

**2. Collect and prepare the data**  
Data from `mlb_payrolls.csv` → uploaded to `sql_project.team_payroll_records`.

**3. Analyze the data and generate insights**  
Descriptive and diagnostic queries to explore payroll trends.

**4. Communicate insights**  
Findings and recommendations below each query.

**5. Take action**  
Inform decision-makers about payroll trends and budget optimizations.

In [1]:
import os
import pandas as pd
from sqlalchemy import create_engine
from dotenv import load_dotenv

load_dotenv()

pg_user = os.getenv('PG_USER')
pg_password = os.getenv('PG_PASSWORD')
pg_host = os.getenv('PG_HOST')
pg_db = os.getenv('PG_DB')

engine = create_engine(
    f"postgresql+psycopg2://{pg_user}:{pg_password}@{pg_host}:5432/{pg_db}"
)

pd.set_option('display.max_rows', None)

In [2]:
sql_query = '''
WITH payrolls AS (
    SELECT
        "Team Full" AS team,
        "Year" AS year,
        REPLACE(REPLACE("Total Payroll Allocations", '$', ''), ',', '')::numeric AS payroll
    FROM sql_project.team_payroll_records
)

SELECT
    year,
    AVG(payroll) AS avg_payroll,
    MAX(payroll) AS max_payroll,
    MIN(payroll) AS min_payroll
FROM payrolls
GROUP BY year
ORDER BY year;
'''

df = pd.read_sql(sql_query, engine)
df

Unnamed: 0,year,avg_payroll,max_payroll,min_payroll
0,2011,100977500.0,210950685.0,45386925.0
1,2012,107020200.0,225589742.0,54547239.0
2,2013,110210700.0,236694375.0,28727913.0
3,2014,119154000.0,241557818.0,50559679.0
4,2015,127587900.0,301735080.0,72990525.0
5,2016,133152000.0,264470494.0,62161191.0
6,2017,139775100.0,259311393.0,64576736.0
7,2018,138377600.0,227398860.0,68810167.0
8,2019,137687900.0,229166880.0,64178722.0
9,2020,61206650.0,124719080.0,23478635.0


**Insight:** Payrolls have steadily increased across MLB over the years.  
**Recommendation:** Monitor spending growth rates to ensure competitiveness.  
**Prediction:** Payrolls will continue to increase in alignment with revenue growth.

In [3]:
sql_query = '''
WITH payrolls AS (
    SELECT
        "Team Full" AS team,
        "Year" AS year,
        REPLACE(REPLACE("Total Payroll Allocations", '$', ''), ',', '')::numeric AS payroll
    FROM sql_project.team_payroll_records
),
payroll_growth AS (
    SELECT
        team,
        year,
        payroll,
        payroll - LAG(payroll) OVER (PARTITION BY team ORDER BY year) AS payroll_change
    FROM payrolls
)

SELECT *
FROM payroll_growth
WHERE payroll_change IS NOT NULL
ORDER BY payroll_change DESC
LIMIT 10;
'''

df2 = pd.read_sql(sql_query, engine)
df2

Unnamed: 0,team,year,payroll,payroll_change
0,Los Angeles Dodgers,2021,265343390.0,140624310.0
1,Los Angeles Dodgers,2013,236694375.0,119491658.0
2,New York Mets,2021,201189189.0,119243591.0
3,Philadelphia Phillies,2021,190513223.0,116969676.0
4,Los Angeles Angels,2021,182749560.0,115708667.0
5,Houston Astros,2021,194222042.0,111566937.0
6,San Diego Padres,2021,179767346.0,106905732.0
7,Boston Red Sox,2021,186718068.0,102507678.0
8,Texas Rangers,2023,251332754.0,101295308.0
9,San Francisco Giants,2021,171191545.0,97782728.0


**Insight:** Some teams increased payroll drastically in key seasons.  
**Recommendation:** Review the correlation between these increases and performance outcomes.  
**Prediction:** High payroll increases may not always lead to proportionally higher performance.