# Performance Progression Modeling

This notebook looks at how athletic performance changes across age categories using historical averages. 

Ideally this would not be used as a predictive model but as an estimate.

In [1]:
import sqlite3
import pandas as pd

In [2]:
conn = sqlite3.connect("../track_jumps.db")

Reduced 'HAVING COUNT(*) >= 10' to 10 to include results from all age categories in each event

In [5]:
query = """
SELECT
    e.event_name,
    e.gender,
    f.age_group,
    ROUND(AVG(f.mark), 2) AS avg_mark,
    COUNT(*) AS performances
FROM fact_performances f
JOIN dim_events e
    ON f.event_id = e.event_id
WHERE f.age_group IN ('U18', 'U20', 'SENIOR')
GROUP BY
    e.event_name,
    e.gender,
    f.age_group
HAVING COUNT(*) >= 10
ORDER BY
    e.event_name,
    e.gender,
    f.age_group;
"""
df_progression = pd.read_sql(query, conn)
df_progression


Unnamed: 0,event_name,gender,age_group,avg_mark,performances
0,high jump,F,SENIOR,1.98,3677
1,high jump,F,U18,1.92,110
2,high jump,F,U20,1.97,89
3,high jump,M,SENIOR,2.29,9499
4,high jump,M,U18,2.27,23
5,high jump,M,U20,2.27,368
6,long jump,F,SENIOR,6.89,3861
7,long jump,F,U18,6.64,32
8,long jump,F,U20,6.8,100
9,long jump,M,SENIOR,8.16,9475


#### Using a .pivot_table to show results on one row per event/gender.

In [6]:
pivot = df_progression.pivot_table(
    index=["event_name", "gender"],
    columns="age_group",
    values="avg_mark"
).reset_index()

pivot

age_group,event_name,gender,SENIOR,U18,U20
0,high jump,F,1.98,1.92,1.97
1,high jump,M,2.29,2.27,2.27
2,long jump,F,6.89,6.64,6.8
3,long jump,M,8.16,7.89,8.1
4,triple jump,F,14.32,14.15,14.17
5,triple jump,M,17.08,16.97,17.05


#### Calulate the progression of improvement from U18 to U20, U20 to Senior, and U18 to Senior.

In [8]:
pivot["u18_to_u20_pct"] = (
    (pivot["U20"] - pivot["U18"]) / pivot ["U18"]
) * 100

pivot["u20_to_senior_pct"] = (
    (pivot["SENIOR"] - pivot["U20"]) / pivot ["U20"]
) * 100

pivot["u18_to_senior_pct"] = (
    (pivot["SENIOR"] - pivot["U18"]) / pivot ["U18"]
) * 100

pivot.round(2)

age_group,event_name,gender,SENIOR,U18,U20,u18_to_u20_pct,u20_to_senior_pct,u18_to_senior_pct
0,high jump,F,1.98,1.92,1.97,2.6,0.51,3.13
1,high jump,M,2.29,2.27,2.27,0.0,0.88,0.88
2,long jump,F,6.89,6.64,6.8,2.41,1.32,3.77
3,long jump,M,8.16,7.89,8.1,2.66,0.74,3.42
4,triple jump,F,14.32,14.15,14.17,0.14,1.06,1.2
5,triple jump,M,17.08,16.97,17.05,0.47,0.18,0.65


#### Drawing Conclusions

As a top level athlete ages we see their performance improves regardless of event or gender, which is to be expected.
Women tend to have greater improvement margin in comparison to men from U18 to senior level in all jumps. 

Based on this data one could conclude a female triple jumper with a mark of 13.9m in the U18 age level could see a mark 
around 14.07 by their senior level assuming training continues without interuption.