# What do I want to end up with?
## Tenniest apparatus (per year + aggregated)
## Tenniest teams (per year + aggregrated)
## Top 10 (20?) goats of all time (by average score) (by apparatus?)
## Bubble maps x axis year, y axis team, size = no. 10s (colour/pie apparatus if poss?)
## Avg score over time (colour by team)

# 1 The tenniest apparatus

Which apparatus (vault, uneven bars, balance beam or floor exercise) attract the most 10s from the judges? Has it changed over time?

Intuitively, one would assume that vault would attract the fewest deductions; gymnasts are only performing one skill, so there are fewer opportunities to make mistakes.

However, my anecdotal observation as a watcher of college gymnastics is that the judges in this competition are fairly lenient; hesitancy on beam or short handstands on bars might not incur the deduction they would in other leagues. However, they are quite strict on landings - that is, if a gymnast doesn't perfectly stick their landing, they will incur a deduction. Given vault's landing difficulty, does this even out the advantage of having to perform fewer skills?

In [1]:
!pip install -r ../requirements.txt



In [2]:
import os
import json
import requests
import sqlite3
from tqdm.notebook import tqdm, trange
tqdm.pandas()
import numpy as np
import pandas as pd 
from sqlalchemy import create_engine
from lets_plot import * # This imports all of ggplot2's functions
LetsPlot.setup_html()
import plotly.express as px

%load_ext sql
%config SqlMagic.autocommit=True

from pprint import pprint

In [3]:
%sql sqlite:///../data/clean/gymternet.db --alias gymternet 
engine = create_engine('sqlite:///../data/clean/gymternet.db')

In [4]:
%%sql --alias gymternet

SELECT COUNT(*) FROM gymnast_results WHERE vt_score = 10.0 OR ub_score = 10.0 OR bb_score = 10.0 OR fx_score = 10.0;

COUNT(*)
433


In [26]:
%%sql gymternet

-- LEFT JOIN with aggregated row at the bottom
SELECT 
    SUM(r.vt_score = 10.0) AS 'Vault',
    SUM(r.ub_score = 10.0) AS 'Uneven Bars',
    SUM(r.bb_score = 10.0) AS 'Balance Beam',
    SUM(r.fx_score = 10.0) AS 'Floor Exercise',
    m.year AS 'Season'
FROM gymnast_results AS r
LEFT JOIN meets AS m
ON m.meet_id = r.meet_id
GROUP BY m.year

-- UNION ALL

-- SELECT 
--     SUM(r.vt_score = 10.0) AS 'Vault',
--     SUM(r.ub_score = 10.0) AS 'Uneven Bars',
--     SUM(r.bb_score = 10.0) AS 'Balance Beam',
--     SUM(r.fx_score = 10.0) AS 'Floor Exercise',
--     SUM(r.vt_score = 10.0) + SUM(r.ub_score = 10.0) + SUM(r.bb_score = 10.0) + SUM(r.fx_score = 10.0) AS 'Total Tens',
--     'Overall' AS 'Season'
-- FROM gymnast_results AS r
-- LEFT JOIN meets AS m
-- ON m.meet_id = r.meet_id;

Vault,Uneven Bars,Balance Beam,Floor Exercise,Season
34,32,2,7,2015
12,8,16,28,2016
22,26,35,16,2017
10,51,53,24,2018
31,38,8,56,2019
28,10,32,4,2020
50,44,20,21,2021
59,46,38,77,2022
88,81,126,64,2023
45,56,69,103,2024


In [7]:
# Export the above query to a new df
tenniest_apparatus_query = """
SELECT 
    SUM(r.vt_score = 10.0) AS 'Vault',
    SUM(r.ub_score = 10.0) AS 'Uneven Bars',
    SUM(r.bb_score = 10.0) AS 'Balance Beam',
    SUM(r.fx_score = 10.0) AS 'Floor Exercise',
    m.year AS 'Season'
FROM gymnast_results AS r
LEFT JOIN meets AS m
ON m.meet_id = r.meet_id
GROUP BY m.year;
"""

# Execute the query and store the result in a DataFrame
tenniest_apparatus_df = pd.read_sql_query(tenniest_apparatus_query, engine)

# Preview the df
tenniest_apparatus_df

Unnamed: 0,Vault,Uneven Bars,Balance Beam,Floor Exercise,Season
0,34,32,2,7,2015
1,12,8,16,28,2016
2,22,26,35,16,2017
3,10,51,53,24,2018
4,31,38,8,56,2019
5,28,10,32,4,2020
6,50,44,20,21,2021
7,59,46,38,77,2022
8,88,81,126,64,2023
9,45,56,69,103,2024


We want this table to look slightly different, so that it's easier to read by Plotly.

New layout should look like:
| **Apparatus**    | **Number of 10s** | **Season** |
|------------------|-------------------|------------|
| 'Vault'          | 34                | 2015       |
| 'Uneven Bars'    | 32                | 2015       |
| 'Balance Beam'   | 2                 | 2015       |
| 'Floor Exercise' | 7                 | 2015       |
| 'Total'          | 75                | 2015       |

etc.


In [8]:
# Melt the DataFrame
tenniest_apparatus_melted = pd.melt(tenniest_apparatus_df, id_vars=['Season'], var_name='Apparatus', value_name='No. of Tens')

# Preview the melted DataFrame
tenniest_apparatus_melted.head()

Unnamed: 0,Season,Apparatus,No. of Tens
0,2015,Vault,34
1,2016,Vault,12
2,2017,Vault,22
3,2018,Vault,10
4,2019,Vault,31


In [9]:
fig = px.bar(
    tenniest_apparatus_melted, 
    x="Apparatus", 
    y="No. of Tens", 
    color="Apparatus", 
    barmode="group",
    facet_row="Season",  # Stack subplots vertically by year
    category_orders={"Season": sorted(tenniest_apparatus_melted["Season"].unique())}  # Ensure the years are sorted
)

# Update layout for better visualization
fig.update_layout(
    title="Number of Perfect 10s on Each Apparatus by Season",
    xaxis_title="Apparatus",
    yaxis_title="Number of Tens",
    height=1200,  
    width=400    
)

# Show the figure
fig

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

In [11]:
# Making an animated plot to show the number of 10s scored on each apparatus over the years

tenniest_apparatus = px.bar(tenniest_apparatus_melted, 
                x="Apparatus", 
                y="No. of Tens", 
                animation_frame="Season",       
                color="Apparatus", 
                hover_name="Apparatus",
                range_y=[0, tenniest_apparatus_melted["No. of Tens"].max()]  # Set the y-axis range
            )

# Export the plot to html file
tenniest_apparatus.write_html("../docs/figures/tenniest_apparatus_animation.html")

# Show the plot
tenniest_apparatus

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

# 2 The tenniest teams

Ok, but this is a competition, isn't it? Which *teams* have been the most successful in achieving tens over the years? Has it changed over time?

<!-- Intuitively, one would assume that vault would attract the fewest deductions; gymnasts are only performing one skill, so there are fewer opportunities to make mistakes.

However, my anecdotal observation as a watcher of college gymnastics is that the judges in this competition are fairly lenient; hesitancy on beam or short handstands on bars might not incur the deduction they would in other leagues. However, they are quite strict on landings - that is, if a gymnast doesn't perfectly stick their landing, they will incur a deduction. Given vault's landing difficulty, does this even out the advantage of having to perform fewer skills? -->

In [24]:
%%sql --alias gymternet

SELECT 
    SUM(r.vt_score = 10.0) AS 'Vault',
    SUM(r.ub_score = 10.0) AS 'Uneven Bars',
    SUM(r.bb_score = 10.0) AS 'Balance Beam',
    SUM(r.fx_score = 10.0) AS 'Floor Exercise',
    SUM(r.vt_score = 10.0) + SUM(r.ub_score = 10.0) + SUM(r.bb_score = 10.0) SUM(r.fx_score = 10.0) AS 'Total 10s'
    g.team_id AS 'team_id',
    t.team_name AS 'Team',
    m.year AS 'Season'
FROM gymnast_results AS r
LEFT JOIN gymnasts AS g
ON g.gymnast_id = r.gymnast_id
LEFT JOIN teams as t
ON t.team_id = g.team_id
LEFT JOIN meets as m
ON m.meet_id = r.meet_id
GROUP BY t.team_name, r.meet_id;

RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument


Original error message from DB driver:
(sqlite3.OperationalError) near "(": syntax error
[SQL: SELECT
    SUM(r.vt_score = 10.0) AS 'Vault',
    SUM(r.ub_score = 10.0) AS 'Uneven Bars',
    SUM(r.bb_score = 10.0) AS 'Balance Beam',
    SUM(r.fx_score = 10.0) AS 'Floor Exercise',
    SUM(r.vt_score = 10.0) + SUM(r.ub_score = 10.0) + SUM(r.bb_score = 10.0) SUM(r.fx_score = 10.0) AS 'Total 10s'
    g.team_id AS 'team_id',
    t.team_name AS 'Team',
    m.year AS 'Season'
FROM gymnast_results AS r
LEFT JOIN gymnasts AS g
ON g.gymnast_id = r.gymnast_id
LEFT JOIN teams as t
ON t.team_id = g.team_id
LEFT JOIN meets as m
ON m.meet_id = r.meet_id
GROUP BY t.team_name, r.meet_id;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

If you need help solving this issue, send us a message: https://ploomber.io/commun

In [29]:
# Export the above query to a new df
tenniest_teams_query = """
SELECT 
    SUM(r.vt_score = 10.0) AS 'Vault',
    SUM(r.ub_score = 10.0) AS 'Uneven Bars',
    SUM(r.bb_score = 10.0) AS 'Balance Beam',
    SUM(r.fx_score = 10.0) AS 'Floor Exercise',
    g.team_id AS 'team_id',
    t.team_name AS 'Team',
    m.year AS 'Season'
FROM gymnast_results AS r
LEFT JOIN gymnasts AS g
ON g.gymnast_id = r.gymnast_id
LEFT JOIN teams as t
ON t.team_id = g.team_id
LEFT JOIN meets as m
ON m.meet_id = r.meet_id
GROUP BY t.team_name, m.year;
"""

# Execute the query and store the result in a DataFrame
tenniest_teams_df = pd.read_sql_query(tenniest_teams_query, engine)

# Preview the df
tenniest_teams_df

Unnamed: 0,Vault,Uneven Bars,Balance Beam,Floor Exercise,team_id,Team,Season
0,0.0,0.0,0.0,0.0,1,Air Force,2015
1,0.0,0.0,0.0,0.0,1,Air Force,2016
2,0.0,0.0,0.0,0.0,1,Air Force,2017
3,0.0,0.0,0.0,0.0,1,Air Force,2018
4,0.0,0.0,0.0,0.0,1,Air Force,2019
...,...,...,...,...,...,...,...
809,0.0,0.0,0.0,0.0,82,Yale,2019
810,0.0,0.0,0.0,0.0,82,Yale,2020
811,0.0,0.0,0.0,0.0,82,Yale,2022
812,0.0,0.0,0.0,0.0,82,Yale,2023


In [66]:
# Let's remove the teams that have never gotten a 10
grouped_teams_df = tenniest_teams_df.groupby(['Team']).sum().reset_index()

# Any let's drop the irrelevant columns
grouped_teams_df = grouped_teams_df.drop(columns = ['team_id', 'Season'])

# Preview the new df
grouped_teams_df.head()

Unnamed: 0,Team,Vault,Uneven Bars,Balance Beam,Floor Exercise
0,Air Force,0.0,0.0,0.0,0.0
1,Alabama,8.0,12.0,18.0,6.0
2,Alaska,0.0,0.0,0.0,0.0
3,Arizona,0.0,0.0,0.0,0.0
4,Arizona State,0.0,0.0,0.0,0.0


In [67]:
# Create a column with total tens

grouped_teams_df['total 10s'] = grouped_teams_df[['Vault', 'Uneven Bars', 'Balance Beam', 'Floor Exercise']].sum(axis=1)

#Preview the df
grouped_teams_df.head()

Unnamed: 0,Team,Vault,Uneven Bars,Balance Beam,Floor Exercise,total 10s
0,Air Force,0.0,0.0,0.0,0.0,0.0
1,Alabama,8.0,12.0,18.0,6.0,44.0
2,Alaska,0.0,0.0,0.0,0.0,0.0
3,Arizona,0.0,0.0,0.0,0.0,0.0
4,Arizona State,0.0,0.0,0.0,0.0,0.0


In [72]:
# Drop rows where total 10s == 0

grouped_teams_df = grouped_teams_df[grouped_teams_df['total 10s'] != 0]

# Check how many we have
grouped_teams_df.shape

(29, 6)

In [73]:
# Make some subset dfs for easier plotting

total_tens_df = grouped_teams_df.drop(columns=['Vault', 'Uneven Bars', 'Balance Beam', 'Floor Exercise'])
vault_queens_df = grouped_teams_df.drop(columns=['Uneven Bars', 'Balance Beam', 'Floor Exercise', 'total 10s'])
bars_queens_df = grouped_teams_df.drop(columns=['Vault', 'Balance Beam', 'Floor Exercise', 'total 10s'])
beam_queens_df = grouped_teams_df.drop(columns=['Vault', 'Uneven Bars', 'Floor Exercise', 'total 10s'])
floor_queens_df = grouped_teams_df.drop(columns=['Vault', 'Uneven Bars', 'Balance Beam', 'total 10s'])