# What do I want to end up with?
## Tenniest apparatus (per year + aggregated)
## Tenniest teams (per year + aggregrated)
## Top 10 (20?) goats of all time (by average score) (by apparatus?)
## Bubble maps x axis year, y axis team, size = no. 10s (colour/pie apparatus if poss?)
## Avg score over time (colour by team)

Which apparatus (vault, uneven bars, balance beam or floor exercise) attract the most 10s from the judges? Has it changed over time?

Intuitively, one would assume that vault would attract the fewest deductions; gymnasts are only performing one skill, so there are fewer opportunities to make mistakes.

However, my anecdotal observation as a watcher of college gymnastics is that the judges in this competition are fairly lenient; hesitancy on beam or short handstands on bars might not incur the deduction they would in other leagues. However, they are quite strict on landings - that is, if a gymnast doesn't perfectly stick their landing, they will incur a deduction. Given vault's landing difficulty, does this even out the advantage of having to perform fewer skills?

In [57]:
!pip install -r ../requirements.txt

Collecting plotly (from -r ../requirements.txt (line 15))
  Downloading plotly-5.23.0-py3-none-any.whl.metadata (7.3 kB)
Collecting tenacity>=6.2.0 (from plotly->-r ../requirements.txt (line 15))
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Downloading plotly-5.23.0-py3-none-any.whl (17.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.3/17.3 MB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading tenacity-8.5.0-py3-none-any.whl (28 kB)
Installing collected packages: tenacity, plotly
Successfully installed plotly-5.23.0 tenacity-8.5.0


In [58]:
import os
import json
import requests
import sqlite3
from tqdm.notebook import tqdm, trange
tqdm.pandas()
import numpy as np
import pandas as pd 
from sqlalchemy import create_engine
from lets_plot import * # This imports all of ggplot2's functions
LetsPlot.setup_html()
import plotly.express as px

%load_ext sql
%config SqlMagic.autocommit=True

from pprint import pprint

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [40]:
%sql sqlite:///../data/clean/gymternet.db --alias gymternet 
engine = create_engine('sqlite:///../data/clean/gymternet.db')

In [41]:
%%sql --alias gymternet

SELECT COUNT(*) FROM gymnast_results WHERE vt_score = 10.0 OR ub_score = 10.0 OR bb_score = 10.0 OR fx_score = 10.0;

COUNT(*)
433


In [42]:
%%sql --alias gymternet

-- Find the number of gymnasts who scored a perfect 10 on each event
SELECT COUNT(*) FROM gymnast_results WHERE fx_score = 10.0;



COUNT(*)
133


In [68]:
%%sql gymternet

-- LEFT JOIN with aggregated row at the bottom
SELECT 
    SUM(r.vt_score = 10.0) AS 'Vault',
    SUM(r.ub_score = 10.0) AS 'Uneven Bars',
    SUM(r.bb_score = 10.0) AS 'Balance Beam',
    SUM(r.fx_score = 10.0) AS 'Floor Exercise',
    m.year AS 'Season'
FROM gymnast_results AS r
LEFT JOIN meets AS m
ON m.meet_id = r.meet_id
GROUP BY m.year

-- UNION ALL

-- SELECT 
--     SUM(r.vt_score = 10.0) AS 'Vault',
--     SUM(r.ub_score = 10.0) AS 'Uneven Bars',
--     SUM(r.bb_score = 10.0) AS 'Balance Beam',
--     SUM(r.fx_score = 10.0) AS 'Floor Exercise',
--     SUM(r.vt_score = 10.0) + SUM(r.ub_score = 10.0) + SUM(r.bb_score = 10.0) + SUM(r.fx_score = 10.0) AS 'Total Tens',
--     'Overall' AS 'Season'
-- FROM gymnast_results AS r
-- LEFT JOIN meets AS m
-- ON m.meet_id = r.meet_id;

Vault,Uneven Bars,Balance Beam,Floor Exercise,Season
34,32,2,7,2015
12,8,16,28,2016
22,26,35,16,2017
10,51,53,24,2018
31,38,8,56,2019
28,10,32,4,2020
50,44,20,21,2021
59,46,38,77,2022
88,81,126,64,2023
45,56,69,103,2024


In [69]:
# Export the above query to a new df
tenniest_apparatus_query = """
SELECT 
    SUM(r.vt_score = 10.0) AS 'Vault',
    SUM(r.ub_score = 10.0) AS 'Uneven Bars',
    SUM(r.bb_score = 10.0) AS 'Balance Beam',
    SUM(r.fx_score = 10.0) AS 'Floor Exercise',
    m.year AS 'Season'
FROM gymnast_results AS r
LEFT JOIN meets AS m
ON m.meet_id = r.meet_id
GROUP BY m.year;
"""

# Execute the query and store the result in a DataFrame
tenniest_apparatus_df = pd.read_sql_query(tenniest_apparatus_query, engine)

# Preview the df
tenniest_apparatus_df

Unnamed: 0,Vault,Uneven Bars,Balance Beam,Floor Exercise,Season
0,34,32,2,7,2015
1,12,8,16,28,2016
2,22,26,35,16,2017
3,10,51,53,24,2018
4,31,38,8,56,2019
5,28,10,32,4,2020
6,50,44,20,21,2021
7,59,46,38,77,2022
8,88,81,126,64,2023
9,45,56,69,103,2024


We want this table to look slightly different, so that it's easier to read by Plotly.

New layout should look like:
| **Apparatus**    | **Number of 10s** | **Season** |
|------------------|-------------------|------------|
| 'Vault'          | 34                | 2015       |
| 'Uneven Bars'    | 32                | 2015       |
| 'Balance Beam'   | 2                 | 2015       |
| 'Floor Exercise' | 7                 | 2015       |
| 'Total'          | 75                | 2015       |

etc.


In [70]:
# Melt the DataFrame
tenniest_apparatus_melted = pd.melt(tenniest_apparatus_df, id_vars=['Season'], var_name='Apparatus', value_name='No. of Tens')

# Preview the melted DataFrame
tenniest_apparatus_melted.head()

Unnamed: 0,Season,Apparatus,No. of Tens
0,2015,Vault,34
1,2016,Vault,12
2,2017,Vault,22
3,2018,Vault,10
4,2019,Vault,31


In [71]:

px.bar(tenniest_apparatus_melted, 
           x="Apparatus", 
           y="No. of Tens", 
           animation_frame = "Season",       
           color = "Apparatus", 
           hover_name = "Apparatus"
           )

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed