#### American Ninja Warrior Analysis and Visualizations
#### Notebook 2

By: Jeff Hale

Note that all analysis is for the first 11 seasons, because that is the data available.

Imports

In [4]:
import pandas as pd
import numpy as np
import plotly.express as px

Read in the file with eleven years of data that we created.

In [5]:
df = pd.read_csv(f"data/anw_data-2022-02-18-10:19.csv", index_col=0)
df.head(2)

Unnamed: 0,index,Name,Fantasy Points,CQ OBS,Time,CF OBS,Time.1,Boot Camp,Stage 1,Time.2,Stage 2,Time.3,Stage 3,Stage 4,Time.4,Year,run
0,0,Sean Bryan,49.5,Complete (LA),236.84,Complete (LA),369.99,,Complete,126.05,Complete,248.3,Ultimate Crazy Cliffhanger,,,10,
1,1,Drew Drechsel,44.0,Complete (MIA),144.74,Stair Hopper (MIA),Qualified,,Complete,96.2,Complete,231.35,Ultimate Crazy Cliffhanger,,,10,


Let's see what we have time to see. 👀

In [6]:
df.head(2)


Unnamed: 0,index,Name,Fantasy Points,CQ OBS,Time,CF OBS,Time.1,Boot Camp,Stage 1,Time.2,Stage 2,Time.3,Stage 3,Stage 4,Time.4,Year,run
0,0,Sean Bryan,49.5,Complete (LA),236.84,Complete (LA),369.99,,Complete,126.05,Complete,248.3,Ultimate Crazy Cliffhanger,,,10,
1,1,Drew Drechsel,44.0,Complete (MIA),144.74,Stair Hopper (MIA),Qualified,,Complete,96.2,Complete,231.35,Ultimate Crazy Cliffhanger,,,10,


In [7]:
df_appearances = df["Name"].value_counts().to_frame().reset_index()
df_appearances.columns=['Name', 'Appearances']
df_appearances

Unnamed: 0,Name,Appearances
0,Ryan Stratis,11
1,David Campbell,11
2,Travis Rosen,10
3,Lorin Ball,9
4,Chris Wilczewski,9
...,...,...
1516,Omar Payton,1
1517,Zac Eddington,1
1518,Spenser Mestel,1
1519,Robert Taylor,1


In [143]:
fig = px.histogram(df_appearances, x='Appearances')
fig.update_layout(
    title_text='Most Competitors Make a Single Appearance',
    xaxis_title_text='Number of Seasons a Competitor Appeared',
    yaxis_title_text='Count',
    bargap=0.1,
    showlegend=False,
)


In [145]:
df_appearances.describe()

Unnamed: 0,Appearances
count,1521.0
mean,1.580539
std,1.27537
min,1.0
25%,1.0
50%,1.0
75%,2.0
max,11.0


In [148]:
df_appearances['Appearances'].median()

1.0

In [150]:
px.box(df_appearances['Appearances'])

In [151]:
px.violin(df_appearances['Appearances'])

Looks like a Poisson distribution to me.

#### How many people appeared at least seven times in the eleven years?

In [9]:
df_appearances[df_appearances['Appearances'] >= 7].count()['Appearances']

22

#### Who has the most cumulative fantasy points (through 11 seasons)?

In [115]:
df_totals = (
    df.groupby(["Name"])
    .sum()
    .sort_values(by="Fantasy Points", ascending=False)[["Fantasy Points"]]
).reset_index()
df_totals


Unnamed: 0,Name,Fantasy Points
0,Drew Drechsel,353.0
1,Joe Moravsky,299.5
2,Ryan Stratis,289.5
3,David Campbell,289.0
4,Travis Rosen,287.0
...,...,...
1516,Dennis Ruelas,0.0
1517,Charlie Escue,0.0
1518,Tony Geronimo,0.0
1519,Will Washington,0.0


In [154]:
fig2 = px.bar(
    df_totals.head(10),
    x='Name',
    y='Fantasy Points',
    color='Fantasy Points',
    title='Cumulative Fantasy Point Leaders',
    color_continuous_scale=px.colors.sequential.Purp,
)

fig2.update_coloraxes(showscale=False)

Drew Drechsel for the win! 🏆 

What are those fantasy points anyway? According to the [website](http://www.anwfantasy.com/how-to-play/), points are scored as follows.

    1 point for every obstacle cleared!

    2 points for City Qualifier course clear
    4 points for City Finals course clears
    4 points for Midoriyama/Las Vegas Stage 1 clears
    6 points for Midoriyama/Las Vegas Stage 2 clears
    8 points for Midoriyama/Las Vegas Stage 3 clears
    10 points for Total Victory
    Additonally a 0.5 point bonus is awarded to the fastest runner for each timed stage

    For example Geoff Britten's "Perfect Season" is scored:
    • 2pts for City Qualifier cleared + 6 obstacles cleared +
    • 4pts for City Finals cleared + 10 obstacles cleared (+0.5 fastest bonus) +
    • 4pts for Stage 1 cleared + 8 obstacles cleared +
    • 6pts for Stage 2 cleared + 6 obstacles cleared (+0.5 fastest bonus) +
    • 8pts for Stage 3 cleared + 8 obstacles cleared +
    • 10pts for Stage 4 cleared + 1 obstacle cleared =
    74 total points!

#### Who had the most fantasy points in a single season?

Let's look at the distribution first.

In [118]:
fig2 = px.histogram(df_totals.set_index('Name'), title='Distribution of Fantasy Points for Contestants (Per Season) ')
fig2.update_layout(
    bargap=0.1,
    showlegend=False,
    xaxis_title_text='Fantasy Points',
    yaxis_title_text='Count',
)


Looks like a Poisson to me. 

Let's plot the top 10 single season fantasy point leaders.

In [127]:
single_season_leaders = df.loc[df['Fantasy Points'].nlargest(5).index][['Name', 'Fantasy Points', 'Year']]
single_season_leaders

Unnamed: 0,Name,Fantasy Points,Year
2135,Drew Drechsel,71.5,11
856,Geoff Britten,63.0,7
2136,Daniel Gil,63.0,11
857,Isaac Caldiero,56.0,7
1472,Brian Arnold,54.0,5


In [155]:
fig3 = px.bar(
    single_season_leaders,
    x='Name',
    y='Fantasy Points',
    color='Fantasy Points',
    title='Fantasy Point Single Season Leaders',
    color_continuous_scale=px.colors.sequential.Purp,
)

fig3.update_coloraxes(showscale=False)


Drew for the win again!

#### How has the average number of fantasy points per contestant per season changed over time?

In [78]:
avg_fantasy_pts = df.groupby('Year')[['Fantasy Points']].mean().reset_index()
avg_fantasy_pts

Unnamed: 0,Year,Fantasy Points
0,1,14.627451
1,2,10.875
2,3,11.411765
3,4,14.107407
4,5,13.089109
5,6,11.334545
6,7,10.568182
7,8,10.189362
8,9,10.278997
9,10,9.976821


Adjust DataFrame attributes for easier/improved Plotly styling.

In [79]:
avg_fantasy_pts['Year'] = avg_fantasy_pts['Year'].astype('string')
avg_fantasy_pts.rename(columns={'Year': 'Season', 'Name':'Contestants'}, inplace=True)
avg_fantasy_pts

Unnamed: 0,Season,Fantasy Points
0,1,14.627451
1,2,10.875
2,3,11.411765
3,4,14.107407
4,5,13.089109
5,6,11.334545
6,7,10.568182
7,8,10.189362
8,9,10.278997
9,10,9.976821


In [156]:
fig4 = px.line(
    avg_fantasy_pts,
    x="Season",
    y="Fantasy Points",
    title="Average Fantasy Points Per Contestant",
)

fig4.update_yaxes(range=[8, 15])

fig4.update_layout(showlegend=False)

The trend was down over seasons four through eight and then the average number of points per participant held steady through season eleven.

#### How has the average number of participants per season changed over time?

In [141]:
participants = df.groupby('Year')[['Name']].count().reset_index()
participants['Year'] = participants['Year'].astype('string')
participants.rename(columns={'Year': 'Season', 'Name':'Contestants'}, inplace=True)
participants

Unnamed: 0,Season,Contestants
0,1,51
1,2,72
2,3,68
3,4,270
4,5,202
5,6,275
6,7,308
7,8,235
8,9,319
9,10,302


In [142]:
fig5 = px.line(
    participants,
    x="Season",
    y="Contestants",
    title="Contestants Per Season",
)

fig5.update_layout(
    showlegend=False,
)

fig5.update_yaxes(range=[0, 350])

There was quite a jump in the number of contestants between seasons three and four, then the numbers stabilized.

#### What's the relationship between the number of appearances and the number of fantasy points?

Presumably positive.

In [159]:
df_appearances.head(1)

Unnamed: 0,Name,Appearances
0,Ryan Stratis,11


In [163]:
df_appearances.shape

(1521, 2)

In [160]:
df_totals.head(1)

Unnamed: 0,Name,Fantasy Points
0,Drew Drechsel,353.0


In [164]:
df_totals.shape

(1521, 2)

In [165]:
df_cumulative = pd.merge(df_appearances, df_totals)
df_cumulative

Unnamed: 0,Name,Appearances,Fantasy Points
0,Ryan Stratis,11,289.5
1,David Campbell,11,289.0
2,Travis Rosen,10,287.0
3,Lorin Ball,9,187.0
4,Chris Wilczewski,9,190.0
...,...,...,...
1516,Omar Payton,1,8.0
1517,Zac Eddington,1,8.0
1518,Spenser Mestel,1,8.0
1519,Robert Taylor,1,8.0


In [166]:
df_cumulative.corr()

Unnamed: 0,Appearances,Fantasy Points
Appearances,1.0,0.870668
Fantasy Points,0.870668,1.0


The correlation is quite high: .87.

Let's plot the points with an OLS regression trend line.

In [178]:
px.scatter(df_cumulative, x='Appearances', y='Fantasy Points', title='A Linear Relationship', trendline='ols')

## Summary 

In this project I scraped, combined, visualized, and analyzed the American Ninja Warrior fantasy data. We looked at single season data, cumulative data, and changes over time.

## Future Directions

It would be interesting to get data on 