# American Ninja Warrior Obstacle info

## By: Jeff Hale

## Goal

### Predict which Ninja Warrior obstacles will be present. 

We'll explore the data and use a few types of machine learning algorithms. 

## Background
[American Ninja Warrior](https://www.nbc.com/american-ninja-warrior) is a televised competition where contestents try to complete a serious of obstacles as quickly as possible, without falling. My kids love watching it and somehow it has been nominated for 3 Emmy's. 

Ten-time ninja warrior Matt Laessig also happens to lead [data.world](https://data.world), which offers a platform with data sets and an interface for data analysis. One of the data sets contains each of the obstacles in Ninja Warrior by year. 

This isn't a huge dataset, and machine learning might not provide value here, but let's see if it can.

## Set-up
Load the necessary libraries.
Configure the Jupyter Notebook settings.
Load the data into a pandas DataFrame

In [1]:
import numpy as np 
import pandas as pd 
from sklearn import preprocessing

import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(34)

%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
!ls

ANW.ipynb [34mdata[m[m      [34mimages[m[m


In [3]:
# !pip list

In [4]:
orig_df = pd.read_excel('data/American Ninja Warrior Obstacle History.xlsx')

## Exploratory Data Analysis (EDA)

Let's take a peak

In [5]:
orig_df.head()

Unnamed: 0,Season,Location,Round/Stage,Obstacle Name,Obstacle Order
0,9,Los Angeles,Qualifying (Regional/City),Floating Steps,1
1,9,Los Angeles,Qualifying (Regional/City),Cannonball Drop,2
2,9,Los Angeles,Qualifying (Regional/City),Fly Wheels,3
3,9,Los Angeles,Qualifying (Regional/City),Block Run,4
4,9,Los Angeles,Qualifying (Regional/City),Battering Ram,5


In [6]:
orig_df.tail()

Unnamed: 0,Season,Location,Round/Stage,Obstacle Name,Obstacle Order
764,1,Sasuke 23 (Japan),National Finals - Stage 3,Hang Climb,6
765,1,Sasuke 23 (Japan),National Finals - Stage 3,Spider Flip,7
766,1,Sasuke 23 (Japan),National Finals - Stage 3,Gliding Ring,8
767,1,Sasuke 23 (Japan),National Finals - Stage 4,Heavenly Ladder,1
768,1,Sasuke 23 (Japan),National Finals - Stage 4,G-Rope,2


This dataset does include data from when the show was in Japan. Let's make a note that grouping by country location could be interesting .

The *Round/Stage* column looks like it might be making this dataframe less than *tidy*. Tidy data is an important concept you can read up on [here](http://vita.had.co.nz/papers/tidy-data.html). We'll probably want to split that column in two.

### Let's get some basic descriptive information.

In [7]:
df = orig_df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 769 entries, 0 to 768
Data columns (total 5 columns):
Season            769 non-null int64
Location          769 non-null object
Round/Stage       769 non-null object
Obstacle Name     769 non-null object
Obstacle Order    769 non-null int64
dtypes: int64(2), object(3)
memory usage: 30.1+ KB


### Initial Impressions
We have five variables and records from 769 competitions.

Looks like no missing data, but we need to check by looking at the values.

First let's make these column names a little shorter and save some keystrokes.

In [8]:
df.columns = ['season', 'location', 'round_stage', 'obstacle', 'obstacle_order' ]
df.head()

Unnamed: 0,season,location,round_stage,obstacle,obstacle_order
0,9,Los Angeles,Qualifying (Regional/City),Floating Steps,1
1,9,Los Angeles,Qualifying (Regional/City),Cannonball Drop,2
2,9,Los Angeles,Qualifying (Regional/City),Fly Wheels,3
3,9,Los Angeles,Qualifying (Regional/City),Block Run,4
4,9,Los Angeles,Qualifying (Regional/City),Battering Ram,5


Let's see how many unique obstacles are in the dataset. 

In [9]:
df['obstacle'].value_counts()

Warped Wall                79
Salmon Ladder              35
Quintuple Steps            32
Floating Steps             22
Log Grip                   21
Jump Hang                  18
Quad Steps                 16
Jumping Spider             13
Wall Lift                  11
Invisible Ladder           11
Rolling Log                11
Bridge of Blades           10
Spider Climb                9
Cargo Climb                 9
Rope Ladder                 9
Spinning Log                8
Rope Climb                  8
Flying Bar                  8
Jumping Bars                8
Ultimate Cliffhanger        8
Unstable Bridge             8
Devil Steps                 7
Double Salmon Ladder        6
Elevator Climb              6
Hang Climb                  6
Floating Boards             6
Tarzan Rope                 6
Spinning Bridge             6
Paddle Boards               6
Metal Spin                  6
                           ..
Minefield                   1
Helix Hang                  1
Peg Cloud 

Interesting. There are 196 different obstacles in the dataset. The Warped Wall is by far the most common obstacle. 

Thoughts:

I wonder what the age breakdown looks like.

I wonder if some of these are always in the championship events.

This is count data. See article I wrote on types of data here. Looks like we might have a Poisson distribution.



Let's look at a frequency of the counts.

Let's look at a histogram of obstacles that have been in at least three competitions.