**Welcome to Data Visualization Tutorial.**

**What is Data Visualization?**

  You've been through this. In math class in middle school your teacher told you to plot the points on a graph. It must have looked something like this.
  
  ![alt text](http://chemlab.truman.edu/files/2015/06/image002.jpg)
  
  Data visualization refers to the graphical representation of information and data. 
 
** Why is it important?**
 
  Data Visualization is one of the fundamental skills that a data scientist will need. It provides simple and natural ways to visualize large quantities of data. It can be used to help understand our data at a more fundamental level and see trends that we might not have considered before. 
  
**  What we will do?**

  Go over the basics of Pandas Dataframe and the Seaborn Library
  
  Seaborn allows us to easily create beautiful graphs with dataframes.
  
  ![alt text](https://www.quantinsti.com/wp-content/uploads/2017/07/seaburn-1.png)  
  
We will learn about different types of graphs and their particular use cases. Once we learn these graphs, we will practice pandas and seaborn with the data collected from NFL players. We will observe how each player's position relates to their total BMI (Body Mass Index)

  
**Resources**
Tutorials on seaborn and data visualization:

[https://www.kaggle.com/learn/data-visualisation](https://www.kaggle.com/learn/data-visualisation)

[https://elitedatascience.com/python-seaborn-tutorial](https://elitedatascience.com/python-seaborn-tutorial)



# The Set Up

In [0]:
#Install all dependencies for the project using pip
!pip install ohmysportsfeedspy
!pip install simplejson
!pip install pathlib
!pip uninstall seaborn -y
!pip install seaborn


In [0]:
#Imports
import seaborn as sns
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt

#Style
sns.set(style="darkgrid")
%matplotlib inline

#Check that we have the proper Seaborn version
print("THIS IS SNS.VERSION -> {}".format(sns.__version__))



IF sns.__version__ is 0.7.1 


*   Restart the notebook in Runtime. (CMD + CTRL + m + .)
*   Rerun the cell above

# **Code from (API Data Collection Project):**


If you get errors try changing the API key to your mentors from yesterday

In [0]:
# libraries
from ohmysportsfeedspy import MySportsFeeds
from pprint import pprint
from pathlib import Path
import json

# functions
def get_output_from_api(username, password):
	# where api json file is stored
	path = 'results/roster_players-nfl-2017-2018-regular.json'

	# load json from results cache
	if Path(path).exists():
		with open(path, 'r') as f:
			output = json.load(f)
	# make request to API if no cache
	else:
		msf = MySportsFeeds(version="1.2", verbose=True)
		msf.authenticate(username, password)
		output = msf.msf_get_data(league='nfl', season='2017-2018-regular', feed='roster_players', format='json')

	return output

def get_player(entry):
	# store each player as an object
	player = {}

	# add respective fields for players
	try:
		player['name'] = entry['player']['FirstName'] + ' ' + entry['player']['LastName']
		player['position'] = entry['player']['Position']
		player['height'] = entry['player']['Height']
		player['weight'] = entry['player']['Weight']

	# in the case that information is missing, set field to 'N/A'
	except KeyError as error:
		attribute = str(error).strip('\'').lower()
		player[attribute] = 'N/A'

	return player

def get_team_players(output):
	# stores teams as keys and a list of player as values
	teams = {}

	# add every player from output
	for entry in output['rosterplayers']['playerentry']:
		# get player information in an object
		player = get_player(entry)

		# initializes list once from each team
		if entry['team']['Name'] not in teams:
			teams[entry['team']['Name']] = []

		# adds player to the appropriate team list
		teams[entry['team']['Name']].append(player)

	return teams

def save_to_file(teams):
	# opens file with write permissions
	f = open('results.txt', 'w')

	# converts dictionary to string and saves in file
	f.write(str(teams))

	# closes file stream
	f.close()

# commands
username = '8663abc5-a99a-4f33-90c4-c1d697'
password = 'Dk30RQHT'

output = get_output_from_api(username, password)

teams = get_team_players(output)

pprint(teams)

save_to_file(teams)


In [0]:
#Lets see how many teams there are!
print("There are {} teams".format(len(teams)))

**NOTE:** We currently have our data saved in dictionaries

# **Ex 1:** complete the BMI_calc function:

Use the equation as a guide

Use mass and height to calculate the BMI

*   mass will be in pounds
*   height will be in inches



The equation for body mass index:



![alt text](https://i.imgur.com/7JRMwEf.png?1)

In [0]:
#remember height is squared

def BMI_calc(height, mass):
  #your code here
  
  
  
  return (BMI)

In [0]:
test_height = 70
test_weight = 150

print("My equation says: %.2f" % BMI_calc(test_height, test_weight))

**Was your output correct?**

The cell above should output 21.52

If its not outputting the correct answer check with your neighbors and find the smart one :P

**What is your BMI?**

Fill the variables below with your information to find your BMI.

In [0]:
my_height = #your height here in inches
my_weight = #your weight in pounds

print("My BMI is: %.2f" % BMI_calc(my_height, my_weight))

# EX 2: Convert height in form: 5'11" to inches



You may have noticed that in the data we used, the format of the height was in feet and inches, seperated by a '/' EX: '5\'11"'
Write a method convertft_to_in(height) that takes in the unformated string and converts it into inches.

The height data we are getting looks like this 

![](https://i.imgur.com/BMZNEvV.png)

* We need height in inches to calculate our BMI
* if you need help refer to the google document


In [0]:
#height will come in format: '5\'11"'

def convertft_to_in(height):
    #your code here
    

    return inches

Now test your convertft_to_in(height) method

In [0]:
test_heights = ['6\'11"','5\'4"','7\'9"','5\'10"','5\'4"']
results = []
for height in test_heights:
  results.append(convertft_to_in(height))

print(results)

Output should be [83, 64, 93, 70, 64]

#Using Pandas to create a Dataframe from our API data


Data frames are a type of object ...
Some pandas info and some dataframe info

In [0]:
position = []
BodyMassIndex = []
PlayerName = []
Height = []
Weight = []
Team = []
# Iterate through list of Teams
for tm in teams:
  # Iterate through each teams players
  for player in range(0, len(teams[tm])):
    # "player" will be a dictionary with name, position, height, and weight
    
    # check if weight exists
    # there was nine datapoints where a player's weight was not added.
    # I will exclude them from the dataset.
    # This a perfect example of the real world.
    # Usually data we recieve is not perfect and will have missing values or outliers
    if 'weight' in teams[tm][player]:
      height = teams[tm][player]['height']
      weight = int(teams[tm][player]['weight'])
      
      #We will use your methods for both of these, so make sure they are correct!
      inches = convertft_to_in(height)
      BMI = BMI_calc(inches, weight)
      
      #Lets create lists for easier conversion to Pandas
      Team.append(tm)
      Height.append(inches)
      Weight.append(weight)
      PlayerName.append(teams[tm][player]['name'])
      BodyMassIndex.append(BMI)
      position.append(teams[tm][player]['position'])

      teams[tm][player]['BMI'] = BMI                                                          

print("number of datapoints: {}".format(len(position)))

#create a pandas.DataFrame
data = pd.DataFrame(
    {'Team': Team,
     'PlayerName': PlayerName,
     'Position': position,
     'Height': Height,
     'Weight': Weight,
     'BMI': BodyMassIndex
    })

#sort the dataframe to this order
data = data[['Team', 'PlayerName', 'Position', 'Weight', 'Height', 'BMI',]]

#Try a few dataframe commands

learn more: 
[https://www.youtube.com/watch?v=XDAnFZqJDvI&t=80s](https://www.youtube.com/watch?v=XDAnFZqJDvI&t=80s)

In [0]:
#what our dataframe looks like

#NOTE: Because we are in collabratory, we are able to refer to the dataframe 
#object as itself, and have it pretty print it's contents. 
#If we were in IDLE or commandline, we would need to use the print() method.

data

In [0]:
#the first few rows of our dataframe

data.head(5)

In [0]:
#the last few rows of our dataframe

data.tail(5)

In [0]:
#grab only the position column

data['Position']

In [0]:
#count the number of occurences

data['Position'].value_counts()

In [0]:
#calculate mean of quantitative data
data.mean()

In [0]:
#describe will give some stats on our data.  

data.describe()

# **Seaborn Basics**

[Seaborn Docs](https://seaborn.pydata.org/tutorial/relational.html)


[Ultimate Seaborn Tutorial](https://elitedatascience.com/python-seaborn-tutorial)

**To plot a scatter plot use sns.relplot(x, y, data)**

*   x = column name of data you wish to use
*   y = column name of data you wish to use
*   data = name of dataframe
*   height = height of your image
*  aspect = how wide you want your image
*   hue = color the data with this column



In [0]:
#to change shape you can add height and aspect
#look for outliers

sns.relplot(x="Position", y="Weight", data=data, height=7, aspect=1.5);

**different variation of scatter plot that prevents overlapping of dots**


sns.catplot()

In [0]:
# hue adds color based off the named column from dataframe

sns.catplot(x="Position", y="Weight", hue="Team", data=data, height=7, aspect=1.5);

**To plot a simple bar graph use sns.barplot(x, y, data)**

In [0]:
fig , ax = plt.subplots(figsize=(9,4.5))

sns.barplot(x='Position', y='Weight', data=data, ax = ax)

#**Ex. 3**

** Plot Position x BMI on a scatter plot**

It should look something like this.


![](https://i.imgur.com/67HTg8sl.png)

In [0]:
#Use either sns.relplot or sns.catplot to plot Position vs BMI
#write code here



#Look at the graph.

*   Wait a minute why is one of the RB's (Running Backs) BMI so low?
*   Lets take a closer 
*   Run the cell below

In [0]:
print(data.loc[data['BMI'] < 10])


print("\nFound an outlier!!! BMI of 5.2 (O.o)", "Oh that makes sense, he only weighs 40lbs... WAIT WHAT???")
print("We will deal with Dan Vitale later")

# Subsetting Data with Pandas Dataframes

Now that we have a basic understanding of how to create dataframes with Pandas and create pretty graphics with Seaborn, lets work on subsetting our data.
Subsetting is the act of selecting a specific set out of the larger data set. For example, if I only wanted to see the player info for the position of Quarter Back, I could subset my data such that the newly returned dataframe only contains the information of Quarter Backs. Another example might be subsetting the data by a range of player's height.

[Subsetting with Pandas](https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c)

The basic commands to subset data are: data[], data.iloc(), and data.loc()

In [0]:
#Grab only the specified columns

data[['Team', 'PlayerName', 'Position']]

In [0]:
#Grab only the rows with Team Name as Cardinals

cardinals_data = data.loc[data['Team'] == 'Cardinals']
cardinals_data

In [0]:
#grab all rows with quarterback as position
quarterback_data = data.loc[data['Position'] == 'QB']
quarterback_data

In [0]:
#grab a whole row

data.iloc[3]

In [0]:
#grab specific row and column with .at(row # , column name)

data.at[10, 'Height']

# Plotting a distribution plot

*  use sns.distplot()



In [0]:
sns.distplot(quarterback_data['Weight'])

**Ex 4: create a sns.distplot() of Quarterback BMI**
Bonus:  Try to [colorize](https://seaborn.pydata.org/generated/seaborn.distplot.html) it
      

In [0]:
#your code here

**Ex 5: create a bar plot of 49ers Team BMI.**

In [0]:
#your code here

# Removing The Outlier


*   Now that we roughly know how to subset dataframes
*   We need to try to find Dan Vitale and change his weight to the correct one
*   Google Dan Vitale's and find his real weight
*   Refer to google doc for help



 **Ex 6: Find the real weight of Dan Vitale and fix the data!**


In [0]:
#The player's info
data.loc[data['PlayerName'] == 'Dan Vitale']

In [0]:
# your code here

#Violin Plots & Box Plots
Although scatter plots are nice. I think we can use different types of graphs that can give us more information while being extremely pretty.
These plots are great for showing distribution across mutiple categories at the same time.

Lets use [Violin Plots](https://matplotlib.org/examples/statistics/customized_violin_demo.html
) & [Box Plots](http://www.physics.csbsju.edu/stats/box2.html)


![alt text](https://i.imgur.com/WFyufIX.png?1)
![alt text](https://datavizcatalogue.com/methods/images/anatomy/box_plot.png)

**To plot a violin plot use sns.violinplot()**

In [0]:
fig, ax = plt.subplots(figsize=(10,6))
sns.violinplot(ax = ax, x="Position", y="BMI",
                data=data)


**To plot a violin plot use sns.boxplot()**

In [0]:
fig, ax = plt.subplots(figsize=(10,6))
sns.boxplot(ax = ax, x="Position", y="BMI",
                data=data)

#Observe the data

*  How high is there BMI?
*  Does the BMI span a large range?
*  Is BMI concentrated in one spot
*  What position is this? [Wiki for football abreviations](https://en.wikipedia.org/wiki/American_football_positions)


---


**TRENDS**
*   Players who need quickness will have lower BMI's
*   Players who need strength will have higher BMI's

---





# Roles

![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/2/25/American_Football_Positions.svg/604px-American_Football_Positions.svg.png)

**DB (Defensive Back)**

     Defensive backs are the smaller, quicker, and faster defensive players who frequently serve as a defense’s backstop – the last, best hope at stopping an offensive player who’s gotten loose 

**LB (Line Backer)**
      
     Linebackers play behind the defensive line and perform various duties depending on the situation, including rushing the passer, covering receivers, and defending against the run.
        
**LS (Long Snapper)**

     A specialized center who snaps the ball directly to the holder or punter. This player is usually distinct from the regular center, as the ball often has to be snapped much farther back on kicking plays.
    
    
**DE (Defensive end)**

     Their function is to attack the passer or stop offensive runs to the outer edges of the line of scrimmage (most often referred to as "containment"). The faster of the two is usually placed on the right side of the defensive line (quarterback's left) because that is a right-handed quarterback's blind side.
    
    
**DT (Defensive tackle)**

      Sometimes called a defensive guard, defensive tackles play at the center of the defensive line. Their function is to rush the passer (if they can get past the offensive linemen blocking them), and stop running plays directed at the middle of the line of scrimmage. 

**QB (Quarterback)**

      The quarterback is the player who receives the ball from the center to start the play. The most important position on the offensive side, the quarterback is responsible for receiving the play from the coaches on the sideline and communicating the play to the other offensive players in the huddle.
    
**OT (Offensive tackle)**

      Two tackles play outside of the guards. Their role is primarily to block on both running and passing plays. The area from one tackle to the other is an area of "close line play" in which blocks from behind, which are prohibited elsewhere on the field, are allowed. For a right-handed quarterback, the left tackle is charged with protecting the quarterback from being hit from behind (known as his "blind side")
      
      
**SS (Line Backer)**

      The safeties are the last line of defense (farthest from the line of scrimmage) and usually help the corners with deep-pass coverage. The strong safety (SS) is usually the larger and stronger of the two, providing extra protection against run plays by standing closer to the line of scrimmage, usually on the strong (tight end) side of the field.
    
**WR (Wide Reciever)**

     The wide receivers are pass-catching specialists. Their main job is to run pass routes and get open for a pass, although they are occasionally called on to block.

**RB (Running backs)**

      Running backs are players who line up behind the offensive line, who are in position to receive the ball from the quarterback, and execute a rushing play. 
    
**FB (Feature Back)**
    
      A running back will sometimes be called a "feature back" if he is the team's starting running back.
    
**CB (CornerBack)**
        
     Typically two players primarily cover the wide receivers. Cornerbacks attempt to prevent successful quarterback passes by either swatting the airborne ball away from the receiver or by catching the pass themselves. 

**C (Center)**
 
      The center is the player who begins the play from scrimmage by snapping the ball to the quarterback. Like all offensive linemen, the center has the responsibility to block defensive players. 
   
**P (Punter)**
      
      The punter, upon receiving the snap, drops the ball and kicks it from the air.

**G (Gunner)**

      A player on kickoffs and punts who specializes in running down the field very quickly in an attempt to tackle the returner.
    
    
**OLB (Outside Linebacker)**

     Linebackers play behind the defensive line and perform various duties depending on the situation, including rushing the passer, covering receivers, and defending against the run.
  

**ILB (Inside Linebacker)**

     Same as Middle Linebacker
    
**TE (Tight End)**

     Tight ends play on either side of, and directly next to, the tackles. Tight ends are considered hybrid players, something between a wide receiver and an offensive lineman. Because they play next to the other offensive linemen, they are frequently called on to block, especially on running plays. However, because they are eligible receivers, they may also catch passes.


**K (Kicker)**
  
    Also called the "placekicker", he handles kickoffs, extra points, and field goal attempts. All three situations require the kicker to kick the ball off of the ground, either from the hands of a "holder" or off of a "tee"

**T (Tackle)**

    Same as Offensive Tackle

**FS (Free Safety)**
     
     The free safety (FS) is usually the smaller and faster of the two safeties, and is usually the deepest player on the defense, providing help on long pass plays.

**MLB (Middle Linebacker)**
  
      Middle linebackers must be capable of stopping running backs who make it past the defensive line, covering pass plays over the middle, and rushing the quarterback on blitz plays.

**NT (Nose Tackle)**

    The most interior defensive tackle who sometimes lines up directly across from the ball (and therefore is almost nose-to-nose with the offense's center) is often called a nose tackle, alternately nose guard or middle guard.
    

More info here: https://en.wikipedia.org/wiki/American_football_positions

#Test your Knowledge

#Ex 9. Create a BoxPlot of the 49ers team


*   Use Height as Y axis
*   Use Position as X axis



In [0]:
#your code below


#Ex 10. Create a ViolinPlot of all Wide Recievers
  

*   Use BMI for Y axis
*   Team name for X axis



In [0]:
#your code below


# **BONUS**

What oher fun [graphs](https://seaborn.pydata.org/examples/index.html) can you make?

Find cool graphs to plot

Some cool graphs I've seen.
*   [Hexbin Plot](https://seaborn.pydata.org/examples/hexbin_marginals.html)
*   [Swarmplot](https://seaborn.pydata.org/examples/scatterplot_categorical.html)
*   [3D Plots -> import needed](https://python-graph-gallery.com/370-3d-scatterplot/)

You can also set your own [style](https://medium.com/@andykashyap/top-5-tricks-to-make-plots-look-better-9f6e687c1e08)

In [0]:
#your code below
