## Introducing `seaborn`

There are several popular data visualization packages.  `seaborn` is one of them. It is built on `matplotlib` similarly to how `pandas` is built on top of `numpy`.

### Loading `seaborn`

If `seaborn` is not installed on your computer, you can do so with the below command:

In [None]:
# Install seaborn
!pip install seaborn

If you remember one thing from this class, remember when and how to import modules! `sns` is a common alias for `seaborn`:

In [3]:
# Let's get started with seaborn!
import seaborn as sns

ModuleNotFoundError: No module named 'seaborn'

Great! Now that `seaborn` is ready to use, let's read in our data. We will be looking at vehicle mileage statistics. 

In [5]:
import pandas as pd
mpg = pd.read_csv('data/mpg.csv')
mpg.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
0,16.0,8,400.0,230,4278,9.5,73,1,pontiac grand prix
1,14.0,8,455.0,225,4425,10.0,70,1,pontiac catalina
2,14.0,8,455.0,225,3086,10.0,70,1,buick estate wagon (sw)
3,12.0,8,455.0,225,4951,11.0,73,1,buick electra 225 custom
4,14.0,8,454.0,220,4354,9.0,70,1,chevrolet impala


## Data visualization with `seaborn`

Like with most topics in this class, we are just scratching the surface of what's possible with data visualization in Python. 

To learn more, check out the [`seaborn` documentation] or the resources at the end of this book.

You may be familiar with the types of charts we'll build here. They are all common practices and familiar to build in Excel.

However, we won't building graphs through a menu, but by code! 

Fortunately, because you internalized how to work with DataFrames in the last section, you're in a great place to begin building `seaborn` plots. 

We will pass particular column(s) from our DataFrame to plot in `seaborn`.

We'll get started plotting one variable, or *univariate* plotting, and then move to two, or *bivariate*. 

# Univariate plotting

### Histograms

A histogram displays how many observations are found within given intervals of a variable. 

We can use `distplot()` to plot a histogram in `seaborn`.

Let's look at the distribution of the `mpg` variable. Because this is a DataFrame, we will refer to specific column(s) using bracket `[]` notation:

In [None]:
# Plot a histogram of mpg
sns.distplot(mpg['mpg'])

With a bar chart we will count up the observations of each value of a category. For example we could find how many observations are placed for each category. 

In [5]:
# Bar chart

baseball.head()

Unnamed: 0,playerID,birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity,deathYear,deathMonth,deathDay,...,nameLast,nameGiven,weight,height,bats,throws,debut,finalGame,retroID,bbrefID
0,aardsda01,1981.0,12.0,27.0,USA,CO,Denver,,,,...,Aardsma,David Allan,215.0,75.0,R,R,2004-04-06,2015-08-23,aardd001,aardsda01
1,aaronha01,1934.0,2.0,5.0,USA,AL,Mobile,,,,...,Aaron,Henry Louis,180.0,72.0,R,R,1954-04-13,1976-10-03,aaroh101,aaronha01
2,aaronto01,1939.0,8.0,5.0,USA,AL,Mobile,1984.0,8.0,16.0,...,Aaron,Tommie Lee,190.0,75.0,R,R,1962-04-10,1971-09-26,aarot101,aaronto01
3,aasedo01,1954.0,9.0,8.0,USA,CA,Orange,,,,...,Aase,Donald William,190.0,75.0,R,R,1977-07-26,1990-10-03,aased001,aasedo01
4,abadan01,1972.0,8.0,25.0,USA,FL,Palm Beach,,,,...,Abad,Fausto Andres,184.0,73.0,L,L,2001-09-10,2006-04-13,abada001,abadan01


A histogram is good for when you want to visualize the *distribution*. Let's plot the distribution of weights.

## DRILLS

1. Plot the count of players that their birth country. 
2. Plot the distribution of player heights. 

# Bivariate plotting

- Stacked bar chart
- Box plot
- Scatter plots



A box plot is good for when we want to compare the distribution of the same continuous variable across multiple categories.

For example we could compare the distribution of heights for right- versus left-handed players. 

If we were interested in the *count* of categories rather than their distribution we could use a stacked box plot. 