In [1]:
import pandas as pd 
import seaborn as sns

In [2]:
tips = sns.load_dataset("tips")
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


# Goals of Plotting 

For now, we're going to take a really limited view of what you want to accomplish when you're dealing with data -- you want to ask a question of the data and see it come back to you **as quickly as possible**. Because you're data scientist who is asking the question, we're going to presume several things: 

1. You know what the axes are
2. You know what you're interested in for the plot 
3. You're not space constrained, so you don't have to worry about tinkering with `['this', 'that', 'the other thing']`. 

This isn't always the case -- when you do work, and give it to me to learn from you asyncronously, I lack the context to know what you're interested in (so you have to provide a caption), I *don't* understand how the variable names relate to what is actually in the variable (so you have to provide *descriptive* variable names), I don't know what you think you've learned from the plot or how it fits into your argument (so you have to write a paragraph that supports the plot). 

Here: our goal is that you can learn, as quickly as possible, about your data.

# Seaborn 

To meet the goal of "learn as much as possible, as quickly as possible" we're going to start using the `Seaborn` module. Seaborn is a high-level plotting module. It is built on top of `matplotlib`. The module is opinionated about how you should do your work. Most of the time, but not always, its opinions about how you should do your work are reasonable. 

## Opinion #1

You should not *always* have to index every variable with with data frame that it comes from. Especially when you're plotting, either (a) you're using a single variable; or (b) the collection of your variables should be reasonably well-arranged. It is highly unlikely that you would have data in two different objects. If you do, you probably shouldn't. 

**Consequence:** Most of the Seaborn api takes a `data = ` argument. For example

    sns.distplot(x='the_x_var', data = df); 

   
## Opinion #2

Of course you want to tinker. But tinkering is a way to waste time that you should be spending knowing what is in your data. Not making a plot "just-so". 

**Consequence**: Seaborn makes a reasonable set of assumptions about how you want your data to look. You're *much* better off if you consent to letting these be your defaults. If you insist, you can dig into the plot using `matplotlib`, but in my experience this is a boondoggle. 

## Opinion #3

There are three families of plots. Well, really there are two families, but in one of those families they don't get along very well, and so they've been separated. 

![the_crown](https://media.giphy.com/media/3oriNNdygUAoeXuV4A/giphy.gif)


**Consequence**: 

1. Plot type #1: **Relational Plots**. Relational plots work on the familiar x-y coordinate grid. They make a literal mapping of x-values and y-values onto this grid. 

       sns.relplot(x = 'x_var', y = 'y_var', data = df); 
       
  1. We can represent more information into these plots: 
    1. Adding a `hue =` will set the colors of points based on another series this can be continuous or categorical
    2. Adding a `style =` will set the shape of the point. This has to be a categorical series 
    3. Adding a `col=` or `row=` will create separate plots along a row or column. This has to be a categorical series. 

2. Plot type #2. **Distribution Plots**. Relational plots are also displayed in the x-y axis, but the make a mapping of the data between where it sits in your Data Frame and how it is displayed in the plot. Most frequently this is a binned count of values -- these bins might be coarse, in which case you get a histogram. Or they might be quite fine, in which case you get a kernel-density plot. 

       sns.distplot(x = 'x_var', data = df);
       sns.kdeplot(x = 'x_var', data = df)

3. Plot type #3. The feuding party of the distribution plot family -- **Categorical Plots**. With categories -- for example the things that we might group by -- we might want to display an outcome within each of those categories. Rather than mapping categorical data onto the x-axis through some relatively complex translation, `seaborn` just creates a new type of plot that is a category plot 

       sns.catplot(x = 'x_var' , y = 'total_bill', data = df);