#               Intro to Plotting
![](https://www.nobledesktop.com/image/pythondataviz.png)

## Types of plots and when to use them 
- [The Visual Vocabulary](https://gramener.github.io/visual-vocabulary-vega/#)
- [Matplotlib plots](https://matplotlib.org/stable/api/axes_api.html)

## Anatomy of a Matplotlib visual 
* Understand how to style a plot 
* What is the goal of an effective visual? 

![](https://files.realpython.com/media/anatomy.7d033ebbfbc8.png)

## Building with Matplotlib 
* Understand the difference between figure and axes objects. This is the "object-oriented" syntax, which is the preferred interface for Matplotlib. The object-oriented syntax uses <br/>
```fig, ax = plt.subplots() or fig = plt.figure().``` 

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn

In [None]:
#linspace creates a floating point array with equally spaced
#numbers from 0-1000 with a total of 25 elements
years = range(1975, 2000)
# Create a numpy array of 25 values from 0 - 1000
data = np.linspace(0, 1000, 25)
data

### Styling and labeling a plot 

In [None]:
# instantiate line plot - can label ax whatever you want
fig, ax = plt.subplots()

# passing in our x and y data into .plot function
ax.plot(years, data, color='lightblue', linewidth=4, linestyle='-.')

# Add labels for x and y axes
ax.set_xlabel('Years')
ax.set_ylabel('Random Floats')

# Add a title for the plot
ax.set_title('Practice Plot', color='blue', fontsize=20)

# add a legend 
ax.legend(["Example"], loc=4);



### Position, Length and Size 

In [None]:
# more fake data for zooming in and out 
# arange creates an array 
x = np.arange(101)
y = x**2

#### Zooming in and out using ```set_xlim(min,max) and/or set_ylim(min,max)```

In [None]:
# let's zoom in 
# Create plot, draw graph, set title
fig, ax = plt.subplots()
ax.plot(x,y)
ax.set_title("Zoomed Out")

# Set the limits of x and y to "zoom out"
ax.set_xlim(min(x)-15, max(x)+15)
ax.set_ylim(min(y)-1500, max(y)+1500);

In [None]:
# Create plot, draw graph, set title
fig, ax = plt.subplots()
ax.plot(x,y)
ax.set_title("Zoomed In")

# Set the limits of x and y to "zoom in"
ax.set_xlim(80, 100)
ax.set_ylim(6000, 10000);

#### Using ```.set_xticks() and .set_yticks() ```

In [None]:
# Create plot, draw graph, set title
fig, ax = plt.subplots()
ax.plot(x,y)
ax.set_title("More Ticks on x-axis, Fewer Ticks on y-axis")

# Customize the x and y axis ticks so x-axis has 11 and y-axis has 5
xticks = np.linspace(start=min(x), stop=max(x), num=11)
yticks = np.linspace(start=min(y), stop=max(y), num=5)
ax.set_xticks(xticks)
ax.set_yticks(yticks);

In [None]:
# Create plot, draw graph, set title
fig, ax = plt.subplots()
ax.plot(x,y)
ax.set_title('Displaying Terrible Use of xticks and yticks')

# Customize the x and y axis ticks so the max tick is higher than the max data
xticks = np.linspace(start=min(x), stop=max(x)*2, num=11)
yticks = np.linspace(start=min(y), stop=max(y)*10, num=11)
ax.set_xticks(xticks)
ax.set_yticks(yticks);

## Creating multiple plots on 1 figure

In [None]:
# Create fake data
x1 = [1, 4, 6, 8]
y1 = [10, 15, 27, 32]
x2 = [0.5, 2.2, 4.2, 6.5]
y2 = [21, 19, 9, 26]

# Create the plot
fig, ax = plt.subplots()

# Generate a line plot
ax.plot(x1, y1)

# Draw a scatter plot on same axes
ax.scatter(x2, y2)

# Add a legend
ax.legend(["Dataset 1", "Dataset 2"]);

In [None]:
# Create the plot
fig, ax = plt.subplots()

# Set the limits of x and y axes
ax.set_xlim(0, 9), ax.set_ylim(5,35)

# Generate a line plot with custom styling
ax.plot(x1, y1, color='lightblue', linewidth=3, linestyle = '-.')

# Draw a scatter plot on same axes with custom styling
ax.scatter(x2, y2, color='red', marker='x')

# Add a legend
ax.legend(["Dataset 1", "Dataset 2"]);

We can also create multiple axes (i.e. multiple subplots) within a single figure by specifying the nrows and/or ncols arguments.

For example, here we are creating 3 side-by-side axes within a figure:

In [None]:
fig, axes = plt.subplots(figsize=(11, 3), ncols=3)

In [None]:
fig, axes = plt.subplots(figsize=(3, 11), nrows=3)

__Note__: We have multiple axes now. 

In [None]:
fig, axes = plt.subplots(figsize=(11, 3), ncols=3)
axes[1].set_facecolor("orange")

## Object-oreinted syntax vs. PyPlot syntax
So far, all of these examples have used the "object-oriented" syntax, which is the preferred interface for Matplotlib. (The object-oriented syntax uses fig, ax = plt.subplots() or fig = plt.figure().)

In [None]:
# You don't need to create the plot before adding the line

# Use plot() function to create a plot using above values
plt.plot(years, data)

# Add a legend to the plot
plt.legend(["Sample Data"]);

# You will often see this line in examples, but it isn't 
# needed with %matplotlib inline
# plt.show()

## Avoiding ugly visuals 
> The research shows that the best visual patterns humans identify well are:

- Positional changes (scatter plots)
- Length changes (bar charts)

> We're not so good at understanding:

- Color hue changes 
- Area changes (pie charts)

### Too much going on 
![](https://static1.squarespace.com/static/55b6a6dce4b089e11621d3ed/55b6d08fe4b0d8b921b02f83/55b6d0b6e4b0d8b921b03a36/1438044342911/1000w/)

#### Tips - Don't use: 
- Heavy grid lines
- Unnecessary text
- Pictures surrounding the visual
- Shading or 3d components
- intense colors 

To use color well:
- highlight interesting groups

![](https://upload.wikimedia.org/wikipedia/en/timeline/15833eb5fd215305b5408fa1b9db622f.png)

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/lindseyberlin/ds-lesson-hub/master/Phase1/Visualization-MatplotlibAndSeaborn/data/titanic.csv')

In [None]:
df.info()

In [None]:
df.head()

## Let's explore!
### What are the most common last names?



In [None]:
df['Name']

The last names are part of the full name column... first we'll need to grab the last names!

So many ways to do this: 

In [None]:
# string and split method

In [None]:
# use a for loop 

In [None]:
# use a list comp

In [None]:
# how about lambda


In [None]:
# how to get groupby to work on this
df[['PassengerId', 'Last Name']].groupby(by='Last Name').count().sort_values(by='PassengerId',
                                                                             ascending=False).head()

In [None]:
# or use value counts 
df['Last Name'].value_counts().head(10)

In [None]:
most_common_ln = df['Last Name'].value_counts().head(8)
most_common_ln

In [None]:
# Plot a bar graph of the most common names 
fig, ax = plt.subplots(figsize=(10,8))


ax.bar(most_common_ln.index, most_common_ln)

# Add labels for x and y axes
ax.set_xlabel('Last Names')


# Add a title for the plot
ax.set_title('Most Common Last Names on Titanic', fontsize=15)


ax.tick_params(axis='x', labelrotation = 45)

# add a legend 
ax.legend(["Example"], loc=1);

#### Practice Problem: Turn the last name values into percentage of total population. Then create a horizontal bar chart with the percentage of total population annotated at the end of each bar. 

### Who Paid the Most Fare?
As always - a few ways we can do this: .loc versus sort_values

In [None]:
max_fare = df['Fare'].max()

In [None]:
df.loc[df['Fare'] == max_fare]

In [None]:
df.sort_values(by='Fare', ascending = False).head()

In [None]:
# plot a histogram but let's use the PyPlot syntax 
plt.hist(df['Fare']);

In [None]:
plt.figure(figsize=(8,6))
plt.hist(df['Fare'])
plt.vlines(df['Fare'].mean(),0, 750, label=f'Average Fare: {df["Fare"].mean():.2f}', color='red')
plt.vlines(df['Fare'].median(),0, 750, label=f"Median Fare: {df['Fare'].median():.2f}", color='orange')
plt.legend()
plt.show()

## Seaborn
Seaborn and Matplotlib are two of Python's most powerful visualization libraries. Seaborn uses fewer syntax and has stunning default themes and Matplotlib is more easily customizable through accessing the classes.

Matplotlib provides the basic functionality for creating plots and filling them with different kinds of shapes and colors, Seaborn takes this functionality a step farther by providing a bunch of ready-made mathematical visualizations that are commonly used in Data Science. Best of all, Seaborn is written to be simple to use and easy to understand, so most visualizations only take 1 or 2 lines of code!

In [None]:
import seaborn as sns
sns.set(style='darkgrid')
# Same as sns.set_style('darkgrid')
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Let's take a look at how these columns overlap
sns.pairplot(df, hue='Survived');

In [None]:
# Let's try a categorical plot (used to be a factor plot)
sns.catplot(x='Pclass', y='Age', data=df, col='Survived');