# Chapter 10. Graphics for Communication

By now, you should have plenty of experience making visualizations from `pandas` DataFrames. But how do you give your graphics the professional polish necessary for a presentation or a publication? This chapter discusses some general strategies for making professional-looking, information-dense graphics.

# 10.1 Polishing Graphics Using Matplotlib

The `pandas` plotting functions are built on top of `matplotlib`, the basic plotting library in Python. In this section, we will look at how to use `matplotlib` to polish graphics that were made using `pandas` plotting.

In [None]:
import pandas as pd
%matplotlib inline

titanic = pd.read_csv("/data301/data/titanic.csv")
housing = pd.read_csv("/data301/data/AmesHousing.txt", sep="\t")

## Labels and Titles

In Chapter 2, we made a graphic where we showed the number of survivors by `sex` and `pclass`. But there is no way to tell from this graph that what is being plotted is the number of survivors.

In [None]:
survival_counts = titanic.groupby(["pclass", "sex"]).survived.sum().unstack()
survival_counts.plot.bar()
survival_counts

We need to add a label to the $y$-axis. Since the `pandas` plotting functions are just wrappers around `matplotlib`, we can call `matplotlib` to set the label for the $y$-axis manually. The function `plt.ylabel()` sets the label for the $y$-axis.

In [None]:
import matplotlib.pyplot as plt

survival_counts.plot.bar()
plt.ylabel("Number of Survivors")

To set the $x$-axis and title manually, we use the `plt.xlabel()` and `plt.title()` functions.

In [None]:
survival_counts.plot.bar()
plt.ylabel("Number of Survivors")
plt.xlabel("Passenger Class")
plt.title("Number of Survivors by Class and Sex")

**Warning:** Remember that `plt.xlabel()` and `plt.ylabel()` are _functions_, not attributes. That is, you cannot set the $x$-label by writing `plt.xlabel = "Passenger Class"`. You have to call `plt.xlabel()` as a function on the string that you want to set as the label: `plt.xlabel("Passenger Class")`.

## Customizing the Axes

There are many reasons to customize the axes. We may want to restrict the range of the axes to where the data lives or change the spacing of the ticks.

In [None]:
housing.plot.scatter(x="Gr Liv Area", y="SalePrice")

Notice that there are only a handful of homes that are larger than 4000 square feet. We may want to zoom in on the homes that are smaller than 4000 square feet. We may also want to anchor the plot at the origin, so that the bottom left corner is (0, 0). To do this, we set the range of the $x$- and $y$-axes manually using the `plt.xlim()` and `plt.ylim()` functions.

In [None]:
housing.plot.scatter(x="Gr Liv Area", y="SalePrice")
plt.xlim(0, 4000)
plt.ylim(0, 700000)

To change the locations of the ticks, we can set them manually using `plt.xticks()` and `plt.yticks()`. The ticks do not have to be evenly spaced. Both `plt.xticks()` and `plt.yticks()` take an optional second argument that specify the tick labels.

In [None]:
housing.plot.scatter(x="Gr Liv Area", y="SalePrice")
plt.xticks([1000, 2000, 4000])
plt.yticks([0, 200000, 400000, 600000], 
           ["$0", "$200K", "$400K", "$600K"]);

One reason to set the ticks manually is to eliminate "chartjunk": wasted ink on a graphic. Default graphics often have too many ticks with labels that are too long. The number of ticks can be reduced and the labels shortened without any loss in clarity.

Another source of "chartjunk" is unnecessary borders. For example, we can eliminate the "spines" of the top and right axes without any loss of information. To do this in `matplotlib`, we save the `AxesSubplot` object returned by the `pandas` plotting function and turn off the top and right spines.

In [None]:
ax = housing.plot.scatter(x="Gr Liv Area", y="SalePrice")
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

We can even go so far as to turn off all of the spines, although this plot seems to lack the "structure" of the above plot. But this is a matter of taste.

In [None]:
ax = housing.plot.scatter(x="Gr Liv Area", y="SalePrice")
for key in ax.spines:
    ax.spines[key].set_visible(False)

Notice that turning the spines off does not eliminate the ticks. We can eliminate the ticks by passing an empty list to `plt.xticks()` or `plt.yticks()`.

In [None]:
ax = housing.plot.scatter(x="Gr Liv Area", y="SalePrice")
for key in ax.spines:
    ax.spines[key].set_visible(False)
plt.xticks([])
plt.yticks([]);

## Style Sheets

The default `matplotlib` style sheet ("default") is acceptable, but leaves something to be desired aesthetically. You can choose a different style sheet using `plt.style.use()`. For example, to make plots using the "fivethirtyeight" style sheet (designed to produce graphics like the ones on [fivethirtyeight.com](http://www.fivethirtyeight.com/)), you would call `plt.style.use("fivethirtyeight")`.

In [None]:
plt.style.use("fivethirtyeight")
housing.plot.scatter(x="Gr Liv Area", y="SalePrice")

To see a list of all of the available `matplotlib` themes, print `plt.style.available`.

In [None]:
print(plt.style.available)

The choice of styles is mostly aesthetic. But there are settings where a particular style is more appropriate. For example, the "dark_background" style helps the graphic blend in if it will be inserted into a presentation with a black background.

In [None]:
plt.style.use("dark_background")
housing.plot.scatter(x="Gr Liv Area", y="SalePrice")

The "grayscale" style is useful if the graphic will be printed in grayscale, instead of in color.

In [None]:
plt.style.use("grayscale")
survival_counts.plot.bar()

Suppose you like everything about the "grayscale" style sheet, except for the gray background. You can override any aspect of the style sheet by setting the appropriate parameter in the `rcParams` object of `matplotlib`. `rcParams` is essentially a dictionary that keeps track of the style of every element of a plot, from the figure size to the color cycle. A style sheet is nothing more than specification of the parameters in `rcParams`. For example, we see that the reason the background is gray is because the "figure.facecolor" parameter is set to "0.75" (with "0" being black and "1" being white).

In [None]:
from matplotlib import rcParams
rcParams["figure.facecolor"]

We can make the background color white by setting the "figure.facecolor" parameter to "1".

In [None]:
plt.style.use("grayscale")
rcParams["figure.facecolor"] = "1"
survival_counts.plot.bar()

Although it is not one of the available styles listed, we can revert to the original style at any time by setting the style to "default".

In [None]:
plt.style.use("default")
housing.plot.scatter(x="Gr Liv Area", y="SalePrice")

## Figure Size

You might be surprised that the figure became larger when we returned to the "default" style. The reason for this is technical. The "default" style sheet specifies that the figure size should be (6.4, 4.8). But this parameter setting is overwritten by the "inline" backend of IPython (which is loaded when we call `%matplotlib inline`), which specifies that the figure size should be (6.0, 4.0). When we call `plt.style.use("default")` explicitly, the figure size is set to (6.4, 4.8), which is why it looks bigger than before.

To set the figure size explicitly, we can use the `figuresize=` argument of the `pandas` plotting method.

In [None]:
housing.plot.scatter(x="Gr Liv Area", y="SalePrice", figsize=(12, 8))

## Saving

To save a figure to disk, use `plt.savefig()`. `matplotlib` will usually be able to determine the right format from the file extension.

In [None]:
housing.plot.scatter(x="Gr Liv Area", y="SalePrice")
plt.savefig("sqft_vs_price.png")

# Exercises

**Exercise 1.** Make a publication-ready graphic that communicates the information in the `tips` data set (`/data301/data/tips.csv`). Your graphic should have a title, self-explanatory axes labels, and reasonable axes ranges. Use an appropriate style, and save your graphic to disk.

In [None]:
# TYPE YOUR CODE HERE.