(communicate-plots)=
# Graphics for Communication

## Introduction

In this chapter, you'll learn about using visualisation to communicate. In {ref}`exploratory-data-analysis`, you learned how to use plots as tools for *exploration*.
When you make exploratory plots, you know—even before looking—which variables the plot will display.
You made each plot for a purpose, quickly looked at it, and then moved on to the next plot.
In the course of most analyses, you'll produce tens or hundreds of plots, most of which are immediately thrown away.

Now that you understand your data, you need to *communicate* your understanding to others.
Your audience will likely not share your background knowledge and will not be deeply invested in the data. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible. In this chapter, you'll learn some of the tools that **lets-plot** provides to do make charts tell a story.

### Prerequisities

As ever, there are a plethora of options (and packages) for data visualisation using code. We're focusing on the declarative, "grammar of graphics" approach using **lets-plot** here, but advanced users looking for more complex graphics might wish to use an imperative library such as the excellent **matplotlib**. You should have both **lets-plot** and **pandas** installed. Once you have them installed, import them like so:

In [None]:
# remove cell
import matplotlib_inline.backend_inline
import matplotlib.pyplot as plt

# Plot settings
plt.style.use("https://github.com/aeturrell/python4DS/raw/main/plot_style.txt")
matplotlib_inline.backend_inline.set_matplotlib_formats("svg")

In [None]:
from lets_plot import *
import pandas as pd

LetsPlot.setup_html()

## Labels and Titles

The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels, titles, and other contextual information. Let's look at an example using the MPG (miles per gallon) data, which covers the fuel economy for 38 popular models of cars from 1999 to 2008.

In [None]:
# load the data
mpg = pd.read_csv(
    "https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/mpg.csv", index_col=0
)

We want to show fuel efficiency on the highway changes with engine displacement, in litres. The most basic chart we can do with these variables is:

In [None]:
(ggplot(mpg, aes(x="displ", y="hwy")) + geom_point())

Now we're going to add lots of extra useful information that will make the chart better. The purpose of a plot title is to summarize the main finding.
Avoid titles that just describe what the plot is, e.g., "A scatterplot of engine displacement vs. fuel economy".

We're going to:

- add a title that summarises the main finding you'd like the viewer to take away (as opposed to one just describing the obvious!)
- add a subtitle that provides more info on the y-axis, and make the x-label more understandable
- remove the y-axis label that is at an awkward viewing angle
- add a caption with the source of the data

Putting this all in, we get:

In [None]:
(
    ggplot(mpg, aes(x="displ", y="hwy"))
    + geom_point()
    + labs(
        title="Fuel efficiency generally decreases with engine size",
        subtitle="Highway fuel efficiency (miles per gallon)",
        caption="Source: fueleconomy.gov",
        y="",
        x="Engine displacement (litres)",
    )
)

This is much clearer. It's easier to read, we know where the data come from, and we can see *why* we're being shown it too.

But maybe we want a different message? You can flex depending on your needs, and some people prefer to have a rotated y-axis so that the subtitle can provide even more context:

In [None]:
(
    ggplot(mpg, aes(x="displ", y="hwy"))
    + geom_point(aes(color="class"))
    + geom_smooth(se=False, method="loess", size=2)
    + labs(
        x="Engine displacement (L)",
        y="Highway fuel economy (mpg)",
        color="Car type",
        title="Fuel efficiency generally decreases with engine size",
        subtitle="Two seaters (sports cars) are an exception because of their light weight",
        caption="Source: fueleconomy.gov",
    )
)

### Exercises

1.  Create one plot on the fuel economy data with customized `title`, `subtitle`, `caption`, `x`, `y`, and `color` labels.

2.  Recreate the following plot using the fuel economy data.
    Note that both the colors and shapes of points vary by type of drive train.

In [None]:
(
    ggplot(mpg, aes(x="cty", y="hwy", color="drv", shape="drv"))
    + geom_point()
    + labs(
        x="City MPG",
        y="Highway MPG",
        shape="Type of\ndrive train",
        color="Type of\ndrive train",
    )
)

3.  Take an exploratory graphic that you've created in the last month, and add informative titles to make it easier for others to understand.

## Annotations

In addition to labelling major components of your plot, it's often useful to label individual observations or groups of observations.
The first tool you have at your disposal is `geom_text()`.
`geom_text()` is similar to `geom_point()`, but it has an additional aesthetic: `label`.
This makes it possible to add textual labels to your plots.

There are two possible sources of labels.
First, you might have a dataframe that contains labels.
In the following plot we pull out the cars with the highest engine size in each drive type and save their information as a new data frame called `label_info`.


To double check this works, let's use the terminal. We'll try the command `ls`, which lists everything in directory, and `grep *.svg` to pull out any files that end in `.svg` from what is returned by `ls`. These are strung together as commands by a `|`. (Note that the leading exclamation mark below just tells the software that builds this book to use the terminal.)

In [None]:
!ls | grep *.svg

In [None]:
# remove-cell
import os

os.remove("output_chart.svg")