(communicate-plots)=
# Graphics for Communication

## Introduction

In this chapter, you'll learn about using visualisation to communicate.

There are a plethora of options (and packages) for data visualisation using code. First, though a note about the different philosophies of data visualisation. There are broadly two categories of approach to using code to create data visualisations: imperative, where you build what you want, and declarative, where you say what you want. Choosing which to use involves a trade-off: imperative libraries offer you flexibility but at the cost of some verbosity; declarative libraries offer you a quick way to plot your data, but only if it’s in the right format to begin with, and customisation may be more difficult.

There are also different purposes of data visualisation. It can be useful to bear in mind the three broad categories of visualisation that are out there:

- exploratory

- scientific

- narrative

Python has packages that cover all three of these.

In {ref}`exploratory-data-analysis`, you learned how to use plots as tools for *exploration*.
When you make exploratory plots, you know---even before looking---which variables the plot will display.
You made each plot for a purpose, could quickly look at it, and then move on to the next plot.
In the course of most analyses, you'll produce tens or hundreds of plots, most of which are immediately thrown away. Exploratory visualisation is usually quick and dirty, and flexible too.

The second kind, *scientific visualisation*, is the prime cut of your exploratory visualisation. It’s the kind of plot you might include in a more technical paper, the picture that says a thousand words. The first image of a black hole {cite}`akiyama2019first` is a prime example of this. You can get away with having a high density of information in a scientific plot because it's designed for specialists. Ensuring that important values can be accurately read from the plot is especially important in these kinds of charts. But they can also be the kind of plot that presents the killer results in a study; they might not be exciting to people who don’t look at charts for a living, but they might be exciting and, just as importantly, understandable by your peers.

The third and final kind is *narrative visualisation*, and it is the focus of this chapter—though we'll only scratch the surface. This is the one that requires the most thought in the step where you go from the first view to the end product because your audience will likely not share your background knowledge and will not be deeply invested in the data. It’s a visualisation that doesn’t just show a picture, but gives an insight. These are the kind of visualisations that you might see in the Financial Times, The Economist, or on the BBC News website. They come with aids that help the viewer focus on the aspects that the creator wanted them to (you can think of these aids or focuses as doing for visualisation what bold font does for text). They’re well worth using in your work, especially if you’re trying to communicate a particular narrative, and especially if the people you’re communicating with don’t have deep knowledge of the topic. You might use them in a paper that you hope will have a wide readership, in a blog post summarising your work, or in a report intended for a policymaker.

In [None]:
# remove cell
import matplotlib_inline.backend_inline
import matplotlib.pyplot as plt

# Plot settings
plt.style.use("https://github.com/aeturrell/python4DS/raw/main/plot_style.txt")
matplotlib_inline.backend_inline.set_matplotlib_formats("svg")

### Prerequisites

As well as **pandas**, you will need to install the declarative visualisation package **seaborn** for this chapter. This chapter uses the next generation version of **seaborn**, which can be installed by running the following on the command line (aka in the terminal): 

```bash
pip install --pre seaborn
```

Although it will get installed when you install **seaborn**, we'll also be using the powerful imperative visualisation library that **seaborn** builds on, **matplotlib**.

You'll need to import the **seaborn** and **pandas** libraries into your session using

In [None]:
import seaborn.objects as so
import pandas as pd

## Labels and Titles

The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels. This example plot axis labels:

In [None]:
# load the data
mpg = pd.read_csv(
    "https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/mpg.csv", index_col=0
)

Now let's do the plot with a title by passing the `title=` keyword argument into the `label` property.

In [None]:
(
    so.Plot(mpg, x="displ", y="hwy")
    .add(so.Dot())
    .label(title="Fuel efficiency generally decreases with engine size")
)

The purpose of a plot title is to summarise the main finding. Avoid titles that just describe what the plot is, e.g. "A scatterplot of engine displacement vs. fuel economy".

If you need to add more text, there are two other useful labels that you can use:

-   `subtitle` adds additional detail in a smaller font beneath the title.

-   `caption` adds text at the bottom right of the plot, often used to describe the source of the data.


You can use `.label` to replace the axis and legend titles. It's usually a good idea to replace short variable names with more detailed descriptions, and to include the units.

In [None]:
(
    so.Plot(mpg, x="displ", y="hwy")
    .add(so.Dot())
    .label(x="Engine displacement (L)", y="Highway fuel economy (mpg)")
)

It's possible to use mathematical equations and functions instead of text strings:

In [None]:
(so.Plot(mpg, x="displ", y="hwy").add(so.Dot()).label(y=str.capitalize, x=r"$x^{y-z}$"))

## Annotations

[TODO]

## Scales

The third way you can make your plot better for communication is to adjust the scales.
Scales control the mapping from data values to things that you can perceive.
Normally, **seaborn** automatically adds scales for you.
For example, when you type:

In [None]:
(so.Plot(mpg, x="displ", y="hwy", color="class").add(so.Dot()))

**seaborn** automatically adds default scales behind the scenes:

In [None]:
(
    so.Plot(mpg, x="displ", y="hwy", color="class")
    .add(so.Dot())
    .scale(
        x=so.Continuous(),
        y=so.Continuous(),
        color=so.Nominal(),
    )
)

Note the naming scheme for scales: `.scale` followed by the name of the dimension, then `=so.`, then the name of the scale.
The default scales are named according to the type of variable they align with: continuous, nominal, and so on.

The default scales have been carefully chosen to do a good job for a wide range of inputs.
Nevertheless, you might want to override the defaults for two reasons:

-   You might want to tweak some of the parameters of the default scale.
    This allows you to do things like change the breaks on the axes, or the key labels on the legend.

-   You might want to replace the scale altogether, and use a completely different algorithm.
    Often you can do better than the default because you know more about the data.

```{admonition} Exercise
Try a plot with a scale setting of `x="log"`.
```

### Axis Ticks

You can specify axis ticks directly using the `tick` property on the `Scale` parameter:

In [None]:
(
    so.Plot(mpg, x="displ", y="hwy", color="class")
    .add(so.Dot())
    .scale(
        x=so.Continuous(),
        y=so.Continuous().tick(at=[0, 10, 20, 30, 40]),
        color=so.Nominal(),
    )
)

### Legend Keys

### Legend Layout

[TODO]

### Limits, aka 'zooming'

There are two ways to control the plot limits:

1.  Adjusting what data are plotted
2.  Setting the limits in each scale

Here is the same plot done according to 1 and 2 respectively.

In [None]:
(
    so.Plot(mpg, x="displ", y="hwy", color="class")
    .add(so.Dot())
    .limit(x=(5, 7), y=(10, 30))
)

In [None]:
(
    so.Plot(
        mpg.query("displ >= 5 & displ <= 7 & hwy >= 10 & hwy <= 30"),
        x="displ",
        y="hwy",
        color="class",
    ).add(so.Dot())
)

While they convey the same information, the former looks better.

## Themes

Seaborn comes with several built-in themes that you can switch between by using

In [None]:
import seaborn as sns

sns.set_theme(style="darkgrid", palette="dark")

(so.Plot(mpg, x="displ", y="hwy", color="class").add(so.Dot()))

Note that you can also create your own themes using **matplotlib**, the library that sits under **seaborn** (this book uses a custom theme).


## Saving Plots

There are lots of output options to choose from to save your file to. Remember that, for graphics, *vector formats* are generally better than *raster formats*. In practice, this means saving plots in svg or pdf formats over jpg or png file formats. The svg format works in a lot of contexts (including Microsoft Word) and is a good default. To choose between formats, just supply the file extension and the file type will change automatically, eg "chart.svg" for svg or "chart.png" for png (thought note that raster formats often have extra options, like how many dots per inch to use).

In [None]:
(so.Plot(mpg, x="displ", y="hwy", color="class").add(so.Dot()).save("output_chart.svg"))

To double check this works, let's use the terminal. We'll try the command `ls`, which lists everything in directory, and `grep *.svg` to pull out any files that end in `.svg` from what is returned by `ls`. These are strung together as commands by a `|`. (Note that the leading exclamation mark below just tells the software that builds this book to use the terminal.)

In [None]:
!ls | grep *.svg

Great! It looks like our file saved successfully.

In [None]:
# remove-cell
import os

os.remove("output_chart.svg")