14.2. Labels, titles, and other contextual information  

http://aeturrell.github.io/python4DS/communicate-plots.html#labels-titles-and-other-contextual-information

In [1]:
import pandas as pd

from lets_plot import *

LetsPlot.setup_html()

# load the data
mpg = pd.read_csv(
    "https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/mpg.csv", index_col=0
)

In [2]:
(ggplot(mpg, aes(x="displ", y="hwy")) + geom_point())

In [3]:
# Now we’re going to add lots of extra useful information that will make the chart better. The purpose of a plot title is to summarize the main finding. Avoid titles that just describe what the plot is, e.g., “A scatterplot of engine displacement vs. fuel economy”.

# We’re going to:

# add a title that summarises the main finding you’d like the viewer to take away (as opposed to one just describing the obvious!)

# add a subtitle that provides more info on the y-axis, and make the x-label more understandable

# remove the y-axis label that is at an awkward viewing angle

# add a caption with the source of the data

# Putting this all in, we get:


(
    ggplot(mpg, aes(x="displ", y="hwy"))
    + geom_point(aes(colour="class"))
    + geom_smooth(se=False, method="loess", size=1)
    + labs(
        title="Fuel efficiency generally decreases with engine size",
        subtitle="Highway fuel efficiency (miles per gallon)",
        caption="Source: fueleconomy.gov",
        y="",
        x="Engine displacement (litres)",
    )
)

In [4]:
(
    ggplot(mpg, aes(x="displ", y="hwy"))
    + geom_point(aes(colour="class"))
    + geom_smooth(se=False, method="loess", size=1)
    + labs(
        x="Engine displacement (L)",
        y="Highway fuel economy (mpg)",
        colour="Car type",
        title="Fuel efficiency generally decreases with engine size",
        subtitle="Two seaters (sports cars) are an exception because of their light weight",
        caption="Source: fueleconomy.gov",
    )
)

In [7]:
# In addition to labelling major components of your plot, it’s often useful to label individual observations or groups of observations. The first tool you have at your disposal is geom_text(). geom_text() is similar to geom_point(), but it has an additional aesthetic: label. This makes it possible to add textual labels to your plots.

# There are two possible sources of labels: ones that are part of the data, which we’ll add with geom_text(); and ones that we add directly and manually as annotations using geom_label().

# In the first case, you might have a data frame that contains labels. In the following plot we pull out the cars with the highest engine size in each drive type and save their information as a new data frame called label_info. In creating it, we pick out the mean values of “hwy” by “drv” as the points to label—but we could do any aggregation we feel would work well on the chart.


#EXPLANATION OF CODE-

# This code creates a summary table showing the average highway MPG and engine displacement for each drive type. Let me break it down:

# **1. Create a mapping dictionary:**
# ```python
# mapping = {
#     "4": "4-wheel drive",
#     "f": "front-wheel drive",
#     "r": "rear-wheel drive",
# }
# ```
# Maps abbreviations to full descriptions.

# **2. `mpg.groupby("drv")`**
# Groups the data by the `drv` column (drive type: 4, f, or r).

# **3. `.agg({"hwy": "mean", "displ": "mean"})`**
# Aggregates the grouped data by calculating:
# - Mean of `hwy` (highway MPG) for each drive type
# - Mean of `displ` (engine displacement) for each drive type

# **4. `.reset_index()`**
# Converts `drv` from an index back into a regular column, making the dataframe easier to work with.

# **5. `.assign(drive_type=lambda x: x["drv"].map(mapping))`**
# Creates a new column called `drive_type` by:
# - Taking each value in the `drv` column
# - Looking it up in the `mapping` dictionary
# - Replacing "4" → "4-wheel drive", "f" → "front-wheel drive", etc.

# **6. `.round(2)`**
# Rounds all numeric values to 2 decimal places.

# **Result:**
# A clean summary table with columns: `drv`, `hwy`, `displ`, and `drive_type`, showing average values for each drive type with readable labels.

mapping = {
    "4": "4-wheel drive",
    "f": "front-wheel drive",
    "r": "rear-wheel drive",
}
label_info = (
    mpg.groupby("drv")
    .agg({"hwy": "mean", "displ": "mean"})
    .reset_index()
    .assign(drive_type=lambda x: x["drv"].map(mapping))
    .round(2)
)
label_info

Unnamed: 0,drv,hwy,displ,drive_type
0,4,19.17,4.0,4-wheel drive
1,f,28.16,2.56,front-wheel drive
2,r,21.0,5.18,rear-wheel drive


In [9]:
# Then, we use this new data frame to directly label the three groups to replace the legend with labels placed directly on the plot. Using the fontface and size arguments we can customize the look of the text labels. They’re larger than the rest of the text on the plot and bolded. (theme(legend.position = "none") turns all the legends off — we’ll talk about it more shortly.)

(
    ggplot(mpg, aes(x="displ", y="hwy", color="drv"))
    + geom_point(alpha=0.5)
    + geom_smooth(se=False, method="loess")
    + geom_text(
        aes(x="displ", y="hwy", label="drive_type"),
        data=label_info,
        fontface="bold",
        size=8,
        hjust="left",
        vjust="bottom",
    )
    + theme(legend_position="none")
)