In [2]:
# Reference: https://jupyterbook.org/interactive/hiding.html
# Use {hide, remove}-{input, output, cell} tags to hiding content

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import display
import myst_nb

sns.set()
sns.set_context('talk')
np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.set_option('display.max_rows', 7)
pd.set_option('display.max_columns', 8)
pd.set_option('precision', 2)
# This option stops scientific notation for pandas
# pd.set_option('display.float_format', '{:.2f}'.format)

def display_df(df, rows=pd.options.display.max_rows,
               cols=pd.options.display.max_columns):
    with pd.option_context('display.max_rows', rows,
                           'display.max_columns', cols):
        display(df)

(ch:viz)=
# Data Visualization

A well-known scientist once said:

> There is a magic in graphs. The profile of a curve reveals in a ﬂash a whole
> situation — the life history of an epidemic, a panic, or an era of
> prosperity. The curve informs the mind, awakens the imagination, convinces.
> {cite}`brintonGraphic1939`.

As data scientists, we create data visualizations in order to understand our
data and explain our analyses to other people. Every plot has a message. And
it's our job to use plots to communicate this message as clearly as
possible.

In the first half of this chapter, we dive into creating plots in Python using
`seaborn` and `matplotlib`, two popular packages for plotting in
Python. While these packages make it easy to create plots, we'll see that 
these plots often need adjustments to be useful.

In the second half of this chapter, we'll discuss principles of effective data
visualizations. We specifically talk about how to: choose scales for axes,
handle large amounts of data with smoothing and aggregation, facilitate
meaningful comparisons, incorporate the study design, and add contextual
information. Each section also contains with code that implements
these principles using Python.

The sequence of topics in this chapter is designed to mimic real-world
practice. During an analysis, we often want to create basic plots as quickly as
possible while exploring their data. Here, we can use the simple default plots
that `seaborn` and `matplotlib` implement. After we decide on what analysis
to do, we can then fine-tune the plots. In this step, we'll apply principles of
visualization to make our plots useful for a broad audience.