In [107]:
# Reference: https://jupyterbook.org/interactive/hiding.html
# Use {hide, remove}-{input, output, cell} tags to hiding content

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import display
import myst_nb

sns.set()
sns.set_context('talk')
np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.set_option('display.max_rows', 7)
pd.set_option('display.max_columns', 8)
pd.set_option('precision', 2)
# This option stops scientific notation for pandas
# pd.set_option('display.float_format', '{:.2f}'.format)

def display_df(df, rows=pd.options.display.max_rows,
               cols=pd.options.display.max_columns):
    with pd.option_context('display.max_rows', rows,
                           'display.max_columns', cols):
        display(df)

(ch:wrangling_summary)=
# Summary

Data wrangling is an essential part of data analysis. Without it, we risk
overlooking problems in data that can have major consequences for future
analysis. This chapter covered several important data wrangling steps that we
use in nearly every analysis.

First, we covered the initial steps of reading in data into Python. We
introduced different types of file formats and wrote code that can read in data
from these formats. The shell and command-line tools let us check whether our
dataset is small enough to read directly into `pandas`, and whether there were
potential problems in the dataset encoding. Understanding granularity gave us
insight into what kinds of joins we want to perform and whether we should
aggregate our data.

Next, we talked about what to look for in a dataset after we've read it into a
dataframe. Quality checks help us spot problems in the data. Missing values are
an especially important and common issue, and we provided guidelines on 
imputing missing values. We transform data in order to make them easier to
analyze, and we talked about transformations the modify the structure of a
dataframe.

We illustrated these techniques through two sections with more detailed
examples of data wrangling, one on the CO2 data and one on the restaurant
safety data. Together, the data wrangling techniques in this chapter prepare
the data for exploratory data analysis, the topic of the next chapter.