In [1]:
# Reference: https://jupyterbook.org/interactive/hiding.html
# Use {hide, remove}-{input, output, cell} tags to hide content

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import display, set_matplotlib_formats
import myst_nb

import plotly
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.default = 'plotly_mimetype+svg'
pio.templates['book'] = go.layout.Template(
    layout=dict(
        margin=dict(l=10, r=10, t=10, b=10),
        autosize=True,
        width=350, height=250,
    )
)
pio.templates.default = 'seaborn+book'

set_matplotlib_formats('svg')
sns.set()
np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.set_option('display.max_rows', 7)
pd.set_option('display.max_columns', 8)
pd.set_option('precision', 2)
# This option stops scientific notation for pandas
# pd.set_option('display.float_format', '{:.2f}'.format)

def display_df(df, rows=pd.options.display.max_rows,
               cols=pd.options.display.max_columns):
    with pd.option_context('display.max_rows', rows,
                           'display.max_columns', cols):
        display(df)

(ch:lifecycle)=
# The Data Science Lifecycle

Data science is a rapidly evolving field.
At the time of this writing people are still trying to pin down exactly
what data science is, what data scientists do, and what skills data 
scientists should have.
What we do know, though, is that data science uses a combination of 
methods and principles to draw insights from data.
We use these insights to make all sorts of important decisions. 
Data science lets scientists see whether a vaccine works,
helps computer programs filter out spam from our email inboxes,
and tells urban planners where to build new roads.


This book covers fundamental principles and skills
that data scientists use to perform analyses.
To help you remember the bigger picture, we've organized these topics
around a workflow for analysis that we call the *data science lifecycle*.
This chapter introduces the data science lifecycle.
It also provides a map for the rest of the book by showing you where 
each chapter fits into the lifecycle.
Unlike other books that focus on one part of the lifecycle, this book
covers the entire lifecycle from start to finish.
We'll teach you how to perform your own data analyses and draw sound
conclusions. We'll explain theoretical principles and show how they work in
practical case studies.
Throughout the book, we'll rely on real data from analyses by other data
scientists, not made-up data.

```{figure} figures/ds-lifecycle.svg
---
name: ds-lifecycle
---

This diagram of the data science lifecycle shows its four high-level steps.
The arrows show how the steps lead into one another.
```

{numref}`Figure %s <ds-lifecycle>` shows the data science lifecycle.
It's split into four stages: asking a question, obtaining data, 
understanding the data, and understanding the world.
We've made the stages very broad on purpose.
In our experience, the mechanics of a data analysis change all the time.
Programmers continue to build new software packages and programming languages
for analysis.
Statisticians discover new techniques that are more useful than the
old ones. 
Despite these changes, we've found that almost every data analysis follows
the four steps in our lifecycle.
In this chapter, we'll discuss the
individual stages of the lifecycle and provide a map for the rest of the book
by showing how each chapter of the book falls into one of the lifecycle stages.