In [1]:
# Reference: https://jupyterbook.org/interactive/hiding.html
# Use {hide, remove}-{input, output, cell} tags to hiding content

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import display

sns.set()
sns.set_context('talk')
np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.set_option('display.max_rows', 7)
pd.set_option('display.max_columns', 8)
pd.set_option('precision', 2)
# This option stops scientific notation for pandas
# pd.set_option('display.float_format', '{:.2f}'.format)

def display_df(df, rows=pd.options.display.max_rows,
               cols=pd.options.display.max_columns):
    with pd.option_context('display.max_rows', rows,
                           'display.max_columns', cols):
        display(df)

(sec:scope_naturalphenomenon)=
# Measuring Natural Phenomenon

The Venn diagram introduced for observing a target population can be extended to the situation where we want to measure a quantity such as the count of particles in the air, the age of a fossil, the speed of light, etc. In these cases we consider the quantity we want to measure as an unknown value. (This unkown value referred is often referred to as a *parameter*.) In our diagram, we shrink the target to a point that represents this unknown. The instrument’s accuracy acts as the frame, and the sample consists of the measurements taken by the instrument within the frame. You might think of the frame as a dart board, where the instrument is the person throwing the darts. If they are reasonably good, the darts land within the circle, scattered around the bullseye. The scatter of darts correspond to the measurments taken by the instrument. The target point is not seen by the dart thrower, but ideally it coincides with the bullseye. 

To illustrate the concepts of measurement error and the connection to sampling error, we examine the problem of calibrating air quality sensors.  

__EXAMPLE: Purple Air.__ 
Across the US, sensors to measure air pollution are widely used by individuals, community groups, and state and local air monitoring agencies {cite}`hug2020,owyang2020`. For example, on two days in September, 2020, approximately 600,000 Californians and 500,000 Oregonians viewed PurpleAir’s map as fire spread through their states and evacuations were planned. PurpleAir creates air quality maps from crowdsourced data that streams in from their sensors. See the map of monitor readings in Berkeley on Aug 21, 2020 (screenshot taken by Josh Hug).  

```{figure} PurpleAirConstruct.png
---
name: fig:PurpleAirConstruct
---
This representation is typical of many measurement processes. The access frame represents the measurement process which reflects the accuracy of the instrument.
```

We can think of the data scope as follows: at any location and point in time, there is a true particle composition in the air surrounding the sensor, this is our target. Our instrument, the sensor, takes many measurements, in some cases a reading every second. These form a sample contained in the access frame, the dart board. If the instrument is working properly, the measurements are centered around the bullseye, and the target coincides with the bullseye.  Researchers have found that low humidity can distort the readings so that they are too high {cite}`hug2020`. In {numref}`Chapter %s <ch:pa>`, we address how to use data science to calibrate these instruments to improve their accuracy. $\blacksquare$

We continue the dart board analogy in the next section to introduce the concepts of bias and variation, describe common ways in which a sample might not be representative of the population, and draw connections between accuracy and the protocol. 