# Synopsis

In this session, we will learn about measurement.

We will discuss types of data, measurement instruments and their limitations.

# Read libraries

In [None]:
from IPython.core.display import HTML
from IPython.lib.display import YouTubeVideo

# Statistics

**Statistics is the science concerned with the study of the collection, analysis, interpretation, presentation, and organization of data.**

For concreteness let us consider two examples that hopefully you can relate to.

**The US Census Bureau collects data on people residing in the US**. The data includes measures such as the number of household occupants, their gender, their age, their familial relationships, their incomes, their education levels, and so on. The process for the data collection is called a **census** because the process **aims to measure all individuals** in the population.

**Colleges collect applications for their programs**. The data collected includes standardized test scores, GPAs, essays, recommendation letters, and so on. No college is able to collect a census, so each must make decisions based on their **sample** of applicants. Each college is able to analyze a sample comprising a very small fraction of all students applying to college each year.  


## Data

So what is data? Glad you ask!

Let's look at the census. A census survey asks people lots of questions:

* how many people live in the household
* which census block is the household in
* what are their names
* what are their genders
* what are their ethnic backgrounds
* what are their races
* what are their ages
* what are their marital statuses
* what are their education levels
* what are their income levels
* whether they own or rent their home
* what are their employment statuses
* whether they receive Social Security benefits
* etc.

All of these types of information are data that can be $-$ and are $-$ analyzed.  While the census surveys do not ask us this, many other entities do ask us about what products we consume, what shows we watch, what sport teams we root for, what music we listen to, and so on.  All of these types of information are also data.

As such, **all of them are the purview of statistics**.

# Measurement

**In order to obtain data we need to take measurements**.

If you are paying attention, you will ask **measurement with what**?

> In order to make a measurement, we need a measuring **instrument**.

This may appear to be very straightforward in physical contexts. To measure a mass, you need a scale; to measure a distance, you need a ruler; to measure a temperature, you need a thermometer.

Imagine now, that I ask you what your height is. What is my measuring instrument?

<br><br><br><br>


<br><br><br><br>


<br><br><br><br>


<br><br><br><br>


<br><br><br><br>


<br><br><br><br>


**If you did not answer 'I am', you were wrong.**  Yes, you learned your height presumably when someone else used a ruler to measure your height. However, the value you report is based not only on that value but also on your memory of it, on any rounding you may have made of it in the meantime, of your desire at this moment to be taller or shorter than that value.

That means that whereas a height measured with a ruler may have an uncertainty of a fraction of an inch, the same height reported by an individual may have an uncertainty of a few inches.

Measuring instruments are much more complex that we typically consider. In fact, as described above, they may contain human cognition within the process.

**Getting back to your initial example of the census, a survey is an instrument!** 

Note that this is not necessarily a bad thing. In fact, as philosophers have argued, it all starts with our senses and cognition.


# Categorical data


Thinking about how we measure temperature is a very helpful $-$ I believe $-$ way to think through the different types of data we may encounter.

Imagine a time before we had thermometers.  That should not be hard, we have had thermometers for fewer than 500 years.  Can you imagine cave people waking up in the morning and complaining about how cold it was?  How did they do it?

According to the NIH page [Sensing temperature](https://www.nih.gov/news-events/nih-research-matters/sensing-temperature): 

> Researchers discovered that distinct sets of neurons respond to heat and cold. The findings provide an elegant explanation for how mammals sense temperature.
>
> Thermosensation $-$ the ability to detect temperature $-$ triggers our reflex to withdraw from painful heat or cold. But mammals are also able to detect more pleasant cool and warm temperatures. We sense temperature in our environment through specialized nerve cells that project into the outer layers of the skin.

So it would appear that the instrument here is the *specialized nerve cells that project into the outer layers of the skin*.  It would appear that this instrument is far from perfect.  You and I will likely disagree about what is a nice outdoor (or indoor) temperature. Moreover, as the video below show, our temperature sensing system is susceptible to recalibration. 

In [None]:
vid = YouTubeVideo('gsgiYBGiYpU', width = 600)
display(vid)

Our sensory is able to give us a mostly **categorical** set of temperature measurements: 

* freezing
* cold
* warm
* hot
* very hot

These are (mostly) non-quantitative data. In reality, under many conditions our senses are able to yield **ordinal** measurements.

Other examples of descriptive (categorical) data include:

* favorite color
* skin color
* hair color
* favorite fruit
* favorite drink
* sense of humor
* etc.

Descriptive data cannot, in principle, be ordered. We may order fruits by their name in some language, by their typical size, by their density, or by their acidity, but neither of those provides an objectively unassailable ordering.


# Ordinal data

In spite of what the hands in water experiment may suggest, there is actually a large degree of objectivity and order to our sensing of temperature.

If we were asked what we feel the temperature to be, we would find this list

<br>

<center>
<table>
    <tr><td> freezing </td><td> cold </td><td> warm </td><td> hot </td><td> very hot </td></tr>
</table>
</center>
     
as much more sensible than this one:

<center>
<table>
    <tr><td> cold </td><td> freezing </td><td> hot </td><td> very hot </td><td> warm </td></tr>
</table>
</center>

This becomes even more clear if we assign a numerical code to each value.

<table>
    <tr><td> freezing </td><td> -2 </td></tr>
    <tr><td> cold </td><td> -1 </td></tr>
    <tr><td> warm </td><td> 0 </td></tr>
    <tr><td> hot </td><td> 1 </td></tr>
    <tr><td> very hot </td><td> 2 </td></tr>
</table>

as much more sensible than this one:

<table>
    <tr><td> cold </td><td> -2 </td></tr>
    <tr><td> freezing </td><td> -1 </td></tr>
    <tr><td> hot </td><td> 0 </td></tr>
    <tr><td> very hot </td><td> 1 </td></tr>
    <tr><td> warm </td><td> 2 </td></tr>
</table>

Even though we might think that the boundary between *warm* and *hot* is different, is still makes sense to assign a larger numerical code to *hot*.

To put it in other words, **ordinal data can be ranked**.

In the context of measuring temperatures, we built **thermoscopes** before we developed thermometers.

<table>
    <tr>
        <td> <img src = 'Images/thermoscope_galileo.png' width = 300> 
        <td> <img src = 'Images/thermoscope.png' width = 390> </td>
    </tr>
</table>

The image on the left reproduces the thermoscope developed by *Galileo Galilei*. An increase in temperature of the wine in the bottle leads to an expansion of the fluid and the rising of the column of wine in the pipe.  There is no scale, so we cannot assign a precise value to a temperature but we can deduce that higher temperature will be visible as higher columns. We can even compare two temperatures if we measure the heights of the column.  Because wine's expansion is significant this thermoscope probably could distinguish temperature from about water freezing to water almost boiling.

The image on the right is a modern interpretation of the thermoscope.  Balls with liquids of different densities and different volume fractions float in a column of water. The density of the water varies with the temperature, leading some balls to fall to the bottom (if too dense) or raise to the top (if not dense enough).  The color of the floating balls is an indicator of the temperature. The coldest temperature detectable is when even the blue ball is floating. The warmest when even the red ball has sunk to the bottom. From low to high, temperatures go as:

.

<center>
<table>
    <tr><td> blue </td><td> yellow </td><td> cyan </td><td> black </td><td> red </td></tr>
</table>
</center>


While ordinal data can be ranked, **differences do not have an objective meaning**.  The difference between the red ball sinking versus the black ball sinking cannot be assigned a clear value and it will change for two different thermoscopes. Indeed, in another thermoscope the order of the colors could be different.

Notice that even if we assign numerical values to those temperature:

<table>
    <tr><td> blue </td><td> 0 </td></tr>
    <tr><td> yellow </td><td> 1 </td></tr>
    <tr><td> cyan </td><td> 2 </td></tr>
    <tr><td> black </td><td> 3 </td></tr>
    <tr><td> red </td><td> 4 </td></tr>
</table>

**we cannot state that the difference in temperature between red and black is equal to the difference in temperature between black and cyan**.  We also cannot state that the difference in temperature between red and yellow is three times the difference in temperature between black and cyan.


# Interval data

Ordinal data cannot be used for comparison across samples.  If I take temperature measurements with Galileo's thermoscope inside my home and you take temperature measurements with the other thermoscope inside your home, how would we be able to compare the temperatures inside our respective homes? 

We wouldn't! For that we would need **scales with meaningful differences**.  Type of data with meaninful differences are called **interval** because the intervals between different values have an objective meaning.  Consider the two most widely used temperature scales in use currently: **Celsius** and **Fahrenhein**. 

<img src = 'Images/thermometer.png' width = 120>

Two thermometers using the same scale will yield identical measurements (within the uncertainty and operating ranges of the instruments). Moreover, two thermometers using **different scales** will also yield identical measurements (within the uncertainty and operating ranges of the instruments).

<br>
<center> $ F = \frac{9}{5} C + 32$ </center>
<br>

A difference of 5$^o$C corresponds to a difference of 9$^o$F. 

Constructing an interval measurement scale is far from trivial.  **Getting these two scales to work took over a hundred years**.  The reason is that you need two things in order to build an interval measurement scale

> two fixed points for the scale
>
> something that responds **linearly** to changes in the quantity being measured



## The fixed points

The fixed points are critical in order to be able to compare measurements made with different instruments and/or scales.  Nowadays, it is accepted that the freezing and boiling temperatures of pure water at 1 atmosphere of pressure provide the fixed points for the thermometers most of us interact with.

This was not always evident.  Fahrenhein used for one of the fixed points the temperature of a mixture of ice and salt that he kept in the basement of his home, and for the other the temperature of a healthy cow. Clearly this does not provide a replicable set of fixed points.  We could not have everyone visiting Fahrenhein's basement and cow in order to set the scale of their thermometers.

Moreover, even though using the freezing and boiling temperatures of pure water at 1 atmosphere of pressure seems easy enough it was far from easy.  Just to give an idea of the struggle, consider the determination of the boiling temperature of pure water.  We all now know that the boiling temperature of water depends on altitude from sea level. High in the Colorado mountains, water boils at a lower temperature that in the flats of Illinois.  Makers of thermometers quickly learned how to correct for that.

A much bigger problem is: **what does it mean for water to be boiling?**

[This interesting blog](https://www.goldenmoontea.com/blogs/tea/106687623-the-5-different-stages-of-boiling-water-and-how-the-chinese-use-them-for-tea) discusses how to tell whether water is boiling adequately for a certain type of tea.


<table>
    <tr>
        <td> <img src = 'Images/boiling1.png' width = 250> 
        <td> <img src = 'Images/boiling2.png' width = 390> </td>
    </tr>
</table>

Which of those would you define as boiling for the purpose of setting a fixed point for a thermometer? And where do you place the thermometer? in the water?... in the vapor?...

And how do you boil water? in a glass flask?... a metal pot?... a ceramic pot?...

An how pure should the water be? should it be distilled water?... should dissolved air be removed?...

It turns out that all of those things matter.  Water with too much air dissolved will boil at lower temperatures. Water too pure boiling in a strictly clean glass flask will boil at much higher temperatures. The water vapor's temperature will raise above the temperature of the boiling water.

To answer all of these questions, the Royal Society convened a committee of eminent scientists lead by Cavendish to address this challenge and recommend a protocol to be followed.


## Linear response

Another critical aspect of good measurement instrument is that its output changes linearly with the quantity being measured.  Getting back to our example of a thermometer.  In the first thermometers, the material whose response to changes in temperature was being used was typically some liquid mixture that included some alcohol.  For a constant mass of material, the density is inversely proportional to the volume of the material:

<br>
<center>
$ V(T) \propto \frac{1}{\rho(T)}$.
</center>
<br>

And the assumption of linear response yields 
<br>
<center>
$ h(T) = \frac{V(T)}{A} = \alpha_1 T + \alpha_0$.
</center>
<br>    
The two plots below show the density responses of pure ethanol and pure mercury to changes in temperature.

<table>
    <tr>
        <td> <img src = "Images/ethanol_rho_vs_T.png" width = 400> </td>
        <td> <img src = "Images/mercury_rho_vs_T.png" width = 400> </td>
    </tr>
</table>

It is visually apparent that already for a temperature of 200$^o$F, ethanol's response is no longer linear. In contrast, mercury's response remains linear until about 500$^o$F.  That is why 15$^{th}$ and 16$^{th}$ century scientists quickly realized that good thermometers would have to be made using mercury.

However, even mercury's response does not remain linear for all temperatures. One of the first great achievements of Thermodynamics was the ideal gas law. In its simplest form, it specifies that for an ideal gas the volume is strictly proportional to temperature:

<br>
<center>
$ V \propto T$.
</center>



# Ratio data

Interval data is great because differences between measurements have an objective value and can be easily translated across unit systems.  They suffer, however, from an important weakness.  Measurement ratios have no meaning. Consider again the two most popular temperature scales: Celsius and Fahrenhein.  

40$^o$C equals 104$^o$F.  

20$^o$C equals 68$^o$F.

In Celsius, 40$^o$ is twice 20$^o$, however, 104$^o$F is not twice 68$^o$F.  **Where is this difference coming from?** 

<br><br><br><br>


<br><br><br><br>


<br><br><br><br>


<br><br><br><br>


<br><br><br><br>


<br><br><br><br>


Yes, it comes from the constant in the conversion equation between Celsius and Fahrenhein. The fact that we need to include an additive constant in the equation means that the zero point of these scales was chosen arbitrarily, that **neither of them is a true zero**.

For contrast, consider another measure with a true zero: money. If I have 2 dollars and you have 1 dollar, then I have double the amount of money you do, regardless of whether the amount is expressed in US dollars, EU euros, UK pounds. or something else.

Similarly, if I have two pounds of apples and you have only one pound of apples, I have twice as many as you regardless of the unit system.

So, what about temperature? Is there no way to create a ratio measure?

That is what the **Kelvin** scale provides.  Zero Kelvin is a true zero, so when expressing temperatures in Kelvins we can trust ratios.  If you studied Thermodynamics, you will recall that the second law limits the maximum attainable efficiency $\eta_{\rm max}$ for an heat engine connecting two reservoirs at temperatures $T_h$ and $T_l$:

<br>
<center>
    $\eta_{\rm max} = 1 - \frac{T_l}{T_h}$.
</center>
<br>

**This expression is only valid when temperatures are expressed in Kelvin as that is the only temperature system for which the ratio will have an objective meaning**.
    

# The dangers of ignoring the type of data we have

These definitions are not mere sophistry.  Understanding what type of measurements we are dealing with is critical in order for us to know what kind of operations are permissible with our data.  Let's consider a couple of examples.



## Survey instruments

If you teach or study at university in the US, you are likely familiar with instructor evaluations. At the end of a course, students are asked to rate the quality of the instruction they received.  Whether the options are descriptive $-$ Poor, Fair, Good, Very Good, Excellent $-$ or numerical $-$ 1 to 5 $-$ they are almost universally converted to numeric values on which things such as averages and standard deviations are calculated.

**What could be the problem with this?**

A priori, there is nothing really wrong with this.  The answers to the survey are clearly an ordinal measurement, so we can definitely convert them into numbers.  Once we have numbers, we can definitely perform mathematical operations with them. The question is: **what types of operations yield meaningful results?**

Imagine I receive an average score of 4.8 in `enthusiasm` and a score of 3.2 in `clarity`. Can I conclude that my `enthusiasm` score is 50% higher than my `clarity` score? 

> **NO, I CANNOT!**  My measure is not ratio so **I cannot assign meaning to ratios of values.**

Can I, instead, conclude that my `enthusiasm` score is 1.6 higher than my `clarity` score?


> **NO, I CANNOT!**  My measure is not interval so **I cannot assign meaning to differences of values.**


Can I, at least, conclude that my `enthusiasm` score is higher than my `clarity` score?

> **YES, I CAN!** But only if I have enough data that the difference is statistically significant.


Now imagine that another instructor in another department teaching a different course received an average score of 4.0 in `clarity`. Can I conclude that their `clarity` score is higher than mine?

**You tell me!**


## Differentially expressed genes

If you study molecular biology, you have almost certainly encountered studies reporting on differentially expressed genes under the effect of some intervention. 

For concreteness, I will focus on the measurement of gene expression using microarray platforms.  For full details, you can take a look at [this paper](https://www.pnas.org/doi/10.1073/pnas.1000938107).

<img src = 'Images/microarray_schematic.png' width = 600>

As illustrated in the figure above, a microarray is a glass surface where at specified positions (usually set up as a square lattice) there are nucleic acid polymers that act as probes for specific genes.  A pre-processed biological sample is then drop over the chip and and allowed to bind to the probe polymers.  After a wash, in which not correctly binded material is supposed to be washed away, the sample is exposed to a laser and fluorescent molecules attached to the biological material emit light that is capture and quantified in a photomultiplier device.

$F_i$ quantifies the measured intensity of emitted light for spot $i$ in the chip.  $B_i$ measures the measured intensity of emitted light for the probe-less glass region near spot $i$.

In the paper above, we argue that

<br>
<center>
    $F(i) = E(i)~A~e^{\nu_{sp}(i)} + U~e^{\nu_{nsp}(i)}$,
</center>
<br>

where $U$ and $A$ are just some constant for the experiment and $\nu$ are Gaussian random variables with zero mean and some variance.

Typically, one considers the log-transformed version of these quantities:

<br>
<center>
    ${\rm log} ~F(i) = {\rm log} \left[ E(i)~A~e^{\nu_{sp}(i)} + U~e^{\nu_{nsp}(i)} \right] $,
</center>
<br>

We can pull out the first term, and get

<br>
<center>
    ${\rm log} ~F(i)  = {\rm log} ~E(i) + {\rm log} \left[ A~e^{\nu_{sp}(i)} \right] + {\rm log} \left[ 1 + \frac{U~e^{\nu_{nsp}(i)}}{E(i)~A~e^{\nu_{sp}(i)}} \right] $,
</center>



For very large expression levels, $E \gg U/A$, the final terms reduces to ${\rm log}~1 = 0$ and we are fine as long as we put the biological samples for the two conditions through the same analysis since the second terms on the right hand side are nearly equal for every gene and we do not need to worry about estimating the parameters in the equation.  However, for small expression levels, the non-linear term is unavoidable and the ratio between expression levels is no longer linearly related to the ratio of measured intensities.
