In [6]:
import math
import numpy as np

# Basic Statistics

Throughout this book, I'll use an example of an apple farmer measuring things related to apples.
This may not be a meaningful use of statistics to anyone reading this (I don't really expect this book to catch on in the farming community, but you never know).
However, it should be simple and intuitive, and that is the goal here.

## Attributes and Statistics

So, imagine yourself an apple farmer.
You've got many trees with many apples.
If you pick an apple, you can measure different things about it.
You could weigh it, you could measure its height, or its circumference.
We will refer to these as **attributes** of the apple.  

> **Attribute**: A measurable property of an object

After taking measurements of your apple, you might assign some sort of quality rating based on the numbers you came up with.
Perhaps "bigger is better," so you simply add the height, weight, and circumference to get the "quality."
This would be a kind of **statistic**

> **Statistic**: A mathematical summary of one or more **attributes** across one or more objects.

By adding the values of three attributes, you got a single number to describe your object (the apple).
By the definition above, this "quality" value is a statistic.
However, this case of summarizing multiple attributes for a single object is uncommon.
Instead, we usually focus on statistics that summarize a single attribute across many objects.

For example, suppose one of your trees has 100 apples.
You could pick all 100 apples, and measure the weight of each.
In this case, it would be nice to summarize the weights within this **population** of apples.

> **Population**: A collection of objects sharing some property of interest

## Populations and Samples

## Distributions


## Measures of Center


There are many ways to summarize a single attribute across a population.
One category is called "measures of center."
These are statistics that try, in some way, to represent a "typical" value in the population.

Before we discuss these measures, we should have an example to work with.
To keep things simple, let's assume one of your trees had only 15 apples, which you picked and weighed.
The masses are given below:

<table>
  <tr>
  Masses of apples (g)
  </tr>
  <tr>
    <td>180 g</td> <td>183 g</td> <td>191 g</td> <td>191 g</td> <td>191 g</td>
  </tr>
  <tr>
    <td>192 g</td> <td>203 g</td> <td>209 g</td> <td>211 g</td> <td>212 g</td>
  </tr>
  <tr>
    <td>217 g</td> <td>223 g</td> <td>224 g</td> <td>224 g</td> <td>229 g</td>
  </tr>
</table>

Now, a simple way to choose a "typical" value would be to count the number of times each different value appears in your population.
Then you could report the most-common value as your statistic.
We call this statistic the **mode**.

> **Mode**: A statistic whose value is the most-common attribute value in the population

In the example above, the value 191 appears three times, 224 appears twice, and each other value appears only once.
Thus, the mode for our example population is 191 g.

Another simple option would be to lay out all the values for a given attribute, sorted from smallest to largest, and choose the middle value as the statistic value.
We call this statistic the **median**.

> **Median**: A statistic whose value is the middle value of the attribute across the population

In our example, the masses are already listed in sorted order, so the middle value for the population is at the center of the table; the median for our population is 209 g.

In [7]:
import ipywidgets as widgets
from IPython.display import display
test_slider = widgets.IntSlider()
display(test_slider)

IntSlider(value=0)

In [8]:
from basics_py import harvest_tree

n_col = max(math.floor(math.sqrt(test_slider.value)), 1)
masses = harvest_tree(num_apples=max(test_slider.value, 1))
for i in range(math.ceil(len(masses)/n_col)):
    print(str(masses[n_col*i:n_col*(i+1)]))

n_row = math.ceil(test_slider.value / n_col)
# print(np.reshape(np.sort(masses), (n_row, n_col)))

[211]


In [9]:
test_slider_2 = widgets.IntSlider()
test_slider_3 = widgets.ColorPicker()
# label = widgets.Label("Test")
box = widgets.Stack([test_slider_2, test_slider_3])
display(box)

Stack(children=(IntSlider(value=0), ColorPicker(value='black')), titles=('', ''))

## Measures of Spread