# Histograms

In [None]:
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter  # Use this if you want percentages on y-axis
import numpy as np

A <span style="color:blue">**histogram**</span> is like a bar graph but without any gaps between consecutive bars.

In [None]:
# The following creates an example histogram of dog weights.

dog_weights = [
    37, 39, 41, 42, 43, 43,
    45, 45, 46, 46, 47, 47,
    47, 48, 48, 50, 50, 51,
    51, 52, 53, 53, 54, 57, 62
]

lcl = [35, 40, 45, 50, 55, 60, 65]  # Manually set the lower class limits

fig, ax = plt.subplots()
ax.hist(dog_weights, bins=lcl, edgecolor='black')
ax.set_xlabel("Dog Weights")
ax.set_ylabel("Frequecy")
ax.set_title("Histogram of Dog Weights")

plt.show()

* Each bar is called a <span style="color:blue">**class**</span> (or <span style="color:blue">**bin**</span>)
* The <span style="color:purple">**lower class limit**</span> of the first class is 35, of the second class is 40, etc.
* The <span style="color:red">**class width**</span> is the difference between consecutive lower class limits (5 in the above histogram)
* Each observed quantitative value is placed into a class or bin.

In addition to lower class limits, we can also use <span style="color:orange">**class midpoints**</span> along our horizontal axis:

|Class|Frequency|Class Midpoint|
|:---:|:---:|:---:|
35 &leq; *x* < 40|2|37.5|
40 &leq; *x* < 45|4|42.5|
45 &leq; *x* < 50|9|47.5|
50 &leq; *x* < 55|8|52.5|
55 &leq; *x* < 60|1|57.5|
60 &leq; *x* < 65|1|62.5|

In [None]:
dog_weights = [
    37, 39, 41, 42, 43, 43,
    45, 45, 46, 46, 47, 47,
    47, 48, 48, 50, 50, 51,
    51, 52, 53, 53, 54, 57, 62
]

bins = [35, 40, 45, 50, 55, 60, 65]  # Lower class limits
class_midpoints = [37.5, 42.5, 47.5, 52.5, 57.5, 62.5]  # Class midpoints

fig, ax = plt.subplots()
ax.hist(dog_weights, bins=bins, edgecolor='black')
ax.set_xlabel("Dog Weights")
ax.set_ylabel("Frequecy")
ax.set_title("Histogram of Dog Weights Using Class Midpoints")
ax.set_xticks(class_midpoints)
plt.show()

## Relative Frequency Histogram

* Create a relative frequency histogram much like how we created a relative frequency bar graph.
* Total *heights* of all rectangles must equal 1.00, or 100\%

In [None]:
dog_weights = [
    37, 39, 41, 42, 43, 43,
    45, 45, 46, 46, 47, 47,
    47, 48, 48, 50, 50, 51,
    51, 52, 53, 53, 54, 57, 62
]

rel_weights = np.ones_like(dog_weights)/len(dog_weights) # Get the relative frequency of each dog weight
bins = [35, 40, 45, 50, 55, 60, 65]  # Lower class limits

fig, ax = plt.subplots()
ax.hist(dog_weights, bins=bins, edgecolor='black', weights=rel_weights)
ax.set_xlabel("Dog Weights")
ax.set_ylabel("Relative Frequecy")
ax.set_title("Relative Frequency Histogram of Dog Weights")

plt.show()

The following will convert the y-axis on the chart above to percents. Simply add the following information:

* Include the `100 *` in the `rel_weights` calculation.
* Include the `ax.yaxis.set_major_formatter(PercentFormatter())` before `plt.show()`

In [None]:
dog_weights = [
    37, 39, 41, 42, 43, 43,
    45, 45, 46, 46, 47, 47,
    47, 48, 48, 50, 50, 51,
    51, 52, 53, 53, 54, 57, 62
]

rel_weights = 100 * np.ones_like(dog_weights)/len(dog_weights) # Get the relative frequency of each dog weight
bins = [35, 40, 45, 50, 55, 60, 65]  # Lower class limits

fig, ax = plt.subplots()
ax.hist(dog_weights, bins=bins, edgecolor='black', weights=rel_weights)
ax.set_xlabel("Dog Weights")
ax.set_ylabel("Relative Frequecy")
ax.set_title("Histogram of Dog Weights")
ax.yaxis.set_major_formatter(PercentFormatter())  # this is what you need

plt.show()

## Density Histogram

* Similar to a relative frequency histogram but the total <span style="color:red">***area***</span> of all rectangles must equal 1.

* We will see these ***a lot*** with probability distributions later.

In [None]:
# For density histograms, we can add `density = True` argument to the ax.hist function.

dog_weights = [
    37, 39, 41, 42, 43, 43,
    45, 45, 46, 46, 47, 47,
    47, 48, 48, 50, 50, 51,
    51, 52, 53, 53, 54, 57, 62
]

lcl = [35, 40, 45, 50, 55, 60, 65]  # Manually set the lower class limits

fig, ax = plt.subplots()
ax.hist(dog_weights, bins=lcl, edgecolor='black', density = True)  # notice the new `density = True` argument
ax.set_xlabel("Dog Weights")
ax.set_ylabel("Density")
ax.set_title("Density Histogram of Dog Weights")

plt.show()

## Example 1

Run the cell below and then answer the questions that follow.

In [None]:
scores = [
    52, 57, 58, 63, 67, 71, 71,
    72, 72, 73, 73, 75, 76, 76, 77, 
    77, 78, 81, 82, 82, 82, 83, 83,
    84, 85, 85, 86, 87, 87, 87, 88,
    88, 89, 91, 91, 92, 92, 93, 94,
    95, 95, 96, 97, 98
]

lcl = [50, 60, 70, 80, 90, 100]  # Manually set the lower class limits

fig, ax = plt.subplots()
ax.hist(scores, bins=lcl, color='orange', edgecolor='black')
ax.set_xlabel("Score")
ax.set_ylabel("Frequecy")
ax.set_title("Test Scores")
ax.grid(axis='y', linestyle='--')

plt.show()

(a) What is the class width?

(b) What is the midpoint of the 4th class?

(c) How many test scores are shown?

(d) How many students scored 80 or higher?

(e) What is the relative frequency of the 5th class?

### Creating a Histogram Using a Given Class Width

So far, we have manually set the lower class limits of each bin.

Suppose we want to create a histogram with a given width.

To do so, note that

$$
\text{class width} = \frac{\text{range of values on horizontal axis}}{\text{number of bins}}
$$

We will need to know how many bins to include. 

Thus, solving for `number of bins` by multiplying both sides by the denominator and then dividing, we get

$$
\text{number of bins} = \frac{\text{range of values on horizontal axis}}{\text{class width}}
$$

## Example 2

Create a histogram from the measurements below. 

Use the minimum value as the lower class limit of the first class and use a class width of 2.

In [None]:
measurements = [9, 2, 10, 1, 4,
                5, 1, 6, 7, 4,
                6, 5, 4, 8, 10,
                3, 1, 2, 3, 9,
                8, 6, 1, 1, 10]

In [None]:
lcl_1 = np.min(measurements)  # get the lower class limit of the first class (bin)
class_width = 2
# The below will calculate the number of bins needed to guarantee the needed class width
num_bins =  ((np.max(measurements) + class_width) - np.min(measurements)) // class_width 

lcl = [lcl_1 + class_width*i for i in range(num_bins+ 1)]  # generate the list of lower class limits

fig, ax = plt.subplots()
ax.hist(measurements, bins=lcl, edgecolor='black')
ax.set_xlabel("Measurements")
ax.set_ylabel("Frequecy")
ax.set_title("Histogram of Example 2")
ax.set_xticks(lcl)
ax.grid(axis='y', linestyle='--')

plt.show()

## Cumulative Histograms

A <span style="color:blue">**cumulative histogram**</span> is one in which the frequency (or relative frequency) of each class is a running total up to that class.

## Example 3

Create a cumulative frequency distribution histogram of dog weights from Example 1. 

|Class|Frequency|Cumulative Total|
|:---:|:---:|:---:|
35 &leq; *x* < 40|2||
40 &leq; *x* < 45|4||
45 &leq; *x* < 50|9||
50 &leq; *x* < 55|8||
55 &leq; *x* < 60|1||
60 &leq; *x* < 65|1||

In [None]:
dog_weights = [
    37, 39, 41, 42, 43, 43,
    45, 45, 46, 46, 47, 47,
    47, 48, 48, 50, 50, 51,
    51, 52, 53, 53, 54, 57, 62
]

lcl = [35, 40, 45, 50, 55, 60, 65]  # Lower class limits

fig, ax = plt.subplots()
# include `cumulative=True` to get cumulative histogram
ax.hist(dog_weights, bins=lcl, edgecolor='black', cumulative=True)
ax.set_xlabel("Dog Weights")
ax.set_ylabel("Frequecy")
ax.set_title("Cumulative Histogram of Dog Weights")
ax.grid(axis='y', linestyle='--')

plt.show()

The cell below will produce a cumulative density histogram of the dog weights.

In [None]:
dog_weights = [
    37, 39, 41, 42, 43, 43,
    45, 45, 46, 46, 47, 47,
    47, 48, 48, 50, 50, 51,
    51, 52, 53, 53, 54, 57, 62
]

lcl = [35, 40, 45, 50, 55, 60, 65]  # Lower class limits

fig, ax = plt.subplots()
# include cumulative=True to get cumulative histogram
ax.hist(
    dog_weights,
    bins=lcl,
    edgecolor='black',
    cumulative=True,
    density=True
)
ax.set_xlabel("Dog Weights")
ax.set_ylabel("Relative Frequecy")
ax.set_title("Cumulative Relative Frequency Histogram of Dog Weights")

plt.show()