# Week 3-4: Probability as a Frequency

<font size="6"> Laboratory 2 </font> <br>
<font size="3"> Last updated April 24, 2023 </font>

## <span style="color:orange;"> 00. Content </span>

<font size="5"> Mathematics </font>
- sample spaces and events
- conditional probability
- binary numbers
     
<font size="5"> Programming Skills </font>
- functions
- data visualization
    
<font size="5"> Embedded Systems </font>
- Thonny and MicroPython

## <span style="color:orange;"> 0. Required Hardware </span>

- Microcontroller: Raspberry Pi Pico
- Breadboard
- USB connector
- Camera (Arducam HM01B0)

<h3 style="background-color:lightblue"> Write your name and email below: </h3>

**Name:** me 

**Email:** me @purdue.edu

In [None]:
import numpy as np                      
import matplotlib.pyplot as plt       

## <span style="color:orange;"> Probability as a Frequency </span>

The __sample space__ of an experiment is the set of all possible outcomes, and an __event__ is any subset of the sample space. An event occurs if the outcome of an experiment is contained in the subset defining the event. When we view probability as a frequency, the probability of an event is the relative frequency that the event occurs over many repetitions of the experiment.

The Raspberry Pi Pico has its own internal temperature sensor that returns an integer between 0 and 65535. In this example, our experiment is measuring the internal temperature of the Pico, and the sample space is the set $\{0,1,2,\dots, 65535\}$. We can easily compute the temperature in Celcius from the sensor reading $r$ using the formula $$ T_C =  27 - \frac{\frac{3.3r}{65535} - 0.706}{0.001721}.$$

### <span style="color:red"> Exercise </span>

Let $E$ be the event that the Pico's internal temperature is between 70-72 degrees Fahrenheit. What subset of outcomes from the sample space $\{0,1,2,\dots, 65535\}$ define event $E$?

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>


The Raspberry Pi Pico also has 2MB onboard memory or flash storage, so we can save some small files of sensor data directly on the Pico. Download [pico_temperature.py]__(add link)__ from GitHub. The purpose of the script is to take 12 temperature readings in total measuring once every $0.25$ seconds and save them to a file called `temp.txt` on the Pico.

You only need to place the Pico in the breadboard and connect it to your computer with the USB cable. Open Thonny, run `pico_temperature.py`. After the script is done running, move the `temp.txt` file that is stored on the Pico into the folder you are currently working in, and delete `temp.txt` from the Pico’s storage. Modify `pico_temperature.py` as needed to take more measurements or change how often measurements are taken. 

### <span style="color:red"> Exercise </span>

__Part 1:__ Collect 1,000 temperature measurements from the Pico with at least 0.25 second between measurements. Display a bar plot of the values in `temp.txt`. Include descriptive axis labels and a title. 

__Part 2:__ What value occurs the most frequently in `temp.txt`? What is the relative frequency of that value?

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>

In [None]:
data = np.loadtxt("temp.txt", dtype=int)    # make sure temp.txt is in the same directory as this notebook file
                                            # dtype stands for data type, every line in the txt file will be read as an integer

### <span style="color:red"> Exercise </span>

Find the mode $m$ (i.e., the value that occurs the most) of the first 11 temperature measurements. 
Define $E$ as the event that the temperature measurement is equal to $m$. The complement of event $E$ is $E'$, i.e., when the measurement is different than $m$.

After the first 11 readings, how frequently does the event $E$ occur? How frequently does the event $E'$ occur?

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>

### <span style="color:red"> Exercise </span>

Let's only consider the first digit of the temperature measurement. Plot a bar graph of the frequency that each digit 0-9 appears.

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>



### <span style="color:red"> Exercise </span>

Repeat the previous exercise but with the second digit instead of the first, and again with the third digit, fouth digit, and fifth digit. You will have 4 bar charts.

Which digit place has the most variability in your set of measurements?

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>

## <span style="color:orange;"> Conditional Probability </span>

For two events $E$ and $F$, the probability of event $E$ occuring given that event $F$ has occured is a __conditional probability__. With our view of probability as a frequency, this means that the probability of $E$ given $F$ is the relative frequency of $E$ occuring in the sample space defined by $F$, i.e., $$ P(E\mid F) = \frac{P(E \cap F)}{P(F)}. $$

### Binary Numbers

For reasons that will be clear in the next section, let's take a detour into binary numbers.

We are accustomed to the base-10 number system. We can break down numbers like $9234$ as
\begin{align*}
9234 &= (9\times 1000) +( 2\times 100) + (3\times 10) + (4\times 1) \\
    &= (9\times 10^3) + (2\times 10^2) + (3\times 10^1) + (4\times 10^0)
\end{align*}

Similarly, in base-2 (or binary). The binary number 1101 in base-10 is
\begin{align*}
1101 &= (1\times 2^3) + (1\times 2^2) + (0\times 2^1) + (1\times 2^0) \\
    &= (1\times 8) + (1\times 4) + (0\times 2) + (1\times 1) \\
    &= 13
\end{align*}

In python, the prefix `0b` tells the computer to interpret the sequence as base-2. To do the reverse, call the `bin` function as demonstrated below.

In [None]:
x = 0b1101
print(x)

y = 13
print(bin(13))

# you can use underscores to make numbers more readable without affecting the value
z = 0b_100_001_111_111_001_100_111_001_110_001_0101
w = 9_123_456_789
print(z == w)

### <span style="color:red"> Exercise </span>

What are the following binary numbers in base-10?
- 110
- 010
- 101011
- 110000

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>

### <span style="color:red"> Exercise </span>

__Part 1:__ What range of integers can we represent using 16 bits?

__Part 2:__ What is the minimum length of a random binary sequence needed to get a random number between 0 and 100?

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>

---
**NOTE**

This is a 2-week lab. Turn in the exercises above. Pick up from here during the next lab session.

--- 

Identifying textures in images is an important task in many fields like medical imaging, remote sensing, and facial detection.
There is a simple but powerful method called local binary patterns (LBP) that is commonly used to classify textures when used together with some machine learning algorithms.

We are going to work with the grayscale images in this [folder](**add link**). Download the folder of images from GitHub. 

For an  $m \times n$ grayscale image, there are $m$ rows of pixels and $n$ columns of pixels. Each pixel describes the intensity of the image with an integer between 0 and 255 (0 for black and 255 for white). 

See the cell below on how to read in and display grayscale images in Python. 

In [None]:
from PIL import Image                                   # needed for reading images

img = np.array(Image.open('textures/texture_1.jpg'))    # read in the image and store it as a numpy array
print(f'image size is {img.shape}')
fig, ax = plt.subplots(figsize=(12,6))                  # create figure and set figure size
ax.imshow(img, cmap='gray', vmin=0,vmax=255)            # display the image in grayscale between 0 and 255
# ax.axis('off')                                        # uncomment this line to hide the axes
plt.show()

### <span style="color:red"> Exercise </span>

Complete the following steps of LBP for each image in the textures folder.

**1.** For each pixel, look at the values of its 8 neighboring pixels.

> Ex:
>|   |     |    |
>| --- | --- | ---| 
>| 10 | 15 | 12 |
>| 19 | 12 | 11 |
>| 20 | 16 | 12 |

**2.** Starting at the pixel directly above the center pixel, move clockwise around the center to create a binary sequence of length 8. Assign '0' if the neighbor's value is greater than the center and '1' otherwise. 

> Using the example above, we start at the center value of 12 and then start above at the 15 so the binary sequence is `'0'` so far. 
Then we move to the top right at 12. Now the binary sequence is `'01'`. If we keep going, the final binary number is `'01110001'`.

**3.** Convert the 8-bit sequence to an integer in base-10.

> For our example, we get an LBP value of 113.

**4.** Plot the frequency that each LBP value 0-255 appears.

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>

In [None]:
# remove solution
from skimage import feature

lbp = feature.local_binary_pattern(img, 8, 1, method="default")

values, counts = np.unique(lbp.ravel(), return_counts=True)
plt.bar(values, counts)
plt.show()

### <span style="color:red"> Exercise </span>

The maximum possible LBP value is 255 and it seems to come up a lot. 
If we only consider pixels that result in an LBP value of 255, what grayscales values (i.e., the value of the center pixel) are the most common?

Use a bar plot to summarize your answer. Remember to label the axes and title the plot.

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>

In [None]:
# remove solution
dict = {i:0 for i in range(256)}
for i in range(img.shape[0]):
    for j in range(img.shape[1]):
        if lbp[i,j] == 255:
            dict[img[i,j]] += 1

plt.bar(dict.keys(), dict.values())
plt.show()

**add instructions for wiring the camera and add code from calc 3 labs on capturing a still image**

### <span style="color:red"> Exercise </span>

Connect the camera to the Pico and take at least 3 images of different surfaces (e.g., the desk, floor, your phone case, your hand, etc.) 

Perform LBP on your images. Do you think that the frequency of LBP values that appear are very different from image to image?

Can you think of a (mathematical) rule to distinguish between two different textures based on LBP?

<h3 style="background-color:lightblue"> Write Answer for Exercise Below </h3>

## <span style="color:green;"> Reflection </span>


__1. What parts of the lab, if any, do you feel you did well? <br>
2. What are some things you learned today? <br>
3. Are there any topics that could use more clarification? <br>
4. Do you have any suggestions on parts of the lab to improve?__

<h3 style="background-color:lightblue"> Write Answers for the Reflection Below </h3>