In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab04.ipynb")

# Intro to Numpy 
Welcome to Lab 4 of DATA 271! In this lab we will get practice with the Numpy library. This lab contains small tasks ("appetizers") for you to make sure you understand the examples. The culminating task ("main course") at the end of the document is more complex, and uses most of the topics you have will have worked through.


## Overview
Numpy (which stands for numerical Python) is one of Python's most vital libraries for data science.  Its key data format is the array (ndarray), which is useful for numerical and scientific computational tasks.  The ndarray is a multidimensional array which provides fast array-oriented arithmetic operations (without having to write loops).  Computations performed this way are called vectorized.  This results in concise code that is easy to read, as well as speed compared to element-by-element computation.

Having familiarity with array-oriented semantics will help us use future tools (like Pandas) more effectively.  Numpy is the foundation for nearly all numerical libraries for Python. 

The main areas of functionality we will focus on are
- fast vectorized array operations for data munging and cleaning, subsetting, filtering, and transforming
- common array algorithms such as sorting, unique and set operations
- using descriptive statistics and aggregating/summarizing data
- group-wise data manipultions

Numpy also has statistical functions, random number functions, a linear algebra library, and other functionality.

## Appetizers

**Question 0:** Import the numpy module. 

#### 1. Creating arrays explicitly
We can explictily generate small arrays using the `np.array` function.  The function takes a nested list where each element in the outer list contains the entries for a row in the array.  For larger arrays, it would be tedious to enter data by hand.

**Question 1:** Create a 2D array \begin{bmatrix}
1 & 6 & 3\\
0 & 2 & 1
\end{bmatrix}
and determine the number of dimensions, the shape, and the size using array attributes. 

In [None]:
my_array = ...

number_of_dimensions = ...
shape_of_my_array = ...
size_of_my_array = ...

In [None]:
grader.check("q1")

#### 2. Creating arrays that follow specific rules
Numpy has a number of functions to create arrays of a specific type. Some examples include:
- `np.zeros((m, n))` creates an array of size m x n with all zeros
- `np.ones((m, n))` creates an array of size m x n with all ones
- `np.full((m, n), i)` creates a m x n array with i in each entry
- `np.eye(n)` creates a n x n array with ones on the diagonal and zeros elsewhere (the identity matrix)
- `np.linspace(i, j, n)` creates an array of n evenly divided values between i and j
- `np.arange(i, j, s)` creates an array of values from i up to but not including j with step size of s

**Question 2:** Create the following arrays:
- Assign `A` to a 2 x 3 array filled with the number 7 (int)
- Assign `B` to a 4 x 4 identity array (ones on the diagonal and zeros everywhere else). Make each entry an int. 
- Assign `C` to a one-dimensional array with 5 numbers total which are evenly spaced and with first entry a 2 and last entry a 3
- Assign `D` to a one-dimensional array with the integers 0, 1, 2
- Assign `E` to a 4 x 4 array with the integers 0, 5, 10, 15 on the diagonal and zeros elsewhere.  
Feel free to use online documentation.  If there are multiple ways to create the same array, demonstrate several.

In [None]:
# option 1
A = ...
A
# option 2
A = ...
A

In [None]:
B = ...
B

In [None]:
C = ...
C

In [None]:
# option 1
D = ...
D
# option 2
D = ...
D

In [None]:
# option 1
E = ...
E
# option 2
E = ...
E

In [None]:
grader.check("q2")

#### 3. Indexing and Slicing
Elements and subarrays of NumPy arrays are accessed using the standard square bracket notation that is also used with Python lists.  In general, the expression in the bracket is a tuple where each item in the tuple is a specifiction of which elements to access from each axis (dimension) of the array.

**Question 3.1:** Create a one dimensional array `array1` with 5 evenly spaced entries starting at 2 and ending at 14 (floats). 

In [None]:
array1 = ...
array1

In [None]:
grader.check("q3_1")

**Question 3.2:** Select the third element of `array1`.

In [None]:
third_element = ...
third_element

In [None]:
grader.check("q3_2")

**Question 3.3:** Select the second to last element of `array1`.

In [None]:
second_to_last = ...
second_to_last

In [None]:
grader.check("q3_3")

**Question 3.4:** Select from the second element to last element of `array1`.

In [None]:
second_element_to_last = ...
second_element_to_last

In [None]:
grader.check("q3_4")

**Question 3.5:** Reverse the order of `array1`.

In [None]:
reversed_array1 = ...
reversed_array1

In [None]:
grader.check("q3_5")

#### 4. Views vs copies
Subarrays extracted from arrays using slicing and indexing are alternative views of the same underlying array data (they are arrays that refer to the same data in the memory as the original array).  If elements in views are assigned new values, the values of the original array are updated.  Be aware of this.

If you would prefer to have a copy rather than a view (so you don't overwrite original data), you can use the `.copy()` method.  Then changes to the copy do not affect the original array.

**Question 4.1:** Create an array `array2`, \begin{bmatrix}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15
\end{bmatrix}

In [None]:
array2 = ...
array2

In [None]:
grader.check("q4_1")

**Question 4.2:** Make a copy of `array2` called `array2_copy`. 

In [None]:
grader.check("q4_2")

**Question 4.3:** In `array2_copy`, replace the element in the first row and first column with the number $27$.

In [None]:
grader.check("q4_3")

**Question 4.4:** Extract a subarray from `array2_copy` by taking every other element in the second and third rows.

In [None]:
subarray = ...
subarray

In [None]:
grader.check("q4_4")

**Question 4.5:** Update the element in the second row, second column of `subarray` to 12. 

In [None]:
grader.check("q4_5")

**Question 4.6:** Verify that the original array has not been modified. 

In [None]:
grader.check("q4_6")

#### 5. Fancy and Boolean Valued Indexing
Fancy indexing allows us to index an array with another NumPy array or a Python list.  We can also index with boolean values.
In these instances, the array returned is not a view but a new, independent array.

**Question 5.1:** Create the following one-dimenstional array `array3` containing $3,4,6,10,24,89,45,43,46,99,100$. Use Boolean masking to extract all the numbers that are not divisible by 3. 

In [None]:
array3 = ...
not_div_3 = ...
not_div_3

In [None]:
grader.check("q5_1")

**Question 5.2:** Using `array3` and Boolean masking, extract the numbers which are divisible by 5. 

In [None]:
div5 = ...
div5

In [None]:
grader.check("q5_2")

**Question 5.3:** Using `array3` and Boolean masking, extract the numbers which are divisible by 3 and by 5. 

In [None]:
div3_and_5 = ...
div3_and_5

In [None]:
grader.check("q5_3")

**Question 5.4:** Using `array3` and Boolean masking, update the values that are divisible by 3 in the original array to 42.

In [None]:
array3

In [None]:
grader.check("q5_4")

#### 6. Reshaping and Resizing Arrays
When working with data in arrays, it can be useful to rearrange the arrays and alter the way they are interpreted.  For example, an $N \times N$ array can be rearranged into a vector of length $N^2$ or several vectors can be concatenated into a longer vector or stacked into a matrix.  Reshaping an array does not modify the underlying array data, and produces a view of the array (if a copy is needed, use `np.copy()`).  It is necessary that the requested new shape match the number of elements in the original array.
The `ravel()` function is a special case of reshape which returns a flattened one dimensional array.
The functions `vstack()` and `hstack()` allows the joining of arrays either vertically or horizontally.

**Question 6.1:** Use array methods to reshape the array  `[1,2,3,4,5,6,7,8,9,10,11,12]` into the 2d array below. 

\begin{bmatrix}
1 & 4 & 7 & 10 \\
2 & 5 & 8 & 11 \\
3 & 6 & 9 & 12
\end{bmatrix}

In [None]:
orig_array = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
reshaped_array = ...
reshaped_array

In [None]:
grader.check("q6_1")

<!-- BEGIN QUESTION -->

**Question 6.2:** Is it possible to reshape the previous array into an array with 2 rows and 5 columns? Why or why not?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 6.3:** Create the array below and then flatten it to a one dimensional array containing the numbers 1 through 15 in order. 

\begin{bmatrix}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15
\end{bmatrix}

In [None]:
big_ndarray = ...
flattened_array = ...
flattened_array

In [None]:
grader.check("q6_3")

**Question 6.4:** Create an array called `base_array`, $[1, 2, 3]$ and use array operations to build the following matrices:  
`m1` = 
    $\begin{bmatrix}
1 & 2 & 3\\
1 & 2 & 3 \\
1 & 2 & 3 
\end{bmatrix}$ 

`m2`=
$\begin{bmatrix}
1 & 1 & 1\\
2 & 2 & 2 \\
3 & 3 & 3 
\end{bmatrix}$

In [None]:
base_array = ...
m1 = ...
print(m1)
m2 = ...
print(m2)

In [None]:
grader.check("q6_4")

#### 7. Sorting arrays

**Question 7.1:** Sort each column of the array below.

$\begin{bmatrix}
4 & 2 & 3 & 4 & 5\\
9 & 7 & 8 & 1 & 10\\
11 & 12 & 13 & 15 & 14
\end{bmatrix}$

In [None]:
given_array = ...
sorted_columns = ...
sorted_columns

In [None]:
grader.check("q7_1")

**Question 7.2:** Sort each row of the array below.

$\begin{bmatrix}
4 & 2 & 3 & 4 & 5\\
9 & 7 & 8 & 1 & 10\\
11 & 12 & 13 & 15 & 14
\end{bmatrix}$

In [None]:
sorted_rows = ...
sorted_rows

In [None]:
grader.check("q7_2")

**Question 7.3:** Sort all elements of the array below. (Return a 1d array)

$\begin{bmatrix}
4 & 2 & 3 & 4 & 5\\
9 & 7 & 8 & 1 & 10\\
11 & 12 & 13 & 15 & 14
\end{bmatrix}$

In [None]:
sorted_elementwise = ...
sorted_elementwise

In [None]:
grader.check("q7_3")

**Question 7.4:** Sort the rows of the array below based on the elements in the first column.

$\begin{bmatrix}
3 & 1 & 2 \\
6 & 5 & 4 \\
1 & 7 & 9 
\end{bmatrix}$

*Expected output:*  
$\begin{bmatrix}
1 & 7 & 9 \\
3 & 1 & 2 \\
6 & 5 & 4
\end{bmatrix}$

In [None]:
three_by_three = np.array([[3,1,2],[6,5,4],[1,7,9]])
row_wise_sorting = ...
row_wise_sorting

In [None]:
grader.check("q7_4")

**Question 7.5:** Sort the rows of the array below based on the elements in the first column in descending order.

$\begin{bmatrix}
3 & 1 & 2 \\
6 & 5 & 4 \\
1 & 7 & 9 
\end{bmatrix}$

In [None]:
row_wise_sorting_descending = ...
row_wise_sorting_descending

In [None]:
grader.check("q7_5")

#### 8. Broadcasting
Numpy allows the user to perform element-wise operations on arrays of different shapes by broadcasting them to a common shape. This lets us add a scalar value to each element of an array, or to add two arrays of different shapes by automatically expanding the dimension of the smaller array.

**Question 8.1:** In the Fibonacci Sequence, each number in the sequence is the sum of the two numbers that precede it. The sequence goes: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34,...  

Binet's formula (derived by mathematician Jacques Philippe Marie Binet) is an explicit formula used to find the $n$th term of the Fibonacci sequence. It is given by:

$$F_n=\frac{1}{\sqrt{5}}\left(\left(\frac{1+\sqrt{5}}{2}\right)^n-\left(\frac{1-\sqrt{5}}{2}\right)^n\right)$$

Use NumPy and Binet's formula to make a 1d array containing the first 15 numbers (ints) in the Fibonacci Sequence. 

In [None]:
n = ...
Fn = ...
Fn

In [None]:
grader.check("q8_1")

#### 9. Random Numbers

Numpy provides the random module to work with random numbers from various distributions (e.g., uniform, normal, etc.).

**Question 9.1:** Write a simulation for flipping a fair coin 5000 times to estimate $P(tails)$. 

In [None]:
prob_tail = ...
prob_tail

In [None]:
grader.check("q9_1")

<!-- BEGIN QUESTION -->

**Question 9.2:** In the previous part, do you get the same answer every time? Explain. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 9.3:** Repeat the first part of this problem but include `np.random.seed(1)` at the beginning of your cell.  Run your code several times and revisit the question.

In [None]:
prob_tail_seed = ...
prob_tail_seed

In [None]:
grader.check("q9_3")

<!-- BEGIN QUESTION -->

**Question 9.4:** In the previous part, do you get the same answer every time? Explain. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## Main Course
Air quality is an important factor in public health and environmental studies. One of the key pollutants monitored is PM10 (particulate matter with a diameter of 10 micrometers or less). PM10 comes from sources such as dust, smoke, and vehicle emissions. These tiny particles can be inhaled and may cause respiratory problems, especially for sensitive groups like children and the elderly.

In this exercise, we will analyze daily average PM10 levels recorded at the Jacobs station in Eureka, CA throughout 2024 ([source](https://explore.openaq.org/)). Using NumPy, we will explore and process this dataset to uncover trends and insights.

Run the cell below to import the dataset. This will load three separate NumPy arrays:

-`day`: The day of the month (1–31) for each recorded observation.  
-`month`: The month of the year (1–12).  
-`pm10`: The daily average PM10 concentration (measured in µg/m³). These PM10 measurements coorespond to the month/day in the other arrays

In [None]:
np.set_printoptions(suppress=True)
month = np.loadtxt("month.csv", delimiter=",")
day = np.loadtxt("day.csv", delimiter=",")
pm10 = np.loadtxt("pm10.csv", delimiter=",")

**Question 10.1:** Currently, we have three separate 1D arrays: `month`, `day`, and `pm10`. Combine them into a single 2D NumPy array, where each row represents a single day's data. The first column should contain month data, second column should contain day, and the third column should contain pm10. 

In [None]:
pm10_data = ...
pm10_data

In [None]:
grader.check("q10_1")

**Question 10.2:** On which day of the year was the PM10 level the highest? Your answer should be a 1d array containing the data (month, day, year) for that day.

In [None]:
worst_quality = ...
worst_quality

In [None]:
grader.check("q10_2")

**Question 10.3:** Find the month that had the worst air quality on average. Your answer for `worst_month` should be a single int representing the month that had the highest average pm10 measurement.

In [None]:
unique_months = ...
monthly_avg_pm10 = ...

# Find the month with the highest average PM10
worst_month = ...
worst_month

In [None]:
grader.check("q10_3")

**Question 10.4:** Since PM10 data is recorded daily, we expect 31 days for some months and 30 or 29 days for others. If the data for any day is missing, the difference between consecutive days in a given month will not be 1.

Use NumPy functions to identify the months with missing data (if any). Your final answer should be a 1d array with the month(s) (int) that had missing data. 

In [None]:
day_gaps = ...
months_with_missing = ...
months_with_missing

In [None]:
grader.check("q10_4")

## Dessert (optional) 
Refer to the Lecture 10 demo if you choose to do this one. 
- choose an image for this task (a photo you took, a photo from the web, etc.).
- display the image.
- display its negative.
- display the image rotated 90 degrees clockwise (different than example).
- display the image as a grayscale image.
- display the image in each of its color channels.
- crop part of the image and display it.  (Choose a cropping that is visually interesting based on your image and experimentation.)
- chose one other image and blend the two images.  Play with weights until you are happy with the results.


# You're done!

Gus is so happy you made it to the end! Run the cell below to download the zip and submit to Canvas. 

<img src="gus_a_loaf_of_bread.JPG" alt="drawing" width="300"/>

### References
If you want to read more about these topics, check out
- Numerical Python: Scientific Computation and Data Science Applications wtih Numpy, SciPy and Matplotlib by Robert Johansson
- Data Science with Python: Probabilistic Modeling.  https://www.cdslab.org/python/notes/probabilistic-modeling/random-numbers/random-numbers.html

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)