# 🔢 Numerical Python: NumPy (np)

An extremely useful aspect of Python is the ability to import and use libraries. A library in Python is a collection of pre-written code (functions, classes, modules) that someone else has already created.

NumPy is a popular library that helps a lot when it comes to running basic mathematically operations.

In [None]:
%reset -f
# ^Similar to variables, '%reset' will also reset any libraries that you have imported.

# Import library (after '%reset')
import numpy as np    # If we want to call a numpy function, we can do this
                      # using the shorthand np that we defined.
                      # Example: 'np.function_name()'

## Basic Math Operations

**Python** recognizes many arithmetic operations that are quick and easy to use:
* Addition: `+`
* Subtraction: `-`
* Multiplication: `*`
* Division: `/`
* Floor Division: `//` (division, but rounds down to the nearest integer)
* Modulo: `%` (provides the remainder from division)
* Power: `**` (Note: Matlab uses `^` instead)

Additionally, Python has many built-in functions for various mathematical operations:
* `sum(l)`: Returns the sum of the elements in list `l`.
* `min(l)`: Returns the minimum value of the elements in list `l`.
* `max(l)`: Returns the maximum value of the elements in list `l`.
* `len(l)`: Returns the length or number of elements in list `l`.
* `range(n)`: Creates a sequence of integers from `0` to `n-1` with a step size of `1`.
  * Remember that Python goes up to, but doesn't include, the final index, which is why `range(n)` goes to `n-1`.
  * Note that returns a sequence, which is not a list or array.
  * You can also specify `start` (Default = 0) and `step` (Default = 1) when using this function: `range(start,n,step)`.
* `round(n, ndigits)`: Rounds the number `n` to the number of digits specified by `ndigits`.

Due to Pythons versatility, you'll also find that many of these functions will also work with strings and other variable types. **NumPy** was designed specifically to work with numbers, so the equivalent functions (e.g., `np.sum(l)`, `np.min(l)`, `np.max(l)`, `np.size(l)`, `np.arange(start,stop,step)`, `np.round(l)`)—as well as many other functions—run much more efficiently than their Python counterparts.

### 💪 Practice: Understanding the `np.round()` Function
In the textbox below the code cell, explain what the function `np.round()` does and how it works.
* Try it for different values of `n`
* Try it for different values of `ndigits`
* What happens when `ndigits` has a negative value?

In [None]:
# Test out some of the operations and functions described above:
n = (14**3)/17
ndigits = 2

print(n)
print(np.round(n,ndigits))

👋 Double-click here for your response.

## Defining Arrays/Matricies

Where the efficiency of NumPy's functions comes in handy is when you're dealing with large sets of data, known as arrays. Arrays are similar to lists, however, they store numerical information in a manner that is optimized for mathematical operations. Therefore, we will primarily be using arrays (as opposed to lists).


We can use the function `np.array()` to convert a list into an array, as long as all the elements are the same type (typically `int` or `float`). If you have a mixture of `int` and `float` variables in your array, NumPy will default all elements to the `float` variable type.

In [None]:
# Create a 1-D array
l = [1, 2, 3.]
a = np.array(l)

print(l)
print(type(l)) # List

print(a)
print(type(a)) # NumPy array, not a list

This same function (`np.array()`) can also create multi-dimensional arrays (e.g., 2D Matrix).

In [None]:
# Create a 2-D matrix
M = np.array([[1.1, 1.2, 1.3],[2.1, 2.2, 2.3],[3.1, 3.2, 3.3],[4.1, 4.2, 4.3]])
print(M)

## Indexing & Slicing Matricies

Matrix indexing follows the convention [`row`, `column`]

🧠 A trick for remembering this is that when making a list, you usually write your numbers going downwards first (by row):
1. Item 1
2. Item 2
3. Item 3
4. ...

Then if you needed to make a second list, you could write this list next to your original list (by column). And so on.

Therefore (using Python's notation for counting by starting at 0), if you have an $n \times m$ matrix `M`, and you want the second row (`n = 1`) and the third column (`m = 2`), the index would be `M[1,2]`. This index goes down first (row) and then goes out (column).

In [None]:
# Print orignal matrix
print(M)
print("\n")   # This creates a space between M and M[1,2]

#Print index for n=1, m=2 (Remember that Python counting starts as 0)
print(M[1,2])

Slicing works the same as it did for lists/arrays, except now you need to consider rows and columns separately. The convention `[start:stop:step]` still holds, but is now `M[row_start:row_stop:row_step , col_start:col_stop:col_step]`. Remember that `stop` for slicing goes up to but does not include that index.

In [None]:
# Examples of slicing with a matrix
print(M[1:])              # Row 2 (index = 1) to end. All columns included because it is not specified
print('\n')               # This line `\n` adds a space between printed Matricies

print(M[:,1:])            # Column 2 (index = 1) to end. Notice that we need `:` to signify that all rows are included.
print('\n')

print(M[-1:1:-1,::-1])    # Final Column (index = -1) up to but not including Row 2 (index = 2) with steps of -1
                          # All Rows, but reversed.

Just like with lists/arrays, elements of a matrix can be redefined.

In [None]:
# Print original matrix
print('Original Matrix: \n', M)
print('\n')

# Redefine value in row 2, column 2
M[1,1] = 100

# Print revised matrix
print('Revised Matrix: \n', M)

### 💪 Practice: Slicing Matricies
Use the `np.array()` function to create a $4 \times 6$ matrix comprised of whatever values you want. Try printing the following slices of this matrix:
* The element in the 2nd row, 4th column.
* The 3rd row only.
* The 5th column only.
* The 1st and 4th rows, but backwards and excluding the first and second columns.

## Common Functions for Array/Matrix Operations

**NumPy is able to tell you properties about your array (`a`) or matrix (`M`) by using attributes**. Unlike functions, these can be called upon by adding `.attribute` to the end of your array or matrix (e.g., `a.size`, `M.shape`)
* `.shape`: Returns a tuple of the array/matrix dimensions.
  * `.shape[0]`: Returns the number of elements in the first dimension of the array/matrix (e.g., length of a 1D array, number of rows of a 2D matrix)
  * `.shape[1]`: Returns the number of elements in the second dimension of the array/matrix (e.g., number of columns of a 2D matrix)
  * `.shape[n]`: Note that you can have an n-dimensional array, and so this will give the number of elements in the `n-1` dimension of the array.
* `.size`: Returns the total number of elements.
* `.dtype`: Returns the data type of the elements.
* `.T`: Returns the transpose (switches rows and columns) of the matrix.

In [None]:
# Create and Print M
M = np.array([[1.1, 1.2, 1.3],[2.1, 2.2, 2.3],[3.1, 3.2, 3.3],[4.1, 4.2, 4.3]])
print(M)
print('\n')

# Print Various Attributes of M
print('Dimensions of M:',M.shape)
print('# of Rows:',M.shape[0])
print('# of Columns:',M.shape[1])
print('# of Elements:',M.size)
print('Data Type of M:', M.dtype)

**NumPy also has functions that make it easy to create arrays/matricies**.
* `np.zeros(n)`: Returns an array that's length `n` consisting of all `0`'s.
  * This same function can make a $n \times m$ martix: `np.zeros((n,m))`. The input must be a tuple: `(n,m)`.
  * Alternatively, you could also use `np.zeros_like(M)`, which produces a matrix of `0`'s in the same shape as matrix `M`.
  * This is especially useful if you need to create the structure of an array that you fill in later.
* `np.ones(n)`: Returns an array that's length `n` consisting of all `1`'s.
  * This same function can make a $n \times m$ martix: `np.ones((n,m))`. The input must be a tuple: `(n,m)`.
  * Alternatively, you could also use `np.ones_like(M)`, which produces a matrix of `1`'s in the same shape as matrix `M`.
  * This is especially useful if you need to create an array that all has the same number (e.g., `2*ones(n)` gives you an array consisinting all of `2`'s).
* `np.eye(n)`: Returns an $n \times n$ identity matrix (i.e., `1`'s in the diagonal of the matrix starting from the upper left corner).
* `np.arange(n)`: Creates an array from `0` to `n-1` with a step size of `1`.
  * Remember that Python (and NumPy) goes up to, but doesn't include, the final index, which is why `np.arange(n)` goes to `n-1`.
  * Unlike `range(n), this produces an array that you can do math with.
  * You can also specify `start` (Default = 0) and `step` (Default = 1) when using this function: `np.arange(start,n,step)`.
* `np.linspace(start,stop,n)`: Returns an array of length `n` that goes from `start` to `stop` so that all `n` elements are evenly-spaced.
  * By default `n = 50` unless otherwise speficified.
  * Note that `np.linspace(start,stop)` goes up to, and includes, `stop` (unlike most other Python/NumPy functions).
* `np.logspace(start,stop,n,base=10)`: Returns an array of length `n` that goes from `base**start` to `base**stop` so that all `n` elements between `start` and `stop` are evenly spaced.
  * By default `n = 50` unless otherwise speficified.
  * By default `base = 10` unless otherwise speficified. Note to change `base`, you need to put `base = b`, where `b` is the alternative base number that you want.
  * If you want `np.logspace(start,stop,n)` to give you an array that increases by the power of 10 (e.g., 1, 10, 100, 1000 ...), you would need to use the following form: `np.logspace(0,stop,stop+1)`.
* `np.diag([a,b,c],k=0)`: Returns a $n \times n$ matrix, where `n` is the length of the array represented by `[a,b,c]`.
  * `k = 0` means that the elements of the array `[a,b,c]` will be placed directly on the diagonal of the matrix.
  * `k = 1` means that the elements of the array `[a,b,c]` will be placed +1 above the diagonal. Note that this increases the size of the matrix to $(n+|k|) \times (n+|k|)$.
  * `k = -2` means that the elements of the array `[a,b,c]` will be placed -2 below the diagonal. Note that this increases the size of the matrix to $(n+|k|) \times (n+|k|)$.

### 💪 Practice: Make a 1D Array
Create an array that goes from `3` to `33` with a step size of `3`.

In [None]:
np.arange(3,34,3)

### 💪 Practice: Make a Quick Matrix

Create a $5 \times 5$ matrix where all the elements are `5`'s.

### 💪 Practice: Create a Diagonal Matrix

Create a $4 \times 4$ matrix where all the diagonal elements are `4`'s, the above diagonal elements are `5`'s, and the below diagonal elements are `3`'s. **Note**: Use the `np.diag()` function to create three separate $4 \times 4$ Matrices. Then try combining these matricies by adding them together.

# 👾 Matrix Math

All kinds of data can be stored in matricies (later on we'll call these "DataFrames"). Therefore, **matrix math** is important because it allows us to implement operations on data stored in matricies so that we can process our data and conduct various data analyses.

Matrix math is also essential for linear algebra, which is generally the most computationally efficient way to solve systems of coupled equations. Note that Python/NumPy uses a slightly different notation for matrix operations than Matlab.

## Matrix Methods

Some initial methods and functions that are useful include:
* `.sum()`: Returns the sum of the values in the array/matrix.
* `.min()`: Returns the minimum value in the array/matrix.
* `.max()`: Returns the maximum value in the array/matrix.
* `.mean()`: Reutrns the average value in the array/matrix.

For each of the methods above, you can specify in the parentheses if you want to implement the method across all the rows in a column (e.g., `.sum(axis=0)`) or across all the columns in a row (e.g., `.sum(axis=1)`).

💡 A trick for remembering this is that `0` is typically associated with the bottom of something, and so `axis=0` means that doing an operation down (↓) the columns (i.e., applying it to all the rows in each column). Meanwhile, the number `1` looks like a column, which means that `axis=1` is creating a column to the right of our matrix because its doing an operation across (→) the rows (i.e., applying it to all the columns in each row).

In [None]:
%reset -f
import numpy as np

# Define Matrix M
M = np.array([[1,2,3],[4,5,6]])

# Example of using .sum()
M_sum = M.sum()
M_sum_by_col = M.sum(axis=0)
M_sum_by_row = M.sum(axis=1)

#Printing our Results
print("Original Matrix:\n",M)
print('\n')

print("Total Sum:",M_sum)
print('\n')

print("Sum by Column:",M_sum_by_col)
print('\n')

print("Sum by Row:",M_sum_by_row)

In [None]:
# Matrix Summation
print('Original Matrix')
print(M,'\n')

print('Sum over entire matrix')
print(M.sum())          # sum over all values
print(np.sum(M))        # sum over all values
print('\n')

print('Sum of second row onward')
print(M[1:,:].sum())     # sum first two rows
print(np.sum(M[1:,:]))   # sum first two rows

### 💪 Practice: Matrix Methods
Given the matrix `M0` use the various methods we have discussed to do the following:
* Print the minimum value of each row.
* Print the maximum value of each column.
* Slice `M0` and take the average of the four elements in the bottom right corner of the matrix.

In [None]:
M0 = np.array([[8,2,3,12],[7,11,5,9],[4,10,1,6]])
print(M0)


## Matrix Operations

By default, NumPy will usually implement matrix operations on an **element-wise basis**, meaning that if you are multiplying two matricies of the same size together (using the `*` symbol), elements of the same index will be multiplied together:

$$
\begin{bmatrix}
a_{11} & a_{12}\\
a_{21} & a_{22}
\end{bmatrix}
*
\begin{bmatrix}
b_{11} & b_{12}\\
b_{21} & b_{22}
\end{bmatrix}
=
\begin{bmatrix}
a_{11}b_{11} & a_{12}b_{12}\\
a_{21}b_{21} & a_{22}b_{22}
\end{bmatrix}
$$

This element-wise operation is not only true for multiplication (`A * B`), but also:
* Addition: `A + B`
* Subtraction: `A - B`
* Scalar Multiplication: `2*A`
* Multiplication: `A * B`
* Division: `A / B`
* Raising to a Power: `A**2`

In [None]:
# Scalar Multiplication (to Create m1 and m2)
m1 = 2*np.ones((3,3))     # 3x3 matrix of 1s
m2 = 4*np.ones((3,3))     # Multiplying a matrix by a scalar
print('m1 = \n',m1)
print('\n')
print('m2 = \n',m2)

In [None]:
# Addition & Subtraction
print(m1 + m2) # elementwise addition
print('\n')
print(m2 - m1) # elementwise subtraction

In [None]:
# Multiplication & Division
print(m1 * m2) # elementwise multiplication (In Matlab: "m1.*m2")
print('\n')
print(m1 / m2) # elementwise division (In Matlab: "m1./m2")

In [None]:
# Raising to a Power
print(m2 ** 2) # elementwise power (In Matlab: "m2.^2")

### 🫵 Give it a try!
Practice some of the Matrix Math you have learned by first creating a $4 \times 4$ matrix called `A`, where the diagonal consists of `5`'s, the above diagonal (`+1`) consists of `9`'s, the below diagonal (`-1`) consists of `4`'s, and the remaining elements are `2`'s.
* Then create a vector `a` by multiplying element-wise the first two columns of `A` and subtract element-wise the third column of `A`.
* Finally, create a variable `b` that takes the sum of `a` and divides by the average of the fourth column of `A`.

### 👈 If you're stuck, click the 🔽 button on the left for hints.

#### 🤔 Hint #1

In [None]:
# Recall how we can construct diagonal matricies with the np.diag() function
# This is typically done by constructing individual diagonal matricies and adding them together.
A_diag = np.diag(5*np.ones(4),k=0)
A_above = np.diag(9*np.ones(3),k=1)
A_below = # Use the above code as an example.

# This last one is hard, but you basically keep doing the same thing:
A_background = np.diag(2*np.ones(2),k=2)+np.diag(2*np.ones(2),k=-2)+np.diag(2*np.ones(1),k=3)+np.diag(2*np.ones(1),k=-3)

# Now how would you bring all of these together to get A?

#### 😅 Hint #2

In [None]:
A[:,0] # This will get you the first column of A. Recall Python indexing begins at 0.
A[:,1] # This will get you the second column of A.
# Just like variables you can add/subtract/multiply these element-wise
A[:,0]*A[:,1]

# Now you need to incorporate the third column of A

#### 🥲 Hint #3

In [None]:
# Now we can use methods to sum and take the average:
b1 = a.sum()
b2 = A[:,3].mean()
# Now you just need to divide these two to get b.

#### 🥳 Check your solution!

In [None]:
# Create Matrix A
A_diag = np.diag(5*np.ones(4),k=0)
A_above = np.diag(9*np.ones(3),k=1)
A_below = np.diag(4*np.ones(3),k=-1)
A_background = np.diag(2*np.ones(2),k=2)+np.diag(2*np.ones(2),k=-2)+np.diag(2*np.ones(1),k=3)+np.diag(2*np.ones(1),k=-3)
A = A_diag + A_above + A_below + A_background

# Use matrix math and slicing to get vector a.
a = (A[:,0]*A[:,1])-A[:,2]

# Use methods and matrix math to get the final result b.
b = a.sum() / A[:,3].mean()

## Linear Algebra

Matrix math is also essential for linear algebra. Linear Algebra is a branch of mathematics that deals with vectors (1D Arrays), matricies (2D Arrays), and solving systems of linear equations. A system of linear equations is a set of two or more equations that work together to describe a situation with multiple unknowns — like prices, quantities, or measurements.

Each equation shows how the unknowns (like $x$, $y$, or $z$) are related linearly — meaning:
* The unknowns (or variables) are not squared, multiplied together, or inside functions like sine or square root (e.g., $2x^2-5yz$ is not linear).
* They appear just as they are, multiplied by numbers, and then added or subtracted from one another (e.g., $3x+4y-8z=20$ is linear).

To help contextualize this, imagine a friend is selling cookies, cupcakes, and pastries at a Bake Sale. On Day 1, they sold 3 cookies, 2 cupcakes, and 1 pastry for a total of \$20. On Day 2, they sold 2 cookies, 1 cupcake, and 3 pastries for a total of \$19. And on Day 3, they sold 1 cookie, 3 cupcakes, and 2 pastries for a total of \$22. If you wanted to figure out the price of cookies ($x$), cupcakes ($y$), and pastries ($z$) respectively, you could set up a system of linear equations for each day:

$$
3x+2y+1z = 20
$$

$$
2x+1y+3z = 19
$$

$$
1x+3y+2z = 22
$$

You could solve this by substitution or adding/subtracting equations, but that can be a lot of work. Linear algebra gives us general tools that allows us to solve this—and even more complicated—systems of linear equations.

If we were to generalize our example from above, a system of linear equations could look like:

$$
a_{11}x_1+a_{12}x_2+a_{13}x_3 = b_1
$$

$$
a_{21}x_1+a_{22}x_2+a_{23}x_3 = b_2
$$

$$
a_{31}x_1+a_{32}x_2+a_{33}x_3 = b_3
$$

Reorganizing this sytem of equations as matrix equations gives us:
$$
\begin{bmatrix}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33}
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
x_3
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3
\end{bmatrix}
$$

In linear algebra, we will simplify the above matrix equation as:
$$
\textbf{A} \textbf{x} = \textbf{b}
$$

Where $\textbf{A}$ is the coefficient matrix, $\textbf{x}$ is the variable vector, and $\textbf{b}$ is the solution vector. Note that capital letters typically indicate matricies ($\textbf{A}$) and lowercase letters typically indicate vectors ($\textbf{x}$, $\textbf{b}$), and the **bold font** indicates that we are dealing with array (e.g., 1D vectors and 2D matrices).

Most of linear algebra is solving this equation ($\textbf{Ax}=\textbf{b}$) and using other mathematical techniques to learn more about our system that is described by these equations. Although we discussed a somewhat trivial example of determining prices of bake sale goods, linear algebra allows us to do some incredible things, like optimize water availability in connected reservoir networks, reduce traffic congestion during rush hour, ensure the structural stability of buildings, model pollution transportat, and track the spread of disease.

## Important Functions for Linear Algebra

The **Dot Product** takes two vectors and gives back a single number by multiplying element-wise and summing up the product:

$$\textbf{a}=[2,3], \textbf{b}=[4,5]$$

$$\textbf{a} \cdot \textbf{b} = (2 \cdot 4)+(3 \cdot 5) =23$$

In [None]:
%reset -f
import numpy as np

# Define Parameters
a = np.array([2, 3])
b = np.array([4, 5])

result = np.dot(a, b)   # Dot Product (In Matlab: "dot(a,b)" or "a'*b")

# Print the Dot Product
print(result)

**Matrix Multiplication** is like systemmatically doing the dot product for rows of an initial matrix (`A`) and columns of a secondary matrix (`B`), thereby producing a third matrix that has the same number of rows as `A` and the same number of columns as `B`.

$$
A=
\begin{bmatrix}
1 & 2 & 3\\
4 & 5 & 6
\end{bmatrix}
,
\ B=
\begin{bmatrix}
7 & 8\\
9 & 10\\
11 & 12
\end{bmatrix}
$$

The top-left element is from the dot product of the first row of `A` and the first column of `B`:
$$(1 \cdot 7)+(2 \cdot 9)+(3 \cdot 11) = 58$$

The top-right element is from the dot product of the first row of `A` and the second column of `B`:
$$(1 \cdot 8)+(2 \cdot 10)+(3 \cdot 12) = 64$$

The bottom-left element is from the dot product of the second row of `A` and the first column of `B`:
$$(4 \cdot 7)+(5 \cdot 9)+(6 \cdot 11) = 139$$

The bottom-right element is from the dot product of the second row of `A` and the second column of `B`:
$$(4 \cdot 8)+(5 \cdot 10)+(6 \cdot 12) = 154$$

The result from this matrix multiplicaiton is:
$$
\begin{bmatrix}
56 & 64\\
139 & 154
\end{bmatrix}
$$

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[7, 8],
              [9, 10],
              [11, 12]])

result = A @ B    # Matrix Multiplication (In Matlab: "A*B")
                  # Note that np.dot(A, B) gives you the same result.
print(result)

Other important linear algebra functions include:
* `np.linalg.solve(A,b)`: Solve the system of equations `Ax = b`.
* `np.linalg.inv(A)`: Takes the inverse of matrix `A`.
* `np.linalg.det(A)`: Determines whether the matrix `A` is invertible.
* `np.linalg.eig(A)`: Determines the dynamic behavior of the system of equations described by `A`.
* `np.linalg.norm(A)`: Determines the length, distance, error associated with `A`.

However, it is good to program these functions yourself in order to understand how they work before relying on them.

# 📊 Data Visualization: Matplotlib.PyPlot (plt)

`matplotlib.pyplot` is a popular Python module used to create a wide variety of plots and visualizations, making it easy to understand and present data. In this section, you'll learn how to use its simple, MATLAB-like functions to quickly build and customize your own plots.

Just like with `numpy`, we need to import this library before we can use it:

In [None]:
#pip install matplotlib	# This is often unnecessary because matplotlib
# is pre-installed in the Google Colab environment.

import matplotlib.pyplot as plt

## Basic Structure of a Plot

Before you can make a plot, you first need **data** — usually in the form of two variables: one for the x-axis and one for the y-axis. These variables must be organized in the same way, meaning they should have the **same number of elements** (for example, two arrays that both have 5 values each). Once your data is ready, you can create a simple line plot using `plt.plot(x, y)`.

In [None]:
# Plotting the squared relationship between x (independent) and y (dependent).
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]

plt.plot(x, y)  # Create a simple line plot by default.

Notice that just using `plt.plot(x,y)` gives some extra information about the plot that you might not want (i.e., `[<matplotlib.lines.Line2D at 0x7aa13961a010>]`). To avoid this, you can use `plt.show()` after all of your plotting code to suppress this output.

In [None]:
# Plotting the squared relationship between x (independent) and y (dependent).
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]

plt.plot(x, y)  # Create a simple line plot by default.
plt.show()

## Customizing Plots

After plotting the data, you can customize the plot using some of the following basic functions:
* `plt.title("Plot Title")`: Adds a title to the plot.
* `plt.xlabel("x-axis label")`: Adds a label to the x-axis.
* `plt.ylabel("y-axis label")`: Adds a label to the y-axis.
* `plt.legend()`: Adds a legend (used when you set label= in plot commands).
* `plt.show()`: Displays the plot window (especially important outside notebooks).

In [None]:
plt.plot(x, y)  # Create a simple line plot by default.

plt.title("Basic Plot")  # Add a title
plt.xlabel("X-axis")  # Label the x-axis
plt.ylabel("Y-axis")  # Label the y-axis
plt.show()  	  # Display the plot. Google Colab will plot this, even without.
                # plt.show(), but may be necessary in other coding environments.

In addition to these basic customizations, you can also customize:
* `plt.xlim([xmin, xmax])`: Sets the limits of the x-axis.
* `plt.ylim([ymin, ymax])`: Sets the limits of the y-axis.
* `plt.legend(loc="best")`: Adds a legend to the plot, using labels you defined with `label=` in your plot commands. The loc parameter controls the legend's location (e.g., `"upper left"`, `"lower right"`, `"best"` automatically picks a good spot).
* `plt.grid()`: Adds gridlines to make the plot easier to read.
* `plt.figure(figsize=(8, 6), dpi=300)`: Creates a new figure with a specified size in inches (`figsize`) and resolution in dots per inch (`dpi`). The default `dpi` is `100`, but increasing it (e.g., to `300`) makes saved images sharper and better for printing.

In [None]:
# Customizing Axes Limits
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]

plt.plot(x, y)
plt.xlim(0, 6) # Setting x-axis limit.
plt.ylim(0, 30) # Setting y-axis limit.
plt.title("Custom Axis Limits")
plt.show()

In [None]:
# Customizing the Legend
x_squared = [1, 2, 3, 4, 5]
y_squared = [1, 4, 9, 16, 25]

x_linear = [1, 2, 3, 4, 5]
y_linear = [1, 2, 3, 4, 5]

plt.plot(x_squared, y_squared, label="y = x$^2$")
plt.plot(x_linear, y_linear, label="y = x")
plt.legend(loc="best")  	# Add legend in the upper-left corner
plt.show()
          				              # There's also:
					                      #  0 or 'best', 1 or 'upper right',
                                #  2 or 'upper left', 3 or 'lower left',
                                #  4 or 'lower right', 5 or 'right',
                                #  6 or 'center left', 7 or 'center right',
                                #  8 or 'lower center', 9 or 'upper center',
                                #  and 10 or 'center'

In [None]:
# Adding Gridlines
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]

plt.plot(x,y)
plt.grid()  # Add gridlines
plt.show()

In [None]:
# Changing Figure Size
plt.figure(figsize=(8, 6), dpi=300)	# figsize=(8, 6): Sets figure size to 8
                                    # inches wide by 6 inches tall.

                                    # dpi=300: Increases the default
                                    # resolution from 100 dots per inch to
                                    # 300 dots per inch.
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
plt.plot(x, y)
plt.title("Figure with Custom Size")
plt.grid()
plt.show()

Lastly, we can also customize the `linestyle` (the pattern of the line), the `color` (color of the line and markers), and `marker` (adding markers at each of your data points). These customizations are done in the `plt.plot()` function itself, after you have specified `x` and `y` (and your `label`).

In [None]:
# Changing Line Style, Color, Marker
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
plt.plot(x,y, linestyle="--", color="red", marker="o")
plt.title("Custom Line Style")
plt.grid()
plt.show()

**Table of Common Line Styles, Colors, and Marker Types**

|   **Line Style**    |      **Color**       |     **Markers**     |
|:-------------------:|:--------------------:|:-------------------:|
| Solid Line: '-'     | Blue: 'b', 'blue'    | Point: '.'          |
| Dashed Line: '--'   | Green: 'g', 'green'  | Circle: 'o'         |
| Dash-Dot Line: '-.' | Red: 'r', 'red'      | X: 'x'              |
| No Line: 'None', ' '| Cyan: 'c', 'cyan'    | Plus: '+'           |
|                     | Magenta: 'm', 'magenta' | Triangle: '^'    |
|                     | Yellow: 'y', 'yellow'   | Down Triangle: 'v'|
|                     | Black: 'k', 'black'     | Right Triangle: '>'|
|                     | White: 'w', 'white'     | Left Triangle: '<' |
|                     |                         | Diamond: 'd'       |
|                     |                         | Pentagon: 'p'      |
|                     |                         | Hexagon: 'h'       |
|                     |                         | Star: '*'          |


For more guidance on colors, check out this [website](https://matplotlib.org/stable/users/explain/colors/colors.html#sphx-glr-users-explain-colors-colors-py).

### 🫵 Give it a try!
Create a line plot for `x_squared = [1, 2, 3, 4, 5]` and `y_squared = [1, 4, 9, 16, 25]`; and `x_cubed = [1, 2, 3, 4, 5]` and `y_cubed = [1, 8, 27, 64, 125]`. Be sure to:
* Set your `figsize` to `(8, 6)` and `dpi` to `200`
* Provide a title ("Comparison Plot") and axes labels ("X Values" and "Y Values")
* Label your plots $x^2$ and $x^3$ respectively
* Set limits to your x-axis: `0` to `5`
* Set limits to your y-axis: `0` to `130`
* Add grid lines and put it in the `upper right` corner.
* Add a legend
* Give the `squared` plot: Blue Diamond markers, with Dashed Lines
* Give the `cubed` plot: Cyan Star markers, with Dash-Dot Lines

In [None]:
# Define the data you want to plot:
x_squared = [1, 2, 3, 4, 5]
y_squared = [1, 4, 9, 16, 25]
x_cubed = [1, 2, 3, 4, 5]
y_cubed = [1, 8, 27, 64, 125]


### 👈 If you're stuck, click the 🔽 button on the left for hints.

#### 🤔 Hint #1

In [None]:
# Define the data you want to plot:
x_squared = [1, 2, 3, 4, 5]
y_squared = [1, 3, 5, 7, 20]      # Be sure to adjust this.
x_cubed = [1, 2, 3, 4, 5]
y_cubed = [1, 20, 50, 75, 130]    # Be sure to adjust this.

# Also recall how to add labels:
plt.plot(x_squared, y_squared, label="label 1")   # Be sure to adjust this.
plt.plot(x_cubed, y_cubed, label="label 2")       # Be sure to adjust this.
plt.show()

#### 😅 Hint #2

In [None]:
# Remember how to set figsize and dpi
plt.figure(figsize=(2, 4), dpi=100)   # Be sure to adjust this.

# Remember how to set limits to your x-axis and y-axis
plt.xlim(0,10)    # Be sure to adjust this.
plt.ylim(0,300)   # Be sure to adjust this.

# Remember how to add gridlines and a legend.
plt.grid()
plt.legend()

# Be sure to incorporate this into your other code.

#### 🥲 Hint #3

In [None]:
# Remeber how to customize your marker and lines:
plt.plot(x_squared, y_squared, linestyle="--", color="red", marker="o")   # Be sure to adjust this.
plt.plot(x_cubed, y_cubed, linestyle="--", color="red", marker="o")       # Be sure to adjust this.

#### 🥳 Check your solution!

In [None]:
# Define the data you want to plot:
x_squared = [1, 2, 3, 4, 5]
y_squared = [1, 4, 9, 16, 25]
x_cubed = [1, 2, 3, 4, 5]
y_cubed = [1, 8, 27, 64, 125]

# Recall how to add labels:
plt.plot(x_squared, y_squared, label="$x^2$", linestyle="--", color="blue", marker="d")
plt.plot(x_cubed, y_cubed, label="$x^3$", linestyle="-.", color="cyan", marker="*")
plt.title("Comparison Plot")
plt.xlabel("X Values")
plt.ylabel("Y Values")
plt.grid()
plt.legend(loc="upper left")

plt.show()

### 💪 Extra Practice: Scatter Plot of a Math Equation
Use `NumPy` to plot `50` data points of $x \in [-3, 3]$ for $y = x^3 - 3x$ as a scatter plot.

## Common Plot Types

We've already covered line plots, but there a number of different plot types that we could also use, including:
* **Scatter Plots**: `plt.scatter(x, y)`: creates a plot of individual data points, useful for showing relationships, patterns, or clusters between two variables.

* **Bar Charts**: `plt.bar(x, height)`: displays rectangular bars to compare quantities across different categories; the length of each bar represents its value.

* **Horizontal Bar Charts**: `plt.barh(y, width)`: creates bars that extend horizontally instead of vertically. This is useful when category names are long, or when you want to emphasize comparison across categories in a horizontal layout.

* **Pie Charts**: `plt.pie(sizes, labels=...)`: shows parts of a whole as slices of a circle, useful for illustrating proportion or percentage breakdowns.

In [None]:
# X-Y Scatter Plot
plt.scatter([0, 1, 2, 3], [0, 1, 4, 9], label="y = x$^2$")
plt.scatter([0, 1, 2, 3], [0, 1, 2, 3], label="y = x")
plt.title("Line Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()  # Add a legend
plt.grid()
plt.show()

In [None]:
# Bar Chart
categories = ["A", "B", "C"]
values = [3, 7, 5]

plt.bar(categories, values, color="skyblue")
plt.title("Bar Plot Example")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.grid()
plt.show()

In [None]:
# Horizontal Bar Plot
categories = ["A", "B", "C"]
values = [3, 7, 5]

plt.barh(categories, values, color="skyblue")
plt.title("Horizontal Bar Plot Example")
plt.xlabel("Values")
plt.ylabel("Categories")
plt.grid()
plt.show()

In [None]:
# Pie Chart
categories = ["A", "B", "C"]
values = [3, 7, 5]

plt.pie(values, labels=categories)
plt.title("Pie Chart Example")
plt.show()

### 💪 Practice: Create your own Bar Chart
Imagine we are counting the number of samples obtained in different boroughs of New York City. The breakdown of samples are:
* Manhattan: 52
* Brooklyn: 43
* Queens: 11
* The Bronx: 34
* Staten Island: 21

Create a bar chart that visualizes the number of samples taken for each borough.

## Multiple Plots

When you want to create separate figures, you can use `plt.figure()` to create or switch between different figure windows, allowing you to work on separate plots independently without affecting each other.

The syntax is `plt.figure(number)`, where: `number`: The figure identifier (e.g., 1, 2, 3, ...). You can use this number to create a new figure or activate an existing one.

In the example below, `plt.figure(1)` creates or activates Figure 1 and draws a line plot, while `plt.figure(2)` creates or activates Figure 2 and draws a scatter plot. Both figures are then displayed together when `plt.show()` is called.

In [None]:
# First figure
plt.figure(1) # Notice the 1 in plt.figure(1)
plt.plot([1, 2, 3], [1, 4, 9])
plt.title("Figure 1")
plt.grid()

# Second figure
plt.figure(2) # Notice the 2 in plt.figure(2)
plt.scatter([1, 2, 3], [1, 2, 3])
plt.title("Figure 2")
plt.grid()

plt.show()

Sometimes you may want to show several plots side by side or in a single figure to compare data more easily. Using functions like `plt.subplot()` or creating multiple figure and axis objects, you can organize multiple plots in rows and columns within one figure, allowing you to clearly highlight similarities or differences between datasets.

The syntax of is `plt.subplot(nrows, ncols, index)`, where:
* nrows: Number of rows in the grid.
* ncols: Number of columns in the grid.
* index: Position of the current plot (counted left to right, top to bottom, starting at 1).



In [None]:
# Simple Subplots
plt.subplot(1, 2, 1)  # 1 row, 2 columns, 1st plot
plt.plot([1, 2, 3], [1, 4, 9])
plt.title("Plot 1")
plt.grid()

plt.subplot(1, 2, 2)  # 1 row, 2 columns, 2nd plot
plt.plot([1, 2, 3], [1, 2, 3])
plt.title("Plot 2")
plt.grid()

plt.tight_layout()  # Adjust spacing (tight)

# For manual control over spacing, you can adjust using the following:
#plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.3, hspace=0.4)
      # ^left, right: Left and right margins as a fraction of the figure width.
      # ^top , bottom: Top and bottom margins as a fraction of the figure height.
      # ^wspace: Width spacing between subplots (default is 0.2).
      # ^hspace: Height spacing between subplots (default is 0.2).

plt.show()

When you use `plt.subplots()`, you can set it equal to `fig` (which refers to the entire figure or canvas that holds the subplots, axes, titles, legend, etc.) and `axes` (which refers to individual subplots inside the fiture). Separating `fig` and `axes` makes editing these various objects much easier.

In [None]:
# Advanced Subplots
fig, axes = plt.subplots(2, 2, figsize=(8, 6))

axes[0, 0].plot([1, 2, 3], [1, 4, 9])
axes[0, 0].set_title("Plot 1")
axes[0, 0].grid()

axes[0, 1].bar(["A", "B", "C"], [3, 7, 5])
axes[0, 1].set_title("Plot 2")
axes[0, 1].grid()

axes[1, 0].scatter([1, 2, 3], [1, 4, 9])
axes[1, 0].set_title("Plot 3")
axes[1, 0].grid()

axes[1, 1].pie([3, 7, 5],labels=["A", "B", "C"])
axes[1, 1].set_title("Plot 4")
axes[1, 1].grid()

plt.tight_layout()
plt.show()

There are many nuances to plotting data using Matplotlib. For a more comprehensive guide see [Matplotlib Tutorial (GeeksforGeeks)](https://www.geeksforgeeks.org/matplotlib-tutorial/).

### 💪 Practice: Create 4 Side-By-Side Plots
Plot the following equations for $x \in [-5, 5]$ in $2 \times 2$ Subplots:
* $y=x$
* $y=x^2$
* $y=x^3$
* $y=x^4$

Be sure to label the entire plot "Comparison of Plots" and individual subplot titles with the equation names.

# 🧮 Statistics

Real-world datasets are often large or noisy. Statistics offers us a variety of tools to help us understand these datasets and identify correlations and trends. NumPy has a number of built-in functions that can help describe these datasets:
* `np.mean(data)`: Returns the average value of the array by adding up all the elements and dividing by the number of elements.
* `np.sort(data)`: Sorts the array in ascending order (lowest to highest value).
* `np.median(data)`: Returns the median value of an array by ordering the elements from lowest to highest and returning the value of the element that is at the midpoint in this series (if the dataset has an even number of elements, the median value will be an average of the two elements at the midpoint in the ordered series).
* `np.min(data)`: Returns the minimum value of the array.
* `np.max(data)`: Returns the maximum value of the array.

### 🤝 Characterizing Water Usage with Basic Statistics

Here's an example that uses these functions to characterize water usage of various households in a neighborhood.

* Average Water Usage: $\frac{90+92+88+89+500+91+87}{7}=148.14$ L/day

In [None]:
%reset -f
import numpy as np

water_usage = np.array([90, 92, 88, 89, 500, 91, 87]) # Units: Liters/Day

# Average Water Usage
print(f"Average Water Usage: {np.mean(water_usage):0.2f} L/day")

Water Usage (Ordered Lowest to Highest): 87, 88, 89, 90, 91, 92, 500
* Median (or midpoint of the ordered series): 90

In [None]:
# Sorted in Ascending Order:
print(np.sort(water_usage))

# Median Water Usage
print(f"Average Water Usage: {np.median(water_usage)} L/day")

In [None]:
# Minimum Water Usage
min_water_usage = np.min(water_usage)
print(f"Minimum Water Usage: {min_water_usage} L/day")

# Maximum Water Usage
max_water_usage = np.max(water_usage)
print(f"Maximum Water Usage: {max_water_usage} L/day")

### 🫵 Give it a try!
Perform various `NumPy` functions to perform statistical analyses on a set of grades. These analyses should include:
* Lowest Grade
* Highest Grade
* Average Grade
* Median Grade

Print these values in a clear way (e.g., "The lowest grade is ...")

In [None]:
import numpy as np

grades = [88, 61, 72, 99, 80, 92, 77, 78, 67, 95]

# Continue your code here...

### 👈 If you're stuck, click the 🔽 button on the left for hints.

#### 🤔 Hint #1

In [None]:
# Remember the functions for np.min(), np.max()
np.min(grades)
np.max(grades)

#### 😅 Hint #2

In [None]:
# Remember the other functions for np.mean(), np.median()
np.mean(grades)
np.median(grades)

#### 🥲 Hint #3

In [None]:
# Print values clearly
avg_grade = np.mean(grades)

print(f"The average grade is {avg_grade}.")

#### 🥳 Check your solution!

In [None]:
# Import Libraries
import numpy as np

# Grades
grades = [88, 61, 72, 99, 80, 92, 77, 78, 67, 95]

# Statistical Analyses
low_grade = np.min(grades)
high_grade = np.max(grades)
avg_grade = np.mean(grades)
median_grade = np.median(grades)

# Print Statistical Analyses
print(f"The lowest grade is {low_grade}.")
print(f"The highest grade is {high_grade}.")
print(f"The average grade is {avg_grade}.")
print(f"The median grade is {median_grade}.")

### 💪 Extra Practice: Characterize Electricity Usage

Use the NumPy functions `np.mean()`, `np.sort()`, `np.median()`, `np.min()`, and `np.max()` to characterize the following data about electricity usage of various households in a neighborhood.

In [None]:
%reset -f
import numpy as np

elec_usage = np.array([28.4, 31.2, 29.5, 30.7, 27.9, 32.1, 28.6, 30.3, 29.8, 31.0,
                       30.1, 29.3, 30.9, 27.8, 32.4, 28.7, 0.0, 29.6, 31.3, 30.0])

# Continue your code here...

## Analyzing Data Distributions

As we begin to deal with larger datasets, trends/behavior in the data will begin to emerge. Oftentimes, we will describe these characteristics as a distribution — showing which values are more or less common in a dataset. Understanding distributions helps us make sense of patterns like averages, variability, and outliers, which are essential when analyzing real-world data.

* `np.std(data)`: Returns the standard deviation, which measures how spread out the values are from the mean.
* `np.var(data)`: Returns the variance, which is the average of the squared differences of the data elements from the mean. The variance is often calculated in the process of calculating the standard deviation (which is the square root of the variance). Variance provides similar information as the standard deviation about the spread of the data, but is less commonly presented in statistical analyses.
* `np.percentile(data,q)`: Returns the value below which a given percentage `q` (between 0 and 100) of the data falls (e.g., 90th percentile)
* `np.quantile(data, q)`: Returns the value below which a given fraction `q` (between 0 and 1) of the data falls (i.e., same as percentile, but uses decimals instead of percentage).
* `np.histogram(data,bins)`: Returns a tuple of two arrays: the counts of values in each bin and the bin edges used to divide the data.

### 🤝 Understanding Class Grades using Distributions

Analyzing a class's grades on a quiz can help us understand how to use some of these functions. We'll use `np.mean()` and `np.std` to initially characterize the performance of the class. Then we will use `np.percentile()`, `np.quantile()` to identify performance thresholds — such as the 1st quartile, median, or top 10% — and understand how individuals compare to the rest of the class. Lastly, we will assign grades using `np.histogram()` and plot the performance of the class.

In [None]:
%reset -f
import numpy as np

grades = np.array([
    55, 60, 62, 65, 67, 68, 70, 72, 73, 75, 76, 78, 80, 82, 83, 85, 87, 88, 89,
    90, 91, 93, 95, 98, 100])

# Basic Statistics
print(f"Average Score: {np.mean(grades):0.2f}")
print(f"Standard Deviation: +/- {np.std(grades):0.2f}")

>* The average score on the exam was 79.28 / 100 points.
* The standard deviation tells us that "most" (68%) of the scores fall within +/- 12.17 points of the average.

In [None]:
# Percentiles
print(f"25th Percentile (Q1): {np.percentile(grades, 25)}")
print(f"50th Percentile (Median / Q2): {np.percentile(grades, 50)}")
print(f"90th Percentile: {np.percentile(grades, 90)}")
print("\n")

# Quantiles (equivalent way to express same idea, but with decimals)
print(f"1st Quartile (Quantile 0.25): {np.quantile(grades, 0.25)}")
print(f"Median (Quantile 0.5): {np.quantile(grades, 0.5)}")
print(f"10th Decile (Quantile 0.90): {np.quantile(grades, 0.90)}")


>* The above outputs demonstrate that `np.percentile()` and `np.quantile()` function basically the same.
* The output from the 25th Percentile/Quantile tells us the score that 25% of the students scored below.
* The output from the Median Percentile/Quantile tells us the score that 50% of the students scored below.
* The output from the 75th Percentile/Quantile tells us the score that 75% of the students scored below.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Define bins (cutoffs for F, D, C, B, A)
bins = [0, 60, 70, 80, 90, 100]
labels = ['F', 'D', 'C', 'B', 'A']

# Get histogram counts
counts, bin_edges = np.histogram(grades, bins=bins)

# Plot the histogram
plt.bar(labels, counts)
plt.title('Class Grade Distribution')
plt.xlabel('Letter Grade')
plt.ylabel('Number of Students')
plt.show()


>* The above bar plot helps us visualize the distribution of grades and how they fall into the bins that we defined (`A`, `B`, `C`, `D`, and `F`).

In [None]:
# Alternatively, you could plot using `plt.hist`
plt.hist(grades, bins=[0, 60, 70, 80, 90, 100])
plt.title("Class Grade Distribution")
plt.xlabel("Score Range")
plt.ylabel("Number of Students")
plt.show()

>The above plot uses `plt.hist(data,bins)`, where `data` are the values of the grades we want to plot on the histogram and `bins` are an array with our bin edges. Note that the default histogram plot plots the score range on the x-axis, which is why it looks different from the previous bar plot.

### 🫵 Give it a try!

The data below, rainfall, describes the daily rainfall amounts (in millimeters) recorded in a city over the course of a month.

* Use `np.mean()` and `np.std()` to initially characterize the rainfall data.
* Use `np.percentile()` (or `np.quantile()`) to identify important thresholds, such as the median rainfall or the top 10% of rainiest days.
* Create a histogram to visualize the distribution of daily rainfall amounts for the city.

In [None]:
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt

# Example daily rainfall amounts (in mm) over a month
rainfall = np.array([0, 5, 12, 0, 20, 18, 0, 0,
                     2, 25, 10, 0, 8, 15, 5, 0,
                     30, 0, 0, 22, 14, 0, 0, 17,
                     12, 0, 0, 9, 3, 0])

# Continue code here...

### 👈 If you're stuck, click the 🔽 button on the left for hints.

#### 🤔 Hint #1

In [None]:
# Recall the functions for obtaining mean and standard deviation
mean_rain = np.mean(rainfall)
std_rain = np.std(rainfall)

#### 😅 Hint #2

In [None]:
# Recall the functions for obtaining the median and top 10% threshold
median_rain = np.percentile(rainfall, 50)
top_10_threshold = np.percentile(rainfall, 90)

#### 🥲 Hint #3

In [None]:
# Recall how to create a histogram
plt.hist(rainfall, bins=8, color='cornflowerblue', edgecolor='black')

# This allows you to create a vertical line
plt.axvline(median_rain, color='orange', linestyle='dashed', linewidth=2, label='Median')

#### 🥳 Check your solution!

In [None]:
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt

# Example daily rainfall amounts (in mm) over a month
rainfall = np.array([0, 5, 12, 0, 20, 18, 0, 0,
                     2, 25, 10, 0, 8, 15, 5, 0,
                     30, 0, 0, 22, 14, 0, 0, 17,
                     12, 0, 0, 9, 3, 0])

# Characterize the data with mean and standard deviation
mean_rain = np.mean(rainfall)
std_rain = np.std(rainfall)

print(f"Mean daily rainfall: {mean_rain:.1f} mm")
print(f"Standard deviation: {std_rain:.1f} mm")

# Identify thresholds using percentiles
median_rain = np.percentile(rainfall, 50)
top_10_threshold = np.percentile(rainfall, 90)

print(f"Median rainfall: {median_rain:.1f} mm")
print(f"Top 10% threshold (90th percentile): {top_10_threshold:.1f} mm")

# Create a histogram
plt.hist(rainfall, bins=8, color='cornflowerblue', edgecolor='black')
plt.axvline(median_rain, color='orange', linestyle='dashed', linewidth=2, label='Median')
plt.axvline(top_10_threshold, color='red', linestyle='dashed', linewidth=2, label='Top 10% threshold')
plt.title("Distribution of Daily Rainfall (Month)")
plt.xlabel("Rainfall (mm)")
plt.ylabel("Number of Days")
plt.legend()
plt.grid(True)
plt.show()

### 💪 Practice: Analyze Temperature in a City

The data below `temps` describes the daily high temperatures periodically recorded in a city throughout the year.
* Use `np.mean()` and `np.std` to initially characterize the data.
* Use `np.percentile()` (or `np.quantile()`) to identify performance thresholds, such as the median or top 10%, of the temperature data.
* Create a histogram to visualize the distribution of temperature data for the city.

In [None]:
%reset -f

import numpy as np
import matplotlib.pyplot as plt

# Daily high temperatures for a year
temps = np.array([
    # Winter
    54, 56, 55, 53, 52, 55, 57, 58, 56, 54,
    53, 55, 54, 56, 57, 55, 52, 53, 56, 54,
    55, 54, 53, 52, 56, 57, 58, 56, 55, 54,

    # Spring
    65, 66, 68, 67, 69, 70, 68, 67, 66, 65,
    69, 71, 72, 70, 68, 67, 66, 65, 69, 70,
    71, 72, 73, 72, 71, 70, 69, 68, 67, 66,

    # Summer
    90, 92, 91, 93, 95, 96, 94, 93, 91, 90,
    95, 97, 98, 96, 94, 93, 92, 91, 95, 96,
    97, 98, 99, 108, 110, 97, 95, 94, 92, 91,

    # Fall
    85, 84, 83, 82, 81, 80, 79, 78, 77, 76,
    75, 74, 76, 78, 79, 80, 81, 82, 83, 84,
    85, 86, 85, 84, 83, 82, 81, 80, 79, 78,
])

# Continue your code here...

## Random Number Generators & Distributions

You may encounter instances where it is useful to generate an array of random numbers (e.g., sensitivity/uncertainty analyses, Monte Carlo simulations, random sampling). NumPy has a built-in "random number generator" (`rng`) that allows you to set the bounds, define the distribution shape, and specify the number of random numbers you want.

To introduce this concept, we'll start with a **uniform distribution**, meaning that any value within a range has an equal likelihood of being generated. To achieve this requires two steps: (1) creating a random number generator (`rng`) object and (2) applying that random number generator object to a specific distribution — in this case a uniform distribution:

* `rng = np.random.default_rng()`: This defines `rng` as our random number generator object.
* `rng_array = rng.uniform(lower,upper,n)`: This shapes `rng` into a uniform distribution with the lower bound `lower`, an upper bound `upper`, and `n` number of values in the array containing the randomly generated numbers (`rng_array`).

In [None]:
%reset -f

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt

# Step 1: Create Random Number Generator Object
rng = np.random.default_rng()
# Step 2: Shape rng into a distribution
rng_array = rng.uniform(0,1,5)  # lower = 0, upper = 1, n = 5

# Print Result
print(rng_array)

If you run the above code cell multiple times, you'll see that you'll get different values each time you run the cell. However, sometimes it can be useful to have "reproducible" randomness. This means that we define a `seed` that our computer will use as a starting place to deterministically (or reproducibily) generate random numbers.
* `rng = np.random.default_rng(seed=42)`: In this case, the `seed` is set to `42`, but this could be any integer.

If this `seed` is not defined (like in the previous case), then NumPy uses a constantly changing value tied to your computer (i.e., the exact time/date on your computer) to achieve non-reproducibility, also known as non-deterministic.

Note that if you are using the same `seed`, you should generate the same random numbers no matter which computer you are running on. This can be extremely useful if you are trying to compare results or debug your code.

In [None]:
# Run this cell multiple times to see the difference between non-deterministic randomness
# and deterministic (or reproducible) randomness.

# Non-Deterministic Randomness
rng_rand = np.random.default_rng()
data_rand = rng_rand.uniform(0,1,10)

# Reproducible Randomness
rng_seed = np.random.default_rng(seed=42)
data_seed = rng_seed.uniform(0,1,10)

# Create Side-By-Side Histograms
fig, axes = plt.subplots(1, 2, figsize=(8, 4), sharey=True)
# Non-deterministic histogram
axes[0].hist(data_rand, bins=10, edgecolor='black')
axes[0].set_title('Non-Deterministic')
axes[0].set_xlabel('Value')
axes[0].set_ylabel('Frequency')
# Reproducible histogram
axes[1].hist(data_seed, bins=10, edgecolor='black')
axes[1].set_title('Reproducible (Seed=42)')
axes[1].set_xlabel('Value')

plt.suptitle('Comparison of Uniform[0, 1] Random Numbers')
plt.tight_layout()
plt.show()

# The plot on the left should change each time, whereas the plot on the right should remain unchanged.

### Normal Distriubtions

Another common distribution is is the **Normal Distribution** (also known as a Gaussian Distribution). Once you have defined your `rng` object, you can generate an array that is shaped like a normal distribution by doing the following:
* `norm_array = rng_norm.normal(mean,std,n)`: This shapes `rng` into a normal distribution around the average value `mean` with a spread according to the standard deviation `std` with `n` number of values in the array containing the randomly generated numbers (`norm_array`).

In [None]:
# Normal Distribution
rng_norm = np.random.default_rng(seed=25)
norm_array = rng_norm.normal(0,1,500)

# Plot Histogram
plt.hist(norm_array, bins=10, edgecolor='black')
plt.title('Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')

We actually encounter Normal Distributions fairly often (hence the name "Normal"). If you are looking to interpret a normal distributions, it's helpuful to know the **68-95-99.7 Rule**:
* 68% of all data falls within $\pm 1$ standard deviation of the mean.
* 95% of all data falls within $\pm 2$ standard deviation of the mean.
* 99.7% of all data falls within $\pm 3$ standard deviation of the mean.

Although we can cacluate the mean or standard deviation of any distribution, the 68-95-99.7 Rule only applies to Normal Distributions.

### 🫵 Give it a try!

You administered an exam to a class of 100 students. The average grade on this exam was 78.4 / 100 points and the standard deviation was 12.1 points. Create a normal distribution using the random number generator to simulate the grades of the class. Plot your result as a bar plot or histogram.

### 👈 If you're stuck, click the 🔽 button on the left for hints.

#### 🤔 Hint #1

In [None]:
# Begin by initializing rng_exam
rng_exam = np.random.default_rng(seed=42) # It's up to you if you want to define the seed
                                          # or let your solution be non-deterministic.

#### 😅 Hint #2

In [None]:
# Use the information in the problem to create exam_array

# Define parameters
mean = 78.4
std = 12.1
n_students = 100

# Remember we want a normal distribution
exam_array = rng_exam.normal(mean,std,n_students)

#### 🥲 Hint #3

In [None]:
# Plot the Results (Bar Plot so that we can define grades)

# Define bins (cutoffs for F, D, C, B, A)
bins = [0, 60, 70, 80, 90, 100]
labels = ['F', 'D', 'C', 'B', 'A']

# Get histogram counts
counts, bin_edges = np.histogram(exam_array, bins=bins)

# Plot the histogram
plt.bar(labels, counts)
plt.title('Class Grade Distribution')
plt.xlabel('Letter Grade')
plt.ylabel('Number of Students')
plt.show()

#### 🥳 Check your solution!

In [None]:
# Initializing rng_exam
rng_exam = np.random.default_rng(seed=42) # It's up to you if you want to define the seed
                                          # or let your solution be non-deterministic.

# Define Parameters
mean = 78.4
std = 12.1
n_students = 100

# Define Normal Distribution
exam_array = rng_exam.normal(mean,std,n_students)

# Define bins (cutoffs for F, D, C, B, A)
bins = [0, 60, 70, 80, 90, 100]
labels = ['F', 'D', 'C', 'B', 'A']

# Get histogram counts
counts, bin_edges = np.histogram(exam_array, bins=bins)

# Plot the histogram
plt.bar(labels, counts)
plt.title('Class Grade Distribution')
plt.xlabel('Letter Grade')
plt.ylabel('Number of Students')
plt.show()

### 💪 Extra Practice: Daily Water Consumption

A group of 100 households was surveyed to measure daily water consumption per household. The average daily water consumption was 250 liters, with a standard deviation of 40 liters. Simulate this data using a normal distribution and plot it to visualize how household water usage is distributed.

### Other Distributions

In addition to uniform and normal distributions, there are a variety of other distributions we can create once we define `rng`:
* `int_array = rng_int.integers(lower,upper,n)`: Creates a uniform distribution, but with only integers.
* `lognorm_array = rng_lognorm.lognormal(mean,std,n)`: Creates a lognormal distribution.
* `exp_array = rng_exp.exponential(scale,n)`: Creates an exponential distribution where `scale` is the inverse of the exponential coefficeint that has units of events per unit time.
* `bn_array = rng_bn.binomial(n_sim,p,n)`: Creates a binomial distribution (e.g., probability of getting heads when flipping a coin), where `n_sim` is the number of times the coin is flipped per simulation, `p` is the probability of getting heads, and `n` is the number of simulations run.
* `mn_array = rng_mn.multinomial(n,pvals,size)`: Creates a multinomial distribution (e.g., probability of rolling a die), where `n_sim` is the number of times the die is rolled per simulation, `pvals` is the probability of rolling each side of the die, and `n` is the number of simulations run.

In [None]:
# Set up common parameters
seed = 25
n = 500  # Number of samples or simulations

# Create RNGs
rng_lognorm = np.random.default_rng(seed)
rng_exp = np.random.default_rng(seed)
rng_bn = np.random.default_rng(seed)
rng_mn = np.random.default_rng(seed)

# ----- Lognormal Distribution -----
mean = 0     # Mean of the underlying normal distribution
std = 0.5    # Standard deviation of the underlying normal distribution
lognorm_array = rng_lognorm.lognormal(mean, std, n)

# ----- Exponential Distribution -----
scale = 1.0  # Scale = 1 / lambda (e.g., average time between events)
exp_array = rng_exp.exponential(scale, n)

# ----- Binomial Distribution -----
n_sim = 10   # Number of coin flips per simulation
p = 0.5      # Probability of heads
bn_array = rng_bn.binomial(n_sim, p, n)

# ----- Multinomial Distribution -----
die_faces = 6
pvals = [1/die_faces] * die_faces  # Fair 6-sided die
n_rolls = 10                       # Number of dice rolls per simulation
mn_array = rng_mn.multinomial(n_rolls, pvals, n)  # Shape: (n, 6)
mn_totals = mn_array.sum(axis=0)  # Sum counts for each face across all simulations

# Plot side-by-side histograms
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Lognormal
axes[0, 0].hist(lognorm_array, bins=10, edgecolor='black')
axes[0, 0].set_title('Lognormal Distribution')
axes[0, 0].set_xlabel('Value')
axes[0, 0].set_ylabel('Frequency')

# Exponential
axes[0, 1].hist(exp_array, bins=10, edgecolor='black')
axes[0, 1].set_title('Exponential Distribution')
axes[0, 1].set_xlabel('Time Between Events')

# Binomial
axes[1, 0].hist(bn_array, bins=np.arange(n_sim + 2) - 0.5, edgecolor='black')
axes[1, 0].set_title('Binomial Distribution')
axes[1, 0].set_xlabel('Number of Heads in 10 Flips')

# Multinomial (sum of face counts across simulations)
axes[1, 1].bar(np.arange(1, 7), mn_totals, edgecolor='black')
axes[1, 1].set_title('Multinomial Distribution (Die Rolls)')
axes[1, 1].set_xlabel('Die Face')
axes[1, 1].set_ylabel('Total Counts over 500 Sims')
axes[1, 1].set_xticks(np.arange(1, 7))

plt.suptitle('Histograms of Different Random Distributions', fontsize=14)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()