<h2 style="color: #EC4000;"> numpy </h2>

`numpy` is a popular module for mathematical calculations, and `matplotlib` is a popular module for data visualization and being misspelled as `mathplotlib`. Usually these two along with `pandas` form the trio of usual suspects when it comes to what you use for data processing. `pandas` is optional for this course though, so I won't be going over them. I will go over `numpy` (this set of notes) and `matplotlib` (the other set of notes named `module_twelve_matplotlib`), but it won't be incredibly in-depth. Questions asked involving those modules have historically been quite straight-forward: they're not anything super complex but still require you to have a good understanding of the basics.

I will going over content *useful for this course* within the modules' quick start guides:
- `numpy` quickstart guide: https://numpy.org/doc/stable/user/quickstart.html
- `matplotlib` quickstart guide: https://matplotlib.org/stable/users/explain/quick_start.html

Note that neither module are in the standard library, so you will have to install them. Generally, using the terminal to install using `pip`/`pip3`/`conda` is recommended.

`pip` installation:

```python
pip install numpy
pip install matplotlib
```

`pip3` installation:
```python
pip3 install numpy
pip3 install matplotlib
```

`conda` installation:
```python
conda install numpy
conda install matplotlib
```

There should be a PDF with instructions on how to install both modules in your Canvas course for your section.

<h5 style="color: #EC4000;"> numpy overview </h5>

The core of `numpy` lays in using multi-dimensional arrays (mathematically speaking, matrices). You have seen multi-dimensional arrays before in the form of nested/multi-dimensional lists. A list of lists is a 2D list specifically, and `numpy` allows you to make a 2D `numpy` array using a 2D list. I'll get more into the details in a bit, but just keep the last sentence in mind. Some students have mentioned in the past that mentally treating a `numpy` array as a super fancy Python list helped them learn this module more easily (and to be honest, a `numpy` array *is* a super fancy Python list).

<h5 style="color: #EC4000;"> Making the array </h5>

An array in `numpy` can be created as the result of a `numpy` operation or directly using `numpy.array()`. The former will be discussed later, but for now, I will focus on the latter. Example 12N-1 showcases a 1D array and a 2D array:  

In [3]:
# Example 12N-1: Making the data

"""
Since usually you will be using a lot of numpy functions (and thus, you have to type "numpy.function()" a LOT, it is easier to shorthand numpy as np to reduce visual clutter and to save your fingers from typing so much)
"""
import numpy as np

a = np.array([2, 0, 2, 5])
b = np.array([[1, 2, 3], 
             [4, 5, 6], 
             [7, 8, 9]])

print(a)
print(b)

[2 0 2 5]
[[1 2 3]
 [4 5 6]
 [7 8 9]]


Note that there is always exactly *one* list that gets passed into `array()`. `a` is formed using one 1D list, and `b` is formed using one 2D list. If I use wrote `np.array(2, 0, 2, 5)` or `np.array([1, 2, 3], [4, 5, 6], [7, 8, 9])`, `numpy` would get angry at me. Also, my formatting for `b` is perfectly legal. For 2D lists, I like writing each inner list on a separate line since that makes visualizing the 2D list and its dimensions much more easier.

Now the cool thing is, when I print the arrays, the commas are removed!

<h5 style="color: #EC4000;"> Array attributes </h5>

`numpy` array attributes are just attributes of the array: accessing any one of them will give you the corresponding information about that attribute. Though the guide lists six of them, only the first four are relevant to us at the moment:

- `ndim`: number of axes (dimensions)
- `shape`: the dimensions of the array
- `size`: the total number of elements within the whole array
- `dtype`: the datatype of the elements within the array

Note that for `ndim`, I introduced the term "axes" (plural form of "axis"). It is important to note that when we refer to the axes, we are just talking about the dimensions *themselves*, not the size of the dimensions (which are shown with `shape`). The following example showcases the four attributes:

In [6]:
# Example 12N-2: numpy attributes

import numpy as np

c = np.array([[1, 2, 3],
              [4, 5, 6]])

print("ndim:", c.ndim)
print("shape:", c.shape)
print("size:", c.size)
print("dtype:", c.dtype)

ndim: 2
shape: (2, 3)
size: 6
dtype: int32


As expected of a 2D array, there are two dimensions. For a 2D array, the shape is defined as `(rows, columns)` or more typically as `(rows, cols)`.  Do not think of `(x, y)` since that would be `(cols, rows)`: just think of how you would process a 2D list normally. You would go through each inner list (each row) then each element (each column) in each row. `c` has 2 rows and 3 columns, and thus its shape is `(2, 3)`. The total number of elements is just the product of all the numbers in the `shape`, so 2x3 = 6. 

The important part of the `dtype` is the part before the number, so you just need to worry about the `int`. This shows that `c` consists of just `int` values. The 32 is the number of bits, which gives you more information about what *the size* of `int` (or any numeric data type). Generally speaking, the bigger the number, the larger the range of numbers the datatype can support. For `int`, this is better visualized if you consider the range of numbers is from `-(2^(n-1))` to `(2^(n-1)) - 1` where `n` is the number of bits. Do you really need to know the full range given `n` bits? Not now, but if you end up dealing with very large or very precise numbers, then ensuring the data type is large enough can be important.

<h5 style="color: #EC4000;"> Zeroes and Ones </h5>

What if you wanted to make an array with just zeroes or just ones given the array shape? Well, we actually have functions for those: `zeroes()` for zeroes and `ones()` for ones. These are great functions to quickly make a default array that you can adjust later.

In [11]:
# Example 12N-3: Zeroes and Ones

import numpy as np

blank = np.zeros((2, 3, 4))
full = np.ones((2, 3, 4))

print(blank)
print("----")
print(full)

[[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]]
----
[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]


Using tuples instead of lists to pass in the shape is recommended for clarity. One, the `shape` attribute is a *tuple* (so it makes sense to pass one in for the shape though you can pass in a list and the code will be fine), and two, it's so you don't confuse the code with `np.array()` which directly turns your list into a `numpy` array. This is just a personal recommendation, but it's pretty common for people to use tuples for shapes (the quickstart guide also does this).

The easiest way to read a shape is *backwards*. So using (2, 3, 4) as an example:
1. Starting at the end, I see 4. This means I will start with a list of 4 elements
2. I go backwards and I see 3. This means I want exactly 3 lists with 4 elements
3. I go backwards and I see 2. This means I want exactly 2 lists of 3 lists with 4 elements each

You can then form the array visually using this logic:

```python
# 1. List of four elements
[0, 0, 0, 0]

# 2. 3 of the above in a list
[[0, 0, 0, 0],
 [0, 0, 0, 0],
 [0, 0, 0, 0]]

# 3. 2 of the above in a list
[[[0, 0, 0, 0],
  [0, 0, 0, 0],
  [0, 0, 0, 0]],
  
 [[0, 0, 0, 0],
  [0, 0, 0, 0],
  [0, 0, 0, 0]]]
```

And that's the output for `blank` from Example 12N-3. Note that the actual output has `float`s instead of `int`s as with my typed-out output above. We can verify this with Example 12N-4.

In [12]:
# Example 12N-4: What dtype is blank?

import numpy as np

blank = np.zeros((2, 3, 4))
print(blank.dtype)

float64


Generally `numpy` will default to floats, but if you really need 0s, then you can specify the datatype (which I don't believe you need to *memorize*, just recognize):

In [13]:
# Example 12N-5: Specifiying the dtype

import numpy as np

blank = np.zeros((2, 3, 4), dtype=np.int64)
print(blank.dtype)
print(blank)

int64
[[[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]

 [[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]]


The decimal points disappeared and the datatype is now `int64`. If you forget how to adjust the datatype, you can always manually create the multi-dimensional list, fill it with 0s (or whatever number), then make a `np.array()` with that list:

In [14]:
# Example 12N-6: The hard way

import numpy as np

# Store information
multi_dim_list = []
shape = (2, 3, 4)

for x in range(shape[0]):
    # Store the lists of 0s
    sub_list = []
    for y in range(shape[1]):
        # Make a list with shape[2] 0s
        inner_most_list = [0] * shape[2]

        # Add to sublist
        sub_list.append(inner_most_list)
    # Add sublist to main list
    multi_dim_list.append(sub_list)

print(multi_dim_list)

# Form the array
blank = np.array(multi_dim_list)
print(blank.dtype)
print(blank)


[[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]]
int32
[[[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]

 [[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]]


Also notice that printing the 3D list results in a single line of text that is much less readable than the neatly formatted output when printing out the 3D array.

<h5 style="color: #EC4000;"> arange and reshape </h5>

You remember `range()` helping you effectively make a list of numbers that you can iterate over using a `for` loop? Well there's something similar in `numpy` called `arange()`, which does the same exact thing but you get an array instead. One key feature of `arange()` is that it accepts *floating point arguments*:

In [15]:
# Example 12N-7: arange()

import numpy as np

print(np.arange(10))
print("---")
print(np.arange(1, 10))
print("---")
print(np.arange(1, 10, 2))

[0 1 2 3 4 5 6 7 8 9]
---
[1 2 3 4 5 6 7 8 9]
---
[1 3 5 7 9]


Now you may be thinking, this isn't super useful since it can only make 1D arrays. What if I wanted more dimensions? Remember how the `size` attribute gave you the number of elements and that it is equal to the product of the dimension sizes from the `shape` attribute? If you know how many elements you have, you can just define a valid shape that would give you the same `size`. For example, `np.arange(10)` has 10 elements, so a valid shape would be `(2, 5)` as a 2 by 5 array has 10 elements. The function you need to actually reshape this array is called `reshape()`:

In [17]:
# Example 12N-8: Reshaping a 1D array

import numpy as np

print(np.arange(10))
print("---")
print(np.arange(10).reshape(2, 5))
print("---")
print(np.arange(10).reshape(3, 4))

[0 1 2 3 4 5 6 7 8 9]
---
[[0 1 2 3 4]
 [5 6 7 8 9]]
---


ValueError: cannot reshape array of size 10 into shape (3,4)

As you can see from the error message, if your shape supports a different amount of elements than the number of elements generated by `arange()`, then you get an error because `numpy` either doesn't have enough space for all the elements or doesn't know what to fill the extra space with. With the above example, `(3, 4)` imples 12 elements (3 x 4 = 12), but I only have 10 elements in my array generated by `arange(10)`. Since there is a size mismtach, an error is thrown.

By the way, if you have way too many elements to be realistically printed, then `numpy` will just give you the boundaries (the corners) of the array:

In [18]:
# Example 12N-9: Corners!

import numpy as np

print(np.arange(10000).reshape(100, 100))

[[   0    1    2 ...   97   98   99]
 [ 100  101  102 ...  197  198  199]
 [ 200  201  202 ...  297  298  299]
 ...
 [9700 9701 9702 ... 9797 9798 9799]
 [9800 9801 9802 ... 9897 9898 9899]
 [9900 9901 9902 ... 9997 9998 9999]]


The "..." tells you that `numpy` has skipped the values between the value to its left and the value to its right or the value above it and the value below it.

<h5 style="color: #EC4000;"> linspace </h5>

`arange()` is very powerful, but it has one major flaw: floating point error. Python is always susceptible to floating point error (the rounding errors that plagued Labs 1 and 2), and `arange()` may give you strange results if you make it generate floats. The main issue is that this may also *change how many elements you have*. `linspace()` is a function very similar to `arange()` except for two things:
- The ending number is *included*
- The third argument is no longer the step but rather *the size*

So `linspace()` (not `linespace()`, a common misspelling) gives you exactly the amount of numbers between two numbers inclusive, no matter what. It will control the spacing.

In [5]:
# Example 12N-10: linspace

import numpy as np

print(np.linspace(1, 21, 8))

[ 1.          3.85714286  6.71428571  9.57142857 12.42857143 15.28571429
 18.14285714 21.        ]


This allows you to more safely use `reshape()` since you are certain what the `size` of the array is. The auto-spacing is also nice (the output has the decimal points aligned so the numbers are easier to read). Unline `arange()`, all three arguments must be provided (the starting number, the ending number, and the size).

<h5 style="color: #EC4000;"> Recognizing basic functions </h5>

There are *a lot of functions* in `numpy`. Based on the most recent time I checked the quickstart guide (August 28, 2024), the quickstart guide lists:
- 18 functions for array creation
- 5 functions for conversions
- 20 functions for manipulations
- 4 for questions
- 8 for ordering
- 12 for operations
- 4 for basic statistics
- 5 for basic linear algebra

And these is just the quickstart guide's *list of just some of the useful functions*. Do not panic, you do not need to memorize all of them (yet... nah I'm messing with you, you might only need to memorize a couple there and there), but you should recognize the obviously named ones. `max()` gives you the maximum value for example.

In [6]:
# Example 12N-11: max()

import numpy as np

print(np.array([1, 5, 2, 3]).max())

5


If you are reading a question and you don't recognize the function right away, make a good guess based on the name of the function. 99.99% of the time, your guess will be almost, if not exactly, correct. Feel free to play with some of the other functions mentioned in Zybooks, but I won't cover them myself. `array()`, `arange()`, `linspace()`, and `reshape()` are essential for forming and shaping your data and hence why I talked about them. But the others are more for manipulating and doing stuff with the data, so I think it's best not to go too in-depth with them all (it's quite a deep rabbit hole if I got to fully yap about everything).

<h5 style="color: #EC4000;"> Basic Operations </h5>