![Erudio logo](img/erudio-logo-small.png)
---
![NumPy logo](img/numpy-logo-small.png)

# What is NumPy?

NumPy is the basic foundation for (almost) all fast numeric computation within the Python ecosystem.  It provides multi-dimensional arrays, vectorized operations on elements, and static typing of "unboxed" numbers.

For Python programmers, NumPy requires reframing our thinking about problems.  Unlike with Python lists, sets, dictionaries, and other standard data structures, NumPy arrays contain a fixed number of elements, each of the same datatype.

Moreover, in NumPy we rarely loop over data elements.  Rather, we *vectorize* operations by performing them simultaneously on many or all the data in an array.

## Uses of NumPy

- Image and signal processing
- Linear algebra
- Data transformation and query
- Time series analysis
- Statistical analysis

## NumPy Ecosystem

We can usefully think of several *layers* for Python numeric computing. At the base is NumPy, built on top of that are general purpose libraries that utilize NumPy, such as Pandas, Matplotlib, and SciPy.  Above that are many domain or purpose-specific libraries and tools.

![NumPy ecosystem](img/numpy-ecosystem.png)

# Array Creation

Arrays have two essential elements: a shape and an element type.  The element type—or `dtype`—is fixed for the life of an array.  As well, the *size* of an array is fixed for its life, but its *shape* is not.  These design elements are much of what allows NumPy operations to work very quickly and on contiguous memory allocations.

Technically, the number of *dimensions* that a NumPy array might have is 32 (as of version 1.26).  In practice, you will probably never use more than 5 or 6 dimensions; but you are very likely to create arrays with millions of *elements* (their size).

The first two dimensions are often called *rows* and *columns*. The third dimension is sometimes called *panels* or *planes*.  Higher dimensions are usually just named by number.

![2D Numpy array](img/numpy-zeros-2D.png)

## Array Creation Examples

### Create a 2-D array of all zeros

In [None]:
import numpy as np

In [None]:
arr = np.zeros(shape=(5,6))
print(arr)

In [None]:
print(arr.shape)
print(arr.size)

### Create an array based on a Python list-of-lists-of-lists

In [None]:
arr = np.array([[[1, 2, 3, 4],
                 [5, 6, 7, 8],
                 [9, 8, 7, 6]],
                [[-1, -2, -3, -4],
                 [-5, -6, -7, -8],
                 [-0, -3, -6, -9]]])
print(arr)

In [None]:
print(arr.shape)
print(arr.size)

### Create an array based on a Python iterable

In [None]:
arr = np.array(range(17, 100, 2))
print(arr)

In [None]:
print(arr.shape)
print(arr.size)

## Short exercise (array creation)

Try to create an 3-D array containing all the odd numbers from 17 to 99.

In [None]:
# Complete this code
arr_odd_3d = ...

# NumPy Datatypes

Every element of an array has the same datatype.  If you were looking carefully at the above examples, you may have noticed that some of the arrays contained some kind of integer, and others contained some kind of floating point value.

NumPy makes some guesses about the datatype (called `dtype`) you want, depending on how an array is created, but you also have fine-grained control.

## Array `dtype` Examples

In [None]:
arr = np.ones(20, dtype=bool)
print(arr)

In [None]:
arr = np.arange(20, dtype=float)
print(arr)

In [None]:
arr = np.array(range(20, 0, -1), dtype=complex)
print(arr)

## Array `dtype` options

The Python types can be used for the default bit-length of some `dtype`s, specific to the machine you are working on.  But NumPy provides many other specific types.

```{list-table} Python Type and Numpy dtype equivalence
:header-rows: 1
:widths: 80 80

* - Python Type
  - NumPy dtype
* - `bool`
  - `np.bool_` 
* - `int`
  - `np.int_`
* - `float`
  - `np.float_`
* - `complex`
  - `np.complex_`
```

* Integers can be 8 - 64 bits.
* floats can be 16, 32, or 64 bits (128 bits on some platforms)

[Full list of supported types](https://docs.scipy.org/doc/numpy/user/basics.types.html)

Looking only at the types, not their bit-lengths, we have:

![NumPy dtype hierarchy](img/dtype-hierarchy.png)

## More Array `dtype` Examples

In [None]:
arr1 = np.ones(10, dtype=np.float128)
arr1

In [None]:
arr2 = np.arange(0, 500, 50, dtype=np.uint8)
arr2

In [None]:
print("Array dtypes: %s; %s" % (arr1.dtype, arr2.dtype))

## Short exercise (array creation with typing)

Try to create an 3-D array containing all the odd numbers from 17 to 99. But this time try to create the numbers as complex numbers (with zero imaginary component), using extended precision components (or double precision if your platform does not support extended 128-bit precision)

In [None]:
# Complete this code
arr_odd_ext_complex_3d = ...

# Reshaping arrays

A very powerful feature of NumPy arrays is that they can change shape without changing the memory block where they live.  Treating an array as a different shape—including perhaps a different number of dimensions—does not require any copying of data.

In [None]:
arr_3d = np.array([[[1, 2, 3, 4],
                    [5, 6, 7, 8],
                    [9, 8, 7, 6]],
                   [[-1, -2, -3, -4],
                    [-5, -6, -7, -8],
                    [-0, -3, -6, -9]]])
print(arr_3d)

In [None]:
print("Shape:", arr_3d.shape)
print("Size:", arr_3d.size)
print("Dims:", arr_3d.ndim)

We can reshape the array and bind the new *view* of it to another name.

In [None]:
arr_2d = arr_3d.reshape(4, 6)
print(arr_2d)

In [None]:
print("Shape:", arr_2d.shape)
print("Size:", arr_2d.size)
print("Dims:", arr_2d.ndim)

In [None]:
arr_1d = arr_3d.reshape(24)
print(arr_1d)

In [None]:
print("Shape:", arr_1d.shape)
print("Size:", arr_1d.size)
print("Dims:", arr_1d.ndim)

## Views of the same data

Something very interesting happened when we reshaped our array.  It becomes more evident when we modify values using one of the names.  Below we also see the special kind of indexing that NumPy uses, with commas separating the offsets in each dimension.

In [None]:
arr_1d[10] = 999
print("1D\n", arr_1d)
print('----------')
print("2D\n", arr_2d)
print('----------')
print("3D\n", arr_3d)

Let us change data in a different view also:

In [None]:
arr_3d[1, 1, 3] = 777
arr_3d[0, 2, 1] = 444
print("1D\n", arr_1d)
print('----------')
print("2D\n", arr_2d)
print('----------')
print("3D\n", arr_3d)

Notice that we can also *retrieve* values with a comma-separated index, not only set it.

In [None]:
arr_3d[1, 1, 3]

In [None]:
arr_3d[0, 0, 0]

In [None]:
arr_2d[1, 4]

## Short exercise (array reshaping)

* First try to create an 3-D array containing all the odd numbers from 17 to 99. 
* Next, try viewing the array as having 6 rows and 7 columns.
* Change each of the "corners" of this 2-D array to contain the value 0 instead of what it had before.
* Answer two questions:
  * In the 1-D view of the array, what are the offsets of the zeros?
  * In the 3-D view of the array, what are the offsets of the zeros?
  * Verify your answers by displaying the scalar at each offset you indicated.

In [None]:
# Create the array...

# View as 2-D...

# Change the values in the corners...

# Determine the offsets into 1-D view...

# Determine the offsets into 3-D view

# Ways of creating arrays

In above examples, we created NumPy arrays using several different construction functions.  Let us review these and describe a number of additional functions.

## Zeros and ones

The functions `np.zeros()` and `np.ones()` create arrays with those respective values for all elements.  Such an array is a starting point, and we will typically modify selections of values in various ways thereafter.

In [None]:
# Specify shape and dtype on construction
np.ones(shape=(3, 7), dtype=np.float32)

In [None]:
np.zeros(shape=(2, 2, 2, 2), dtype=np.int8)

## Empty array

`np.empty()` is extremely fast because it simply requests some amount of memory from the operating system and then *does nothing with it*.  Thus, the array returned by `np.empty()` is *uninitialized*. `np.empty()` is useful if you know you are going to fill up all the elements of your array later, but use with caution.

In [None]:
# DANGER!  uninitialized array 
# (re-run this cell and you will very likely see different values)
np.empty(shape=(15,3), dtype=int)

## Value ranges

`np.arange()` generates sequences of numbers like Python's `range()` built-in.  Non-integer step values may lead to unexpected results; for these cases, you may prefer `np.linspace()`.

  * a single value is a stopping point
  * two values are a starting point and a stopping point
  * three values are a start, a stop, and a step size

As with `range`, the ending point is *not* included.

In [None]:
print("int arg:", np.arange(10))     # cf. range(stop)
print("float arg:", np.arange(10.0)) # cf. range(stop)
print("step:", np.arange(0, 12, 2))  # end point excluded
print("neg. step:", np.arange(10, 0, -1.0))
print("small step:", np.arange(1, 3, 0.3333)) 

Notice how we don't get to "end" nor include exactly 2 in the last `arange`.

## Ranges with closed interval

In contrast to `np.arange()`, `np.linspace()` and `np.logspace()` allow you to explicitly include both ends of a range. This is difficult or impossible to match exactly with `np.arange()` given floating point rounding issues.  `np.logspace()` is similar to `np.linspace()` but gives points arranged on log scale.

In [None]:
# Balanced distribution of round-up and round-down of steps
print("0-10 in 4 steps:\n", np.linspace(0, 10, 4))
print("0-10 in 20 steps:\n", np.linspace(0, 10, 20))

In [None]:
from math import e as ℯ
print("Powers of 10:", np.logspace(0, 3, 4))
print("Natural log scale:", np.logspace(0, 3, 4, base=ℯ))

We will look at graphing and calculations in other lessons in this course, but often ranges with defined ends are useful for displaying plots.

In [None]:
%matplotlib inline
from math import pi as π
from numpy import sin, cos
import matplotlib.pyplot as plt

x = np.linspace(-π, π, 100)
y = sin(x) + cos(2*x)
plt.plot(x, y);

## Short exercise (creating ranges)

* Create an array of 400 64-bit floating point values between 0 and 2π (e.g. radians, perhaps).
* Create an array where each row represents a different quadrant of a full 360° rotation.  Each row should contain 100 data points as radians.
* Extra Credit: Create an array where each *column* represents a different quadrant of a full 360° rotation.

In [None]:
# Array of radians
arr_rad = ...

# 2-D array of quadrants
arr_quads = ...

## Diagonal arrays:  `np.eye` and `np.diag`

`np.eye(N)` produces an array with shape (N,N) and ones on the diagonal (an N×N identity matrix). `np.diag()` produces a diagonal 2D array from an array argument (or any iterable). `np.diag()` is its own inverse as well.

In [None]:
print("Identity matrix:")
print(np.eye(3))

In [None]:
print("Diagonal matrix:")
print(np.diag([3, 2, 1]))

In [None]:
# diagonal of an identity matrix ...
np.diag(np.eye(3))

## Arrays from Random Distributions

It is common to create arrays whose elements are samples from a random distribution.  For the many options, see:

  * help(np.random) 
  * [NumPy Random](https://docs.scipy.org/doc/numpy/reference/random/index.html)

In [None]:
print("Uniform on [0,1):")
np.random.random((2, 5))

In [None]:
print("standard normal:")
print(np.random.standard_normal((2, 5))) # call with tuple
print("randn (equiv):")
print(np.random.randn(2, 5)) # argument per dimension

In [None]:
print("Uniform ints on [0,5) - upper open:")
print(np.random.randint(0, 5, (2, 5)))

:::{note}
The general recommendation is that new Numpy code should use the appropriate methods of a [Generator](https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator) instance instead.
:::

## Short exercise (random arrays)

* Assuming colors are represented by successive positive integers, create a 3×3×3 cube where each segment is a random color.
* Create a diagonal array on a 5×5 matrix where each element of the diagonal is a random "color."
* Extra Credit: Create an array of the distinct numbers from 1 to 100 in random order
* Extra Credit: Draw a 20×20 array of values from a Poisson distribution.

In [None]:
# Colors
RED, GREEN, BLUE = 1, 2, 3

# Cube of colors
arr_colors = ...

# Diagonal of colors
arr_diag = ...

---

Materials licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by the authors