# Practice with NumPy

Author: Greg Wray  
2025-MAR-22 

In [None]:
import numpy as np
import sys
from datetime import date
import math
import random

## NumPy

**NumPy** is a library of data structures and functions for working arrays and matrices in Python. It is widely used in science and engineering. NumPy is a foundational library, not only used on its own, but required for many other libraries. 

NumPy provides a large set of functions and methods written in C that provide improved performance over base Python functions. This includes a full set of math, logic, and statistics functions, as well as functions specific to working with arrays and linear algebra. 

Access there official NumPy documentation [here](https://numpy.org/doc/stable/index.html).

## Introducing ndarrays   

The fundamental data structure provided by NumPy is the **ndarray**, which stands for n-dimensional array. As the name implies, ndarrays can be created in any number of dimensions: vector (1-dimensional), array (2-dimensional), matrix (3-dimensional), and so forth. 

In contrast to base Python iterables, ndarrays have a fixed size that is specified at creation.  

An ndarray holds a single data type; it can be any valid Python data type or object, although most commonly numeric and Boolean. While ndarrays can hold strings and even data structures, this is not their primary intended purpose.

Importantly, ndarrays are optimized for **speed** and for **vectorized** operations. 
   
To illustrate the performance difference, we can use `timeit`, a Jupyter "magic" that measures the time required execute a Python statement. Note that the ndarray is processed 1-2 orders of magnitude faster. Try some other operations for comparison. 

In [None]:
# create a big list and a big ndarray
my_list = list(range(1_000_000))
arr_A = np.arange(1_000_000)

In [None]:
# apply a function to each item and time the operation
%timeit my_list_2 = [x * 2.333 for x in my_list]

In [None]:
# apply a function to each item and time the operation
%timeit arr_A_2 = arr_A * 2.333

To illustrate vectorized operations, we can create another array, then carry out arithmetic and matrix operations directly on the array object without needing to loop through each element. This also makes code *much* easier to understand. 

In [None]:
# create an array (syntax will be explained later)
arr_B = np.arange(0, 100, 5).reshape(4,5)
arr_B

In [None]:
# multiply an array by a scalar
arr_B * 7

In [None]:
# add arrays
arr_B + arr_B

In [None]:
# calculate the square root of every item in an array
np.sqrt(arr_B)

In [None]:
# transpose an array
np.transpose(arr_B)

## Creating ndarrays

When working with tabular data, creating a data frame from scratch is uncommon. In contrast, it is often useful to create an array from scratch. NumPy provides a set of useful methods that allow you to quickly generate arrays and populate them with specified values. The following examples illustrate the most common ways to generate ndarrays; refer to the NumPy documentation for other approaches.

**Create an ndarray from an existing iterable.** The general-purpose method for creating ndarrays is `np.array()`. You can pass any iterable to it, but lists are the most common. (With dictionaries, you first need to convert to a list or else it will treat the entire dictionary as a single element in the array.) Nested iterables are interpreted as arrays with the number of dimensions corresponding to the depth of nesting. 

In [None]:
# generate some lists
simple_list = [22, -1, 5, 17, 9, 24, 31, 99, -42]
nested_list = [[1, 8, 101, 50], [5, 1, 19, 44], [72, -4, 1, -30], [50, 37, -61, 1]]

In [None]:
# create a 1D ndarray from flat list
np.array(simple_list)

In [None]:
# create a 2D ndarray from nested list
np.array(nested_list)

**Create an ndarray filled with a specified value.** NumPy provides functions for creating specific kinds of arrays. The first argument in all cases is the *shape* of the array: pass the same number of integers as dimensions, with values corresponding to length of each dimension. For more than 1 dimension, pass a tuple.

In [None]:
# create a 1D ndarray filled with zeros
np.zeros(16)

In [None]:
# create a 2D ndarray filled with zeros
np.zeros((4, 4))

In [None]:
# create a 3D ndarray filled with ones
np.ones((2, 3, 4))

In [None]:
# create an nD ndarray filled with a single value
np.full((4, 5), 42)

In [None]:
# create a 2D array filled with random numbers
np.random.rand(6, 4) 

In [None]:
# create a 2D ndarray of 0s with 1s on the diagonal (identity matrix); .eye() is a synonym
#   only needs 1 value, since the identify matrix is a square
np.identity(8)

In [None]:
# create a 2D ndarray filled with a single number except with 1s on the diagonal
#   first create the identity arrray as above, then filter and reassign 0 values
arr_C = np.identity(8)
arr_C[arr_C == 0] = 5
arr_C

**Create an ndarray by generating a sequence.** Use the `arange()` function to generate a sequence of numbers while creating an ndarray. This function is analogous to the `range()` function of base Python.

In [None]:
# generate a 1D ndarray from a sequence
np.arange(8)

In [None]:
# arange() takes the same optional arguments as range()
np.arange(0, 100, 5)

In [None]:
# use .reshape to make a 2D ndarray
np.arange(0, 100, 5).reshape(4, 5)

## Data types

NumPy provides a much larger set of atomic data types than base Python. Available data types are essentially the ones available in C, since much of NumPy is written in C. For numerics, you have control over not only integer, float, and complex types, but also amount of memory per item and whether you need negative values. In general, chose the smallest and simplest data type that can hold your data; this can result in signifcant savings in memory and compute time.   

Examples of NumPy data types are: `float64` (64-bit float), `uint8` (unsigned 8-bit integer), and `int32` (signed 32-bit integer). To use these data types, prefix with `np.`. You can also use generic names `int` and `float` if you are not concerned with memory and performance. Strings are a special case and won't be covered here. See the NumPy documentation for a full list of available data types and their names. 

**Specify data type at creation.** Use the optional `dtype` argument with most ndarray generator functions to indicate the data type you would like.   

In [None]:
# generate a 2D ndarray of small unsigned integers
np.arange(0, 100, 5, dtype=np.uint8).reshape(4,5)

In [None]:
# generate a 2D ndarray of medium floats
np.arange(0, 100, 5, dtype=np.float16).reshape(4,5)

In [None]:
# generate a 2D ndarray of booleans; the easiest way is using full()
np.full((4, 5), False)

**Convert between data types.** Use the `.astype()` method to convert an existing ndarray to a different data type. You can check on the data type of an existing array by calling the `.dtype` attribute.

In [None]:
# check data type
arr_C = np.identity(8)
arr_C.dtype

In [None]:
# convert data type
arr_C.astype(np.int32)

## Input / output with arrays

The following examples illustrate how to move values into and out of npdarrrays. 

In [None]:
# return a standard Python list with nesting corresponding to array dimensions
#    See "Create an ndarray from an existing iterable" (above) for  the opposite operation
arr_D = np.arange(10).reshape(2, 5)
print(arr_D, '\n')
list_D = arr_D.tolist()
print(list_D)
type(list_D)

In [None]:
# read from a file directly into an ndarray
np.genfromtxt('small.txt', delimiter=',')

In [None]:
# save an ndarray as a .csv file
arr_E = np.arange(0, 20, dtype=np.uint32).reshape(4, 5)
np.savetxt('output.csv', arr_E, delimiter=',')
arr_E


## Retrieving information about an ndarray

**Attributes.** NumPy provides several attributes and functions to retrieve information about an ndarray. The following are some of the more useful ones; consult the NumPy documentation for a full list and more information. These attributes and functions are particularly useful when you will be changing the size, shape, or dimensions of an array within a program. Some kinds of information can be returned from either a function or an attribute, because attributes can be chained while functions are more useful for functional programming. Remember that attributes should not be followed by `()`.

In [None]:
# create an array to work with
arr_F = np.arange(0, 400, 5).reshape(4, 4, 5)
print(arr_F)

In [None]:
# retrieve data type
type(arr_F)

In [None]:
# retrieve the number of dimensions; attribute
arr_F.ndim

In [None]:
# retrieve the length of each axis; attribute
arr_F.shape

In [None]:
# retrieve the total number of elements; attribute
arr_F.size

In [None]:
# retrieve the amount of memory that the array occupies; attribute
arr_F.nbytes

In [None]:
# retrieve the number of dimensions; function
np.ndim(arr_F)

In [None]:
# retrieve the number of dimensions; function
np.shape(arr_F)

In [None]:
# retrieve the total number of elements; function
np.size(arr_F)

In [None]:
# retrieve the number of elements on axis 2; function
np.size(arr_F, axis=2)

**Contents by condition.** To find out if an array contains values that meet a given condition, use the NumPy versions of `any()` and `all()`. These can be applied to a single axis with an optional argument; in this case, NumPy returns a list of values for each column or row (depending on which axis is requested). Note that compound conditions use `&` and `|` for AND and OR; to negate use `~`.

In [None]:
# first, create an array to use
arr_G = np.arange(0, 100, 5).reshape(4,5)
print(arr_G)

In [None]:
# test whether any values in the array are larger than 50
np.any(arr_G > 70)

In [None]:
# test whether all values in the array are positive
np.all(arr_G > 0)

In [None]:
# test whether all values in each column are larger than 10
np.all(arr_G > 10, axis=0)

In [None]:
# test whether any values in the array meet a compound condition
np.any((arr_G > 10) & (arr_G < 70))

## Indexing ndarrays
Use square bracket indexing to access values in ndarrays. This works similarly to plain Python: indexing is zero-based and accepts negative values as well as slices, including open slices. You can pass up to the same number of arguments as array dimensions; fewer arguments than dimensions will retrieve sub-arrays (see examples below). Arguments are separated by commas; each argument can have 1, 2, or 3 values separated by colons (start; start:stop; and start:stop:step).   

When mentally mapping indexes onto an ndarray, a good rule of thumb is to work your way from outer to inner representation. In a 2-dimensional array, the outer dimension (axis = 0, first argument) refers to rows and the inner dimension (axis = 1, second argument) refers to columns. In a 3-dimensional array, the outer dimension (axis = 0) refers to "planes", the intermediate (axis = 1) refers to rows, and the inner (axis = 2) refers to columns. Another way to think about this is that internal columns are always the last dimension, and internal rows are always the second-to-last. If you require 4- and higher dimensional arrays (tensors), the same basic principles hold, but are beyond the scope of this notebook.

In [None]:
# create 1-dimensional array
arr_H = np.arange(10)
print(arr_H)

In [None]:
# index a 1-dimensional array
arr_H[3]

In [None]:
# create 2-dimensional array
arr_I = np.arange(36).reshape(6, 6)
print(arr_I)

In [None]:
# index a 2-dimensional array; the first value refers to rows, the second to columns
arr_I[3, 4]

In [None]:
# create a 3-dimensional array
arr_J = np.arange(0, 400, 5).reshape(4,4,5)
print(arr_J)

In [None]:
# index a 3-dimensional array
#    the first value refers to the separated layers, the second to rows, and the third to columns
arr_J[3, 2, 1]

**Subsetting in multiple dimensions.** Indexing a multidimensional array with the *same number of indices as dimensions* returns a single element. For instance, `my_arr[3,2,1]` returns the value of a single element if `my_arr` has 3 dimensions. To retrieve a single element from a 2-dimesional array requires 2 indices. And so forth.

Indexing a multidimensional array with *fewer indices than dimensions* returns a subdimensional array. For instance, `my_arr[2,1]` returns a 2-dimensional array if `my_arr` has three dimensions.

In [None]:
# create a 3-dimensional array to work with
arr_J = np.arange(0, 400, 5).reshape(4,4,5)
print(arr_J)

In [None]:
# retrieve a single element from a 3-dimensional array
arr_J[3,2,1]

In [None]:
# retrieve a 1-dimensional array from a 3-dimensional array
arr_J[3,2]

In [None]:
# retrieve a 2-dimensional array from a 3-dimensional array
arr_J[3]

**Indexing with slices.** Slices provide a powerful and elegant way to access multiple values within arrays. Getting comfortable with slicing will let you efficiently index rows and columns of 2-dimensional arrays, and planes, rows, and columns in any dimension of 3-dimensional arrays. Practice indexing with slices on 1- and 2- dimensional arrays first, as it quickly gets complicated with 3- and higher-dimensional arrays. With a 2-dimensional array, the first argument refers to rows and second to columns.

In [None]:
# create a 2-dimensional array to work with
arr_K = arr_J[0]
print(arr_K)

In [None]:
# retrieve the first column (axis 1)
arr_K[:,0]

In [None]:
# retrieve the fourth row (axis 0)
arr_K[3,:]

In [None]:
# retrieve a smaller 2-dimensional array
arr_K[:3,:3]

In [None]:
# negative indices are useful for retrieving the highest-index values
arr_K[-2:,-2:]

In [None]:
# retrieve every other column 
arr_K[:,::2]

In [None]:
# retrieve an array with the order of rows, but not columns, reversed
arr_K[::-1,:]

When indexing a 3-dimensional array, work your way from the outermost to innnermost brackets.

In [None]:
# create an array to work with
arr_J = np.arange(0, 400, 5).reshape(4,4,5)
print(arr_J)

In [None]:
# retrieve the value of a column from the third dimension of a 3-dimensional array
arr_J[1,:,2]

In [None]:
# retrieve the value of a row from the second dimension of a 3-dimensional array
arr_J[1,3,:]

**Fancy indexing.** It is also possible to index an ndarray using a list of integers. NumPy calls this *fancy indexing*. This approach is useful when you want to extract discontinuous or out-of-order items, rows, or columms. Fancy indexing always returns a new data object, not a view of the original. The examples below use 2-dimensional arrays, but fancy indexing works with any number of dimensions (if you can wrap your head around how to do it). Note that indexing in the nested array is by a list of values in dimension zero, followed by a list of values in dimension 1; do not pass separate pairs of values for each item you want back. 

In [None]:
# create an array to work with
arr_L = np.arange(32).reshape((8, 4))
print(arr_L)

In [None]:
# retrieve a discontinuous set of rows using fancy indexing
#   note that we are passing a list to square bracket indexing
arr_L[[3,0,5], :]

In [None]:
# retrieve a discontinuous set of columns using fancy indexing
#   note that we are passing a list to the second argument of square bracket indexing
#   note that we need to pass :, before the list to indicate that the list applies to axis=1
arr_L[:,[2, 0]]

In [None]:
# retrieve a discontinuous set of items using fancy indexing
#   note that we are passing a nested list; the two sub-lists must be the same length
arr_L[[0,6,2], [2,0,1]]

**Boolean indexing.** Recall that Boolean indexing involves first making a "mask": a data object that containts True/False values for each item in the object you want to filter. Any valid condition can be used to make the mask. Boolean indexing is particularly useful when you want to perform the same filtering step more than once. Like fancy indexing, Boolean indexing with ndarrays always returns a new object. 

In [None]:
# create an array
simple_list = [21, -1, 5, 65, 17, 9, 24, 31, 99, -44, 13, -4, 29, 82, 36]
arr_M = np.array(simple_list).reshape(5, 3)
print(arr_M)

In [None]:
# create a mask based on whether value in first column is divisible by 3
div_3 = arr_M[:, 0] % 3 == 0
div_3

In [None]:
# alternatively, create a mask using where()
div_3 = np.where(arr_M[:, 0] % 3 == 0, True, False)
div_3

In [None]:
# apply the mask to retrieve values in second and third columns based on True/False in first column
arr_M[div_3,1:]

## Subsetting and copying ndarrays
**Views and copies.** When assigning subsets of ndarrays, values are typically *not* copied unless specifically requested. This saves memory and compute time. However, you need to be aware that modifications to a slice or subset will update the source array. Use the `copy()` function if you want to create a distinct object in memory. Fancy indexing and Boolean idexing are exceptions: they always create a copy by default. NumPy documentation distinguishes between *views* and *copies* to make it clear which kind of behavior to expect.

In [None]:
# updating a view updates the source array
arr_N = np.arange(0, 6)		
my_slice = arr_N[:3]
my_slice[2] = 42	
arr_N[2]					# returns 42, not 2!

In [None]:
#  updating a copy of a view does not update the source array
arr_N = np.arange(0, 6)		
my_slice = arr_N[:3].copy()
my_slice[2] = 42	
arr_N[2]					# returns 2

**Subsetting by condition.** You can retrive values meeting any valid logical expression. Note that filtering returns a 1-dimenstional array, regardless of the number of dimensions in the input array. 

In [None]:
# first, create an array
arr_P = np.random.rand(6, 4)
print(arr_P)

In [None]:
# filter for values greater than or equal to 0.75
arr_P[arr_P >= 0.85]

If you want the indices rather than the values, use `where()`. This will return one ndarray of indexes for each dimension of the ndarray you pass to the function. If there is more than one dimension, the returned arrays are wrapped in a tuple. For example, applying `where()` to a 2-dimensional array will return a tuple containing two ndarrays: the first with values in axis 0 and the second with values in axis 1. Pair the first value in the first array with the first value in the second array to get the coordinates of the first item, and so forth.

In [None]:
# filter for indexes of values greater than or equal to 0.75
np.where(arr_P >= 0.85)

## Updating subsets of an ndarray.

In all of the examples below, the original ndarray is updated in-place. 

**Update with a single value.** To update a subset of items, provide an index and a new value. This works for single items or slices. To update all items, use the `.fill()` method.

In [None]:
# create an array to work with
arr_Q = np.zeros((4, 5), dtype=int)
print(arr_Q)

In [None]:
# replace a single value in a 2-dimensional array
arr_Q[2, 2] = 999
print(arr_Q)

In [None]:
# replace a 2-dimensional slice of values in a 2-dimensional array with a scalar
arr_Q[1:3, 1:3] = 999
print(arr_Q)

In [None]:
# replace all values in an ndarray with a single new value
arr_Q.fill(42)
print(arr_Q)

**Update from an iterable.** You can update values in all or a specified subset of items an ndarry using `copyto()` and providing an iterable containing the new values. The examples below update from a flat list, but you can use  nested iterables to update in more than 1 dimension. Make sure the size and dimensions of the indexing and iterable are the same!

In [None]:
# create an array to work with
arr_Q = np.zeros((4, 5), dtype=int)
print(arr_Q)

In [None]:
# create values to use for updating
new_row = [-1, -2, -3, -4, -5] 
new_col = (-10, -20, -30, -40)
new_area = [[222, 333], [444, 555]]

In [None]:
# replace values in second row 
np.copyto(arr_Q[1, :], new_row) 
print(arr_Q)

In [None]:
# replace values in third column 
np.copyto(arr_Q[:,2], new_col) 
print(arr_Q)

In [None]:
# replace a 2-dimensional slice from a nested iterable
arr_Q[1:3, 1:3] = new_area
print(arr_Q)

**Update based on condition.** Use any logical expression and supply a new value. Use `&` and `|` as operators to create compound conditions (note: differen from base Python, which uses `and` and `or`). Use `~` to negate any expression or sub-expression.

In [None]:
# create an array to work with
arr_R = np.arange(0, 100, 5).reshape(4,5)
print(arr_R)

In [None]:
# replace all values divisble by 3 with 999
arr_R[arr_R % 3 == 0] = 999
print(arr_R)

In [None]:
# replace all values divisble by 3 and less than 50 with 999
arr_R[(arr_R % 3 == 0) & (arr_R < 20)] = 999
print(arr_R)

## Math and functions with ndarrays

The standard arithmetic and logical operators can be applied to Ndarrays for vectorized (implicit looping) operations. Numpy also provides a large number of functions optimized to work with ndarrays called "universal functions" or "ufuncs" for short. They provide vectorized operations, avoiding the need for loops. Ufuncs are written in C and are often much faster than applying standard Python functions. Some examples are shown below. See the NumPy documentation for a full list of ufuncs. When working with ndarrays, be sure to specify the `np.` version, as the names are often identical to standard Python functions.

In [None]:
# create 2 arrays to work with
arr_S = np.arange(1, 101, 5).reshape(4,5)
arr_T = np.arange(1, 61, 3).reshape(4,5)
print(arr_S, '\n')
print(arr_T)

In [None]:
# return the product of every item multiplied by a constant
arr_S * 33.3

In [None]:
# return the square root of every item
np.sqrt(arr_S)

In [None]:
# return boolean values by applying a condition to every item
arr_S >= 50    

In [None]:
# return the sum of two arrays
arr_S + arr_T

In [None]:
# return the sum of two arrays divided by the sin of the first
(arr_S + arr_T) / np.sin(arr_S)

In [None]:
# return boolean values based on a condition comparing two arrays
(arr_S - 5) > arr_T

In [None]:
# return boolean values based on a condition comparing two arrays
(arr_S + arr_T <= 200) & ((arr_S + arr_T) % 3 == 0)

**Math updates in-place.** For simple arithmetic, you can you any of the augmentated assignment operators.

In [None]:
# Multiply all values by 3 in-place 
arr_S *= 3
arr_S

In [None]:
# Divide all values by 3 in-place
arr_T /= 3       # type error because NumPy is written in C! (this statement mixes data types)

## Summary statistics
Many of the summary statistics below can be called as functions or methods using the same name. Many can be applied to the entire array or just one dimension of a multi-dimensional array.

In [None]:
# first, create an array to work with
nested_list = [[1, 8, 101, 50], [5, 1, 19, 44], [72, -4, 1, -30], [50, 37, -61, 1]]
arr_U = np.array(nested_list)
print(arr_U)

In [None]:
# return the mean value from an ndarray using a function
np.mean(arr_U)

In [None]:
# return the mean value from an ndarray using a method
arr_U.mean()

In [None]:
# return the sum of all items in an ndarray
arr_U.sum()

In [None]:
# return the sums of rows
arr_U.sum(axis=0)

In [None]:
# return the largest value in an array
arr_U.max()

In [None]:
# return the largest value in each row
arr_U.max(axis=1)

In [None]:
# return the variance of an array
arr_U.var()

In [None]:
# return the standard deviation of columns
arr_U.std(axis=0)

## Sorting

NumPy provides a lot flexibility in sorting procedures and options. The following examples cover the basics, but you can control sorting in more granular ways if needed.

In [None]:
# first, create an array to work with
nested_list = [[1, 8, 101, 50], [5, 1, 19, 44], [72, -4, 1, -30], [50, 37, -61, 1]]
arr_U = np.array(nested_list)
print(arr_U)

In [None]:
# sort an array along a single axis; modifies the array in-place
#    default is to sort along the last axis (1 for a 2-dimensional array)
arr_U.sort()
print(arr_U)

In [None]:
# sort an array along the first axis; modifies the array in-place
arr_U.sort(axis=0)
print(arr_U)

In [None]:
# to make a copy and preserve the original, use the function rather than the method
sorted_arr = np.sort(arr_U)
print(arr_U)

In [None]:
# the original is unmodified
print(arr_U)

In [None]:
# return items sorted across the entire array (flattens into a 1-dimensional array)
np.sort(arr_U, axis=None)

In [None]:
# return a sorted list of unique values
np.unique(arr_U)

## Matrix operations

NumPy provides a rich set of functions and methods for matrix manipulations and linear algebra. The examples below illustrate some of the most common operations, but there are many more. 

**Reorganizing arrays.** Reorganizing involves changing the shape or dimensions of an array, or stacking (merging) two arrays. When stacking arrays, the joining edges must be of equal length. Any change to the dimensions or size of an array requires creating a new array, so assign output to save the result. A corollary is that none of these operations affect the original array(s).

In [None]:
# first, create an array to work with
nested_list = [[1, 8, 101, 50], [72, -4, 1, -30], [50, 37, -61, 1]]
arr_U = np.array(nested_list)
print(arr_U)

In [None]:
# returned a reshaped array
#    note that the number of items must match the new dimensions
arr_U.reshape(2, 6)

In [None]:
# return the transpose of the array
np.transpose(arr_U)

In [None]:
# synonym for getting the transpose of the array
arr_U.T

In [None]:
# return a flattened version of the array in 1 dimension
#    follows row order
arr_U.flatten()

In [None]:
# vertically stack arrays
arr_V1 = np.array([1, 2, 3, 4])
arr_V2 = np.array([5, 6, 7, 8])
np.vstack((arr_V1, arr_V2))

In [None]:
# horizontally stack arrays
arr_H1 = np.ones((2, 4))
arr_H2 = np.zeros((2, 2))
np.hstack((arr_H1, arr_H2))

**Matrix multiplication.** 

In [None]:
# construct 2 matrices to multiply
arr_W = np.ones((2,3))
arr_X = np.full((3,2), 4)
print(arr_W, '\n')
print(arr_X)

In [None]:
# do matrix multiplication
np.matmul(arr_W, arr_X)

**Determinant of a matrix.**

In [None]:
# find the derminant
arr_Y = np.random.rand(6, 6)
print(arr_Y)
np.linalg.det(arr_Y)

In [None]:
# sanity check: the identify matrix has a determiant of 1 regardless of matrix size
arr_Z = np.identity(7)
print(arr_Z)
np.linalg.det(arr_Z)