# Numpy

- NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python.
- Many computational packages providing scientific functionality use NumPy's array objects as one of the standard interface _lingua francas_ for data exchange.

### some of the things you'll find in NumPy:

- ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible _broadcasting_ capabilities
    
- Mathematical functions for fast operations on entire arrays of data without having to write loops
    
- Tools for reading/writing array data to disk and working with memory-mapped files
    
- Linear algebra, random number generation, and Fourier transform capabilities
    
- A C API for connecting NumPy with libraries written in C, C++, or FORTRAN


One of the reasons NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. There are a number of reasons for this:

- NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy's library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.
    
- NumPy operations perform complex computations on entire arrays without the need for Python `for` loops, which can be slow for large sequences. NumPy is faster than regular Python code because its C-based algorithms avoid overhead present with regular interpreted Python code.

In [130]:
import numpy as np
my_arr = np.arange(1_000_000)
my_list = list(range(1_000_000))

Now let's multiply each sequence by 2:

In [131]:
%timeit my_arr2 = my_arr * 2

%timeit my_list2 = [x * 2 for x in my_list]

1.87 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
42.3 ms ± 1.89 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.

## 4.1 The NumPy ndarray: A Multidimensional Array Object

- One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python.
- Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.

In [132]:
import numpy as np

data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])

data

array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])

- We can then use this numpy array to perform mathematical operations like scalar

In [133]:
print(data * 10)
print("")
print(data + data)

[[ 15.  -1.  30.]
 [  0. -30.  65.]]

[[ 3.  -0.2  6. ]
 [ 0.  -6.  13. ]]


- The numpy arrays are homogenous, meaning all the elements in the array must be of same type

- Every array has a shape, a tuple indicating the size of each dimension

- and a dtype, an object describing the data type of the array

In [134]:
print(data.shape)
print()
print(data.dtype)

(2, 3)

float64


### Creating ndarrays

**Easiest method**

The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. 

In [135]:
data1 = [6, 7.5, 8, 0, 1]

arr1 = np.array(data1)

arr1

array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [136]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]

arr2 = np.array(data2)

arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Since data2 was a list of lists, the NumPy array arr2 has two dimensions, with shape inferred from the data. We can confirm this by inspecting the ndim and shape attributes:

In [137]:
print(arr2.ndim)
print()
print(arr2.shape)

2

(2, 4)


- Unless explicitly specified (discussed in Data Types for ndarrays), numpy.array tries to infer a good data type for the array that it creates.

- The data type is stored in a special dtype metadata object; for example, in the previous two examples we have:

In [138]:
print(arr1.dtype)
print()
print(arr2.dtype)

float64

int64


**Other methods**

In addition to numpy.array, there are a number of other functions for creating new arrays. As examples, numpy.zeros and numpy.ones create arrays of 0s or 1s, respectively, with a given length or shape. numpy.empty creates an array without initializing its values to any particular value. To create a higher dimensional array with these methods, pass a tuple for the shape:

In [139]:
# np.zeroes method
print(np.zeros(10))
print()
print(np.zeros((3, 6)))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]


In [140]:
# np.ones method
print(np.ones(10))
print()
print(np.ones((3, 6)))

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

[[1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]]


By default, np.ones and np.zeros create arrays with the data type float64, which means each element is a decimal (floating-point) number. If you want integer values, you can specify the dtype parameter, like this:

In [141]:
print(np.ones(10, dtype=int))
print()
print(np.zeros((3, 6), dtype=int))

[1 1 1 1 1 1 1 1 1 1]

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]


In [142]:
# np.empty method
print(np.empty(10))
print()
print(np.empty((2, 3, 6)))

[5.e-324 5.e-324 5.e-324 5.e-324 5.e-324 5.e-324 5.e-324 5.e-324 5.e-324
 5.e-324]

[[[7.07106781 7.06400028 7.05693985 7.04988652 7.05693985 7.06400028]
  [7.06400028 7.05692568 7.04985815 7.04279774 7.04985815 7.05692568]
  [7.05693985 7.04985815 7.04278354 7.03571603 7.04278354 7.04985815]]

 [[7.04988652 7.04279774 7.03571603 7.0286414  7.03571603 7.04279774]
  [7.05693985 7.04985815 7.04278354 7.03571603 7.04278354 7.04985815]
  [7.06400028 7.05692568 7.04985815 7.04279774 7.04985815 7.05692568]]]


**np.empty** returns a new NumPy array of the specified shape and dtype, but does not initialize its values. **The elements will contain whatever values already exist at that memory location**, so the contents are arbitrary and unpredictable (often random-looking numbers). Use np.empty when you plan to fill the array with your own values right after creation.

In [143]:
# np.arange method
# numpy.arange is an array-valued version of the built-in Python range function:

print(np.arange(15))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


## Some important NumPy array creation functions:


| Function         | Description |
|------------------|-------------|
| `array`          | Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a data type or explicitly specifying a data type; copies the input data by default |
| `asarray`        | Convert input to ndarray, but do not copy if the input is already an ndarray |
| `arange`         | Like the built-in range but returns an ndarray instead of a list |
| `ones`, `ones_like`   | Produce an array of all 1s with the given shape and data type; `ones_like` takes another array and produces a ones array of the same shape and data type |
| `zeros`, `zeros_like` | Like ones and ones_like but producing arrays of 0s instead |
| `empty`, `empty_like` | Create new arrays by allocating new memory, but do not populate with any values like ones and zeros |
| `full`, `full_like`   | Produce an array of the given shape and data type with all values set to the indicated "fill value"; `full_like` takes another array and produces a filled array of the same shape and data type |
| `eye`, `identity`     | Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere) |

### Example implementation of each function:

In [144]:
# array
import numpy as np
arr = np.array([1, 2, 3])
print("array:", arr)

# asarray
lst = [4, 5, 6]
arr2 = np.asarray(lst)
print("asarray:", arr2)

# arange
arr3 = np.arange(0, 10, 2)
print("arange:", arr3)

# ones, ones_like
arr4 = np.ones((2, 3))
print("ones:", arr4)
arr5 = np.ones_like(arr)
print("ones_like:", arr5)

# zeros, zeros_like
arr6 = np.zeros((2, 3))
print("zeros:", arr6)
arr7 = np.zeros_like(arr)
print("zeros_like:", arr7)

# empty, empty_like
arr8 = np.empty((2, 3))
print("empty:", arr8)
arr9 = np.empty_like(arr)
print("empty_like:", arr9)

# full, full_like
arr10 = np.full((2, 3), 7)
print("full:", arr10)
arr11 = np.full_like(arr, 9)
print("full_like:", arr11)

# eye, identity
arr12 = np.eye(3)
print("eye:\n", arr12)
arr13 = np.identity(3)
print("identity:\n", arr13)

array: [1 2 3]
asarray: [4 5 6]
arange: [0 2 4 6 8]
ones: [[1. 1. 1.]
 [1. 1. 1.]]
ones_like: [1 1 1]
zeros: [[0. 0. 0.]
 [0. 0. 0.]]
zeros_like: [0 0 0]
empty: [[0. 0. 0.]
 [0. 0. 0.]]
empty_like: [0 0 0]
full: [[7 7 7]
 [7 7 7]]
full_like: [9 9 9]
eye:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
identity:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Data Types for ndarrays:

The data type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data:

In [145]:
arr1 = np.array([1, 2, 3], dtype=np.float64)

arr2 = np.array([1, 2, 3], dtype=np.int32)

print(arr1.dtype)
print(arr2.dtype)

float64
int32


<center><b>NumPy data types:</center></b>

| Type                              | Type code    | Description                                                                                                                        |
|------------------------------------|--------------|------------------------------------------------------------------------------------------------------------------------------------|
| `int8`, `uint8`                   | `i1`, `u1`   | Signed and unsigned 8-bit (1 byte) integer types                                                                                   |
| `int16`, `uint16`                 | `i2`, `u2`   | Signed and unsigned 16-bit integer types                                                                                           |
| `int32`, `uint32`                 | `i4`, `u4`   | Signed and unsigned 32-bit integer types                                                                                           |
| `int64`, `uint64`                 | `i8`, `u8`   | Signed and unsigned 64-bit integer types                                                                                           |
| `float16`                         | `f2`         | Half-precision floating point                                                                                                      |
| `float32`                         | `f4` or `f`  | Standard single-precision floating point; compatible with C float                                                                  |
| `float64`                         | `f8` or `d`  | Standard double-precision floating point; compatible with C double and Python float object                                         |
| `float128`                        | `f16` or `g` | Extended-precision floating point                                                                                                  |
| `complex64`, `complex128`, `complex256` | `c8`, `c16`, `c32` | Complex numbers represented by two 32, 64, or 128 floats, respectively                                                             |
| `bool`                            | `?`          | Boolean type storing True and False values                                                                                         |
| `object`                          | `O`          | Python object type; a value can be any Python object                                                                               |
| `string_`                         | `S`          | Fixed-length ASCII string type (1 byte per character); e.g., to create a string data type with length 10, use `'S10'`              |
| `unicode_`                        | `U`          | Fixed-length Unicode type (number of bytes platform specific); same specification semantics as string_ (e.g., `'U10'`)             |

- Data types are a source of NumPy's flexibility for interacting with data coming from other systems. In most cases they provide a mapping directly onto an underlying disk or memory representation, which makes it possible to read and write binary streams of data to disk and to connect to code written in a low-level language like C or FORTRAN.

- The numerical data types are named the same way: a type name, like float or int, followed by a number indicating the number of bits per element. A standard double-precision floating-point value (what’s used under the hood in Python’s float object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as float64. 

### Type conversion or Type casting

You can explicitly convert or cast an array from one data type to another using ndarray’s astype method:

In [146]:
# integer to float conversion
arr = np.array([1, 2, 3, 4, 5])
print(arr.dtype)
float_arr = arr.astype(np.float64)
float_arr, float_arr.dtype

int64


(array([1., 2., 3., 4., 5.]), dtype('float64'))

In [147]:
# float to integer conversion
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr.dtype)
arr.astype(np.int32)

float64


array([ 3, -1, -2,  0, 12, 10], dtype=int32)

In [148]:
# string to float conversion
numeric_strings = np.array(["1.25", "-9.6", "42"])

str_to_float = numeric_strings.astype(float)

str_to_float, str_to_float.dtype

(array([ 1.25, -9.6 , 42.  ]), dtype('float64'))

There are shorthand type code strings you can also use to refer to a dtype:

In [149]:
zeros_uint32 = np.zeros(8, dtype="u4")
zeros_uint32

array([0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)

**Note** - Calling astype always creates a new array (a copy of the data), even if the new data type is the same as the old data type.

Arithmetic with NumPy Arrays:

Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays apply the operation element-wise:

In [150]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

print(arr)
print()
print(arr * arr)
print()
print(arr - arr)

[[1. 2. 3.]
 [4. 5. 6.]]

[[ 1.  4.  9.]
 [16. 25. 36.]]

[[0. 0. 0.]
 [0. 0. 0.]]


Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [151]:
print(1/arr)
print()
print(arr ** 2)

[[1.         0.5        0.33333333]
 [0.25       0.2        0.16666667]]

[[ 1.  4.  9.]
 [16. 25. 36.]]


Comparisons between arrays of the same size yield Boolean arrays:

In [152]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

print(arr2)
print()
print(arr2 > arr)

[[ 0.  4.  1.]
 [ 7.  2. 12.]]

[[False  True False]
 [ True False  True]]


## Basic Indexing and Slicing

NumPy array indexing is a deep topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:

In [153]:
arr = np.arange(10)

print(arr[5])
print()
print(arr[5:8])

# if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is propagated (or broadcast henceforth) to the entire selection.
arr[5:8] = 12
print()
print(arr)

5

[5 6 7]

[ 0  1  2  3  4 12 12 12  8  9]


In NumPy, when you create a slice of an array (e.g., arr_slice = arr[5:8]), the slice is a view of the original array, not a copy. This means both arr_slice and the original arr share the same data in memory.

So, if you modify an element in the slice, like:

the change will also appear in the original array arr. This is different from Python lists, where slicing creates a new list (a copy).

Why?
This behavior is designed for efficiency—large arrays can be sliced and manipulated without copying data, saving memory and time.

In [154]:
arr_slice = arr[5:8]

arr_slice

array([12, 12, 12])

In [155]:
arr_slice[1] = 12345

arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

In [156]:
# The bare [:] will assign values to all the elements of the array.
arr_slice[:] = 64\
    
print(arr)

[ 0  1  2  3  4 64 64 64  8  9]


**Note**: If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the array—for example, arr[5:8].copy().

In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:

In [157]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

arr2d[2]

array([7, 8, 9])

Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements. So these are equivalent:

In [158]:
print(arr2d[0][2])
print()
print(arr2d[0, 2])  # This is a more efficient way to access the same element   

3

3


In multidimensional arrays, if you omit later indices, the returned object will be a lower dimensional ndarray consisting of all the data along the higher dimensions. So in the 2 × 2 × 3 array arr3d:

In [159]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(arr3d)
print('------')
print(arr3d[0])

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
------
[[1 2 3]
 [4 5 6]]


Both scalar values and arrays can be assigned to arr3d[0]:

In [160]:
old_values = arr3d[0].copy()

arr3d[0] = 42

arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [161]:
arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Similarly, arr3d[1, 0] gives you all of the values whose indices start with (1, 0), forming a one-dimensional array:

In [162]:
print(arr3d[1, 0])
# this expression again is the same as:
print(arr3d[1][0])

[7 8 9]
[7 8 9]


### Indexing with slices


Like one-dimensional objects such as Python lists, ndarrays can be sliced with the familiar syntax:

In [163]:
print(arr)

print(arr[1:6])

[ 0  1  2  3  4 64 64 64  8  9]
[ 1  2  3  4 64]


In [164]:
print(arr2d)
print()
print(arr2d[:2])

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[[1 2 3]
 [4 5 6]]


As you can see, it has sliced along axis 0, the first axis. A slice, therefore, selects a range of elements along an axis. It can be helpful to read the expression arr2d[:2] as "select the first two rows of arr2d."

You can pass multiple slices just like you can pass multiple indexes:

In [165]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

 Note that a colon by itself means to take the entire axis, so you can slice only higher dimensional axes by doing:



In [166]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

assigning to a slice expression assigns to the whole selection:

In [167]:
arr2d[:2, 1:] = 0

arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

### Boolean Indexing

In [168]:
names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
 
data = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2],[-12, -4], [3, 4]])

print(names)
print()
print(data)

['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']

[[  4   7]
 [  0   2]
 [ -5   6]
 [  0   0]
 [  1   2]
 [-12  -4]
 [  3   4]]


Suppose each name corresponds to a row in the data array and we wanted to select all the rows with the corresponding name "Bob". Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing names with the string "Bob" yields a Boolean array:

In [169]:
names == "Bob"

array([ True, False, False,  True, False, False, False])

In [170]:
data[names == "Bob"]

array([[4, 7],
       [0, 0]])

The Boolean array must be of the same length as the array axis it’s indexing. You can even mix and match Boolean arrays with slices or integers

To select everything but "Bob" you can either use != or negate the condition using ~:

In [171]:
print(names != "Bob")
print()
print( ~(names == "Bob"))

[False  True  True False  True  True  True]

[False  True  True False  True  True  True]


In [172]:
data[~(names == "Bob")]

array([[  0,   2],
       [ -5,   6],
       [  1,   2],
       [-12,  -4],
       [  3,   4]])

The ~ operator can be useful when you want to invert a Boolean array referenced by a variable:

In [173]:
cond = names == "Bob"

data[~cond]

array([[  0,   2],
       [ -5,   6],
       [  1,   2],
       [-12,  -4],
       [  3,   4]])

To select two of the three names to combine multiple Boolean conditions, use Boolean arithmetic operators like & (and) and | (or):

In [174]:
mask = (names == "Bob") | (names == "Will")

data[mask]

array([[ 4,  7],
       [-5,  6],
       [ 0,  0],
       [ 1,  2]])

**Note** : Selecting data from an array by Boolean indexing and assigning the result to a new variable always creates a copy of the data, even if the returned array is unchanged.



Setting values with Boolean arrays works by substituting the value or values on the righthand side into the locations where the Boolean array's values are True. To set all of the negative values in data to 0, we need only do:

In [175]:
data[data < 0] = 0

data

array([[4, 7],
       [0, 2],
       [0, 6],
       [0, 0],
       [1, 2],
       [0, 0],
       [3, 4]])

You can also set whole rows or columns using a one-dimensional Boolean array:

In [176]:
data[names != "Joe"] = 7

data

array([[7, 7],
       [0, 2],
       [7, 7],
       [7, 7],
       [7, 7],
       [0, 0],
       [3, 4]])

### Fancy Indexing

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays. Suppose we had an 8 × 4 array:

In [177]:
arr = np.zeros((8, 4))

for i in range(8):
    arr[i] = i
    
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

To select a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order:

In [178]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

Negative indices selects rows from the end:

In [179]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

Passing multiple index arrays does something slightly different; it selects a one-dimensional array of elements corresponding to each tuple of indices:

In [180]:
arr = np.arange(32).reshape((8, 4))

arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [181]:
 arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

Keep in mind that fancy indexing, unlike slicing, always copies the data into a new array when assigning the result to a new variable. If you assign values with fancy indexing, the indexed values will be modified.

### Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything. Arrays have the transpose method and the special T attribute:

In [182]:
arr = np.arange(15).reshape((3, 5))

print(arr)
print()
print(arr.T)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

[[ 0  5 10]
 [ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]]


When doing matrix computations, you may do this very often—for example, when computing the inner matrix product using numpy.dot

In [183]:
arr = np.array([[0, 1, 0], [1, 2, -2], [6, 3, 2], [-1, 0, -1], [1, 0, 1]])

np.dot(arr.T, arr)

array([[39, 20, 12],
       [20, 14,  2],
       [12,  2, 10]])

The @ infix operator is another way to do matrix multiplication:

In [184]:
arr.T @ arr

array([[39, 20, 12],
       [20, 14,  2],
       [12,  2, 10]])

Simple transposing with .T is a special case of swapping axes. ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to rearrange the data:

In [185]:
print(arr)
print()
print(arr.swapaxes(0, 1))

[[ 0  1  0]
 [ 1  2 -2]
 [ 6  3  2]
 [-1  0 -1]
 [ 1  0  1]]

[[ 0  1  6 -1  1]
 [ 1  2  3  0  0]
 [ 0 -2  2 -1  1]]


swapaxes similarly returns a view on the data without making a copy.

## Pseudorandom Number Generation

The numpy.random module supplements the built-in Python random module with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions. For example, you can get a 4 × 4 array of samples from the standard normal distribution using numpy.random.standard_normal:

In [186]:
samples = np.random.standard_normal(size=(4, 4))

samples

array([[ 0.01222641,  0.54827604,  0.14127758, -0.09430783],
       [-0.64970284,  0.02692248,  2.31105568, -1.72049605],
       [-0.66105918,  0.07183933, -0.16595993,  0.40016348],
       [-0.58888324,  0.34265778,  0.8555283 ,  1.72146405]])

Python’s built-in random module, by contrast, samples only one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:

In [187]:
from random import normalvariate

N = 1_000_000

%timeit samples = [normalvariate(0, 1) for _ in range(N)]

%timeit np.random.standard_normal(N)

509 ms ± 6.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.5 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


These random numbers are not truly random (rather, pseudorandom) but instead are generated by a configurable random number generator that determines deterministically what values are created.

Functions like numpy.random.standard_normal use the numpy.random module's default random number generator, but your code can be configured to use an explicit generator:

In [188]:
rng = np.random.default_rng(seed=12345)

data = rng.standard_normal((2, 3))

The seed argument is what determines the initial state of the generator, and the state changes each time the rng object is used to generate data. The generator object rng is also isolated from other code which might use the numpy.random module:

In [189]:
type(rng)

numpy.random._generator.Generator

<center><b>NumPy random number generator methods:</center></b>

| Method            | Description                                                        |
|-------------------|--------------------------------------------------------------------|
| `permutation`     | Return a random permutation of a sequence, or return a permuted range |
| `shuffle`         | Randomly permute a sequence in place                               |
| `uniform`         | Draw samples from a uniform distribution                           |
| `integers`        | Draw random integers from a given low-to-high range                |
| `standard_normal` | Draw samples from a normal distribution with mean 0 and std 1      |
| `binomial`        | Draw samples from a binomial distribution                          |
| `normal`          | Draw samples from a normal (Gaussian) distribution                 |
| `beta`            | Draw samples from a beta distribution                              |
| `chisquare`       | Draw samples from a chi-square distribution                        |
| `gamma`           | Draw samples from a gamma distribution                             |
| `uniform`         | Draw samples from a uniform [0, 1) distribution                    |

## Universal Functions: Fast Element-Wise Array Functions

A universal function, or ufunc, is a function that performs element-wise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

Many ufuncs are simple element-wise transformations, like numpy.sqrt or numpy.exp:

In [190]:
arr = np.arange(10)

print( np.sqrt(arr))
print()
print(np.exp(arr))    

[0.         1.         1.41421356 1.73205081 2.         2.23606798
 2.44948974 2.64575131 2.82842712 3.        ]

[1.00000000e+00 2.71828183e+00 7.38905610e+00 2.00855369e+01
 5.45981500e+01 1.48413159e+02 4.03428793e+02 1.09663316e+03
 2.98095799e+03 8.10308393e+03]


These are referred to as unary ufuncs. Others, such as numpy.add or numpy.maximum, take two arrays (thus, binary ufuncs) and return a single array as the result:

In [191]:
x = rng.standard_normal(8)
y = rng.standard_normal(8)

print(x)
print()
print(y)
np.maximum(x, y)

[-1.3677927   0.6488928   0.36105811 -1.95286306  2.34740965  0.96849691
 -0.75938718  0.90219827]

[-0.46695317 -0.06068952  0.78884434 -1.25666813  0.57585751  1.39897899
  1.32229806 -0.29969852]


array([-0.46695317,  0.6488928 ,  0.78884434, -1.25666813,  2.34740965,
        1.39897899,  1.32229806,  0.90219827])

While not common, a ufunc can return multiple arrays. numpy.modf is one example: a vectorized version of the built-in Python math.modf, it returns the fractional and integral parts of a floating-point array:

In [192]:
arr = rng.standard_normal(7) * 5

remainder, whole_part = np.modf(arr)

print(remainder)
print()
print(whole_part)

[ 0.51459671 -0.10791367 -0.7909463   0.24741966 -0.71800536 -0.40843795
  0.62369966]

[ 4. -8. -0.  2. -6. -0.  8.]


Ufuncs accept an optional out argument that allows them to assign their results into an existing array rather than create a new one:

In [193]:
print(arr)
print()
out = np.zeros_like(arr)

print(np.add(arr, 1))
print()
np.add(arr, 1, out=out)
print(out)

[ 4.51459671 -8.10791367 -0.7909463   2.24741966 -6.71800536 -0.40843795
  8.62369966]

[ 5.51459671 -7.10791367  0.2090537   3.24741966 -5.71800536  0.59156205
  9.62369966]

[ 5.51459671 -7.10791367  0.2090537   3.24741966 -5.71800536  0.59156205
  9.62369966]


<center><b>Some unary universal functions:</center></b>

| Function                                             | Description                                                                                                 |
|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| `abs`, `fabs`                                       | Compute the absolute value element-wise for integer, floating-point, or complex values                      |
| `sqrt`                                              | Compute the square root of each element (equivalent to `arr ** 0.5`)                                       |
| `square`                                            | Compute the square of each element (equivalent to `arr ** 2`)                                              |
| `exp`                                               | Compute the exponent eˣ of each element                                                                     |
| `log`, `log10`, `log2`, `log1p`                     | Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively                          |
| `sign`                                              | Compute the sign of each element: 1 (positive), 0 (zero), or –1 (negative)                                 |
| `ceil`                                              | Compute the ceiling of each element (smallest integer ≥ that number)                                       |
| `floor`                                             | Compute the floor of each element (largest integer ≤ each element)                                         |
| `rint`                                              | Round elements to the nearest integer, preserving the dtype                                                |
| `modf`                                              | Return fractional and integral parts of array as separate arrays                                           |
| `isnan`                                             | Return Boolean array indicating whether each value is NaN (Not a Number)                                   |
| `isfinite`, `isinf`                                | Return Boolean array indicating whether each element is finite (non-inf, non-NaN) or infinite, respectively|
| `cos`, `cosh`, `sin`, `sinh`, `tan`, `tanh`         | Regular and hyperbolic trigonometric functions                                                             |
| `arccos`, `arccosh`, `arcsin`, `arcsinh`, `arctan`, `arctanh` | Inverse trigonometric functions                                                                |
| `logical_not`                                       | Compute truth value of not x element-wise (equivalent to `~arr`)                                           |

=====================================================================================================

**Some binary universal functions:**

| Function                                                                 | Description                                                                                      |
|--------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| `add`                                                                    | Add corresponding elements in arrays                                                             |
| `subtract`                                                               | Subtract elements in second array from first array                                               |
| `multiply`                                                               | Multiply array elements                                                                          |
| `divide`, `floor_divide`                                                 | Divide or floor divide (truncating the remainder)                                                |
| `power`                                                                  | Raise elements in first array to powers indicated in second array                                |
| `maximum`, `fmax`                                                        | Element-wise maximum; `fmax` ignores NaN                                                         |
| `minimum`, `fmin`                                                        | Element-wise minimum; `fmin` ignores NaN                                                         |
| `mod`                                                                    | Element-wise modulus (remainder of division)                                                     |
| `copysign`                                                               | Copy sign of values in second argument to values in first argument                               |
| `greater`, `greater_equal`, `less`, `less_equal`, `equal`, `not_equal`   | Perform element-wise comparison, yielding Boolean array (equivalent to infix operators >, >=, <, <=, ==, !=) |
| `logical_and`                                                            | Compute element-wise truth value of AND (`&`) logical operation                                  |
| `logical_or`                                                             | Compute element-wise truth value of OR (`|`) logical operation                                   |
| `logical_xor`                                                            | Compute element-wise truth value of XOR (`^`) logical operation                                  |

## Array-Oriented Programming with Arrays

- Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of replacing explicit loops with array expressions is referred to by some people as vectorization.

- In general, vectorized array operations will usually be significantly faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations. 

As a simple example, suppose we wished to evaluate the function sqrt(x^2 + y^2) across a regular grid of values. The numpy.meshgrid function takes two one-dimensional arrays and produces two two-dimensional matrices corresponding to all pairs of (x, y) in the two arrays:

In [194]:
points = np.arange(-5, 5, 0.01) # 100 equally spaced points

xs, ys = np.meshgrid(points, points)

ys

array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]],
      shape=(1000, 1000))

Now, evaluating the function is a matter of writing the same expression you would write with two points:

In [195]:
z = np.sqrt(xs ** 2 + ys ** 2)
z

array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
        7.06400028],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       ...,
       [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
        7.04279774],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568]], shape=(1000, 1000))

## Expressing Conditional Logic as Array Operations


The numpy.where function is a vectorized version of the ternary expression x if condition else y. Suppose we had a Boolean array and two arrays of values:

In [196]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])

yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])

cond = np.array([True, False, True, True, False])

Suppose we wanted to take a value from xarr whenever the corresponding value in cond is True, and otherwise take the value from yarr. A list comprehension doing this might look like:

In [197]:
result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)] 
result 

[np.float64(1.1),
 np.float64(2.2),
 np.float64(1.3),
 np.float64(1.4),
 np.float64(2.5)]

This has multiple problems. First, it will not be very fast for large arrays (because all the work is being done in interpreted Python code). Second, it will not work with multidimensional arrays. With numpy.where you can do this with a single function call:

In [198]:
result = np.where(cond, xarr, yarr)
result

array([1.1, 2.2, 1.3, 1.4, 2.5])

The second and third arguments to numpy.where don’t need to be arrays; one or both of them can be scalars. A typical use of where in data analysis is to produce a new array of values based on another array. Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with –2. This is possible to do with numpy.where:

In [199]:
arr = rng.standard_normal((4, 4))
print(arr)
print()
print(np.where(arr > 0, 2, -2))

[[ 2.61815943  0.77736134  0.8286332  -0.95898831]
 [-1.20938829 -1.41229201  0.54154683  0.7519394 ]
 [-0.65876032 -1.22867499  0.25755777  0.31290292]
 [-0.13081169  1.26998312 -0.09296246 -0.06615089]]

[[ 2  2  2 -2]
 [-2 -2  2  2]
 [-2 -2  2  2]
 [-2  2 -2 -2]]


You can combine scalars and arrays when using numpy.where. For example, I can replace all positive values in arr with the constant 2, like so:

In [200]:
np.where(arr > 0, 2, arr) # set only positive values to 2

array([[ 2.        ,  2.        ,  2.        , -0.95898831],
       [-1.20938829, -1.41229201,  2.        ,  2.        ],
       [-0.65876032, -1.22867499,  2.        ,  2.        ],
       [-0.13081169,  2.        , -0.09296246, -0.06615089]])

### Mathematical and Statistical Methods

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class. You can use aggregations (sometimes called reductions) like sum, mean, and std (standard deviation) either by calling the array instance method or using the top-level NumPy function. When you use the NumPy function, like numpy.sum, you have to pass the array you want to aggregate as the first argument.

Here I generate some normally distributed random data and compute some aggregate statistics:

In [201]:
arr = rng.standard_normal((5, 4))

print(arr)
print()
print(arr.mean()) # or np.mean(arr)
print()
print(arr.sum()) 

[[-1.10821447  0.13595685  1.34707776  0.06114402]
 [ 0.0709146   0.43365454  0.27748366  0.53025239]
 [ 0.53672097  0.61835001 -0.79501746  0.30003095]
 [-1.60270159  0.26679883 -1.26162378 -0.07127081]
 [ 0.47404973 -0.41485376  0.0977165  -1.64041784]]

-0.08719744457434529

-1.743948891486906


Functions like mean and sum take an optional axis argument that computes the statistic over the given axis, resulting in an array with one less dimension:

In [202]:
print(arr.mean(axis=1))
print()
print(arr.sum(axis=0))

[ 0.10899104  0.3280763   0.16502112 -0.66719934 -0.37087634]

[-1.62923076  1.03990647 -0.33436331 -0.82026129]


Here, arr.mean(axis=1) means "compute mean across the columns," where arr.sum(axis=0) means "compute sum down the rows."

Aggregate functions (like sum, mean, max) return a single value (or one value per axis) by combining all elements.
cumsum (cumulative sum) and cumprod (cumulative product) are different:
They return an array of the same shape as the input, where each element is the sum (or product) of all previous elements up to that position.
So, instead of a single result, you get all the intermediate results as an array.

In [203]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])

arr.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28])

The expression arr.cumsum(axis=0) computes the cumulative sum along the rows, while arr.cumsum(axis=1) computes the sums along the columns:

In [204]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])

print(arr.cumsum(axis=0))
print()
print(arr.cumsum(axis=1))

[[ 0  1  2]
 [ 3  5  7]
 [ 9 12 15]]

[[ 0  1  3]
 [ 3  7 12]
 [ 6 13 21]]


<center><b>Basic array statistical methods:</center></b>

| Method            | Description                                                                                   |
|-------------------|-----------------------------------------------------------------------------------------------|
| `sum`             | Sum of all the elements in the array or along an axis; zero-length arrays have sum 0          |
| `mean`            | Arithmetic mean; invalid (returns NaN) on zero-length arrays                                  |
| `std`, `var`      | Standard deviation and variance, respectively                                                 |
| `min`, `max`      | Minimum and maximum                                                                           |
| `argmin`, `argmax`| Indices of minimum and maximum elements, respectively                                         |
| `cumsum`          | Cumulative sum of elements starting from 0                                                    |
| `cumprod`         | Cumulative product of elements starting from 1                                                |

### Methods for Boolean Arrays

Boolean values are coerced to 1 (True) and 0 (False) in the preceding methods. Thus, sum is often used as a means of counting True values in a Boolean array:

In [205]:
arr = rng.standard_normal(100)

print((arr > 0).sum()) # Number of positive values
print()
print((arr < 0).sum()) # Number of negative values

48

52


Two additional methods, any and all, are useful especially for Boolean arrays. any tests whether one or more values in an array is True, while all checks if every value is True:

In [206]:
bools = np.array([False, False, True, False])

print(bools.any())  # Check if any value is True
print()
print(bools.all())  # Check if all values are True

True

False


These methods also work with non-Boolean arrays, where nonzero elements are treated as True.

### Sorting

Like Python’s built-in list type, NumPy arrays can be sorted in place with the sort method:

In [207]:
arr = rng.standard_normal(6)

print(arr)
print()
arr.sort()
print(arr)

[ 0.07726066 -0.68391322 -0.72083767  1.12062282 -0.05481416 -0.08241372]

[-0.72083767 -0.68391322 -0.08241372 -0.05481416  0.07726066  1.12062282]


You can sort each one-dimensional section of values in a multidimensional array in place along an axis by passing the axis number to sort. In this example data:

In [208]:
arr = rng.standard_normal((5, 3))

print(arr)
print()

arr.sort(axis=0)# Sort along the first axis (columns)
print(arr)
print()
arr.sort(axis=1)  # Sort along the second axis (rows)
print(arr)

[[ 0.9359865   1.23853712  1.27279553]
 [ 0.40589222 -0.05032522  0.28931754]
 [ 0.17930568  1.39748056  0.29204679]
 [ 0.63840567 -0.02788771  1.37105185]
 [-2.05280763  0.38050908  0.75539067]]

[[-2.05280763 -0.05032522  0.28931754]
 [ 0.17930568 -0.02788771  0.29204679]
 [ 0.40589222  0.38050908  0.75539067]
 [ 0.63840567  1.23853712  1.27279553]
 [ 0.9359865   1.39748056  1.37105185]]

[[-2.05280763 -0.05032522  0.28931754]
 [-0.02788771  0.17930568  0.29204679]
 [ 0.38050908  0.40589222  0.75539067]
 [ 0.63840567  1.23853712  1.27279553]
 [ 0.9359865   1.37105185  1.39748056]]


The top-level method numpy.sort returns a sorted copy of an array (like the Python built-in function sorted) instead of modifying the array in place. 

### Unique and Other Set Logic

NumPy has some basic set operations for one-dimensional ndarrays. A commonly used one is numpy.unique, which returns the sorted unique values in an array:

In [209]:
names = np.array(["Bob", "Will", "Joe", "Bob", "Will", "Joe", "Joe"])

np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [210]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])

np.unique(ints)

array([1, 2, 3, 4])

Another function, numpy.in1d, tests membership of the values in one array in another, returning a Boolean array:

In [211]:
values = np.array([6, 0, 0, 3, 2, 5, 6])

np.in1d(values, [2, 3, 6])

  np.in1d(values, [2, 3, 6])


array([ True, False, False,  True,  True, False,  True])

<center><b>Array set operations:</center></b>

| Method                | Description                                                                                   |
|-----------------------|-----------------------------------------------------------------------------------------------|
| `unique(x)`           | Compute the sorted, unique elements in x                                                      |
| `intersect1d(x, y)`   | Compute the sorted, common elements in x and y                                                |
| `union1d(x, y)`       | Compute the sorted union of elements                                                          |
| `in1d(x, y)`          | Compute a Boolean array indicating whether each element of x is contained in y                |
| `setdiff1d(x, y)`     | Set difference, elements in x that are not in y                                               |
| `setxor1d(x, y)`      | Set symmetric differences; elements that are in either of the arrays, but not both            |


## File Input and Output with Arrays

NumPy is able to save and load data to and from disk in some text or binary formats.

numpy.save and numpy.load are the two workhorse functions for efficiently saving and loading array data on disk. Arrays are saved by default in an uncompressed raw binary format with file extension .npy:

In [212]:
arr = np.arange(10)

np.save("some_array", arr)

If the file path does not already end in .npy, the extension will be appended. The array on disk can then be loaded with numpy.load:

In [213]:
np.load("some_array.npy")

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can save multiple arrays in an uncompressed archive using numpy.savez and passing the arrays as keyword arguments:

In [214]:
np.savez("array_archive.npz", a=arr, b=arr)

When loading an .npz file, you get back a dictionary-like object that loads the individual arrays lazily:

In [215]:
arch = np.load("array_archive.npz")

arch["b"]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If your data compresses well, you may wish to use numpy.savez_compressed instead:

In [216]:
np.savez_compressed("arrays_compressed.npz", a=arr, b=arr)

## Linear Algebra

- Linear algebra operations, like matrix multiplication, decompositions, determinants, and other square matrix math, are an important part of many array libraries.

- Multiplying two two-dimensional arrays with * is an element-wise product, while matrix multiplications require either using the dot function or the @ infix operator. dot is both an array method and a function in the numpy namespace for doing matrix multiplication:

In [217]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])

y = np.array([[6., 23.], [-1, 7], [8, 9]])

print(x)
print()
print(y)
print()
print(x.dot(y))

[[1. 2. 3.]
 [4. 5. 6.]]

[[ 6. 23.]
 [-1.  7.]
 [ 8.  9.]]

[[ 28.  64.]
 [ 67. 181.]]


x.dot(y) is equivalent to np.dot(x, y):/

In [218]:
np.dot(x, y)

array([[ 28.,  64.],
       [ 67., 181.]])

A matrix product between a two-dimensional array and a suitably sized one-dimensional array results in a one-dimensional array:

In [219]:
x @ np.ones(3)

array([ 6., 15.])

numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant:

In [220]:
from numpy.linalg import inv, qr

X = rng.standard_normal((5, 5))

mat = X.T @ X

print(inv(mat))
print()
print(mat @ inv(mat))


[[  3.49932285   2.84436268   3.59557002 -16.55376878   4.47325573]
 [  2.84436268   2.56666253   2.9001963  -13.57742      3.76776505]
 [  3.59557002   2.9001963    4.48232906 -18.34525499   4.70660032]
 [-16.55376878 -13.57742    -18.34525499  84.01018808 -22.04840478]
 [  4.47325573   3.76776505   4.70660032 -22.04840478   6.05251342]]

[[ 1.00000000e+00  1.75067964e-15 -2.80207584e-15  9.86737630e-16
  -1.66652414e-15]
 [ 1.65521536e-17  1.00000000e+00  6.05561798e-16  4.15196154e-15
  -2.52119478e-15]
 [ 3.86407015e-16 -7.07260933e-16  1.00000000e+00 -6.75452839e-15
  -1.18024337e-15]
 [ 1.82488652e-16 -1.79488644e-17  2.45759362e-16  1.00000000e+00
  -8.63545476e-16]
 [ 3.45446591e-16 -1.23954858e-15  2.22757702e-15 -1.91689727e-14
   1.00000000e+00]]


<center><b>Commonly used numpy.linalg functions:</b></center>


| Function | Description |
|----------|-------------|
| `diag`   | Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal |
| `dot`    | Matrix multiplication |
| `trace`  | Compute the sum of the diagonal elements |
| `det`    | Compute the matrix determinant |
| `eig`    | Compute the eigenvalues and eigenvectors of a square matrix |
| `inv`    | Compute the inverse of a square matrix |
| `pinv`   | Compute the Moore-Penrose pseudoinverse of a matrix |
| `qr`     | Compute the QR decomposition |
| `svd`    | Compute the singular value decomposition (SVD) |
| `solve`  | Solve the linear system Ax = b for x, where A is a square matrix |
| `lstsq`  | Compute the least-squares solution to Ax = b |