NumPy is an open source library available in Python that aids in mathematical, scientific, engineering, and data science programming.
NumPy is a programming language that deals with multi-dimensional arrays and matrices. Arrays are similar to lists in Python, except that every element of an array must be of the same type, typically a numeric type like float or int. Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists.

On top of the arrays and matrices, NumPy supports a large number of mathematical operations

# Importing

In [1]:
# import numpy
import numpy as np

In [2]:
# !pip install numpy

In [3]:
np.__version__

'1.25.2'

To display NumPy's built-in documentation, you can use this:

In [4]:
np?

# Python List vs Numpy Array
Because of Python's dynamic typing, we can create heterogeneous lists:

In [5]:
L = [True, "2", 3.0, 4]
[type(item) for item in L]

[bool, str, float, int]

This flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information–that is, each item is a complete Python object. In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array. The difference between a dynamic-type list and a fixed-type (NumPy-style) array is illustrated in the following figure:

![alt text](https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png)

At the implementation level, the array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

# NumPy Arrays
The central feature of NumPy is the array object class. Arrays are similar to lists in Python, except that every element of an array must be of the same type, typically a numeric type like float or int. Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists.

Creating Arrays

In [7]:
# Manual construction of 1-D arrays
one_D = np.array([0, 1, 2, 3])
print(one_D)
type(one_D)

[0 1 2 3]


numpy.ndarray

In [8]:
# Check number of array dimensions
one_D.ndim

1

Unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):

In [9]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

If we want to explicitly set the data type of the resulting array, we can use the **dtype** keyword:

In [10]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

In [11]:
# Manual construction of 2 and 3-D arrays
two_D = np.array([[0, 1, 2],
                  [3, 4, 5]])
print(two_D.ndim)
three_D = np.array([
                    [[1],
                     [2]],
                    [[3],
                     [4]]
                    ])
print(three_D.ndim)

2
3


In [12]:
# Shape of array
print(two_D.shape)
print(three_D.shape)
print(three_D)
# print(type(three_D))

(2, 3)
(2, 2, 1)
[[[1]
  [2]]

 [[3]
  [4]]]


In [13]:
# returns the size of the first dimension
len(two_D)

2

## Functions for creating arrays
In practice, we rarely enter items one by one

In [18]:
np.zeros(5)
np.zeros((2,2))

array([[0., 0.],
       [0., 0.]])

In [15]:
np.zeros(5, dtype=int)

array([0, 0, 0, 0, 0])

In [19]:
np.ones((2, 4), dtype=int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [20]:
np.full((2, 4), 3.14)

array([[3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14]])

###### Evenly spaced

In [21]:
np.arange(10) # 0 .. n-1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
np.arange(1, 9, 2) # start, end (exclusive), step

array([1, 3, 5, 7])

###### By number of points

In [23]:
np.linspace(0, 1, 6)  # start, end, num-points

array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])

In [24]:
np.linspace(0, 1, 6, endpoint=False)

array([0.        , 0.16666667, 0.33333333, 0.5       , 0.66666667,
       0.83333333])

In [25]:
np.ones((3, 3))  # reminder: (3, 3) is a tuple

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [26]:
np.zeros((2, 2))

array([[0., 0.],
       [0., 0.]])

In [27]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [28]:
np.diag(np.array([1, 2, 3, 4]))

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

In [29]:
# Create an uninitialized array of 2x3 size
np.empty([3, 3])

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

# Random numbers

In [30]:
# create a numpy array of 4 uniformly distributed numbers
np.random.rand(4)       # uniform in [0, 1]

array([0.92752006, 0.61120413, 0.70739571, 0.03014108])

In [31]:
#create a numpy array of 5 normally distributed numbers
np.random.randn(5)      # Gaussian

array([-0.05471172,  0.93492538,  0.45056525, -1.78502847,  1.07759258])

In [32]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.54366605, 0.12886768, 0.15416342],
       [0.00108946, 0.5729158 , 0.59179998],
       [0.06126894, 0.36108961, 0.94804947]])

In [33]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[ 0.86430545,  2.18630167,  0.21634343],
       [ 0.50056873, -0.65639783, -1.09109527],
       [ 0.93168267,  2.90981434,  1.27492092]])

In [34]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[5, 2, 0],
       [9, 4, 2],
       [6, 8, 5]])

# NumPy Standard Data Types

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table. Note that when constructing an array, they can be specified using a string:

In [35]:
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

In [36]:
# or
np.zeros(10, dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)|
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)|
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)|
| ``int8``      | Byte (-128 to 127)|
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)|
| ``uint8``     | Unsigned integer (0 to 255)|
| ``uint16``    | Unsigned integer (0 to 65535)|
| ``uint32``    | Unsigned integer (0 to 4294967295)|
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)|
| ``float_``    | Shorthand for ``float64``.|
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa|
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa|
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa|
| ``complex_``  | Shorthand for ``complex128``.|
| ``complex64`` | Complex number, represented by two 32-bit floats|
| ``complex128``| Complex number, represented by two 64-bit floats|

# NumPy Array Attributes

In [37]:
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array
print(x3)

[[[2 3 2 0 8]
  [9 5 4 0 8]
  [2 0 2 0 1]
  [3 8 7 3 3]]

 [[0 2 4 4 8]
  [3 0 3 6 1]
  [5 1 5 8 4]
  [9 1 9 3 9]]

 [[3 2 5 9 8]
  [1 6 1 2 1]
  [7 1 2 6 7]
  [6 7 7 1 5]]]


In [38]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


In [39]:
print("dtype:", x3.dtype)

dtype: int64


Other attributes include `itemsize`, which lists the size (in bytes) of each array element, and `nbytes`, which lists the total size (in bytes) of the array:

In [41]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


In general, we expect that `nbytes` is equal to `itemsize` times `size`.

# Indexing and slicing
The items of an array can be accessed and assigned to the same way as other Python lists.

In [42]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [43]:
a[0], a[2], a[-1]

(0, 2, 9)

For multidimensional arrays, indexes are tuples of integers.

In [47]:
a = np.diag(np.arange(3))
a

array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 2]])

In [48]:
a[2, 1] = 77 # third line, second column  (for lists a[2][1])
a

array([[ 0,  0,  0],
       [ 0,  1,  0],
       [ 0, 77,  2]])

Keep in mind, that if we attempt to insert a floating-point value to an integer array, the value will be silently truncated.

In [49]:
a[0, 1] =  3.14
a

array([[ 0,  3,  0],
       [ 0,  1,  0],
       [ 0, 77,  2]])

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this:

`x[start:stop:step]`

## One-dimensional subarrays

In [50]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [51]:
a[2:9:3] # [start:end:step]

array([2, 5, 8])

Note that the last index is not included

In [52]:
a[:4]

array([0, 1, 2, 3])

In [53]:
a[1:3]

array([1, 2])

In [54]:
a[::2]

array([0, 2, 4, 6, 8])

In [55]:
a[3:]

array([3, 4, 5, 6, 7, 8, 9])

In [56]:
a[::-1]  # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

It is possible to combine assignment and slicing

In [57]:
a[5:] = 10
a

array([ 0,  1,  2,  3,  4, 10, 10, 10, 10, 10])

In [58]:
b = np.arange(5)
b

array([0, 1, 2, 3, 4])

In [59]:
a[5:] = b[::-1]
a

array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])

## Multi-dimensional subarrays
Multi-dimensional slices work in the same way, with multiple slices separated by commas.

In [60]:
x2 = x3[0]
print(x2)

[[2 3 2 0 8]
 [9 5 4 0 8]
 [2 0 2 0 1]
 [3 8 7 3 3]]


In [61]:
x2[:2, :3] # two rows, three columns

array([[2, 3, 2],
       [9, 5, 4]])

In [62]:
x2[:3, ::2]  # all rows, every other column

array([[2, 2, 8],
       [9, 4, 8],
       [2, 2, 1]])

Finally, subarray dimensions can even be reversed together:

In [63]:
x2[::-1, ::-1]

array([[3, 3, 7, 8, 3],
       [1, 0, 2, 0, 2],
       [8, 0, 4, 5, 9],
       [8, 0, 2, 3, 2]])

## Accessing array rows and columns
One commonly needed routine is accessing of single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon (:):

In [64]:
print(x2)

[[2 3 2 0 8]
 [9 5 4 0 8]
 [2 0 2 0 1]
 [3 8 7 3 3]]


In [65]:
print(x2[:, 0])  # first column of x2

[2 9 2 3]


In [66]:
print(x2[0, :])  # first row of x2

[2 3 2 0 8]


In [67]:
print(x2[0])  # equivalent to x2[0, :]

[2 3 2 0 8]


## Subarrays as no-copy views
One important and extremely useful thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.

In [68]:
print(x2)

[[2 3 2 0 8]
 [9 5 4 0 8]
 [2 0 2 0 1]
 [3 8 7 3 3]]


In [69]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[2 3]
 [9 5]]


Now if we modify this subarray, we'll see that the original array is changed!

In [70]:
x2_sub[0, 0] = 99
print(x2_sub)

[[99  3]
 [ 9  5]]


In [71]:
print(x2)

[[99  3  2  0  8]
 [ 9  5  4  0  8]
 [ 2  0  2  0  1]
 [ 3  8  7  3  3]]


In [72]:
# checks if a and b share the same memory block
np.may_share_memory(x2, x2_sub)

True

### Creating copies of arrays
Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the copy() method:

In [74]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  3]
 [ 9  5]]


In [75]:
x2_sub_copy[0, 0] = 777
print(x2_sub_copy)

[[777   3]
 [  9   5]]


In [76]:
# checks if a and b share the same memory block
np.may_share_memory(x2, x2_sub_copy)

False

In [77]:
print(x2)

[[99  3  2  0  8]
 [ 9  5  4  0  8]
 [ 2  0  2  0  1]
 [ 3  8  7  3  3]]


# Modifying Arrays

## Reshaping Arrays

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method. For example, if you want to put the numbers 1 through 9 in a 3×3 grid, you can do the following:

In [78]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Note that for this to work, **the size of the initial array must match the size of the reshaped array**. Where possible, the reshape method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. This can be done with the reshape method, or more easily done by making use of the `newaxis` keyword within a slice operation:

In [79]:
x = np.array([1, 2, 3])
print(x.shape)

# row vector via reshape
x = x.reshape((1, 3))
print(x)
print(x.shape)

(3,)
[[1 2 3]]
(1, 3)


In [80]:
a = np.array([[1,2,3],
              [2,3,4]])

a = a.flatten()
print(a)
a= a.reshape((3, 2))
print(a)

[1 2 3 2 3 4]
[[1 2]
 [3 2]
 [3 4]]


In [None]:
x = np.array([1, 2, 3])

# row vector via newaxis
x = x[np.newaxis, :]
print(x.shape)

(1, 3)


In [None]:
# column vector via reshape
x.reshape((3, 1))

In [None]:
x = np.array([1, 2, 3])

# column vector via newaxis
x[:, np.newaxis].shape

(3, 1)

## Resizing
Size of an array can be changed with ndarray.resize.

In [None]:
a = np.arange(4)
print(a)
a.resize((8,))
print(a)

[0 1 2 3]
[0 1 2 3 0 0 0 0]


## Concatenating Arrays
Concatenation, or joining of two arrays in **NumPy**, is primarily accomplished using the routines `np.concatenate`, `np.vstack`, and `np.hstack`.

`np.concatenate` takes a tuple or list of arrays as its first argument, as we can see here:

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [None]:
# concatenate more than two arrays
z = [99, 88, 77]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 88 77]


It can be also used for 2-D arrays:

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [None]:
# concatenate along the first axis
# if the axis property is not set, defaults to axis = 0
np.concatenate([grid, grid], axis = 0)

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [None]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions, it can be clearer to use the `np.vstack` (vertical stack) and `np.hstack` (horizontal stack) functions:

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

Similary, `np.dstack` will stack arrays along the third axis.


## Splitting Arrays
The opposite of concatenation is splitting, which is implemented by the functions `np.split`, `np.hsplit`, and `np.vsplit`. For each of these, we can pass a list of indices giving the split points:

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [None]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [None]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [None]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


Similarly, np.dsplit will stack arrays along the third axis.

# Computation on NumPy Arrays: Universal Functions
The reasons that **NumPy** is so important in the Python data science world is that  it provides an easy and flexible interface to optimized computation with arrays of data.

Computation on NumPy arrays can be very fast, or it can be very slow. The key to making it fast is to use `vectorized operations`, generally implemented through NumPy's universal functions (**ufuncs**).

### The Slowness of Loops
The slowness of Python generally manifests itself in situations where many small operations are being repeated – for instance looping over arrays to operate on each element. For example, imagine we have an array of values and we'd like to compute the reciprocal of each. A straightforward approach might look like this:

In [None]:
def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)= np.array([[1,2,3],
              [2,3,4]])

array([0.33333333, 0.125     , 0.16666667, 0.5       , 0.125     ])

In [None]:
big_array = np.random.randint(1, 100, size=1000000)
%time compute_reciprocals(big_array)

CPU times: user 2.32 s, sys: 0 ns, total: 2.32 s
Wall time: 2.32 s


array([0.01315789, 0.02222222, 0.03703704, ..., 0.01162791, 0.01851852,
       0.01388889])

 Each time the reciprocal is computed, Python first examines the object's type and does a dynamic lookup of the correct function to use for that type.

We saw how slow is calculating reciprocal of milion elements. For many types of operations NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a **vectorized** operation.

This can be accomplished by simply performing an operation on the array, which will then be applied to each element. This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

If we compare results of below two examples we see that they are exactly same:

In [None]:
%time (1.0 / big_array)

CPU times: user 3.13 ms, sys: 4.93 ms, total: 8.06 ms
Wall time: 11.8 ms


array([0.01315789, 0.02222222, 0.03703704, ..., 0.01162791, 0.01851852,
       0.01388889])

Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is to quickly execute repeated operations on values in NumPy arrays. Ufuncs are extremely flexible – before we saw an operation between a scalar and an array, but we can also operate between two arrays:

In [None]:
np.arange(5) / np.arange(1, 6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

And ufunc operations are not limited to one-dimensional arrays–they can also act on multi-dimensional arrays as well:

In [None]:
x = np.arange(9).reshape((3, 3))
print(x)
2 ** x

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

# Array arithmetic
NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic operators. The standard addition, subtraction, multiplication, and division can all be used:

In [None]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


There is also a unary ufunc for negation, and a ** operator for exponentiation, and a % operator for modulus:

In [None]:
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


In [None]:
# in case if you use all these together,
# the standard order of operations is respected:
-(0.5*x + 1) ** 2

array([-1.  , -2.25, -4.  , -6.25])

Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy; for example, the `+` operator is a wrapper for the `add` function.
The following table lists the arithmetic operators implemented in NumPy:

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|


### Absolute value
Just as NumPy understands Python's built-in arithmetic operators, it also understands Python's built-in absolute value function:

In [None]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

array([2, 1, 0, 1, 2])

In [None]:
np.abs(x)
np.absolute(x)

array([2, 1, 0, 1, 2])

### Trigonometric functions
NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist are the trigonometric functions. We'll start by defining an array of angles:

In [None]:
theta = np.linspace(0, np.pi, 3)

[0.         1.57079633 3.14159265]


In [None]:
print("theta      = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

theta      =  [0.         1.57079633 3.14159265]
sin(theta) =  [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) =  [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) =  [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]


In [None]:
x = [-1, 0, 1]
print("x         = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

x         =  [-1, 0, 1]
arcsin(x) =  [-1.57079633  0.          1.57079633]
arccos(x) =  [3.14159265 1.57079633 0.        ]
arctan(x) =  [-0.78539816  0.          0.78539816]


### Exponents and logarithms
Another common type of operation available in a NumPy ufunc are the exponentials:

In [None]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]


The inverse of the exponentials, the logarithms, are also available. The basic np.log gives the natural logarithm; if you prefer to compute the base-2 logarithm or the base-10 logarithm, these are available as well:

In [None]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


### Specifying output

For large calculations, it is sometimes useful to be able to specify the array where the result of the calculation will be stored. Rather than creating a temporary array, this can be used to write computation results directly to the memory location where you'd like them to be. For all ufuncs, this can be done using the out argument of the function:

In [None]:
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)

[ 0. 10. 20. 30. 40.]


In [None]:
y = np.zeros(10)
y[::2] = np.power(3, x)
print(y)

[ 1.  0.  3.  0.  9.  0. 27.  0. 81.  0.]


If we had instead written `y[::2] = 2 ** x`, this would have resulted in the creation of a temporary array to hold the results of `2 ** x`, followed by a second operation copying those values into the y array.

In [None]:
y = np.zeros(10)
np.power(3, x, out=y[::2])
print(y)

[ 1.  0.  3.  0.  9.  0. 27.  0. 81.  0.]


# Aggregations: Min, Max, and Everything In Between

Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question. Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the "typical" values in a dataset, but other aggregates are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.).

NumPy has fast built-in aggregation functions for working on arrays; we'll discuss and demonstrate some of them here.

## Sum

Let us consider computing the sum of all values in an array. Python itself can do this using the built-in sum function:

In [None]:
L = np.random.random(100)
print("Python sum : ", sum(L))
print("NumPy sum : ", np.sum(L))

Python sum :  44.81684233959629
NumPy sum :  44.816842339596285


However, because it executes the operation in compiled code, NumPy's version of the operation is computed much more quickly:

In [None]:
big_array = np.random.rand(1000000)
%time sum(big_array)
%time np.sum(big_array)

CPU times: user 176 ms, sys: 0 ns, total: 176 ms
Wall time: 177 ms
CPU times: user 1.7 ms, sys: 0 ns, total: 1.7 ms
Wall time: 969 µs


499764.5201551815

Be careful, though: the ``sum`` function and the ``np.sum`` function are not identical, which can sometimes lead to confusion!
In particular, their optional arguments have different meanings, and ``np.sum`` is aware of multiple array dimensions, as we will see in the following section.

## Minimum and Maximum
Similarly, Python has built-in `min` and `max` functions, used to find the minimum value and maximum value of any given array:

In [None]:
min(big_array), max(big_array)

(4.844385737001744e-08, 0.9999995203964611)

In [None]:
np.min(big_array), np.max(big_array)

(4.844385737001744e-08, 0.9999995203964611)

In [None]:
%timeit min(big_array)
%timeit np.min(big_array)

10 loops, best of 3: 101 ms per loop
1000 loops, best of 3: 409 µs per loop


For ``min``, ``max``, ``sum``, and several other NumPy aggregates, a shorter syntax is to use methods of the array object itself:

In [None]:
print(big_array.min(), big_array.max(), big_array.sum())

4.844385737001744e-08 0.9999995203964611 499764.5201551815


Whenever possible, make sure that you are using the NumPy version of these aggregates when operating on NumPy arrays!

## Multi dimensional aggregates
One common type of aggregation operation is an aggregate along a row or column. Say you have some data stored in a two-dimensional array:

In [None]:
M = np.random.random((3, 4))
print(M)

[[0.5572068  0.56464233 0.42943773 0.32946457]
 [0.63432366 0.60177957 0.82191961 0.54604899]
 [0.53486178 0.88165557 0.16330887 0.08572345]]


By default, each NumPy aggregation function will return the aggregate over the entire array:

In [None]:
M.sum() # np.sum(M)

6.150372922997595

Aggregation functions take an additional argument specifying the axis along which the aggregate is computed. For example, we can find the minimum/maximum value within each column by specifying **axis=0** and minimum/maximum value within each row by specifying **axis=1**:

In [None]:
M.shape

(3, 4)

In [None]:
print(M.min(axis=0))
print(M.min(axis=1))

[0.53486178 0.56464233 0.16330887 0.08572345]
[0.32946457 0.54604899 0.08572345]


In [None]:
print(M.mean(axis=0))
print(M.sum(axis=1))

[0.57546408 0.68269249 0.4715554  0.32041234]
[1.88075143 2.60407182 1.66554967]


NumPy provides many other aggregation functions, but we won't discuss them in detail here. Additionally, most aggregates have a NaN-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point NaN (acronym for Not a Number) value.

In [None]:
vals = np.array([1, np.nan, 3, 4])
vals.dtype

dtype('float64')

In [None]:
1 + np.nan

nan

In [None]:
0 *  np.nan

nan

In [None]:
vals.sum(), vals.min(), vals.max()

(nan, nan, nan)

In [None]:
np.nansum(vals), np.nanmin(vals), np.nanmax(vals)

(8.0, 1.0, 4.0)

Keep in mind that NaN is specifically a floating-point value; there is no equivalent NaN value for integers, strings, or other types.

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |
