# Introduction to Numpy

NumPy is a first-rate library for numerical programming

- Widely used in academia, finance and industry.

- Mature, fast, stable and under continuous development.


NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

The most important structure that NumPy defines is an array data type formally called a numpy.ndarray.

NumPy arrays power a large proportion of the scientific Python ecosystem.

Let’s first import the library.

In [1]:
import numpy as np

To create a NumPy array containing only zeros we use np.zeros

In [3]:
a = np.zeros(3)
a

array([0., 0., 0.])

In [4]:
type(a)

numpy.ndarray

It is possible to create an array filled with 1’s:

In [5]:
np.ones(3)

array([1., 1., 1.])

You can create an array with a range of elements:

In [7]:
np.arange(4)

array([0, 1, 2, 3])

And even an array that contains a range of evenly spaced intervals. To do this, you will specify the first number, last number, and how many elements you need.

In [10]:
np.linspace(0, 10, 5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [15]:
a = np.linspace(0, 10, 5)
a.size

5

NumPy arrays are somewhat like native Python lists, except that

- Data must be homogeneous (all elements of the same type).

- These types must be one of the data types (dtypes) provided by NumPy.

The most important of these dtypes are:

-  float64: 64 bit floating-point number

- int64: 64 bit integer

-  bool: 8 bit True or False

There are also dtypes to represent complex numbers, unsigned integers, etc.

On modern machines, the default dtype for arrays is `float64`

In [16]:
a = np.zeros(3)
type(a[0])

numpy.float64

If we want to use integers we can specify as follows:

In [17]:
a = np.zeros(3, dtype=int)
type(a[0])

numpy.int32

Numpy arrays have following attributes:

In [31]:
a = np.linspace(0, 10, 6)

print("ndim: ", a.ndim) # number of dimensions
print("shape: ", a.shape) # shape of array. Useful for more than 2D arrays.
print("size: ", a.size) # number of elements
print("dtype: ", a.dtype) # type of elements

ndim:  1
shape:  (6,)
size:  6
dtype:  float64


You can change dimensions of an array modifying its shape:

In [32]:
# Before
a

array([ 0.,  2.,  4.,  6.,  8., 10.])

In [33]:
# After
a.shape = (2,3)
a

array([[ 0.,  2.,  4.],
       [ 6.,  8., 10.]])

As you can see ndarrays are **mutable** and now our `a` array is different than before!

In addition, NumPy arrays can be created from Python lists, tuples, etc. using `np.array`

In [55]:
print("From list: ", np.array([10, 20]) ) 
print("From tuple: ", np.array((10, 20)) ) 

From list:  [10 20]
From tuple:  [10 20]


In [56]:
z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
z

array([[1, 2],
       [3, 4]])

See also `np.asarray`, which performs a similar function, but does not make a distinct copy of data already in a NumPy array.

In [57]:
na = np.linspace(10, 20, 2)
na is np.asarray(na)   # Does not copy NumPy arrays

True

In [58]:
na is np.array(na)     # Does make a new copy --- perhaps unnecessarily

False

## Array indexing

For a flat array, indexing is the same as Python sequences:

In [59]:
a = np.linspace(0, 10, 6)
a

array([ 0.,  2.,  4.,  6.,  8., 10.])

In [60]:
type(a)

numpy.ndarray

In [61]:
a[0]

0.0

In [62]:
a[0:2]  # Two elements, starting at element 0

array([0., 2.])

In [63]:
a[-1]

10.0

For 2D arrays the index syntax is as follows:

In [64]:
z = np.array([[1, 2], [3, 4]])
z

array([[1, 2],
       [3, 4]])

In [68]:
print(z[0, 0])
print(z[0, 1])
print(z[1, 0])
print(z[1, 1])

1
2
3
4


Note that indices are still zero-based, to maintain compatibility with Python sequences.

Columns and rows can be extracted as follows:

In [72]:
print("Row 1:    ", z[0, :])
print("Column 2: ", z[: , 1])

Row 1:     [1 2]
Column 2:  [2 4]


In [None]:
# Use code above to print second row and first column



NumPy arrays of integers can also be used to extract elements:

In [73]:
a = np.linspace(0, 10, 6)
a

array([ 0.,  2.,  4.,  6.,  8., 10.])

In [74]:
indices = np.array((0, 2, 3))
a[indices]

array([0., 4., 6.])

If you want to select values from your array that fulfill certain conditions, it’s straightforward with NumPy.

In [77]:
a = np.array([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# You can easily print all of the values in the array that are less than 5.
a[a < 5]

array([1, 2, 3, 4])

In [81]:
# You can also select, for example, numbers that are equal to or greater than 5, and use that condition to index an array.

five_up = (a >= 5)
print(a[five_up])

[ 5  6  7  8  9 10 11 12]


In first step a bolean array is created:

In [82]:
five_up

array([[False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

And then it is used as a mask to select required values.

This way you can select elements that satisfy two conditions using the `&` and `|` operators:

In [83]:
a[(a > 2) & (a < 11)]

array([ 3,  4,  5,  6,  7,  8,  9, 10])

An aside: all elements of an array can be set equal to one number using slice notation:

In [84]:
a = np.empty(3)
a

array([0., 4., 6.])

In [85]:
a[:2] = 5
a

array([5., 5., 6.])

In [87]:
a[:] = -9
a

array([-9., -9., -9.])

## Array methods

Arrays have useful methods, all of which are carefully optimized

In [98]:
a = np.array((3, 4, 2, 1))

print("a: ",a)
a.sort()
print("a sorted: ", a )
print("Sum: ", a.sum() )
print("Mean: ", a.mean() )
print("Max: ", a.max() )
print("Index of the maximal element: ", a.argmax() )
print("Cumulative sum: ", a.cumsum() )
print("Cumulative product: ", a.cumprod() )
print("Variance: ", a.var() )
print("Standard deviation: ", a.std() )

a:  [3 4 2 1]
a sorted:  [1 2 3 4]
Sum:  10
Mean:  2.5
Max:  4
Index of the maximal element:  3
Cumulative sum:  [ 1  3  6 10]
Cumulative product:  [ 1  2  6 24]
Variance:  1.25
Standard deviation:  1.118033988749895


Many of the methods discussed above have equivalent functions in the NumPy namespace:

In [99]:
print(np.sum(a))
print(np.mean(a))

10
2.5


In [108]:
np.random.seed(1)
stock = np.random.randn(300)
stock[:11]

array([ 1.62434536, -0.61175641, -0.52817175, -1.07296862,  0.86540763,
       -2.3015387 ,  1.74481176, -0.7612069 ,  0.3190391 , -0.24937038,
        1.46210794])

In [111]:
rr = stock.cumsum()
rr[-10:]

array([22.1144306 , 22.05760612, 22.54994267, 21.86926453, 21.7847565 ,
       21.48739462, 21.90469663, 22.68946728, 21.73404201, 22.31995245])

## Arithmetic Operations

The operators `+`, `-`, `*`, `/` and `**` all act *elementwise*

In [116]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
print(a + b)
print(a * b)
print(a + 10)
print(a * 10)

[ 6  8 10 12]
[ 5 12 21 32]
[11 12 13 14]
[10 20 30 40]


The two-dimensional arrays follow the same general rules

In [118]:
A = np.ones((2, 2))
B = np.ones((2, 2))
print(A)
print(B)

[[1. 1.]
 [1. 1.]]
[[1. 1.]
 [1. 1.]]


In [119]:
print(A+B)
print(A+10)
print(A*B) # In particular, A * B is not the matrix product, it is an element-wise product.

[[2. 2.]
 [2. 2.]]
[[11. 11.]
 [11. 11.]]
[[1. 1.]
 [1. 1.]]


## Expressing Conditional Logic as Array Operations

The `numpy.where` function is a vectorized version of the ternary expression `x if condition else y`

A typical use of where in data analysis is to produce a new array of values based on another array. Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with –2. This is possible to do with `numpy.where`:

In [122]:
arr = np.random.standard_normal((4, 4))

arr

array([[ 2.06578332, -1.47115693, -0.8301719 , -0.8805776 ],
       [-0.27909772,  1.62284909,  0.01335268, -0.6946936 ],
       [ 0.6218035 , -0.59980453,  1.12341216,  0.30526704],
       [ 1.3887794 , -0.66134424,  3.03085711,  0.82458463]])

In [123]:
arr > 0

array([[ True, False, False, False],
       [False,  True,  True, False],
       [ True, False,  True,  True],
       [ True, False,  True,  True]])

In [124]:
np.where(arr > 0, 2, -2)

array([[ 2, -2, -2, -2],
       [-2,  2,  2, -2],
       [ 2, -2,  2,  2],
       [ 2, -2,  2,  2]])

You can combine scalars and arrays when using `numpy.where`. For example, I can replace all positive values in arr with the constant 2, like so:

In [125]:
np.where(arr > 0, 2, arr) # set only positive values to 2

array([[ 2.        , -1.47115693, -0.8301719 , -0.8805776 ],
       [-0.27909772,  2.        ,  2.        , -0.6946936 ],
       [ 2.        , -0.59980453,  2.        ,  2.        ],
       [ 2.        , -0.66134424,  2.        ,  2.        ]])

NumPy has some basic set operations for one-dimensional ndarrays. A commonly used one is `numpy.unique`, which returns the sorted unique values in an array:

In [127]:
names = np.array(["Bob", "Will", "Joe", "Bob", "Will", "Joe", "Joe"])
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

Other set operations:

- `intersect1d(x, y)` - Compute the sorted, common elements in `x` and `y`

- `union1d(x, y)` - Compute the sorted union of elements

- `in1d(x, y)` - Compute a Boolean array indicating whether each element of `x` is contained in `y`

- `setdiff1d(x, y)` - Set difference, elements in `x` that are not in `y`

- `setxor1d(x, y)` - Set symmetric differences; elements that are in either of the arrays, but not both

### Adding and removing elements

You can concatenate arrays with `np.concatenate()` (similar to `c()` in R)

In [131]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

np.concatenate((a, b)) # double brackets!!

array([1, 2, 3, 4, 5, 6, 7, 8])

In [137]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7,8]])

np.concatenate((x, y), axis=0)

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [138]:
np.concatenate((x, y), axis=1)

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

In order to remove elements from an array, it’s simple to use indexing to select the elements that you want to keep.