# Introduction to numpy

To install numpy with conda, activate an environment and type:

```
conda install -c anaconda numpy
```


The numpy package is a fundamental package for scientific computing in Python. It allows us to efficiently work with multi-dimensional, gridded data.

The basis of the package is an _ndarray_ object -- a data structure that holds our multi-dimensional data. The _ndarray_ object has a few requirements:

* They are a fixed size at creation and don't grow dynamically like Python lists.
* All elements in the array must be of the same type

Numpy arrays are arrays are the basis of MANY other Python packages for statistical and scientific computing. And are fast because they are optimized with pre-compiled C code. 



See [What is NumPy?](https://numpy.org/doc/stable/user/whatisnumpy.html) for more information!


This material is based off the material you'll find in the [Numpy Quickstart](https://numpy.org/doc/stable/user/quickstart.html)

In [1]:
# It's common practice to alias an import so you don't have to
# write out the full package name each time you want to use one of the tools
import numpy as np

In [2]:
# Create an array whose contents are indices, reshape it to be 3 rows, 5 columns
a = np.arange(15).reshape(3, 5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [3]:
# some helpful methods to describe the array
# Note: the following examples format the output using Python f-strings
print(f'Shape: {a.shape}')
print(f'Number of dimensions: {a.ndim}')
print(f'Data type: {a.dtype.name}')
print(f'Size: {a.size}')
print(f'Array type: {type(a)}')

Shape: (3, 5)
Number of dimensions: 2
Data type: int64
Size: 15
Array type: <class 'numpy.ndarray'>


In [4]:
# Create an array from a list.

my_list = [3, 5, 7]
b = np.array(my_list)
b

array([3, 5, 7])

In [5]:
# Set the data type when the array is created.
# Common data types are documented here: https://numpy.org/doc/stable/user/basics.types.html

np.zeros((3, 4), dtype=np.int16)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int16)

In [6]:
np.ones((3, 4), dtype=np.double)

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [7]:
# Create a range -- but Numpy accepts float increments, too!
np.arange(1, 11)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [8]:
np.arange(1, 11, 0.2)

array([ 1. ,  1.2,  1.4,  1.6,  1.8,  2. ,  2.2,  2.4,  2.6,  2.8,  3. ,
        3.2,  3.4,  3.6,  3.8,  4. ,  4.2,  4.4,  4.6,  4.8,  5. ,  5.2,
        5.4,  5.6,  5.8,  6. ,  6.2,  6.4,  6.6,  6.8,  7. ,  7.2,  7.4,
        7.6,  7.8,  8. ,  8.2,  8.4,  8.6,  8.8,  9. ,  9.2,  9.4,  9.6,
        9.8, 10. , 10.2, 10.4, 10.6, 10.8])

In [9]:
# To create an array with a regularly-spaced, specified number of points in a range
# Ex: Increase the number of points from 20 to 2000 to see how python prints a large array
np.linspace(0, 11, 20)

array([ 0.        ,  0.57894737,  1.15789474,  1.73684211,  2.31578947,
        2.89473684,  3.47368421,  4.05263158,  4.63157895,  5.21052632,
        5.78947368,  6.36842105,  6.94736842,  7.52631579,  8.10526316,
        8.68421053,  9.26315789,  9.84210526, 10.42105263, 11.        ])

In [10]:
# Numpy gotcha: arange is not inclusive (like Python's range), but linspace is inclusive!!!

## Numpy Operations

All the standard math operations happe _elementwise_.

Arrays must be the same shape for the elementwise math operations to be successful. Can use a single value, though.

The operation returns a new array with the result of the operation.

In [11]:
a = np.arange(20, 60, 10)
b = np.ones((4,)) * 3
print(a, b)

[20 30 40 50] [3. 3. 3. 3.]


In [12]:
a - b 

array([17., 27., 37., 47.])

In [13]:
a - 17

array([ 3, 13, 23, 33])

In [14]:
a * b

array([ 60.,  90., 120., 150.])

In [15]:
# math functions are built-in...no need to import math
10 * np.sin(a)

array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])

In [16]:
# Numpy also has some constants: https://numpy.org/doc/stable/reference/constants.html
# here are a few popular ones
print(np.nan, np.inf, np.pi)

nan inf 3.141592653589793


In [17]:
# Comparison operators work, too
print(a)
a < 35

[20 30 40 50]


array([ True,  True, False, False])

## Matrix operations

You need some additional operators to do matrix operations.

In [18]:
# Matrix product (multiplication) -- requires appropriate shapes follow math formulation!

A = np.ones((2,2))
B = np.arange(4).reshape((2,2))
print(A)
print(B)
A @ B

[[1. 1.]
 [1. 1.]]
[[0 1]
 [2 3]]


array([[2., 4.],
       [2., 4.]])

In [19]:
# Take the dot product 
A.dot(B)

array([[2., 4.],
       [2., 4.]])

## Unary operators

Computations performed on all the elements in an array, or along an axis

In [20]:
c = np.random.rand(3, 5)
c

array([[0.04563011, 0.70216171, 0.26612824, 0.36607774, 0.45183395],
       [0.47810394, 0.74996938, 0.40723064, 0.93300445, 0.91404728],
       [0.58741236, 0.47426637, 0.4539417 , 0.246935  , 0.56429406]])

In [21]:
c.sum()

7.64103694241172

In [22]:
c.mean()

0.5094024628274479

In [23]:
c.max()

0.9330044511624948

In [24]:
c.min()

0.04563011012008655

In [25]:
# Only along one axis: 0 is along a column, 1 is along a row
# Results in an array the same shape as the axis

# To get the minimum along a column
c.min(axis=0)

array([0.04563011, 0.47426637, 0.26612824, 0.246935  , 0.45183395])

In [26]:
# The maximum along a row:
c.max(axis=1)

array([0.70216171, 0.93300445, 0.58741236])

## Built-in mathematical functions

In [27]:
B = np.arange(2, 7, 2)
# Exponent
np.exp(B)

array([  7.3890561 ,  54.59815003, 403.42879349])

In [28]:
# Square root
np.sqrt(B)

array([1.41421356, 2.        , 2.44948974])

## Indexing, slicing, and iterating

1-D arrays are very similar to Python lists.

- Index with square brackets 
- Indexing starts at 0
- A colon is used to indicate a range like `start:stop:increment`
- Negative values count from the end of the collection

In [29]:
a = np.arange(10)**3
a

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])

In [30]:
a[2]

8

In [31]:
a[::-1] # Reversed array

array([729, 512, 343, 216, 125,  64,  27,   8,   1,   0])

In [32]:
# Set values -- happens in place!
a[1:4:2] = 1000
a

array([   0, 1000,    8, 1000,   64,  125,  216,  343,  512,  729])

In [33]:
# arrays are iterable
for i in a:
    print(i - 12)

-12
988
-4
988
52
113
204
331
500
717


## Multidimensional arrays

Same as 1-D arrays, but subsequent dimensions are indexed by comma-separated values

First is item is row, then column

In [34]:
def f(x, y):
    return x ** 2 + y ** 2

b = np.fromfunction(f, (6, 3), dtype=int)
b

array([[ 0,  1,  4],
       [ 1,  2,  5],
       [ 4,  5,  8],
       [ 9, 10, 13],
       [16, 17, 20],
       [25, 26, 29]])

In [35]:
b[4, 2] # 4th row, 2nd column

20

In [36]:
b[:, -1] # The entire last column

array([ 4,  5,  8, 13, 20, 29])

In [37]:
# Mask values that meet a criteria
np.where(b > 12, b, np.nan)

array([[nan, nan, nan],
       [nan, nan, nan],
       [nan, nan, nan],
       [nan, nan, 13.],
       [16., 17., 20.],
       [25., 26., 29.]])

In [38]:
# Another way
b[b <= 12] = 0
b

array([[ 0,  0,  0],
       [ 0,  0,  0],
       [ 0,  0,  0],
       [ 0,  0, 13],
       [16, 17, 20],
       [25, 26, 29]])

In [39]:
# Iterating -- almost never a good idea! Super slow.
for row in b:
    for item in row:
        if item > 0:
            print(item)

13
16
17
20
25
26
29


In [40]:
# Iterate over all items in the array:
for item in b.flat:
    print(item, end=' ')

0 0 0 0 0 0 0 0 0 0 0 13 16 17 20 25 26 29 

## Redefining shape

- ravel flattens an array to 1-D 
- reshape returns an array with a new shape -- maybe to get to C-like indexing with (column, row)
- transpose (T) switches rows and columns (in a 2d matrix)

Note: each of the above returns an array, and does not modify the array in place.

In [41]:
a = np.indices((2, 6)).sum(axis=0)
a

array([[0, 1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5, 6]])

In [42]:
a.ravel()

array([0, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6])

In [43]:
a.reshape(3, 4)

array([[0, 1, 2, 3],
       [4, 5, 1, 2],
       [3, 4, 5, 6]])

In [44]:
a.T

array([[0, 1],
       [1, 2],
       [2, 3],
       [3, 4],
       [4, 5],
       [5, 6]])

## Stacking together arrays

Arrays can be combined by stacking as long as their sizes are compatible in the desired dimension.

- concatenate stacks arrays along a specified axis
- hstack stacks arrays along their 2nd axis
- vstack stacks arrays along their 1st axis

In [45]:
a = np.random.randint(1, 10, size=(2,2))
a

array([[1, 8],
       [7, 5]])

In [46]:
b = np.random.randint(1, 10, size=(2,2))
b

array([[2, 1],
       [3, 4]])

In [47]:
np.hstack((a, b))

array([[1, 8, 2, 1],
       [7, 5, 3, 4]])

In [48]:
np.vstack((a, b))

array([[1, 8],
       [7, 5],
       [2, 1],
       [3, 4]])

In [49]:
c = np.random.randint(1, 10, size=(4, 2, 2))
d = np.random.randint(1, 10, size=(4, 2, 2))
c, d

(array([[[6, 7],
         [3, 5]],
 
        [[2, 8],
         [8, 6]],
 
        [[1, 1],
         [8, 6]],
 
        [[9, 8],
         [1, 7]]]),
 array([[[5, 6],
         [9, 5]],
 
        [[5, 1],
         [4, 7]],
 
        [[6, 9],
         [5, 4]],
 
        [[1, 2],
         [9, 5]]]))

In [50]:
# The two arrays need to have the same number of dimensions
np.concatenate((c[-1], a))

array([[9, 8],
       [1, 7],
       [1, 8],
       [7, 5]])

In [51]:
# Try along all 3 axes to see what happens
np.concatenate((c, d), axis=2)

array([[[6, 7, 5, 6],
        [3, 5, 9, 5]],

       [[2, 8, 5, 1],
        [8, 6, 4, 7]],

       [[1, 1, 6, 9],
        [8, 6, 5, 4]],

       [[9, 8, 1, 2],
        [1, 7, 9, 5]]])