# Numpy

First among these is NumPy. The main NumPy features are three-fold: its mathematical functions (e.g. `sin`, `log`, `floor`), its `random` submodule (useful for random sampling), and the NumPy `ndarray` object.An array is a homogeneous type

A NumPy array is similar to a mathematical n-dimensional matrix. For example, 

$$\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{d1} & x_{d2} & x_{d3} & \dots  & x_{dn}
\end{bmatrix}$$

A NumPy array could be 1-dimensional (e.g. [1, 5, 20, 34, ...]), 2-dimensional (as above), or many dimensions. It's important to note that all the rows and columns of the 2-dimensional array are the same length. That will be true for all dimensions of arrays.

In [5]:
# to access NumPy, we have to import it
import numpy as np
print(np.__version__)
np.show_config()

1.16.2
mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/Asus/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\lib', 'C:/Users/Asus/Anaconda3\\Library\\include']
blas_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/Asus/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\lib', '

In [7]:
w=np.array([70, 48, 90, 52, 60])
w

array([70, 48, 90, 52, 60])

In [8]:
w[0] + w[1] + w[2] + w[3] + w[4]

320

In [9]:
np.sum(w)

320

In [10]:
np.mean(w)

64.0

In [11]:
np.std(w)

15.019986684414869

In [12]:
n = 10000

In [13]:
%timeit -n100 np.arange(n) ** 2

16.9 µs ± 7.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [14]:
%timeit -n100 [v ** 2 for v in range(n)]

3.03 ms ± 31.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [15]:
%timeit -n100 list(map(lambda v: v ** 2, range(n)))

3.65 ms ± 253 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [3]:
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(list_of_lists)

an_array = np.array(list_of_lists)
print(an_array)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [4]:
an_array.ndim

2

In [5]:
an_array.shape

(3, 3)

In [6]:
an_array.size

9

In [None]:
non_rectangular = [[1, 2], [3, 4, 5], [6, 7, 8, 9]]
print(non_rectangular)

non_rectangular_array = np.array(non_rectangular)
print(non_rectangular_array)

Why did these print differently? Let's investigate their _shape_ and _data type_ (`dtype`).

In [None]:
print(an_array.shape, an_array.dtype)
print(non_rectangular_array.shape, non_rectangular_array.dtype)

The first case, `an_array`, is a 2-dimensional 3x3 array (of integers). In contrast, `non_rectangular_array` is a 1-dimensional length 3 array (of _objects_, namely `list` objects).

In [None]:
# We can also convert the `dtype` of an array after creation.
print(np.logspace(1, 10, 10).dtype)
print(np.logspace(1, 10, 10).astype(int).dtype)

Why does any of this matter?

Arrays are often more efficient in terms of code as well as computational resources for certain calculations. Computationally this efficiency comes from the fact that we pre-allocate a contiguous block of memory for the results of our computation.

To explore the advantages in code, let's try to do some math on these numbers.

First let's simply calculate the sum of all the numbers and look at the differences in the necessary code for `list_of_lists`, `an_array`, and `non_rectangular_array`.

In [None]:
print(sum([sum(inner_list) for inner_list in list_of_lists]))
print(an_array.sum())

Summing the numbers in an array is much easier than for a list of lists. We don't have to dig into a hierarchy of lists, we just use the `sum` method of the `ndarray`. Does this still work for `non_rectangular_array`?

In [None]:
# what happens here?
print(non_rectangular_array.sum())

Remember `non_rectangular_array` is a 1-dimensional array of `list` objects. The `sum` method tries to add them together: first list + second list + third list. Addition of lists results in _concatenation_.

In [None]:
# concatenate three lists
print([1, 2] + [3, 4, 5] + [6, 7, 8, 9])

In [None]:
print('Array row sums: ', an_array.sum(axis=1))
print('Array column sums: ', an_array.sum(axis=0))

In [None]:
print('List of list row sums: ', [sum(inner_list) for inner_list in list_of_lists])

def column_sum(list_of_lists):
    running_sums = [0] * len(list_of_lists[0])
    for inner_list in list_of_lists:
        for i, number in enumerate(inner_list):
            running_sums[i] += number
            
    return running_sums

print('List of list column sums: ', column_sum(list_of_lists))

We can also create a variety of arrays with NumPy's convenience functions.

## `Creating Arrays`

<br>
`arange` returns evenly spaced values within a given interval.

In [7]:
n = np.arange(0, 30, 2) # start at 0 count up by 2, stop before 30
n

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

<br>
`reshape` returns an array with the same data with a new shape.

In [8]:
n = n.reshape(3, 5) # reshape array to be 3x5
n

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

<br>
`linspace` returns evenly spaced numbers over a specified interval.

In [9]:
o = np.linspace(0, 4, 9) # return 9 evenly spaced values from 0 to 4
o

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

<br>
`resize` changes the shape and size of array in-place.

In [10]:
o.resize(3, 3)
o

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

<br>
`ones` returns a new array of given shape and type, filled with ones.

In [11]:
np.ones((3, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

<br>
`zeros` returns a new array of given shape and type, filled with zeros.

In [12]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

<br>
`eye` returns a 2-D array with ones on the diagonal and zeros elsewhere.

In [13]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

<br>
`diag` extracts a diagonal or constructs a diagonal array.

In [17]:
y = np.linspace(1, 10, 3)
np.diag(y)

array([[ 1. ,  0. ,  0. ],
       [ 0. ,  5.5,  0. ],
       [ 0. ,  0. , 10. ]])

<br>
Create an array using repeating list (or see `np.tile`)

In [18]:
np.array([1, 2, 3] * 3)

array([1, 2, 3, 1, 2, 3, 1, 2, 3])

<br>
Repeat elements of an array using `repeat`.

In [19]:
np.repeat([1, 2, 3], 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

In [9]:
np.logspace(1, 10, 10)

array([1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08,
       1.e+09, 1.e+10])

In [10]:
np.diag([1,2,3,4])

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

In [None]:
a = [15,123,22,1,4,55,735,12321] # this is a list

In [None]:
# if we eant to calcaulte over this list
b = [ a[i]*10 for i in range(len(a))] # we can do this by using list comprehension
b

In [None]:
# but if we use arrays
c = np.array(a)
c*10

In [None]:
a = np.array([1, 2, 3, 4, 5])
print(a + 5) # add a scalar
print(a * 5) # multiply by a scalar
print(a / 5) # divide by a scalar (note the float!)

In [None]:
b = a + 1
print(a + b) # add together two arrays
print(a * b) # multiply two arrays (element-wise)
print(a / b.astype(float)) # divide two arrays (element-wise)

In [None]:
print(np.dot(a, b)) # inner product of two arrays
print(np.outer(a, b)) # outer product of two arrays

Arrays have a lot to offer us in terms of representing and analyzing data, since we can easily apply mathematical functions to data sets or sections of data sets. Most of the time we won't run into any trouble using arrays, but it's good to be mindful of the restrictions around shape and datatype.

These restrictions around `shape` and `dtype` allow the `ndarray` objects to be much more performant compared to a general Python `list`.  There are few reasons for this, but the main two result from the typed nature of the `ndarray`, as this allows contiguous memory storage and consistent function lookup.  When a Python `list` is summed, Python needs to figure out at runtime the correct way in which to add each element of the list together.  When an `ndarray` is summed, `NumPy` already knows the type of the each element (and they are consistent), thus it can sum them without checking the correct add function for each element.

# Index and Slicing

In [9]:
a=np.arange(10) * 10
a

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

###  1-dim array

Array[index] output = int( members in arrays at position index )

Array[start : stop : step] output = Array( from position start to stop by step )

In [10]:
print(a[5])
print(a[-1])

50
90


In [11]:
a[3:]

array([30, 40, 50, 60, 70, 80, 90])

In [12]:
a[1::2]

array([10, 30, 50, 70, 90])

In [16]:
a[:-1]

array([90, 80, 70, 60, 50, 40, 30, 20, 10,  0])

In [11]:
a2 = np.reshape(np.arange(1,21),(4,5))
a2

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

### Index and Slice 2-dim numpy array

Array[index1 , index2] output = int( members in arrays at position index )

Array[(start : stop : step)1 , (start : stop : step)2 ] output = Array( from position start to stop by step )

In [None]:
a2[1]

In [None]:
a2[1,2]

In [None]:
a2[0:2]

In [None]:
a2[:,2]

In [None]:
a2[2:4,0:2]

In [None]:
a2[:,3:5]

### Changing Shape

Often we will want to take arrays that are one shape and transform them to a different shape more amenable to a specific operation.

In [None]:
mat = np.random.rand(5, 4)
mat.reshape(-1,2).shape

In [None]:
mat_ravel = mat.ravel()
mat_ravel

In [None]:
mat.transpose().shape

### `Combining Arrays`

In [20]:
p = np.ones([2, 3], int)
p

array([[1, 1, 1],
       [1, 1, 1]])

Use `vstack` to stack arrays in sequence vertically (row wise).

In [21]:
np.vstack([p, 2*p])

array([[1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2]])

Use `hstack` to stack arrays in sequence horizontally (column wise).

In [24]:
np.hstack([p, 2*p])

array([[1, 1, 1, 2, 2, 2],
       [1, 1, 1, 2, 2, 2]])

Use `dstack`

In [25]:
np.dstack([p, 2*p])

array([[[1, 2],
        [1, 2],
        [1, 2]],

       [[1, 2],
        [1, 2],
        [1, 2]]])

**Dot Product:**  

$ \begin{bmatrix}x_1 \ x_2 \ x_3\end{bmatrix}
\cdot
\begin{bmatrix}y_1 \\ y_2 \\ y_3\end{bmatrix}
= x_1 y_1 + x_2 y_2 + x_3 y_3$

In [85]:
x.dot(y) # dot product  1*4 + 2*5 + 3*6

32

In [86]:
z = np.array([y, y**2])
print(len(z)) # number of rows of array

2


<br>
Let's look at transposing arrays. Transposing permutes the dimensions of the array.

In [87]:
z = np.array([y, y**2])
z

array([[ 4,  5,  6],
       [16, 25, 36]])

<br>
The shape of array `z` is `(2,3)` before transposing.

In [88]:
z.shape

(2, 3)

<br>
Use `.T` to get the transpose.

In [89]:
z.T

array([[ 4, 16],
       [ 5, 25],
       [ 6, 36]])

<br>
The number of rows has swapped with the number of columns.

In [90]:
z.T.shape

(3, 2)

<br>
Use `.dtype` to see the data type of the elements in the array.

In [91]:
z.dtype

dtype('int64')

<br>
Use `.astype` to cast to a specific type.

In [92]:
z = z.astype('f')
z.dtype

dtype('float32')

<br>
## Math Functions

Numpy has many built in math functions that can be performed on arrays.

In [93]:
a = np.array([-4, -2, 1, 3, 5])

In [94]:
a.sum()

3

In [95]:
a.max()

5

In [96]:
a.min()

-4

In [97]:
a.mean()

0.60

In [98]:
a.std()

3.26

<br>
`argmax` and `argmin` return the index of the maximum and minimum values in the array.

In [99]:
a.argmax()

4

In [100]:
a.argmin()

0

<br>
## Indexing / Slicing

In [101]:
s = np.arange(13)**2
s

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121, 144])

<br>
Use bracket notation to get the value at a specific index. Remember that indexing starts at 0.

In [102]:
s[0], s[4], s[-1]

(0, 16, 144)

<br>
Use `:` to indicate a range. `array[start:stop]`


Leaving `start` or `stop` empty will default to the beginning/end of the array.

In [103]:
s[1:5]

array([ 1,  4,  9, 16])

<br>
Use negatives to count from the back.

In [104]:
s[-4:]

array([ 81, 100, 121, 144])

<br>
A second `:` can be used to indicate step-size. `array[start:stop:stepsize]`

Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is reached.

In [105]:
s[-5::-2]

array([64, 36, 16,  4,  0])

<br>
Let's look at a multidimensional array.

In [106]:
r = np.arange(36)
r.resize((6, 6))
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

<br>
Use bracket notation to slice: `array[row, column]`

In [107]:
r[2, 2]

14

<br>
And use : to select a range of rows or columns

In [108]:
r[3, 3:6]

array([21, 22, 23])

<br>
Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column.

In [109]:
r[:2, :-1]

array([[ 0,  1,  2,  3,  4],
       [ 6,  7,  8,  9, 10]])

<br>
This is a slice of the last row, and only every other element.

In [110]:
r[-1, ::2]

array([30, 32, 34])

<br>
We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see `np.where`)

In [111]:
r[r > 30]

array([31, 32, 33, 34, 35])

<br>
Here we are assigning all values in the array that are greater than 30 to the value of 30.

In [112]:
r[r > 30] = 30
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

<br>
## Copying Data

Be careful with copying and modifying arrays in NumPy!


`r2` is a slice of `r`

In [113]:
r2 = r[:3,:3]
r2

array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14]])

<br>
Set this slice's values to zero ([:] selects the entire array)

In [114]:
r2[:] = 0
r2

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

<br>
`r` has also been changed!

In [115]:
r

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

<br>
To avoid this, use `r.copy` to create a copy that will not affect the original array

In [116]:
r_copy = r.copy()
r_copy

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

<br>
Now when r_copy is modified, r will not be changed.

In [117]:
r_copy[:] = 10
print(r_copy, '\n')
print(r)

[[10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]] 

[[ 0  0  0  3  4  5]
 [ 0  0  0  9 10 11]
 [ 0  0  0 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 30 30 30 30 30]]


<br>
### Iterating Over Arrays

Let's create a new 4 by 3 array of random numbers 0-9.

In [118]:
test = np.random.randint(0, 10, (4,3))
test

array([[1, 8, 3],
       [6, 3, 9],
       [5, 5, 8],
       [2, 4, 8]])

<br>
Iterate by row:

In [119]:
for row in test:
    print(row)

[1 8 3]
[6 3 9]
[5 5 8]
[2 4 8]


<br>
Iterate by index:

In [120]:
for i in range(len(test)):
    print(test[i])

[1 8 3]
[6 3 9]
[5 5 8]
[2 4 8]


<br>
Iterate by row and index:

In [121]:
for i, row in enumerate(test):
    print('row', i, 'is', row)

row 0 is [1 8 3]
row 1 is [6 3 9]
row 2 is [5 5 8]
row 3 is [2 4 8]


<br>
Use `zip` to iterate over multiple iterables.

In [122]:
test2 = test**2
test2

array([[ 1, 64,  9],
       [36,  9, 81],
       [25, 25, 64],
       [ 4, 16, 64]])

In [123]:
for i, j in zip(test, test2):
    print(i,'+',j,'=',i+j)

[1 8 3] + [ 1 64  9] = [ 2 72 12]
[6 3 9] + [36  9 81] = [42 12 90]
[5 5 8] + [25 25 64] = [30 30 72]
[2 4 8] + [ 4 16 64] = [ 6 20 72]
