# Numpy
As described at https://numpy.org  
> NumPy is the fundamental package for scientific computing with Python. It contains among other things:
> - a powerful N-dimensional array object
> - sophisticated (broadcasting) functions
> - tools for integrating C/C++ and Fortran code
> - useful linear algebra, Fourier transform, and random number capabilities

If you are familiar with Matlab, this comparison might be useful:
https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html

## Resources
1. Ch 4 in Python for Data Analysis, 2nd Ed, Wes McKinney (UCalgary library and https://github.com/wesm/pydata-book)
2. Ch 2 in Python Data Science Handbook, Jake VanderPlas (Ucalgary library and https://github.com/jakevdp/PythonDataScienceHandbook)


Let's explore some of the features. 

First, import Numpy

In [1]:
import numpy as np

## Create numpy arrays
Here are several ways how to create numpy arrays

```python
>>> np.zeros((3,4))
>>> np.ones((2,3,4),dtype=np.int16) 
>>> d = np.arange(10,25,5)
>>> np.linspace(0,2,9)
```

In [6]:
# start-stop and number of values evenly spaced
np.linspace(0,2,9)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

In [3]:
# start-stop(excluded) and increment
d = np.arange(10,25,5)
d

array([10, 15, 20])

Additionally, arrays can be generated from python lists

In [4]:
a = np.array([1, 2, 3, 4])
a

array([1, 2, 3, 4])

### Random arrays
Sometimes it is handy to generate arrays with random entries, just for testing.

The numpy.random module has many useful functions. To get help on numpy functions or modules you can use:
```python
np.info(np.random)
```

Of course, python `help()` or jupyter `?` will work too. Another nice trick is `tab` for tab completion, and `shift-tab` twice to get info on function parameters

In [5]:
# what is in random
np.info(np.random)

Random Number Generation

Use ``default_rng()`` to create a `Generator` and call its methods.

Generator
--------------- ---------------------------------------------------------
Generator       Class implementing all of the random number distributions
default_rng     Default constructor for ``Generator``

BitGenerator Streams that work with Generator
--------------------------------------------- ---
MT19937
PCG64
PCG64DXSM
Philox
SFC64

Getting entropy to initialize a BitGenerator
--------------------------------------------- ---
SeedSequence


Legacy
------

For backwards compatibility with previous versions of numpy before 1.17, the
various aliases to the global `RandomState` methods are left alone and do not
use the new `Generator` API.

Utility functions
-------------------- ---------------------------------------------------------
random               Uniformly distributed floats over ``[0, 1)``
bytes                Uniformly distributed random bytes.
permutation          Randoml

**Use a seed**. It is good practice to set the seed of the random generator so that everytime you run the cell, the same numbers are generated. It makes debugging a lot easier.

Now lets generate a matrix with random integers [0, 10) of size 5x4 with `randint()`, and a python list with 10 random entries from a list of strings containing 'low', 'medium', 'high' with `choice()`. Another useful function is `shuffle()`, try it out too.

In [10]:
np.random.seed(1992)
A = np.random.randint(low=0, high=10, size=(5,4))
A

array([[7, 9, 8, 1],
       [5, 1, 2, 8],
       [8, 8, 4, 6],
       [9, 7, 2, 7],
       [6, 4, 6, 7]])

In [7]:
np.random.seed(1995)
labels = np.random.choice(['low', 'medium', 'high'], size=10)
labels

array(['low', 'high', 'high', 'medium', 'medium', 'medium', 'low', 'low',
       'low', 'high'], dtype='<U6')

## Indexing
Get the shape of an array rows x columns

In [11]:
A.shape

(5, 4)

The first row is (zero indexing)

In [12]:
A[0]

array([7, 9, 8, 1])

The first column is

In [17]:
A[:,0]

array([7, 5, 8, 9, 6])

Slicing works too, how would you get the last two rows?

In [11]:
A[-2:,:]

array([[9, 7, 2, 7],
       [6, 4, 6, 7]])

## Array math
Numpy can handle element-wise math directly, for example converting celcius to fahrenheit

In [18]:
B = (9/5) * A + 32
B

array([[44.6, 48.2, 46.4, 33.8],
       [41. , 33.8, 35.6, 46.4],
       [46.4, 46.4, 39.2, 42.8],
       [48.2, 44.6, 35.6, 44.6],
       [42.8, 39.2, 42.8, 44.6]])

Two arrays of same shape can be added etc.

In [13]:
A + B

array([[51.6, 57.2, 54.4, 34.8],
       [46. , 34.8, 37.6, 54.4],
       [54.4, 54.4, 43.2, 48.8],
       [57.2, 51.6, 37.6, 51.6],
       [48.8, 43.2, 48.8, 51.6]])

Or we can apply a math function to each element

In [14]:
np.sin(A)

array([[ 0.6569866 ,  0.41211849,  0.98935825,  0.84147098],
       [-0.95892427,  0.84147098,  0.90929743,  0.98935825],
       [ 0.98935825,  0.98935825, -0.7568025 , -0.2794155 ],
       [ 0.41211849,  0.6569866 ,  0.90929743,  0.6569866 ],
       [-0.2794155 , -0.7568025 , -0.2794155 ,  0.6569866 ]])

## Array manipulation
It is often useful to concatenate or stack arrays. This is similar to Matlab


In [15]:
np.info(np.hstack)

 hstack(tup)

Stack arrays in sequence horizontally (column wise).

This is equivalent to concatenation along the second axis, except for 1-D
arrays where it concatenates along the first axis. Rebuilds arrays divided
by `hsplit`.

This function makes most sense for arrays with up to 3 dimensions. For
instance, for pixel-data with a height (first axis), width (second axis),
and r/g/b channels (third axis). The functions `concatenate`, `stack` and
`block` provide more general stacking and concatenation operations.

Parameters
----------
tup : sequence of ndarrays
    The arrays must have the same shape along all but the second axis,
    except 1-D arrays which can be any length.

Returns
-------
stacked : ndarray
    The array formed by stacking the given arrays.

See Also
--------
concatenate : Join a sequence of arrays along an existing axis.
stack : Join a sequence of arrays along a new axis.
block : Assemble an nd-array from nested lists of blocks.
vstack : Stack arrays in sequence v

In [16]:
C = np.hstack((A, B))
C

array([[ 7. ,  9. ,  8. ,  1. , 44.6, 48.2, 46.4, 33.8],
       [ 5. ,  1. ,  2. ,  8. , 41. , 33.8, 35.6, 46.4],
       [ 8. ,  8. ,  4. ,  6. , 46.4, 46.4, 39.2, 42.8],
       [ 9. ,  7. ,  2. ,  7. , 48.2, 44.6, 35.6, 44.6],
       [ 6. ,  4. ,  6. ,  7. , 42.8, 39.2, 42.8, 44.6]])

In [17]:
C.shape

(5, 8)

In [18]:
D = np.vstack((A, B))
D

array([[ 7. ,  9. ,  8. ,  1. ],
       [ 5. ,  1. ,  2. ,  8. ],
       [ 8. ,  8. ,  4. ,  6. ],
       [ 9. ,  7. ,  2. ,  7. ],
       [ 6. ,  4. ,  6. ,  7. ],
       [44.6, 48.2, 46.4, 33.8],
       [41. , 33.8, 35.6, 46.4],
       [46.4, 46.4, 39.2, 42.8],
       [48.2, 44.6, 35.6, 44.6],
       [42.8, 39.2, 42.8, 44.6]])

Elements in an array can be modified, either as single elements or entire rows/columns.  
Let's create a copy first.

In [21]:
A_t = np.copy(A)
A_t

array([[7, 9, 8, 1],
       [5, 1, 2, 8],
       [8, 8, 4, 6],
       [9, 7, 2, 7],
       [6, 4, 6, 7]])

In [22]:
A_t[0,0] = 10
A_t

array([[10,  9,  8,  1],
       [ 5,  1,  2,  8],
       [ 8,  8,  4,  6],
       [ 9,  7,  2,  7],
       [ 6,  4,  6,  7]])

In [25]:
A_t[-1, :] = -1
A_t

array([[10,  9,  8,  1],
       [ 5,  1,  2,  8],
       [ 8,  8,  4,  6],
       [ 9,  7,  2,  7],
       [-1, -1, -1, -1]])

In [26]:
A_t[-1, -2:] = [5, 5]
A_t

array([[10,  9,  8,  1],
       [ 5,  1,  2,  8],
       [ 8,  8,  4,  6],
       [ 9,  7,  2,  7],
       [-1, -1,  5,  5]])

## Read values from file (csv)
Values can be read from file with `np.genfromtxt()`


We use data in `heart-attack.csv`. Let's peak at the first 10 lines

In [43]:
# Windows cmd.exe
!type heart-attack.csv
# Windows powershell
# !gc -head 10 heart-attack.csv

# MacOSx
#!head heart-attack.csv

age,gender,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num

28,1,2,130,132,0,2,185,0,0,?,?,?,0

29,1,2,120,243,0,0,160,0,0,?,?,?,0

29,1,2,140,?,0,0,170,0,0,?,?,?,0

30,0,1,170,237,0,1,170,0,0,?,?,6,0

31,0,2,100,219,0,1,150,0,0,?,?,?,0

32,0,2,105,198,0,0,165,0,0,?,?,?,0

32,1,2,110,225,0,0,184,0,0,?,?,?,0

32,1,2,125,254,0,0,155,0,0,?,?,?,0

33,1,3,120,298,0,0,185,0,0,?,?,?,0

34,0,2,130,161,0,0,190,0,0,?,?,?,0

34,1,2,150,214,0,1,168,0,0,?,?,?,0

34,1,2,98,220,0,0,150,0,0,?,?,?,0

35,0,1,120,160,0,1,185,0,0,?,?,?,0

35,0,4,140,167,0,0,150,0,0,?,?,?,0

35,1,2,120,308,0,2,180,0,0,?,?,?,0

35,1,2,150,264,0,0,168,0,0,?,?,?,0

36,1,2,120,166,0,0,180,0,0,?,?,?,0

36,1,3,112,340,0,0,184,0,1,2,?,3,0

36,1,3,130,209,0,0,178,0,0,?,?,?,0

36,1,3,150,160,0,0,172,0,0,?,?,?,0

37,0,2,120,260,0,0,130,0,0,?,?,?,0

37,0,3,130,211,0,0,142,0,0,?,?,?,0

37,0,4,130,173,0,1,184,0,0,?,?,?,0

37,1,2,130,283,0,1,98,0,0,?,?,?,0

37,1,3,130,194,0,0,150,0,0,?,?,?,0

37,1,4,120,223,0,0,168,

Finally, read in the file

In [31]:
da = np.genfromtxt('heart-attack.csv', delimiter=',', skip_header=1)
da

array([[28.,  1.,  2., ..., nan, nan,  0.],
       [29.,  1.,  2., ..., nan, nan,  0.],
       [29.,  1.,  2., ..., nan, nan,  0.],
       ...,
       [54.,  0.,  3., ..., nan, nan,  1.],
       [56.,  1.,  4., ..., nan, nan,  1.],
       [58.,  0.,  2., ..., nan,  7.,  1.]])

Note that missing values are indicated by `np.nan`

## Summarize values
What is the mean, std, min, max in each column?

In [32]:
da.mean(axis=0)

array([47.76791809,  0.72354949,  2.97952218,         nan,         nan,
               nan,         nan,         nan,         nan,  0.58464164,
               nan,         nan,         nan,  0.35836177])

Columns containing `np.nan` will not produce a result. There is a nan-aware function to get around this

In [33]:
np.nanmean(da,axis=0)

array([4.77679181e+01, 7.23549488e-01, 2.97952218e+00, 1.32592466e+02,
       2.50759259e+02, 7.01754386e-02, 2.15753425e-01, 1.39212329e+02,
       3.01369863e-01, 5.84641638e-01, 1.89320388e+00, 0.00000000e+00,
       5.64285714e+00, 3.58361775e-01])

We can use `print()` to get nicer output

In [34]:
print(np.round(np.mean(da,axis=0), 2))

[47.77  0.72  2.98   nan   nan   nan   nan   nan   nan  0.58   nan   nan
   nan  0.36]


Or using `printoptions`

In [35]:
with np.printoptions(precision=2, suppress=True):
    print(np.nanmean(da,axis=0))

[ 47.77   0.72   2.98 132.59 250.76   0.07   0.22 139.21   0.3    0.58
   1.89   0.     5.64   0.36]


Create a min/max matrix

In [36]:
da_min_max = np.vstack((da.min(axis=0), da.max(axis=0)))
print(da_min_max)

[[28.  0.  1. nan nan nan nan nan nan  0. nan nan nan  0.]
 [66.  1.  4. nan nan nan nan nan nan  5. nan nan nan  1.]]


## Count unique values (a histogram)
Of course, we have to get a histogram :)  
First, get a view on the ages column. This is not a copy.

In [38]:
da_ages = da[:,0]

Then, get the unique values and their counts

In [39]:
ages, age_cnt = np.unique(da_ages, return_counts=True)

print(ages)
print(age_cnt)


[28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.
 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63.
 65. 66.]
[ 1  2  1  2  4  2  4  5  5  8  7 11  7 11  7 12  7  8 13 10 19 15 12  9
 17 12 25 15 10  5  9  8  2  2  2  1  2  1]


How about stacking these as column vectors side by side?  
The trick is to turn the 1D arrays into 2D arrays then stack their transposes

In [40]:
ages = ages[np.newaxis]
print(ages.T)

[[28.]
 [29.]
 [30.]
 [31.]
 [32.]
 [33.]
 [34.]
 [35.]
 [36.]
 [37.]
 [38.]
 [39.]
 [40.]
 [41.]
 [42.]
 [43.]
 [44.]
 [45.]
 [46.]
 [47.]
 [48.]
 [49.]
 [50.]
 [51.]
 [52.]
 [53.]
 [54.]
 [55.]
 [56.]
 [57.]
 [58.]
 [59.]
 [60.]
 [61.]
 [62.]
 [63.]
 [65.]
 [66.]]


In [41]:
age_cnt = age_cnt[np.newaxis]

In [42]:
print(np.hstack((age_cnt.T, ages.T)))

[[ 1. 28.]
 [ 2. 29.]
 [ 1. 30.]
 [ 2. 31.]
 [ 4. 32.]
 [ 2. 33.]
 [ 4. 34.]
 [ 5. 35.]
 [ 5. 36.]
 [ 8. 37.]
 [ 7. 38.]
 [11. 39.]
 [ 7. 40.]
 [11. 41.]
 [ 7. 42.]
 [12. 43.]
 [ 7. 44.]
 [ 8. 45.]
 [13. 46.]
 [10. 47.]
 [19. 48.]
 [15. 49.]
 [12. 50.]
 [ 9. 51.]
 [17. 52.]
 [12. 53.]
 [25. 54.]
 [15. 55.]
 [10. 56.]
 [ 5. 57.]
 [ 9. 58.]
 [ 8. 59.]
 [ 2. 60.]
 [ 2. 61.]
 [ 2. 62.]
 [ 1. 63.]
 [ 2. 65.]
 [ 1. 66.]]


An alternative is to use `stack()`

In [35]:
ages, age_cnt = np.unique(da_ages, return_counts=True)
np.stack((age_cnt, ages), axis=-1)

array([[ 1., 28.],
       [ 2., 29.],
       [ 1., 30.],
       [ 2., 31.],
       [ 4., 32.],
       [ 2., 33.],
       [ 4., 34.],
       [ 5., 35.],
       [ 5., 36.],
       [ 8., 37.],
       [ 7., 38.],
       [11., 39.],
       [ 7., 40.],
       [11., 41.],
       [ 7., 42.],
       [12., 43.],
       [ 7., 44.],
       [ 8., 45.],
       [13., 46.],
       [10., 47.],
       [19., 48.],
       [15., 49.],
       [12., 50.],
       [ 9., 51.],
       [17., 52.],
       [12., 53.],
       [25., 54.],
       [15., 55.],
       [10., 56.],
       [ 5., 57.],
       [ 9., 58.],
       [ 8., 59.],
       [ 2., 60.],
       [ 2., 61.],
       [ 2., 62.],
       [ 1., 63.],
       [ 2., 65.],
       [ 1., 66.]])