# Introduction to Numpy

In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. 
It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

# How to create a numpy array?

## Create an 1d array from a list

In [1]:
import numpy as np

In [2]:
a = [0, 1, 2, 3, 4]

In [3]:
aa = np.array(a)

## Print the array and its type

In [4]:
print(type(aa))

<class 'numpy.ndarray'>


In [5]:
aa

array([0, 1, 2, 3, 4])

## Add 2 to each element of aa

In [6]:
aa + 2

array([2, 3, 4, 5, 6])

## Create a 2d array from a list fo lists

In [7]:
b = [[0, 1, 2], [4, 5, 6], [7, 8, 9]]

In [8]:
bb = np.array(b)

In [9]:
bb

array([[0, 1, 2],
       [4, 5, 6],
       [7, 8, 9]])

You may also specify the datatype by setting the dtype argument. 
Some of the most commonly used numpy dtypes are: 'float', 'int', 'bool', 'str' and 'object'.

## Create a float 2d array

In [10]:
bb_f = np.array(b, dtype="float")

In [11]:
bb_f

array([[0., 1., 2.],
       [4., 5., 6.],
       [7., 8., 9.]])

In [12]:
bb_i = np.array(b, dtype="int")

In [13]:
bb_i

array([[0, 1, 2],
       [4, 5, 6],
       [7, 8, 9]])

## Convert to "int" datatype

In [14]:
bb_f.astype("int")

array([[0, 1, 2],
       [4, 5, 6],
       [7, 8, 9]])

## Convert to int then to str datatype

In [15]:
bb_f.astype("int").astype("str")

array([['0', '1', '2'],
       ['4', '5', '6'],
       ['7', '8', '9']], dtype='<U11')

If you are uncertain about what datatype your array will hold or if you want to hold characters and numbers in the same array, you can set the dtype as 'object'.

## Create a boolean array

In [16]:
bb_b = np.array([1, 0, 10], dtype="bool")

In [19]:
bb_b

array([ True, False,  True])

## Create an object array to hold numbers as well as strings

In [20]:
bb_obj = np.array([1, "b"], dtype="object")

In [21]:
bb_obj

array([1, 'b'], dtype=object)

## Convert an array back to a list

In [22]:
bb_obj.tolist()

[1, 'b']

In [23]:
bb_b.tolist()

[True, False, True]

To summarise, the main differences with python lists are:

1. Arrays support vectorised operations, while lists don’t.
2. Once an array is created, you cannot change its size. You will have to create a new array or overwrite the existing one.
3. Every array has one and only one dtype. All items in it should be of that dtype.
4. An equivalent numpy array occupies much less space than a python list of lists.

# How to inspect the size and shape of a numpy array?

<li>If it is a 1D or a 2D array or more. (ndim)</li>

<li>How many items are present in each dimension (shape)</li>

<li>What is its datatype (dtype)</li>

<li>What is the total number of items in it (size)</li>

<li>Samples of first few items in the array (through indexing)</li>

## Create a 2d array with 3 rows and 4 columns

In [24]:
c = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]

In [25]:
cc = np.array(c, dtype="float")

In [26]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.]])

In [27]:
print("Shape: ", cc.shape)

Shape:  (3, 4)


In [28]:
print("Datatype: ", cc.dtype)

Datatype:  float64


In [29]:
print("Size: ", cc.size)

Size:  12


In [30]:
print("Num Dimensions: ", cc.ndim)

Num Dimensions:  2


# How to extract specific items from an array?

In [31]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.]])

## Extract the first 2 rows and columns

In [32]:
cc[:2, :2]

array([[1., 2.],
       [5., 6.]])

In [33]:
## WHY
cc[2:, 2:]

array([[11., 12.]])

## Get the boolean output by applying the condition to each element

In [34]:
d = cc > 4

In [35]:
d

array([[False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [36]:
cc[d]

array([ 5.,  6.,  7.,  8.,  9., 10., 11., 12.])

# How to reverse the rows and the whole array?

## Reverse only the row positions

In [38]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.]])

In [37]:
cc [::-1, ]

array([[ 9., 10., 11., 12.],
       [ 5.,  6.,  7.,  8.],
       [ 1.,  2.,  3.,  4.]])

## Reverse the row and column positions

In [39]:
cc[::-1, ::-1]

array([[12., 11., 10.,  9.],
       [ 8.,  7.,  6.,  5.],
       [ 4.,  3.,  2.,  1.]])

# How to represent missing values and infinite?

Missing values can be represented using np.nan object, while np.inf represents infinite.

Let’s place some in arr2d.

In [40]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.]])

In [41]:
cc[1,1] = np.nan 

In [42]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5., nan,  7.,  8.],
       [ 9., 10., 11., 12.]])

In [43]:
cc[1,2] = np.inf

In [44]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5., nan, inf,  8.],
       [ 9., 10., 11., 12.]])

## Replace nan and inf with -1. Don't use cc == np.nan

In [52]:
Missing_Bool = np.isnan(cc)
Missing_Bool = np.isinf(cc)

In [53]:
cc[Missing_Bool] = -1

In [54]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5., -1., -1.,  8.],
       [ 9., 10., 11., 12.]])

# How to compute mean, min, max on the ndarray?

In [55]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5., -1., -1.,  8.],
       [ 9., 10., 11., 12.]])

## Mean, max and min

In [56]:
print("Mean Value is: ", cc.mean())

Mean Value is:  5.25


In [57]:
print("Max Value is: ", cc.max())

Max Value is:  12.0


In [58]:
print("Min Value is: ", cc.min())

Min Value is:  -1.0


if you want to compute the minimum values row wise or column wise, use the np.amin version instead.

## Row wise and column wise min

In [59]:
print("Column wise minimum: ", np.amin(cc, axis=0))

Column wise minimum:  [ 1. -1. -1.  4.]


In [60]:
print("Row wise minimum: ", np.amin(cc, axis=1))

Row wise minimum:  [ 1. -1.  9.]


## Cumulative Sum 累積總和

In [71]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5., -1., -1.,  8.],
       [ 9., 10., 11., 12.]])

In [61]:
np.cumsum(cc)  ##[0+1, 1+2, 3+3, 6+4, 10+5......]

array([ 1.,  3.,  6., 10., 15., 14., 13., 21., 30., 40., 51., 63.])

# How to create a new array from an existing array?

If you just assign a portion of an array to another array, the new array you just created actually refers to the parent array in memory.

## Assign portion of cc to cc_a. Doesn't really create a new array.

In [72]:
cc

array([[ 1.,  2.,  3.,  4.],
       [ 5., -1., -1.,  8.],
       [ 9., 10., 11., 12.]])

In [73]:
cc_a = cc[:2, :2]

In [74]:
cc_a

array([[ 1.,  2.],
       [ 5., -1.]])

In [75]:
cc_a[:1, :1] = 100

In [76]:
cc_a

array([[100.,   2.],
       [  5.,  -1.]])

In [85]:
cc  ## it will replect in parent array

array([[100.,   2.,   3.,   4.],
       [  5.,  -1.,  -1.,   8.],
       [  9.,  10.,  11.,  12.]])

## Copy portion of cc to cc_b

In [79]:
cc

array([[100.,   2.,   3.,   4.],
       [  5.,  -1.,  -1.,   8.],
       [  9.,  10.,  11.,  12.]])

In [80]:
cc_b = cc[:2, :2].copy()

In [81]:
cc_b

array([[100.,   2.],
       [  5.,  -1.]])

In [82]:
cc_b[:1, :1] = 102

In [83]:
cc_b

array([[102.,   2.],
       [  5.,  -1.]])

In [84]:
cc  ## 102 will not reflect in parent array

array([[100.,   2.,   3.,   4.],
       [  5.,  -1.,  -1.,   8.],
       [  9.,  10.,  11.,  12.]])

# Reshaping and Flattening Multidimensional arrays

## Reshape a 3x4 array to 4x3 array

In [86]:
cc

array([[100.,   2.,   3.,   4.],
       [  5.,  -1.,  -1.,   8.],
       [  9.,  10.,  11.,  12.]])

In [87]:
cc.reshape(4,3)

array([[100.,   2.,   3.],
       [  4.,   5.,  -1.],
       [ -1.,   8.,   9.],
       [ 10.,  11.,  12.]])

In [88]:
cc

array([[100.,   2.,   3.,   4.],
       [  5.,  -1.,  -1.,   8.],
       [  9.,  10.,  11.,  12.]])

The difference between ravel and flatten is, the new array created using ravel is actually a reference to the parent array. 

So, any changes to the new array will affect the parent as well. But is memory efficient since it does not create a copy.

## Flatten it to a 1d array

In [89]:
cc.flatten()

array([100.,   2.,   3.,   4.,   5.,  -1.,  -1.,   8.,   9.,  10.,  11.,
        12.])

In [91]:
cc

array([[100.,   2.,   3.,   4.],
       [  5.,  -1.,  -1.,   8.],
       [  9.,  10.,  11.,  12.]])

## Changing the flatten array does not change parent array

In [92]:
d = cc.flatten()

In [98]:
d[0] = 111  # does not affect cc parent array

In [96]:
d

array([111.,   2.,   3.,   4.,   5.,  -1.,  -1.,   8.,   9.,  10.,  11.,
        12.])

In [97]:
cc

array([[100.,   2.,   3.,   4.],
       [  5.,  -1.,  -1.,   8.],
       [  9.,  10.,  11.,  12.]])

## Changing the raveled array changes the parent also

In [99]:
e = cc.ravel()

In [103]:
e[0] = 101  # will effect changing to cc parent array

In [104]:
e

array([101.,   2.,   3.,   4.,   5.,  -1.,  -1.,   8.,   9.,  10.,  11.,
        12.])

In [105]:
cc

array([[101.,   2.,   3.,   4.],
       [  5.,  -1.,  -1.,   8.],
       [  9.,  10.,  11.,  12.]])

# How to create sequences, repetitions and random number using numpy?

## Lower limit is 0 be default

In [106]:
print(np.arange(10))

[0 1 2 3 4 5 6 7 8 9]


In [107]:
# 0 to 9
print(np.arange(0,10))

[0 1 2 3 4 5 6 7 8 9]


In [108]:
# 0 to 9 with step of 2
print(np.arange(0, 10, 2))

[0 2 4 6 8]


In [109]:
# 10 to 1, decreasing order
print(np.arange(10, 0, -1))

[10  9  8  7  6  5  4  3  2  1]


In [110]:
print(np.arange(10, 0, -2))

[10  8  6  4  2]


## Create an array of evenly

In [111]:
# start at 1 and end at 50
np.linspace(1,50,10,dtype=int)

array([ 1,  6, 11, 17, 22, 28, 33, 39, 44, 50])

In [112]:
np.linspace(start=1, stop=50, num=10, dtype=int)

array([ 1,  6, 11, 17, 22, 28, 33, 39, 44, 50])

In [115]:
np.linspace(0,2,9)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

Similar to np.linspace, there is also np.logspace which rises in a logarithmic scale. 

In np.logspace, the given start value is actually base^start and ends with base^stop, with a default based value of 10.

## Limit the number of digits after the decimal to 2

In [116]:
np.set_printoptions(precision = 2)

In [117]:
# Start at 10^1 and end at 10^50
np.logspace(1, 50, 10, 10)

array([1.00e+01, 2.78e+06, 7.74e+11, 2.15e+17, 5.99e+22, 1.67e+28,
       4.64e+33, 1.29e+39, 3.59e+44, 1.00e+50])

In [118]:
np.logspace(start = 1, stop = 50, num = 10, base = 10)

array([1.00e+01, 2.78e+06, 7.74e+11, 2.15e+17, 5.99e+22, 1.67e+28,
       4.64e+33, 1.29e+39, 3.59e+44, 1.00e+50])

The np.zeros and np.ones functions lets you create arrays of desired shape where all the items are either 0’s or 1’s.

In [119]:
np.zeros([2,2])

array([[0., 0.],
       [0., 0.]])

In [120]:
np.ones([2,2])

array([[1., 1.],
       [1., 1.]])

# How to create repeating sequences?

np.tile will repeat a whole list or array n times. Whereas, np.repeat repeats each item n times.

In [121]:
f = [1, 2, 3]

## Repeat whole of "a" two times

In [123]:
print("Tile: ", np.tile(f,2))

Tile:  [1 2 3 1 2 3]


## Repeat each element of "a" two times

In [126]:
print("Repeat: ", np.repeat(f, 2))

Repeat:  [1 1 2 2 3 3]


# How to generate random numbers?

The random module provides nice functions to generate random numbers 
(and also statistical distributions) of any given shape.

## Random numbers between [0,1] of shape 2,2

In [132]:
print(np.random.rand(2,2))

[[0.76 0.67]
 [0.12 0.1 ]]


## Normal distribution with mean = 0 and variance = 1 of shape 2,2

In [130]:
print(np.random.randn(2,2))

[[-1.63 -0.41]
 [ 0.13 -0.62]]


## Random integers between (0, 10) of shape 2,2

In [135]:
print(np.random.randint(0, 10, size=(2,2)))

[[1 1]
 [1 2]]


In [136]:
print(np.random.randint(0, 10, size=[2, 2]))

[[6 4]
 [0 6]]


## One random number between (0, 1)

In [138]:
print(np.random.random())

0.5252925596298051


## Random numbers between (0, 1) of shape 2,2

In [140]:
print(np.random.random(size=[2,2]))

[[0.11 0.69]
 [0.21 0.34]]


## Pick 10 items from a given list, with equal probability

In [141]:
print(np.random.choice(["a", "e", "i", "o", "u"], size=10))

['i' 'e' 'e' 'a' 'u' 'o' 'e' 'u' 'u' 'a']


## Pick 10 items from a given list with a predefined probability "p"

In [142]:
print(np.random.choice(["a", "e", "i", "o", "u"], size=10, p=[0.3, 0.1, 0.1, 0.4, 0.1]))
# predifined probability need to be total as 1

['o' 'o' 'a' 'o' 'e' 'o' 'u' 'a' 'o' 'a']


If you want to repeat the same set of random numbers every time, you need to set the seed or the random state. 

The seed can be any value.

The only requirement is you must set the seed to the same value every time you want to generate the same set of random numbers.

## Create the random state

In [159]:
g = np.random.RandomState(100)

## Create random numbers between (0, 1) of shape 2, 2

In [160]:
print(g.rand(2,2))

[[0.54 0.28]
 [0.42 0.84]]


In [152]:
print(g)

<mtrand.RandomState object at 0x00000000058EC900>


## Set the random seed

In [161]:
np.random.seed(100)

## Create random numbers between (0, 1) of shape 2, 2

In [162]:
print(np.random.rand(2,2))

[[0.54 0.28]
 [0.42 0.84]]


# How to get the unique items and the counts?

The np.unique method can be used to get the unique items. 

If you want the repetition counts of each item, set the return_counts parameter to True.

## Create random integers of size 10 between (0, 10)

In [184]:
np.random.seed(100)

In [185]:
h = np.random.randint(0, 10, size= 10)

In [186]:
print(h)

[8 8 3 7 7 0 4 2 5 2]


## Get the unique items and their counts

In [187]:
uniqs, counts = np.unique(h, return_counts= True)

In [188]:
print("Unique items: ", uniqs)

Unique items:  [0 2 3 4 5 7 8]


In [189]:
print("Counts      : ", counts)

Counts      :  [1 2 1 1 1 2 2]
