# Introduction to Numpy

#### Credit: https://www.machinelearningplus.com/python/numpy-tutorial-part1-array-python-examples/

## What does numpy provide?

At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

## How to create a numpy array?

There are multiple ways to create a numpy array, one of the most common ways is to create one from a list or a list like an object by passing it to the np.array function.

In [2]:
import numpy as np  # import the numpy library

In [3]:
# create an Id array from a list
list1 = [0,1,2,3,4]
arr1d = np.array(list1)

# print the array and it's type
print(type(arr1d))
arr1d

<class 'numpy.ndarray'>


array([0, 1, 2, 3, 4])

The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.
That means, if you apply a function it is performed on every item in the array, rather than on the whole array object.

In [4]:
# add 2 to every item in the array
arr1d + 2

array([2, 3, 4, 5, 6])

Another characteristic is that, once a numpy array is created, you cannot increase its size. To do so, you will have to create a new array.

In [5]:
# create a 2d array from a list of lists
list2 = [[0,1,2], [3,4,5], [6,7,8]]
arr2d = np.array(list2)
arr2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

You may also specify the datatype by setting the dtype argument. Some of the most commonly used numpy dtypes are: 'float', 'int', 'bool', 'str' and 'object'.

To control the memory allocations you may choose to use one of ‘float32’, ‘float64’, ‘int8’, ‘int16’ or ‘int32’.


In [6]:
# create a float 2d array
arr2d_f = np.array(list2, 'float')
arr2d_f

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

you can also convert it to a different datatype using the astype method.

In [7]:
arr2d_f.astype('int')
arr2d_f

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

A numpy array must have all items to be of the same data type, unlike lists. 

However, if you are uncertain about what datatype your array will hold or if you want to hold characters and numbers in the same array, you can set the dtype as 'object'.

In [8]:
# create an object array to hold numbers as well as strings
arr1d_obj = np.array([1, 'a'], dtype='object')
arr1d_obj

array([1, 'a'], dtype=object)

 you can always convert an array back to a python list using tolist()

In [9]:
# convert an array back to a list
arr1d_obj.tolist()

[1, 'a']

##  Inspecting the size and shape of a numpy array

Let’s suppose you were handed a numpy vector that you didn’t create yourself. 
The things you would want to know about the array are:

If it is a 1D or a 2D array or more. (ndim)

How many items are present in each dimension (shape)

What is its datatype (dtype)

What is the total number of items in it (size)

Samples of first few items in the array (through indexing)

In [10]:
# create a 2d array with 3 rows and 4 columns 
list2 = [[1,2,3,4], [3,4,5,6,], [5,6,7,8]]
arr2 = np.array(list2, dtype='float')
arr2

array([[1., 2., 3., 4.],
       [3., 4., 5., 6.],
       [5., 6., 7., 8.]])

In [11]:
# shape
print('Shape: ', arr2.shape)

# dimension 
print('Dimension: ', arr2.ndim)

# dtype
print('Datatype: ', arr2.dtype)

# size 
print('Size: ', arr2.size)

Shape:  (3, 4)
Dimension:  2
Datatype:  float64
Size:  12


## How to extract specific items from an array?

In [12]:
# extract the first 2 rows and columns
arr2[:2, :2]

array([[1., 2.],
       [3., 4.]])

Additionally, numpy arrays support boolean indexing.

In [13]:
# get the boolean output by applying the condition to each element
b = arr2 > 4
b

array([[False, False, False, False],
       [False, False,  True,  True],
       [ True,  True,  True,  True]])

In [14]:
arr2[arr2>4]

array([5., 6., 5., 6., 7., 8.])

### Reverse the rows and the whole array

In [15]:
# reverse only the row positions
arr2[::-1, ]

array([[5., 6., 7., 8.],
       [3., 4., 5., 6.],
       [1., 2., 3., 4.]])

In [16]:
# reverse the row and the column positions
arr2[::-1, ::-1]

array([[8., 7., 6., 5.],
       [6., 5., 4., 3.],
       [4., 3., 2., 1.]])

### How to represent missing values and infinite?

In [17]:
# insert a nan and an inf
arr2[1,1] = np.nan
arr2[1,2] = np.inf
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., nan, inf,  6.],
       [ 5.,  6.,  7.,  8.]])

In [18]:
# replace nan and inf with -1 
missing_bool = np.isnan(arr2) | np.isinf(arr2)
arr2[missing_bool] = -1
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., -1., -1.,  6.],
       [ 5.,  6.,  7.,  8.]])

In [19]:
### How to compute mean, min and max on the ndarray?

In [20]:
# mean, min and max
print('Mean: ', arr2.mean())
print('Min: ', arr2.min())
print('Max: ', arr2.max())

Mean:  3.5833333333333335
Min:  -1.0
Max:  8.0


However, if you want to compute the minimum values row wise or column wise, use the np.amin version instead.

In [21]:
# row wise and column wise 
print('Column wise minimun: ', np.amin(arr2, axis=0))
print('Row wise minimum: ', np.amin(arr2, axis=1))

Column wise minimun:  [ 1. -1. -1.  4.]
Row wise minimum:  [ 1. -1.  5.]


In [22]:
# cumulative sum
np.cumsum(arr2)

array([ 1.,  3.,  6., 10., 13., 12., 11., 17., 22., 28., 35., 43.])

## How to create a new array from an existing array?

If you just assign a portion of an array to another array, the new array you just created actually refers to the parent array in memory.

That means, if you make any changes to the new array, it will reflect in the parent array as well.

So to avoid disturbing the parent array, you need to make a copy of it using copy(). All numpy arrays come with the copy() method.

In [24]:
# assign portion of arr2 to arr2a, which does
# not create a new array
arr2a = arr2[:2,:2]
arr2a[:1, :1] = 100  # 100 will reflect in arr2
arr2

array([[100.,   2.,   3.,   4.],
       [  3.,  -1.,  -1.,   6.],
       [  5.,   6.,   7.,   8.]])

In [29]:
# copy portion of arr2 to arr2b
arr2b = arr2[:2, :2].copy()
arr2b[:1, :1] = 101
print(arr2)
print(arr2b)

[[100.   2.   3.   4.]
 [  3.  -1.  -1.   6.]
 [  5.   6.   7.   8.]]
[[101.   2.]
 [  3.  -1.]]


## Reshaping and Flattening Multidimensional arrays

Reshaping is changing the arrangement of items so that shape of the array changes while maintaining the same number of dimensions.

Flattening, however, will convert a multi-dimensional array to a flat 1d array. 

In [31]:
# reshape a 3x4 arry to 4x3 array
arr2.reshape(4,3)

array([[100.,   2.,   3.],
       [  4.,   3.,  -1.],
       [ -1.,   6.,   5.],
       [  6.,   7.,   8.]])

There are 2 popular ways to implement flattening. That is using the flatten() method and the other using the ravel() method.

The difference between ravel and flatten is, the new array created using ravel is actually a reference to the parent array. So, any changes to the new array will affect the parent as well. But is memory efficient since it does not create a copy.



In [32]:
# flatten 2d array to a 1d array
arr2.flatten()

array([100.,   2.,   3.,   4.,   3.,  -1.,  -1.,   6.,   5.,   6.,   7.,
         8.])

In [35]:
# changing the flattened array does not change parent
b1 = arr2.flatten()
b1[0] = 672  # changing b1 does not affect arr2
arr2

In [39]:
# changing the raveled array changes the parent also
b2 = arr2.ravel()
b2[0] = 672  # changing b2 changes arr2 also
arr2

array([[672.,   2.,   3.,   4.],
       [  3.,  -1.,  -1.,   6.],
       [  5.,   6.,   7.,   8.]])

## How to create sequences, repetitions and random numbers?

The np.arange function comes in handy to create customised number sequences as ndarray.

In [41]:
print(np.arange(5))

# 0 to 9
print(np.arange(0, 10))

[0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9]


In [43]:
# 0 to 9 with step of 2 
print(np.arange(0, 10, 2))

[0 2 4 6 8]


In [44]:
# 10 to 0, decreasing order
print(np.arange(10, 0, -1))

[10  9  8  7  6  5  4  3  2  1]


Say, you want to create an array of exactly 10 numbers between 1 and 50, Can you compute what would be the step value?

Well, I am going to use the np.linspace instead.

In [45]:
# start at  1 and end at 50
np.linspace(start=1, stop=50, num=10, dtype=int)

array([ 1,  6, 11, 17, 22, 28, 33, 39, 44, 50])

Since I explicitly forced the dtype to be int, the numbers are not equally spaced because of the rounding

Similar to np.linspace, there is also np.logspace which rises in a logarithmic scale. In np.logspace, the given start value is actually base^start and ends with base^stop, with a default based value of 10.

In [46]:
# limit the number of digits after the decimal to 2
np.set_printoptions(precision=2)

# start at 10^1 and end at 10^50
np.logspace(start=1, stop=50, num=10, base=10)

array([1.00e+01, 2.78e+06, 7.74e+11, 2.15e+17, 5.99e+22, 1.67e+28,
       4.64e+33, 1.29e+39, 3.59e+44, 1.00e+50])

The np.zeros and np.ones functions lets you create arrays of desired shape where all the items are either 0’s or 1’s.

In [49]:
# create 2x2 array where all items are zeros
np.zeros([2,2])

array([[0., 0.],
       [0., 0.]])

In [50]:
# create 2x2 array where all items are ones
np.ones([2,2])

array([[1., 1.],
       [1., 1.]])

In [52]:
# create 2x2 array where all items are a number other than 0 or 1
np.full([2,2], 7)

array([[7, 7],
       [7, 7]])

## How to create repeating sequences?

np.tile will repeat a whole list or array n times. Whereas, np.repeat repeats each item n times.

In [65]:
a = [1,2,3]

# repeat whole of 'a' two times
print('Tile: ', np.tile(a,2))

# repeat each element of 'a' two times
print('Repeat: ', np.repeat(a,2))

Tile:  [1 2 3 1 2 3]
Repeat:  [1 1 2 2 3 3]


## How to generate random numbers?

In [66]:
# random numbers between [0,1) of shape 2,2
print(np.random.rand(2,2))

[[0.3  0.78]
 [0.27 0.81]]


In [67]:
# normal distribution with mean=0 and variance=1 of shape 2,2
print(np.random.randn(2,2))

[[-0.78  0.71]
 [-0.03  0.92]]


In [68]:
# random integers between [0,10) of shape 2,2 
print(np.random.randint(0, 10, size=[2,2]))

[[9 5]
 [2 0]]


In [69]:
# one random number between [0, 1)
print(np.random.random())

0.6492422514315721


In [70]:
# random numbers between [0,1) of shape 2,2
print(np.random.random(size=[2,2]))

[[0.03 0.81]
 [0.86 0.15]]


In [71]:
# pick 10 items from a give list, with equal probability
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))

['o' 'o' 'i' 'i' 'a' 'i' 'i' 'o' 'e' 'u']


In [72]:
# pick 10 items from a given list with a predefined probability
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10, p=[0.3, .1, 0.1, 0.4, 0.1]))

['i' 'o' 'o' 'e' 'o' 'o' 'a' 'o' 'a' 'o']


If you want to repeat the same set of random numbers every time, you need to set the seed or the random state. The see can be any value. The only requirement is you must set the seed to the same value every time you want to generate the same set of random numbers.

### how to get the uniques items and the counts

The np.unique method can be used to get the unique items. If you want the repetition counts of each item, set the return_counts parameter to True.


In [74]:
# create random integers of size 10 between [0, 10)
np.random.seed(100)
arr_rand = np.random.randint(0, 10, size=10)
arr_rand

array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])

In [78]:
# get the unique items and their counts
uniques, counts = np.unique(arr_rand, return_counts=True)
print("Unique items: ", uniques)
print("Counts:       ", counts)

Unique items:  [0 2 3 4 5 7 8]
Counts:        [1 2 1 1 1 2 2]
