<h1 style="font-size: 20pt">Python Notebook | Data Science | NumPy</h1><br/>

<b> Author: </b> Tamoghna Saha<br/> 
<b> Created: </b> December 2018<br/>
<b> Updated: </b> October 2022<br/>

![Python](../img/Python-programming.jpg)

# Table of Contents:

NumPy
* ```ndarray``` object & manipulation
* Array creation & operations
* Indexing, slicing & advanced indexing
* Arithmetic, Mathematical & Statistical operations
* Iteration over array
* Broadcasting
* I/O operations

# NumPy

NumPy (a.k.a `Num`erical `Py`thon) is an open source Python library that aids in mathematical, scientific, engineering, and data science programming. It has been built to work with the N-dimensional array, linear algebra, random number, Fourier transform, etc.

NumPy is an incredible library to perform mathematical and statistical operations because it is fast and memory efficient.

## ```ndarray``` object & manipulations

The most important object defined in NumPy is an N-dimensional array type called __ndarray__. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. Each element in ndarray is an object of data-type object (called __dtype__).

The ndarray object consists of __contiguous one-dimensional__ segment of computer memory, combined with an indexing scheme that maps each item to a location in the memory block. The memory block holds the elements in a __row-major order (C style)__ or a __column-major order (FORTRAN or MatLab style)__.

The basic ndarray is created using an array function in NumPy as follows:

```python
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)
```

`NOTE`: In NumPy, axis = 0 is cols and axis = 1 is rows.

Let's visualize how these ndarray is represented:
![ndarray](../img/numpy_array_viz.png)

In [1]:
import numpy as np

var_1 = np.array([1, 2, 3])
var_2 = np.array([[1, 2], [3, 4]], dtype=float)
var_3 = np.array([1, 2, 3, 4, 5], ndmin = 2)
print(var_1)
print(var_2)
print(var_3)

[1 2 3]
[[1. 2.]
 [3. 4.]]
[[1 2 3 4 5]]


In [2]:
## ndarray manipulations
var_1 = np.array([[1,2,3],[4,5,6]])
var_1_shape = var_1.shape
print("Shape: {}".format(var_1_shape))

# reshaping
var_2 = var_1.reshape(3,2)
print('\n'+"="*25+'\n')
print(var_2)

# itemsize
var_3 = np.array([1,2,3,4,5], dtype = np.int8)
var_4 = np.array([1,2,3,4,5], dtype = np.float64)
print('\n'+"="*25+'\n')
print(var_3.itemsize)
print(var_4.itemsize)

# transpose
var_5 = var_1.T
print('\n'+"="*25+'\n')
print(var_5)

Shape: (2, 3)


[[1 2]
 [3 4]
 [5 6]]


1
8


[[1 4]
 [2 5]
 [3 6]]


Some other ndarray manipulation operations are:
* `flatten` - Returns a copy of the array collapsed into one dimension
* `swapaxes` - Interchanges the two axes of an array
* `expand_dims` - Expands the shape of an array
* `concatenate` - Joins a sequence of arrays along an existing axis
* `hstack` / `vstack` - Stacks arrays in sequence horizontally (column wise) / vertically (row wise)
* `hsplit` / `vsplit` - Splits arrays into multiple sub-arrays horizontally / vertically
* `resize`, `append`, `insert`, `delete`, `unique`

The difference between `reshape` and `resize` is that reshape will change the dimension of the array __provided the dimension remains same__, but in resize, even if the dimension shape differs, it will change the dimension and update with __element repeatation__.

## Array creation & operations

A new ndarray object can be constructed by any of the following array creation routines:

In [3]:
var_1 = np.zeros([2,3], dtype = float)
var_2 = np.ones([2,3], dtype = complex)

print(var_1)
print('\n'+"="*25+'\n')
print(var_2)

[[0. 0. 0.]
 [0. 0. 0.]]


[[1.+0.j 1.+0.j 1.+0.j]
 [1.+0.j 1.+0.j 1.+0.j]]


`ndarray` can be created from numerical ranges and there are various ways to do it:
* `numpy.arange(start, stop, step, dtype)` - Similar to Python in-built function of `range`
* `numpy.linspace(start, stop, num, endpoint, retstep, dtype)` - Unlike arange, it generates evenly spaced numbers between the interval, where num = 50 by default
* `numpy.logspace(start, stop, num, endpoint, base, dtype)` - It generates evenly spaced numbers in log scale, where base = 10 by default

It can also be created using existing iterable data as well using `numpy.asarray(a, dtype, order)`.

In [4]:
print("arange\n")
var_1 = np.arange(9,30,3,dtype = float)
var_2 = np.arange(18)
var_2.ndim
var_3 = var_2.reshape(3,2,3)
print(var_1)
print(var_2)
print(var_3)
print('\n'+"="*25+'\n')

arange

[ 9. 12. 15. 18. 21. 24. 27.]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17]
[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]]]




In [5]:
print("linspace\n")
var_1 = np.linspace(10,50,5)
var_2 = np.linspace(10,30,5,endpoint=False)
var_3 = np.linspace(1,4,5,retstep=True)
print(var_1)
print(var_2)
print(var_3)

linspace

[10. 20. 30. 40. 50.]
[10. 14. 18. 22. 26.]
(array([1.  , 1.75, 2.5 , 3.25, 4.  ]), 0.75)


In [6]:
print("logspace\n")
var_1 = np.logspace(1.0, 2.0, 10)
var_2 = np.logspace(1.0, 10.0, 10, base=2)
print(var_1)
print(var_2)
print('\n'+"="*25+'\n')

print("asarray\n")
obj_1 = [1,2,3,4]
var_1 = np.asarray(obj_1)
print(var_1)

logspace

[ 10.          12.91549665  16.68100537  21.5443469   27.82559402
  35.93813664  46.41588834  59.94842503  77.42636827 100.        ]
[   2.    4.    8.   16.   32.   64.  128.  256.  512. 1024.]


asarray

[1 2 3 4]


`ndarray` can also be created randomly using __random__ method in NumPy. For more details, check the [NumPy doc](https://docs.scipy.org/doc/numpy/reference/routines.random.html). Some examples are provided below:

In [7]:
print("\nRandom Int")
print(np.random.randint(1,5,5)) # 5 element array of random number from 1 to 5(exclusive)
print(np.random.random_integers(1,5,5)) # 5 np.int type element array of random number from 1 to 5(inclusive)

print("\nRandom Choice")
# 3 elements array
# random choice from 0 to 4
# no duplicate num using replace=False
# probability of occurence of the elements, summed up to 1
print(np.random.choice(5, 3, replace=False))


Random Int
[4 4 2 1 4]
[1 4 4 4 1]

Random Choice
[2 0 1]


  print(np.random.random_integers(1,5,5)) # 5 np.int type element array of random number from 1 to 5(inclusive)


In [8]:
list_1 = ['Rosstefer', 'Chanandler Bong', 'Ken Adams', 'Regina Phalange', 'Big Fat Golly', 'Backpacking through Western Europe']
print(np.random.choice(list_1, 3, p=[0.2, 0.4, 0.1, 0.15, 0.1, 0.05]))

['Chanandler Bong' 'Chanandler Bong' 'Chanandler Bong']


## Indexing, slicing & advanced indexing

Contents of ```ndarray``` object can be accessed and modified by indexing or slicing, just like Python's in-built iterables. A __slice__ object is passed to the array to extract a part of array having __start, stop,__ and __step__ parameters. 

Slicing is applicable in multi-dimensional ndarray as well, but it becomes more complex than just a start and stop index. This can be achieved using __ellipsis__ which is denoted as ```...``` .

In [9]:
var_1 = np.arange(20) 
sliced = slice(2,17,2)
print(var_1[sliced])
print('\n'+"="*25+'\n')

var_2 = var_1[2:17:2]
var_3 = var_2[3:]
print(var_2)
print(var_3)

[ 2  4  6  8 10 12 14 16]


[ 2  4  6  8 10 12 14 16]
[ 8 10 12 14 16]


In [10]:
# multi-dimensional
var_1 = np.array([[1,2,3],[3,4,5],[5,6,7]]) 
print(var_1)
print('\nNow we will slice the array from the index var_1[1:]\n') 
print(var_1[1:]) # slice items starting from index

[[1 2 3]
 [3 4 5]
 [5 6 7]]

Now we will slice the array from the index var_1[1:]

[[3 4 5]
 [5 6 7]]


In [11]:
## Ellipsis
var_1 = np.array([[1,2,3],[3,4,5],[4,5,6]])

print('Our array is:')
print(var_1)

# this returns array of items in the second column 
print('\nThe items in the second column are:')
print(var_1[...,1])

Our array is:
[[1 2 3]
 [3 4 5]
 [4 5 6]]

The items in the second column are:
[2 4 5]


In [12]:
# Now we will slice all items from the second row 
print('\nThe items in the second row are:')
print(var_1[1,...])

# Now we will slice all items from column 1 onwards 
print('\nThe items from column 2 are:')
print(var_1[...,1:])


The items in the second row are:
[3 4 5]

The items from column 2 are:
[[2 3]
 [4 5]
 [5 6]]


In [13]:
var_1 = np.arange(1,17).reshape(2,2,2,2)
print('Our array is:')
print(var_1)

Our array is:
[[[[ 1  2]
   [ 3  4]]

  [[ 5  6]
   [ 7  8]]]


 [[[ 9 10]
   [11 12]]

  [[13 14]
   [15 16]]]]


In [14]:
print("\nItems from 1st row component of this 4D array")
print(var_1[0,...])

print("\nItems from 1st row of the 2nd row component of this 4D array")
print(var_1[1,0,...])

print("\nItems from 1st row of the 2nd row of the 2nd row component of this 4D array")
print(var_1[1,1,0,...])


Items from 1st row component of this 4D array
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

Items from 1st row of the 2nd row component of this 4D array
[[ 9 10]
 [11 12]]

Items from 1st row of the 2nd row of the 2nd row component of this 4D array
[13 14]


In [15]:
print("\nItems from 1st col component of this 4D array")
print(var_1[...,0])

print("\nItems from 2nd col of the 1st col component of this 4D array")
print(var_1[...,1,0])

print("\nItems from 1st col of the 2nd col of the 1st col component of this 4D array")
print(var_1[...,0,1,0])


Items from 1st col component of this 4D array
[[[ 1  3]
  [ 5  7]]

 [[ 9 11]
  [13 15]]]

Items from 2nd col of the 1st col component of this 4D array
[[ 3  7]
 [11 15]]

Items from 1st col of the 2nd col of the 1st col component of this 4D array
[ 3 11]


It is possible to make a selection from ndarray that is a non-tuple sequence, ndarray object of integer or boolean data type using __advanced indexing__. There are 2 types of it:
* __Integer__: selecting any arbitrary item in an array based on its N-dimensional index
* __Boolean__: resultant object is meant to be the result of Boolean operations

In [16]:
var_1 = np.array([[ 0,  1,  2],[ 3,  4,  5],[ 6,  7,  8],[ 9, 10, 11]])

print('Our array is:')
print(var_1)

rows = [1,3]
cols = [0,2]
var_2 = var_1[rows,cols]
print('\nAfter int indexing technique #1')
print(var_2)

Our array is:
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

After int indexing technique #1
[ 3 11]


In [17]:
rows = [[0,0], [3,3]]
cols = [[0,2], [0,2]]
var_3 = var_1[rows,cols]
print('\nAfter int indexing technique #2')
print(var_3)

var_4 = var_1[1:, [0,1]]
print('\nAfter int indexing technique #3')
print(var_4)

var_5 = var_1[[0,1], :-1]
print('\nAfter int indexing technique #4')
print(var_5)


After int indexing technique #2
[[ 0  2]
 [ 9 11]]

After int indexing technique #3
[[ 3  4]
 [ 6  7]
 [ 9 10]]

After int indexing technique #4
[[0 1]
 [3 4]]


In [18]:
var_6 = var_1[var_1 > 6]
print('\nAfter boolean indexing technique #1')
print(var_6)

var_7 = var_1[np.iscomplex(var_1)]
print('\nAfter boolean indexing technique #2')
print(var_7)


After boolean indexing technique #1
[ 7  8  9 10 11]

After boolean indexing technique #2
[]


## Arithmetic, Mathematical & Statistical Operations

Apart from the usual arithmetic operations, NumPy can perform `dot`, `reciprocal`, `power`, `mod`, `matmul`, etc. Mathematical operation includes trigonometric functions, rounding off figure. 

In [19]:
# arithmetic
var_1 = np.array([0.25, 1.33, 12.5, 0, -10])
print('\nAfter reciprocal')
print(np.reciprocal(var_1))

var_2 = np.array([2,3,5])
var_3 = np.array([10,5,2])
print('\nAfter dot product')
print(np.dot(var_2, var_3))

print('\nAfter power with 2')
print(np.power(var_2, 2))
print('\nAfter power with var_3')
print(np.power(var_2, var_3))


After reciprocal
[ 4.         0.7518797  0.08             inf -0.1      ]

After dot product
45

After power with 2
[ 4  9 25]

After power with var_3
[1024  243   25]


  print(np.reciprocal(var_1))


In [20]:
var_4 = np.array([10,18,26])
var_5 = np.array([4,5,6])
print('\nAfter mod')
print(np.mod(var_4, var_5))

var_6 = np.array([[1,2],[3,4]])
var_7 = np.array([[5,6],[7,8]])
print('\nAfter matrix multiplication')
print(np.matmul(var_6, var_7))  # 1*5+2*7 = 19; 1*6+2*8 = 22


After mod
[2 3 2]

After matrix multiplication
[[19 22]
 [43 50]]


In [21]:
# mathematical
var_1 = np.array([0,30,45,60,90]) 

print('Sine of different angles:')
print(np.sin(var_1 * np.pi/180))

print('\nCosine values for angles in array:')
print(np.cos(var_1 * np.pi/180))

Sine of different angles:
[0.         0.5        0.70710678 0.8660254  1.        ]

Cosine values for angles in array:
[1.00000000e+00 8.66025404e-01 7.07106781e-01 5.00000000e-01
 6.12323400e-17]


In [22]:
var_2 = np.array([1.0, -5.55, 12.302, -0.567, 25.532]) 

print('Original array:')
print(var_2) 

print('\nAfter rounding:')
print(np.around(var_2))
print(np.around(var_2, decimals = 1))

print('\nFloor')
print(np.floor(var_2))

print('\nCeil')
print(np.ceil(var_2))

Original array:
[ 1.    -5.55  12.302 -0.567 25.532]

After rounding:
[ 1. -6. 12. -1. 26.]
[ 1.  -5.6 12.3 -0.6 25.5]

Floor
[ 1. -6. 12. -1. 25.]

Ceil
[ 1. -5. 13. -0. 26.]


In [23]:
# statistical
var_1 = np.arange(1,17).reshape(2,4,2)
print('Original array:')
print(var_1)

print('\nApplying amin across rows')
print(np.amin(var_1, 1))

print('\nApplying amax across cols')
print(np.amax(var_1, 0))

Original array:
[[[ 1  2]
  [ 3  4]
  [ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]
  [13 14]
  [15 16]]]

Applying amin across rows
[[ 1  2]
 [ 9 10]]

Applying amax across cols
[[ 9 10]
 [11 12]
 [13 14]
 [15 16]]


Likewise, there are some other statistical functions, such as:
* `numpy.ptp` - returns the range (maximum-minimum) of values along an axis
* `numpy.mean` - the sum of elements along an axis divided by the number of elements
* `numpy.average` - the weighted average of elements in an array according to their respective weight given in another array
* `numpy.median` - value separating the higher half of a data sample from the lower half

## Iteration over arrays

NumPy package contains an iterator object `numpy.nditer`. It is an efficient multidimensional iterator object to iterate over an array. It is possible to force nditer object to use a specific order by explicitly mentioning it.

We can __modify the elements__ of nditer object using the optional parameter called `op_flags`. Its default value is read-only, but can be set to read-write or write-only mode.

In [24]:
var_1 = np.arange(0,60,5).reshape(3,4) 

print('Original array is:')
print(var_1)

print('\nSorted in C-style order:')
for x in np.nditer(var_1, order = 'C'):
    print(x) 

print('\nSorted in F-style order:')
for x in np.nditer(var_1, order = 'F'): 
    print(x)

Original array is:
[[ 0  5 10 15]
 [20 25 30 35]
 [40 45 50 55]]

Sorted in C-style order:
0
5
10
15
20
25
30
35
40
45
50
55

Sorted in F-style order:
0
20
40
5
25
45
10
30
50
15
35
55


## Broadcasting

It refers to the ability of NumPy to treat __arrays of different shapes__ during arithmetic operations. The smaller array is __broadcast__ to the size of the larger array so that they have compatible shapes.

![broadcasting](../img/broadcasting_numpy.jpg)

If two arrays are broadcastable, a combined __nditer__ object is able to iterate upon them concurrently.

In [25]:
var_1 = np.arange(0,40,4).reshape(2,5)
print(var_1)

var_2 = np.array([0,2,4,6,8])

print('\nAddition using broadcasting')
print(var_1 + var_2)

print('\nBroadcasting Iteration')
for x,y in np.nditer([var_1,var_2]):
    print("{}:{}".format(x,y))

[[ 0  4  8 12 16]
 [20 24 28 32 36]]

Addition using broadcasting
[[ 0  6 12 18 24]
 [20 26 32 38 44]]

Broadcasting Iteration
0:0
4:2
8:4
12:6
16:8
20:0
24:2
28:4
32:6
36:8


## I/O operations

The ndarray objects can be saved to and loaded from the disk files. The I/O functions available are:
* `load()` and `save()` functions handle NumPy binary files with __npy__ extension. The .npy file stores data, shape, dtype and other information required to reconstruct the `ndarray` in a disk file such that the array is correctly retrieved even if the file is on another machine with different architecture. It accepts an additional Boolean parameter `allow_pickles`. A pickle in Python is used to serialize or de-serialize objects before saving to or reading from a disk file.
* `loadtxt()` and `savetxt()` functions handle normal text files. It accepts additional optional parameters such as header, footer, and delimiter.

In [27]:
var_1 = np.arange(0,60,5).reshape(3,4)
print(var_1)

print("\nSaving file as npy...")
np.save('../data/numpy_sample.npy', var_1)

print("\nLoading the file...")
npy_read = np.load('../data/numpy_sample.npy')
print("\nThe numpy array read after loading the file...")
print(npy_read)

[[ 0  5 10 15]
 [20 25 30 35]
 [40 45 50 55]]

Saving file as npy...

Loading the file...

The numpy array read after loading the file...
[[ 0  5 10 15]
 [20 25 30 35]
 [40 45 50 55]]
