# NumPy (Numerical Python)

NumPy works on **multidimensional homogeneous array objects** and it provides a collection of functions to process those arrays. NumPy is known for the fast numerical computations. Numpy provides a high level syntax and it has a very vibrant ecosystem that interopertes for different application areas. The n-dimensional arrays are known as **ndarray**. ndarray forms the primitive building blocks of numerous python libraries.

# Importing numpy

NumPy is available by default in Colab notebooks. Therefore the library can be invoked directly using the *import* statement

*PS: Follow the instructions on the [Installation section](https://numpy.org/install/) of NumPy to make local installations.*


In [2]:
import numpy as np
print("Loaded", np.__version__, "version of numpy!!!")

Loaded 1.21.5 version of numpy!!!


# Creating NumPy arrays

The basic ndarray is created using the **array** function, which takes any sequence as an input parameter. NumPy arrays can be easily created from lists 


In [3]:
# creating a 1 dimensional array
var1 = np.array([1, 2, 3, 4, 5])

# creating a 2 dimensional array
var2 = np.array([[1, 2, 3, 4, 5],
                 [6, 7, 8, 9, 0]])

print("var1 =", var1)
print("var2 =", var2)

var1 = [1 2 3 4 5]
var2 = [[1 2 3 4 5]
 [6 7 8 9 0]]


In [4]:
# It is also possible to specify the type of the element during array creation
var3 = np.array([1, 2, 3], dtype=np.float32)
print("var3 =", var3)

# Notice the usage of numpy arrays to create another array with a different dtype
var4 = np.array(var3, dtype=np.int32)
print("var4 =", var4)

var5 = np.array(var4, dtype=complex)
print("var5 =", var5)

var3 = [1. 2. 3.]
var4 = [1 2 3]
var5 = [1.+0.j 2.+0.j 3.+0.j]


# Array Properties
The important attributes of the ndarray are

*   shape: Dimensions of the array
*   size: Total number of elements in the array
*   ndim: Number of axes
*   dtype: Type of the elements of the array



In [5]:
print(var1.dtype, '\t', var1.shape, '\t', var1.size, '\t', var1.ndim)
print(var2.dtype, '\t', var2.shape, '\t', var2.size, '\t', var2.ndim)
print(var3.dtype, '\t', var3.shape, '\t', var3.size, '\t', var3.ndim)
print(var4.dtype, '\t', var4.shape, '\t', var4.size, '\t', var4.ndim)
print(var5.dtype, '\t', var5.shape, '\t', var5.size, '\t', var5.ndim)

int32 	 (5,) 	 5 	 1
int32 	 (2, 5) 	 10 	 2
float32 	 (3,) 	 3 	 1
int32 	 (3,) 	 3 	 1
complex128 	 (3,) 	 3 	 1


# Other array creation methods

It is not always necessary to have the elements defined during the creation of the arrays. There are several NumPy functions that allows you to create arrays to act as placeholders before the actual computations.

**Exercise #01:**

*   Create different array objects using the following functions: (1) zeros, (2) ones, (3) empty, (4) zeros_like, (5) ones_like, (6) empty_like 


In [22]:
# Solution
var6 = np.zeros((2,2))
print(var6)

var7 = np.ones((3,2))
print(var7)

var8 = np.empty((2,3))
print(var8)

var9 = np.zeros_like(var6)
print(var9)

var10 = np.ones_like(var7)
print(var10)

var11 = np.empty_like(var8)
print(var11)

[[0. 0.]
 [0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[1. 1. 1.]
 [1. 1. 1.]]
[[0. 0.]
 [0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[1. 1. 1.]
 [1. 1. 1.]]


**Exercise #02:**

*   What is the difference between **empty** and **zeros** methods?
*   What is the use of **arange** and **linspace** functions?



**Solution**

* `np.empty` return a new array of given shape and type, without initializing entries; `np.zeros` return a new array of given shape and type, but filled with zeros. `np.empty`, unlike `np.zeros`, does not set the array values to zero, and may therefore be marginally faster. On the other hand, it requires the user to manually set all the values in the array, and should be used with caution.

* Both functions return evenly spaced values withing a given interval; however, while `np.arange` generate values withing the half-open interval [start, stop), `np.linspace` includes the last value [start, stop]. Additionally, when using a non-integer step, is is better to user `np.linspace`.
  * If not specified, the default `start` value is 0.
  
...

# Shape Manipulations

In [23]:
# Creating a numpy array using arange function
var6 = np.arange(10, dtype=int)
print(var6)

# Reshaping the array into 2 x 5 array
var6 = var6.reshape(2,5)
print(var6)

# Reshaping the array in 3 dimensions
var6 = var6.reshape(1,2,5)
print(var6)

# Note that it is not always necessary to give all three dimensions
var6 = var6.reshape(1,2,-1)
print(var6)

[0 1 2 3 4 5 6 7 8 9]
[[0 1 2 3 4]
 [5 6 7 8 9]]
[[[0 1 2 3 4]
  [5 6 7 8 9]]]
[[[0 1 2 3 4]
  [5 6 7 8 9]]]


**Exercise #03:**

*   What is the difference between reshape and resize methods?



**Solution**

* `np.reshape` gives a new shape to an array without changing its data; `np.resize` does the same, but if the new array is larger than the original array, then the new array is filled with repeated copies of a. 
  * Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a.




...

# Indexing, Slicing and Iterating

Indexing and slicing is similar to python lists. We use the **[ ]** operator for providing slices and indices.

In [24]:
var7 = np.array([1, 2, 3, 4, 5])

# indexing the second element
# Remember array indices start with 0 in numpy
print(var7[1])

# slicing the first two elements of the array
print(var7[0:2])

2
[1 2]


In [25]:
# it is also possible to slice arrays based on a condition
print(var7[var7 % 2 == 0])  # prints all the even numbers in the array

[2 4]


In [26]:
var8 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(var8)

# indexing the element at second row and second column
print(var8[1,1])

# it is possible to slice the entire row or column
print(var8[1])
print(var8[:,1])

# slice a particular range
print(var8[0:2, 0:2])

[[1 2 3]
 [4 5 6]
 [7 8 9]]
5
[4 5 6]
[2 5 8]
[[1 2]
 [4 5]]


A visual illustration of slicing mechanism can be seen here

![numpy_indexing.png](https://drive.google.com/uc?export=view&id=1y6rrsrihu3QS5_NGdYMo9mBnbv0qHehq)

*Reference: http://scipy-lectures.org*

**Exercise #04:**

*   What is the use of non-zero function? Provide an example of how non-zero function can be used.



In [None]:
np.nonzero()

**Solution**

`np.nonzero` returns the indices of the elements that are non-zero. It returns a tuple of arrays, **one for each dimension of a**, containing the indices of the non-zero elements in that dimension. The values in a are always tested and returned in row-major, C-style order.

...

In [35]:
# example
var04 = np.array([[0, 1, 2], [3, 0, 5], [6, 7, 0]])
print(var04)
print("---")
print(np.nonzero(var04)) # returns the indices of the elements that are non-zero

[[0 1 2]
 [3 0 5]
 [6 7 0]]
---
(array([0, 0, 1, 1, 2, 2], dtype=int64), array([1, 2, 0, 2, 0, 1], dtype=int64))


A common use for nonzero is to find the indices of an array, where a condition is True. Given an array a, the condition a > 5 is a boolean array and since False is interpreted as 0, np.nonzero(a > 5) yields the indices of the a where the condition is true.



In [37]:
# solution
print(np.nonzero(var04 > 5)) # returns the indices of the elements that are greater than 5

(array([2, 2], dtype=int64), array([0, 1], dtype=int64))


Iterating an array can be either performed using the python list style or using the nditer function.

In [38]:
var9 = np.array([[1, 2, 3], [4, 5, 6]])

for i in var9: # outer loop to access row
    print(i) # returs the entire row

for i in var9: # outer loop to access row
    for j in i:  # iterating each element of the row
        print(j) # returs the element value

[1 2 3]
[4 5 6]
1
2
3
4
5
6


In [39]:
# an alternative to the nesting for loop is the nditer function
var10 = np.array([[1, 2, 3], [4, 5, 6]])
for i in np.nditer(var10): # Transforms the array into a iterator object to iterate over arrays
    print(i)

1
2
3
4
5
6


**Exercise #05:**

*   How can we enumerate a numpy array?



**Solution**

With the `np.ndenumerate` class. 

...

It has to be noted that, slices share memory with original array. 

In the below example, var12 is created by slicing var11. Notice the change in var11 after modifying var12.

In [41]:
var11 = np.array([1, 2, 3])

var12 = var11[0:1]

print('before changing:', var11)
var12[0] = 4
print('after changing:', var11)

before changing: [1 2 3]
after changing: [4 2 3]


This leaves us with the question of how the array copy works in numpy

# Array Copy

When you create a new array using the '=' operator, no new copy of the array is created, ie, only a name is created but it refers to the same object.

In [42]:
var13 = np.array([1, 2, 3, 4, 5])

var14 = var13
print('Before modifying', var13)

var14[0] += 1
print('After modifying', var13)

Before modifying [1 2 3 4 5]
After modifying [2 2 3 4 5]


When a view function is used to create a copy of the array, or when the array is sliced, the returned array is only a shallow copy of the original array, ie, an array object is created but the object points to the same data

In [43]:
var15 = np.array([1, 2, 3, 4, 5])

var16 = var15.view()
print('Before modifying (using view)', var15)

var16[0] += 1
print('After modifying (using view)', var15)

var17 = var15[0:3]
print('Before modifying (using slice)', var15)

var17[2] += 1
print('After modifying (using slice)', var15)

Before modifying (using view) [1 2 3 4 5]
After modifying (using view) [2 2 3 4 5]
Before modifying (using slice) [2 2 3 4 5]
After modifying (using slice) [2 2 4 4 5]


To create a deep copy of the array, it is necessary to use the copy method

In [44]:
var18 = np.array([1, 2, 3, 4, 5])

var19 = var18.copy()
print('Before modifying (using copy)', var18)

var19[0] += 1
print('After modifying (using copy)', var18)

Before modifying (using copy) [1 2 3 4 5]
After modifying (using copy) [1 2 3 4 5]


# Adding new axis

It is also possible to increase the dimensions of a numpy array. **np.newaxis** and **expand_dims** can be used to increase the dimensions of the array. This would be used very handy in building convolutional neural networks where you would need to have uniform channel lengths *(will be used in the later exercises)*.

In [50]:
var20 = np.array([1, 2, 3, 4, 5])
print(var20.shape)

a = var20[np.newaxis, :]  # adding new axis to the first axis
print(a.shape)

b = a[np.newaxis, :]  # adding new axis to the first axis
print(b.shape)

c = np.expand_dims(var20, axis=1)  # adding new axis to the second axis
print(c.shape)

d = np.expand_dims(var20, axis=0)  # adding new axis to the first axis
print(d.shape)

(5,)
(1, 5)
(1, 1, 5)
(5, 1)
(1, 5)


# Broadcasting Rules

Broadcasting deals with how numpy treats arrays with different sizes during arithmetic operations. In general, the smaller arrays are broadcasted into the larger array shapes so that both the arrays are compatible.

Read [basic broadcasting rules](https://numpy.org/doc/stable/user/basics.broadcasting.html) for basic knowledge about broadcasting.
Also read [Array broadcasting](https://numpy.org/doc/stable/user/theory.broadcasting.html#array-broadcasting-in-numpy).


In [52]:
var20 = np.array([[1.2, 2.3, 4.0],
                  [1.2, 3.4, 5.2],
                  [0.0, 1.0, 1.3],
                  [0.0, 1.0, 2e-1]])

print(var20)

print(var20 * 2)  # multiplying each element with 2

print(var20 * [1, 0, 1])  # multiplying each row with [1, 0, 1]]

[[1.2 2.3 4. ]
 [1.2 3.4 5.2]
 [0.  1.  1.3]
 [0.  1.  0.2]]
[[ 2.4  4.6  8. ]
 [ 2.4  6.8 10.4]
 [ 0.   2.   2.6]
 [ 0.   2.   0.4]]
[[1.2 0.  4. ]
 [1.2 0.  5.2]
 [0.  0.  1.3]
 [0.  0.  0.2]]


Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

* they are equal, 
* or one of them is 1

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.

Arrays do not need to have the same number of dimensions. 

> A set of arrays is called “broadcastable” to the same shape if the above rules produce a valid result.



Broadcasting provides a convenient way of taking the outer product (or any other outer operation) of two arrays. The following example shows an outer addition operation of two 1-d arrays:

In [53]:
a = np.array([0.0, 10.0, 20.0, 30.0])
b = np.array([1.0, 2.0, 3.0])
a[:, np.newaxis] + b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

> Here the newaxis index operator inserts a new axis into a, making it a two-dimensional 4x1 array. Combining the 4x1 array with b, which has shape (3,), yields a 4x3 array.

**Exercise #06:**

Normalizing values is an important area in any image processing and machine learning problem. In this exercise, we will try to apply normalization at different axis to understand the role of broadcasting.

`np.mean()` computes the arithmetic mean along the specified axis.
* a : array_like
    Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
* axis : None or int or tuple of ints, optional
    Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.



In [96]:
var21 = np.array([[1.2, 2.3, 4.0],
                  [1.2, 3.4, 5.2],
                  [0.0, 1.0, 1.3],
                  [0.0, 1.0, 2e-1]])

print(var21)

[[1.2 2.3 4. ]
 [1.2 3.4 5.2]
 [0.  1.  1.3]
 [0.  1.  0.2]]


In [97]:
# compute row wise mean
var21_mean_row = np.mean(var21, axis = 1) 
print(var21_mean_row)

[2.5        3.26666667 0.76666667 0.4       ]


In [94]:
# do row wise mean subtraction
print(var21 - var21_mean_row[:,np.newaxis])

[[-1.3        -0.2         1.5       ]
 [-2.06666667  0.13333333  1.93333333]
 [-0.76666667  0.23333333  0.53333333]
 [-0.4         0.6        -0.2       ]]


In [73]:
# compute column wise mean
var21_mean_clm =np.mean(var21, axis=0)
print(var21_mean_clm)

[0.6   1.925 2.675]


In [99]:
# column wise mean subtraction
print(var21 - var21_mean_clm[np.newaxis, :])

[[ 0.6    0.375  1.325]
 [ 0.6    1.475  2.525]
 [-0.6   -0.925 -1.375]
 [-0.6   -0.925 -2.475]]


In [79]:
# how do we normalize the entire array using the global mean?
# solution
print(var21 - np.mean(var21)) # Residual extraction

[[-0.53333333  0.56666667  2.26666667]
 [-0.53333333  1.66666667  3.46666667]
 [-1.73333333 -0.73333333 -0.43333333]
 [-1.73333333 -0.73333333 -1.53333333]]




*   Can you think of an example of row wise normalization and column wise normalization?



**Solution**

*(From Wikipedia)*

> In image processing, normalization is a process that changes the range of pixel intensity values. Applications include photographs with poor contrast due to glare, for example. Normalization is sometimes called contrast stretching or histogram stretching. In more general fields of data processing, such as digital signal processing, it is referred to as dynamic range expansion.[1]
> 
> The purpose of dynamic range expansion in the various applications is usually to bring the image, or other type of signal, into a range that is more familiar or normal to the senses, hence the term normalization. Often, the motivation is to achieve consistency in dynamic range for a set of data, signals, or images to avoid mental distraction or fatigue. For example, a newspaper will strive to make all of the images in an issue share a similar range of grayscale.

...



---


**It has to be noted that this is not a complete tutorial covering the complete numpy aspects. This is provided as an introduction to numpy and its ease of use in numerical computation.**