# Induction

In this notebook, we will just use the basic functions of Numpy, i.e., array initialization, mathematical operations, and things like that. 

**What is Numpy and why should we use it?**

Before we get into the code, I would like to talk a little bit about this module. Numpy, which stands for Numerical Python, is basically an open-source Python module allowing us to work on numbers and multidimensional arrays. There are at least 3 advantages we can get by using this module as compared to the standard Python list. 

First, it provides a lot more flexibilities when it comes to numerical computation and array manipulation. 

Second, Numpy is faster, and third, it consumes less memory. 

The last two mentioned advantages are basically because most of the backend of Numpy uses C programming language.

# 1. Numpy Installation

In [1]:
!pip install numpy



In [2]:
import numpy as np
np.__version__

'1.26.4'

# 2. Array Initialization

In [3]:
# Codeblock 3
np.asarray([7,6,5,4,3,2])

array([7, 6, 5, 4, 3, 2])

As the above code is run, it is going to return an output which looks like the one shown in Figure 3. 

If you put that array inside the `print() function`, the output is going to look somewhat different. In the figure below you can see that the text “array” as well as **the commas disappear**. Nevertheless, keep in mind that it is actually just the matter of representation.

In [4]:
# Codeblock 4
print(np.asarray([7,6,5,4,3,2]))

[7 6 5 4 3 2]


The above behavior is actually different from Python list. If we try to print it out, the result is going to look like as below regardless the use of the `print() function`. We can see in the figure that all elements are separated by a comma while at the same time no “array” text is printed.

In [5]:
### Returns the exact same result.
print([7,6,5,4,3,2])

[7, 6, 5, 4, 3, 2]


Another difference between `Numpy array` and `Python list` can also be seen when we try to print out 2D array.

You can see below that `Numpy array` is automatically printed as **rows and columns**, while Python list is not.

In [6]:
print(np.asarray([[7,6,5,4,3,2],
                [9,8,7,6,5,4]]), end='\n\n')
print([[7,6,5,4,3,2],
      [9,8,7,6,5,4]])

[[7 6 5 4 3 2]
 [9 8 7 6 5 4]]

[[7, 6, 5, 4, 3, 2], [9, 8, 7, 6, 5, 4]]


Both `np.array()` and `np.asarray()` are used to convert a list into Numpy array. 

To do the reverse, we can use the `tolist()` method.

In [7]:
A= [2,4,6,8]
B= np.array(A)  # Convert to Numpy array.
C= B.tolist()   # Convert to Python list.

print(B)
print(C)

[2 4 6 8]
[2, 4, 6, 8]


In case you’re not sure whether a variable contains a list or Numpy array, we can use the `type() function` to check without needing to display the entire content of that variable. By the way the term “ndarray” in the resulting output basically stands for N-dimensional array.

In [8]:
print('type(A):', type(A))
print('type(B):', type(B))
print('type(C):', type(C))

type(A): <class 'list'>
type(B): <class 'numpy.ndarray'>
type(C): <class 'list'>


The last thing I want to show you in this chapter is that we can `initialize a Numpy array automatically` based on the data in a txt file. 

The function we can use for this is `np.genfromtxt()`. 

<div style="text-align: center"><img src="https://miro.medium.com/v2/resize:fit:1032/format:webp/1*2j8bayfXvU9bXTgMi8csng.png" width="100%" heigh="100%" alt="Retrieve&Re-Rank pipeline"></div>

# 3. Numpy Array Limitation

Before we discuss all the things possible to be done using Numpy, I will tell you what this module can not do. 

In Codeblock 10 below, I put multiple values of different datatypes in a Python list, and it seems like the list is able to handle the values properly.

In [9]:
# Codeblock 10
D = [2, 'Hello', True, 9.886]
D

[2, 'Hello', True, 9.886]

If we convert list D into Numpy array, all the values inside will automatically turn into string even after we convert it back to list again.

In [10]:
# Codeblock 11
print(np.array(D))
print(np.array(D).tolist())

['2' 'Hello' 'True' '9.886']
['2', 'Hello', 'True', '9.886']


The above demonstration basically signifies the disadvantage of using Numpy array: it is unable to store elements of multiple datatypes. However, there is actually a reason behind this behavior, in which it allows array computations to be a lot faster and memory efficient as compared to Python list. 

# 4. Computational Speed and Memory Usage

In order to see the time required for Numpy array and Python list to perform the exact same operation, we need to import the `time` module and also initializing the arrays to be used.

In [11]:
# Codeblock 12
import time

E = [123] * 9999999
F = np.array(E, dtype='int8')

The idea of this experiment is very simple, all I want to do is just to sum all values in list E and array F using `sum()` and `np.sum()`. Afterwards, I will print out the computation time. The two functions I write below, namely `summation_python_sum()` is the one that uses Python’s `sum()` function, while the `summation_numpy_sum()` performs the operation using `np.sum()`.

In [12]:
# Codeblock 13
def summation_python_sum(arr):
    start_time = time.time()
    sum(arr)
    end_time = time.time()

    total_time = end_time - start_time
    return total_time
    
def summation_numpy_sum(arr):
    start_time = time.time()
    np.sum(arr)
    end_time = time.time()

    total_time = end_time - start_time
    return total_time

As the two functions above have been declared, now that we can start our experiment by running the Codeblock 14 below.

In [13]:
# Codeblock 14
print('Python list with sum()\t\t: ', summation_python_sum(E), 'sec')
print('Python list with np.sum()\t: ', summation_numpy_sum(E), 'sec')
print('Numpy array with sum()\t\t: ', summation_python_sum(F), 'sec')
print('Numpy array with np.sum()\t: ', summation_numpy_sum(F), 'sec')

Python list with sum()		:  0.09216523170471191 sec
Python list with np.sum()	:  0.5546426773071289 sec
Numpy array with sum()		:  1.6321418285369873 sec
Numpy array with np.sum()	:  0.007001399993896484 sec


According to the results above, we can see that Numpy array works much faster than the others especially when it is paired with Numpy function (0.007 seconds). The second fastest result is obtained when Python list is executed with Python function as well (0.17 seconds). However, in this case it is still around 23 times slower than Numpy. One thing we need to pay attention to is that I think we should avoid mixing either Python functions with Numpy arrays or Python lists with Numpy functions because it causes the processing time to get even slower.

# Memory Usage

That was all about the computational speed, now let’s see how much memory do `list E` and `array F` take. We can do that by executing the codeblock below.

In [14]:
# Codeblock 15
import sys

print('E (Python list):', sys.getsizeof(E), 'bytes')
print('F (Numpy array):', sys.getsizeof(F), 'bytes')

E (Python list): 80000048 bytes
F (Numpy array): 10000111 bytes


As you can see the above output, list E takes up 8 times more memory as compared to array F. You might probably have noticed in Codeblock 12 that I used int8 for the dtype parameter, which essentially stands for 8-bit integer. In this case I decided to use only 8 bits since the numbers in our array is considerably small. Numpy indeed allows us to specify such details in the datatype, making it more efficient in terms of memory usage.

# 5. Datatypes
We can check the datatype of a Numpy array by printing out its dtype attribute. In the output below, we can see that the datatype of array F is int8, which is exactly the same as what we set earlier. Meanwhile, remember that previously we did not specify the datatype for array B. In such a case, Numpy usually will set it to int32.

In [15]:
# Codeblock 16
print(B.dtype)
print(F.dtype)

int64
int8


# Credit:

https://python.plainenglish.io/mastering-numpy-a-comprehensive-guide-to-efficient-array-processing-part-1-2-d55efd851234