<a href="https://colab.research.google.com/github/aadityasomani/Aadi/blob/master/Lesson_15_Numpy_Arrays_II_Aditya.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 15: NumPy Arrays II


---

### Teacher-Student Tasks

In the previous lesson, we learned how to create Claude Shannon's Mind Reader game algorithm which can make accurate predictions. 

In this lesson, we are going to learn a few of the mathematical operations that can be done on a NumPy array. We will also compare the performance of a NumPy array with a Python list.

---

#### Problem Statement


Consider that you are a smartphone retailer and you have few smartphones in your inventory.

|Smartphone Model|Price (INR)|# Units Available|
|-|-|-|
|Samsung Galaxy M30S|	13999|	9|
Realme C2| 6298| 8|
Xiaomi Redmi Note 7 Pro| 10999| 9|
Xiaomi Redmi Note 8 Pro| 14999| 9|
Realme C2 3GB RAM| 7298| 8|
Realme C2 2GB RAM | 6385| 8|
Realme 5| 8999| 9|
Xiaomi Redmi Note 7S 64GB| 9999| 6|
Xiaomi Redmi Note 8| 9999| 5|
Vivo Z1 Pro| 13868| 7|

Suppose you decided to do some analysis of your inventory. In the process, you want to find answers to the following questions:

1. What is the total monetary value of the inventory?

2. What is the average (or mean) price of a smartphone?

3. What is the price of the cheapest smartphone in the inventory?

4. What is the price of the most expensive smartphone in the inventory?

5. What is the median price of a smartphone? 

6. What is the most commonly occurring price of a smartphone?

---

#### Task 1: Descriptive Statistics

Using NumPy arrays, we can easily do some statistical calculations.


You can answer all questions mentioned in the above problem statement in a few seconds by creating NumPy arrays and by applying the `sum(), mean(), median(), min()`, and `max()` functions.

**Note**: A median value is a middle value in an array when the values are arranged in increasing order. Consider the five numbers `6, 1 , 5, 32,` and `13`.

**How to find the median value?**

To find the median value, follow two steps:

1. First, arrange all the numbers in increasing order, i.e., `1, 5, 6, 13, 32`.

2. Look for the middle value which in this case is `6`. So, the required median value is `6`.

In general, let $n$ be the number of numbers in a set. 

1. If $n$ is odd, the median value lies at the 
$\left(\frac{n + 1}{2}\right)^{th}$ 
position after arranging the numbers in ascending order.

2. If $n$ is even, the median value is the mean (or average) of the values at the 
$\left(\frac{n}{2}\right)^{th}$ 
and 
$\left(\frac{n}{2} + 1\right)^{th}$ 
positions.

Let's say we want to find the median of the numbers `34, 12, 8, 7, 21, 19`.

1. First, arrange the numbers in increasing order, i.e., `7, 8, 12, 19, 21, 24`.

2. There are 6 numbers, so the middle values are `12` and `19`. Their mean (or average) is 
$\frac{12+19}{2}$ 
`= 15.5`.
So, the required median value is `15.5`

Let's first create a NumPy array for the phone data given above, then find the answers to these questions one-by-one:


In [1]:
# S1.1: Create two NumPy arrays: one for the smartphone prices and another for the number of units available.
import numpy as np
prices = np.array([13999, 6298, 10999, 14999, 7298, 6385, 8999, 9999, 9999, 13868])
units_available = np.array([9, 8, 9, 9, 8, 8, 9, 6, 5, 7])

Now, let's answer the first question. To find the total monetary value of the inventory, you have to multiply each smartphone price with its corresponding number of units available and then add all the products of the multiplications.

The total monetary value will be: 

$M = p_1 \times u_1 + p_2 \times u_2 + p_3 \times u_3 + \dots + p_n \times u_n$

Where,

- $M$: The total monetary value.
- $p$: The price of a smartphone.
- $u$: The number of units available.
- $n$: The varieties of smartphones.

Therefore, we have to multiply the `prices` array values with the `units_available` array values to get a new array containing total prices for each smartphone. Then using the `sum()` function, we will add all the values of the new array:

In [4]:
# S1.2: Compute the total monetary value of the inventory.
total_prices=prices*units_available
print(total_prices)

total_monetary=np.sum(total_prices)

# Create an array containing total price for each smartphone.
print(total_monetary)

# Calculate the total monetary value.


[125991  50384  98991 134991  58384  51080  80991  59994  49995  97076]
807877


**Note:** We cannot add, subtract, multiply, and divide two Python lists like NumPy arrays.

Now, using the `mean()` function, compute the average price of a smartphone:

In [5]:
# S1.3: Compute the average price of a smartphone.
avg_prices=np.mean(prices)
avg_prices

10284.3

Now, using the `min()` function, compute the price of the cheapest smartphone:

In [6]:
# S1.4: Using the 'min()' function, compute the lowest price of a smartphone.
mini=np.min(prices)
mini

6298

Now, using the `max()` function, compute the price of the expensive smartphone:

In [7]:
# S1.5: Using the 'max()' function, compute the highest price of a smartphone.
maxa=np.max(prices)
maxa

14999

Now, using the `median()` function, compute the median price of a smartphone:

In [9]:
# S1.6: Using the 'median()' function, compute the median price of a smartphone.
medi=np.median(prices)
medi

9999.0

Now, let's compute the most commonly occurring price of a smartphone. If you look at the dataset, the most commonly occurring price is `9999` because it occurs twice. The rest of the prices occur only once.

The value which occurs the most number of times is called the **modal** value or simply **mode**. 

Unfortunately, the `numpy` module does not have a function to calculate the modal value. So, either we can create our function which is a very complicated process or we can use the `mode()` function from the `scipy` library.

For the time being, we will choose the second option. At the end of the class, we will create our version of the `mode()` function.

In the `scipy` library, there is a module called `stats` which contains the `mode()` function. So we have to import the `stats` module from the `scipy` library.

In [10]:
# S1.7: Compute the modal value using the 'mode()' function from the 'scipy.stats' module.
from scipy import stats
modee=stats.mode(prices)
modee

ModeResult(mode=array([9999]), count=array([2]))

In the output, you can see that `9999` is the modal value and it occurs twice in the `prices` array.

**Note:** `from library_name import module_name` is another way of importing a module. It is also standard practice.

---

#### Task 2: Few More Operations on a NumPy Array

Performing mathematical operations on a NumPy array is easier compared to a Python list.

Let's say you have a NumPy array with radii of 20 circles and want to compute the area of every circle. Then you can simply use the double-asterisk (`**`) operator on the NumPy array to square the values. Then multiply the NumPy array with `pi`.

**Note:** The area of a circle with the radius 
$r$ 
is 
$\pi r^{2}$.

In [14]:
# S2.1: Square the values in a numpy array.
import random
# 1. First create a Python list having radii of 20 circles where each radii is a random number from 1 to 10.
radii_list=[random.randint(1,10) for i in range(1,21)]
print(radii_list)
# 2. Convert the list into a NumPy array using the 'array()' function.
np_radii=np.array(radii_list)
print(np_radii)
# 3. Square the elements of NumPy array using the exponent (**) operator. Use can use the 'np.pi' keyword to get the value of 'pi'.
areas=np.pi*(np_radii**2)
areas

[4, 6, 7, 7, 9, 2, 9, 10, 7, 10, 10, 4, 7, 10, 7, 7, 3, 6, 7, 4]
[ 4  6  7  7  9  2  9 10  7 10 10  4  7 10  7  7  3  6  7  4]


array([ 50.26548246, 113.09733553, 153.93804003, 153.93804003,
       254.46900494,  12.56637061, 254.46900494, 314.15926536,
       153.93804003, 314.15926536, 314.15926536,  50.26548246,
       153.93804003, 314.15926536, 153.93804003, 153.93804003,
        28.27433388, 113.09733553, 153.93804003,  50.26548246])

Notice that when you print the values of a NumPy array (in this case `np_radii`) using the `print()` function, the items of the NumPy array are not separated by a comma in the output. For all practical purposes, this is just a different behavior of a NumPy array. Do not worry about it.

If you try to square the radii values stored in a Python list using the same process, then Python will throw the `TypeError` error.

In [16]:
# S2.2: Directly apply the exponent (**) operator on a Python list. 
radii_list**2

TypeError: ignored

Even if you simply multiply a list containing numeric values with a floating-point number, then also Python will throw the `TypeError` error

In [24]:
# S2.3: Directly multiply a Python list with a number.
2*radii_list

[4,
 6,
 7,
 7,
 9,
 2,
 9,
 10,
 7,
 10,
 10,
 4,
 7,
 10,
 7,
 7,
 3,
 6,
 7,
 4,
 4,
 6,
 7,
 7,
 9,
 2,
 9,
 10,
 7,
 10,
 10,
 4,
 7,
 10,
 7,
 7,
 3,
 6,
 7,
 4]

To find the area of the circles whose radii are stored in a Python list, you will have to use a `for` loop:

In [27]:
# S2.4: Square all the items in a Python list.
for i in radii_list:
    print(i**2*3.14)

50.24
113.04
153.86
153.86
254.34
12.56
254.34
314.0
153.86
314.0
314.0
50.24
153.86
314.0
153.86
153.86
28.26
113.04
153.86
50.24


Now, using the same approach, you create two NumPy arrays: 
1. NumPy array having radii (numbers from `1` to `10`) of `10` cylinders.
2. NumPy array having their corresponding heights (numbers from `11` to `20`).

**Note:** The volume of a cylinder is 
$\pi r^{2}h$
, where 
$h$ 
is height of the cylinder and 
$r$ 
is the radius of the cylinder.

In [None]:
# S2.5: Create two NumPy arrays. One having a radii of 10 cylinders and another having their corresponding heights.
# Compute the volume of the 10 cylinders by multiplying the NumPy arrays and store the new NumPy array in the new variable.


So, here we got an array containing the volumes of the corresponding cylinders.

---

#### Task 3: Python List and NumPy Array Performance Comparison

As we discussed earlier, the execution time for a NumPy array is lesser as compared to a Python list. The difference is most significant when the sizes of lists and arrays are in thousands and above.

Let's first create a Python list and a NumPy array both having 100 thousand (or 1 lakh) items. Then let's compute how much time (in seconds) is taken to create the list and the array:

In [35]:
# S3.1: Run the code shown below to see that NumPy arrays are faster than Python lists.
# 1. Import the 'numpy' and 'time' modules
import time
# 2. Record the time before creating a Python list.
list_t0=time.time()

# 3. Create a Python list containing 100,000 items from 1 to 100,000.
list1=[i for i in range(1,100001)]

# 4. Record the time after creating the list.
list_t1=time.time()

# 5. Calculate the time taken to create a Python list by computing the time difference.
list_time=list_t1-list_t0
print("Time taken for List: ",list_time)
# 6. Record the time before creating a NumPy array.
np_t0=time.time()
# 7. Create a NumPy array containing just one row and 100,000 items from 1 to 100,000.
np_array=np.arange(1,100001)
np_t1=time.time()

# 8. Record the time after creating the NumPy array.
np_time=np_t1-np_t0

# 9. Calculate the time taken to create a NumPy array by computing the time difference.
print("Time taken for numpy :",np_time)

# 10. Calculate the factor by which a NumPy array creation is faster than a Python list creation.
print("Numpy array is : ",list_time//np_time," times faster than python list.")

Time taken for List:  0.005701303482055664
Time taken for numpy : 0.0002040863037109375
Numpy array is :  27.0  times faster than python list.


If you run the above code several times, you will see that almost always NumPy arrays are faster than Python lists by a huge margin.

---

#### Task 4: The User-Defined `mode()` Function

Now let's create our version of the `mode()` function. It should take a one-dimensional NumPy array (`input_array`) as an input and should return a pair of the modal value and its count as an output.

To create this function:

1. First, we will create an empty Python list to store the count of every item in the `input_array`:

  ```
  counts_list = []
  ```

2. Next, we will convert the `input_array` to a Python list and store it in the `input_list` variable:

  ```
  input_list = list(input_array)
  ```

3. We will iterate through each item in the `input_list` and count its value. Then we will add the counts to the `counts_list` using the `append()` function:

  ```
  for item in input_list:
      item_count = input_list.count(item)
      counts_list.append(item_count)
  ```

4. Then we will convert the `counts_list` into a NumPy array:

  ```
  counts_array = np.array(counts_list)
  ```

5. Next, we will compute the maximum count value in the `counts_array` using the `count()` function:

  ```
  max_count = np.max(counts_array)
  ```

6. Then, we will find the index of the `max_count` value using the `index()` function:

  ```
  max_count_index = counts_list.index(max_count)
  ```

7. Finally, we will find the modal value using the list indexing method:
  
  ```
  mode = input_list[max_count_index]
  ```

**Note:** There could be other ways to create the `mode()` function. You are free to explore them in your own time.

In [36]:
# S4.1: Create the 'mode()' function which takes a 1D NumPy array as an input and returns the modal value and its count as an output.
def mode(input_array):
    count_list=[]
    input_list=list(input_array)
    for i in input_list:
        item_count=input_list.count(i)
        count_list.append(item_count)
    count_arr=np.array(count_list)
    max_count=np.max(count_arr)
    max_index=count_list.index(max_count)
    mode=input_list[max_index]
    return mode,max_count
mode(prices)


(9999, 2)

So, in this way, we can create the `mode()` function which takes a one-dimensional array as an input and returns a pair of the modal value and its count as an output.

In the next class, we will learn about the Pandas series and Pandas DataFrame.

---