# NumPy

NumPy stands for Numeric Python. It is used for performing numeric operations on arrays. NumPy is better than python list in terms of size, speed and functionality. Detail description in this link - https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference 

Before starting to use NumPy, it needs to be installed in your system. You can install it using pip or anaconda. The following code uses Python3 and NumPy installed with anaconda. See https://anaconda.org/ for how to install NumPy.

The first step is to import NumPy.

In [1]:
# Import the numpy library
import numpy as np

Here, we give a shorter name to NumPy - np. This is known as aliasing. Henceforth, whenever we need to access any methods from the NumPy Library, we will do so using 'np'. See https://www.digitalocean.com/community/tutorials/how-to-import-modules-in-python-3#aliasing-modules for more on aliasing. 

Now that we have NumPy imported, let's define the problems that we are going to solve using NumPy.

# Problem 1

You are given 5 cylindrical containers with different radius and heights ranging between 5 and 25 cm. Find out  
a) the volume of water that each container can contain,  
b) the total volume of water that all containers can contain,  
c) which container can hold the highest volume and how much,  
d) which container can hold the least volume and how much,  
e) what is the mean, median and standard deviation of the volumes of water that can be contained in the containers?

# Solution 1

First we need the radius and heights for the 5 cylindrical containers. The values for radius and heights are defined to be ranging between 5 and 25 cm. Let's first define these in variables.

In [2]:
# Define variables
no_of_items = 10 # we need 2 values for each container, one for radius and one for height
lower_limit = 5
upper_limit = 25

Now, using the above let us generate random values for the radius and heights of the cylindrical containers.

In [3]:
# Generate random values
np.random.seed(0)
values = np.random.randint(lower_limit, upper_limit, no_of_items)

The benefit of using seed is that we are using a particular set of random numbers. The numbers are still random, but running the above cell with always result in the same set of random numbers. You can use any number as argument to the seed, but using a different number will result in a different set of random numbers. Without the seed, every time np.random.randint is called, it will return a different collection of numbers.

Now, let's check what values looks like.

In [4]:
values

array([17, 20,  5,  8,  8, 12, 14, 24, 23,  9])

It just gives us 10 random integer numbers between 5 and 25. Note that the lower_limit is inclusive and the upper_limit is exclusive. So, 25 never appears on the array.  
Side Note: Also, check out np.random.rand() and np.random.randn()

Let us first explore some details regarding the values.

So, there are 10 items in the array. We can use the size attribute to verify this.

In [5]:
values.size

10

And from above we can see there is only a single opening and closing square bracket. So, this is a one dimensional array. Use ndim attribute to verify this.

In [6]:
values.ndim

1

Let us also see how the values are organized using the shape attribute.

In [7]:
values.shape

(10,)

shape gives the number of items along each dimension. Since values is one dimensional, this is represented as (10,)

To determine the data type of the entries in values, use dtype attribute

In [8]:
values.dtype

dtype('int64')

So, all values are found to be 64 bit integers and hence the array is of 64 bit integer type. If even one of the values were of type float, the entire values array would be of type float.

So far, values is just a plain array of numbers. Let's reorganize it to represent the radius and heights of the container. We are going to create a two dimensional array such that each row represents entry for a single container with the first column representing radius and second column representing height. To do this, use the reshape method.

In [9]:
no_of_rows = int(no_of_items/2) # no_of_rows becomes of type float on dividing, so need to convert it to int
no_of_columns = 2
containers = values.reshape(no_of_rows, no_of_columns)
containers

array([[17, 20],
       [ 5,  8],
       [ 8, 12],
       [14, 24],
       [23,  9]])

This results in a two dimensional array as can be seen by the two opening and closing square brackets. Note that to get the number of containers, no_of_items (the random number count) is divided by 2. However, doing this would result in a float type value which is not acceptable by the reshape method. So, it must be converted to int. Also, note that for the reshape method to work, the product of no_of_rows and no_of_colums should result in no_of_items.

To verfiy that this is a 2D array, let us again use ndim.

In [10]:
containers.ndim

2

And the shape is also changed now so that we have 5 rows and 2 colums.

In [11]:
containers.shape

(5, 2)

Next, we are going to collect all the radius in one array and all the heights in another. To do this, let's use slicing. We defined the first column to represent radius and the second column to represent heights.

In [12]:
radius = containers[:,0]
radius

array([17,  5,  8, 14, 23])

containers[:,0] represents that from the containers array, we want all the rows(:) for the first column(0)

And similarly for the heights

In [13]:
height = containers[:,1]
height

array([20,  8, 12, 24,  9])

Both of these are again 1D

In [14]:
print("Dimension for radius: ", radius.ndim)
print("Dimension for height: ", height.ndim)

Dimension for radius:  1
Dimension for height:  1


Now that we have our radius and heights in place, let's calculate the volume for each cylindrical container. We know,  

<img src="volume-cylinder.png"/>

So, using this formula

In [15]:
volume = np.pi*(radius**2)*height
volume

array([18158.40553775,   628.31853072,  2412.74315796, 14778.05184249,
       14957.12262374])

Here, first each element in radius is squared, then element by element multiplication is done and finally all the values are multiplied by pi, which is also available from NumPy. At the end, we have 5 values, one for the volume of each container in cubic centimeter, which solves our first sub-problem defined in (a). 

The second sub-problem defined in (b) can be solved by simply adding all the volumes.

In [16]:
total_volume = volume.sum()
total_volume

50934.64169265132

There is also an alternative method to compute the total volume without calculating the volumes for individual containers, by computing the dot product of radius and height.

In [17]:
radius_squared = np.square(radius)
dot_product = np.dot(radius_squared, height)
total_volume_by_dot_product = np.pi*dot_product
total_volume_by_dot_product

50934.64169265132

The dot product works here because both radius_squared and height are one dimensional. For higher dimensional matrix, it must meet the requirement that the number of columns of the first matrix must be equal to the number of rows of the second matrix. So, if two higher dimensional matrix have same dimensions, before performing dot product, it is necessary to find the transpose of one of the matrix using np.transpose() method.

The above can also be simplified in a single line of code

In [18]:
np.pi*(np.dot(radius_squared,height))

50934.64169265132

This gives us the same total volume in cubic centimeter as above which solves our second sub-problem.

To find the max volume, simply use the max method

In [19]:
volume.max()

18158.405537749004

And to find which container hold the max volume, use the argmax method

In [20]:
volume.argmax()

0

So, the first container can hold the max volume of water which solves our third sub-problem.

Similarly, for min volume, use the min method

In [21]:
volume.min()

628.3185307179587

And argmin method to find the index of container holding minimum volume of water

In [22]:
volume.argmin()

1

So, the second container can hold the least volume of water which solves our fourth sub-problem.

Finally, we compute the mean, median and standard deviation for the volume of water held by the containers using repectively the mean, median and std methods, which solves our fifth and final sub-problem.

In [23]:
np.mean(volume)

10186.928338530264

In [24]:
np.median(volume)

14778.051842486386

In [25]:
np.std(volume)

7199.758245451731