BIO-210: Applied software engineering for life sciences
# Python Introduction III - Numpy 2 and branching operations

## A deeper dive into Numpy
**Numpy** is a widely used Python library for scientific computing. During the last lesson you already learnt quite a few features of **Numpy**. Today, let's explore more features!

In [2]:
import numpy as np

### Slicing operations (refresh)

Let's review together how to index a multi-dimensional array using slicing

In [None]:
a = np.arange(1,101).reshape(10,10)

print('By default, indexing with colon will return all rows and columns')
b = a[:,:]  #[all rows, all columns]
print(b)

print('We can define the start at the end of indexed rows')
b = a[1:3,:]  #[start_idx : end_idx, all columns]
print(b)

print('or the start at the end of indexed columns')
b = a[:,1:3]  #[all rows, start_idx : end_idx]
print(b)

print('We can also specify the start and the end for both rows and columns')
b = a[4:7,1:3]  #[start_idx : end_idx, start_idx : end_idx]
print(b)

Sometimes, it can be useful to skip entries. This can be achieved by adding another colon (:) and the value that specify how many values you want to skip. Therefore, we can summarize all slicing operations with the following notation [start_idx : end_idx : skip_idx]. 

In [None]:
print('Print every fourth row')
b = a[::4,:]
print(b)

**Exercise 0.** Slice the matrix "a" and print all even numbers between 40 (excluded) and 70.

In [None]:
b = a[4:7,1::2]  #[all rows, all columns]
print(b)

### Basic statistical functions
NumPy contains various statistical functions that are used for data analysis. These functions are useful, for example, to find the maximum or the minimum element of a vector. It is also used to compute common statistical operations like standard deviation, variance, etc.

The functions <code>mean</code> and <code>std</code> are used to caculate the mean and standard deviation of the input data (e.g., of an array). Besides caculating the result for the whole data, they can also be used to caculate it along one of axis.

In [None]:
a = np.array([[1, 2], [3, 4]])

print("The full matrix:\n", a)
print("The mean of the whole matrix is:", np.mean(a))
print("The standard deviation of the whole matrix is:", np.std(a))
print("The mean of each column is:", np.mean(a, axis=0))
print("The mean of each row is:", np.mean(a, axis=1))
print("The standard deviation of each column is:", np.std(a, axis=0))

Now, let's generate a random array drawn from a gaussian distribution N(3, 6.25). The function <code>random.randn</code> samples values from a standard gaussian distribution N(0, 1). Therefore, to get a gaussian distribution distribution N(3, 6.25), we need to multiply the vector by the standard deviation (i.e., sqrt(6.25)) and by adding the mean (i.e., 3).

In [None]:
a = 3 + 2.5 * np.random.randn(2, 4)

**Exercise 1.** Calculate the mean and standard deviation first of the whole matrix <code>a</code> and then along the axis 0 of the matrix <code>a</code>.

In [7]:
# Your code here
a = 3 + 2.5 * np.random.randn(2, 4)
print(a)
print("the mean of the matrix a", np.mean(a))
print("the mean of the matrix a around axis 0", np.mean(a, axis=0))
print("The mean of the matrix a around axis 1", np.mean(a,axis = 1))
print("the standard deviation of matrix", np.std(a))
print("The std of matrix axis 0", np.std(a, axis=0))
print("The std of matrix axis 1", np.std(a, axis=1))

[[ 4.29002907 -0.63231583  0.63307483  4.43102626]
 [ 0.47199587 -0.86121914  0.02670929  4.73810919]]
the mean of the matrix a 1.6371761909541789
the mean of the matrix a around axis 0 [ 2.38101247 -0.74676749  0.32989206  4.58456772]
The mean of the matrix a around axis 1 [2.18045358 1.0938988 ]
the standard deviation of matrix 2.2586238533505734
The std of matrix axis 0 [1.9090166  0.11445165 0.30318277 0.15354146]
The std of matrix axis 1 [2.22606373 2.15803222]


Is it close to what you expect? How would you create another matrix <code>a</code>, in which the mean and the standard deviation are closer to the expected ones? 

In [None]:
# Your code here


**Exercise 2.** Besides <code>mean</code> and <code>std</code>, **Numpy** also offers the functions <code>min</code>, <code>max</code>, <code>median</code>, <code>argmin</code>, <code>argmax</code> to caculate the minimum, maximum and median values, index of the minimum and index of the maximum of the array. Apply these functions to the matrix <code>a</code> and along its axis 0 (think of it as coordinates of your array, with axis 0 along rows and axis 1 along columns). Take a better look at the example above to help you understand the importance of this parameter! If you still feel confused check out [this article](https://www.sharpsightlabs.com/blog/numpy-axes-explained/#numpy-axes-quick-explanation).

In [11]:
# Your code here
a = 10 + 2.5 * np.random.randn(2, 3)
print(a)
print("the maximum number of this matrix", np.max(a))
print("the minimum number of this matrix", np.min(a))
print("The maximum along the column are", np.max(a, axis = 1))
print("The minimum along the column are", np.min(a, axis = 1))
print("The maximum along the column are", np.max(a, axis = 0))
print("The minimum along the column are", np.min(a, axis = 0))
print("the argmax number of this matrix", np.argmax(a))
print("the argmin number of this matrix", np.argmin(a))
print("The argmax along the column are", np.argmax(a, axis = 1))
print("The argmin along the column are", np.argmin(a, axis = 1))
print("The armax along the column are", np.argmax(a, axis = 0))
print("The argmin along the column are", np.argmin(a, axis = 0))




[[ 5.15682241  8.3980997  10.87261743]
 [ 9.54077465 10.26787354  8.01260633]]
the maximum number of this matrix 10.872617433781269
the minimum number of this matrix 5.156822407597783
The maximum along the column are [10.87261743 10.26787354]
The minimum along the column are [5.15682241 8.01260633]
The maximum along the column are [ 9.54077465 10.26787354 10.87261743]
The minimum along the column are [5.15682241 8.3980997  8.01260633]
the argmax number of this matrix 2
the argmin number of this matrix 0
The argmax along the column are [2 1]
The argmin along the column are [0 2]
The armax along the column are [1 1 0]
The argmin along the column are [0 0 1]


**Numpy** also supports non-standard numbers, such as **np.inf**, which represents infinity, and **np.nan**, which represents "not-a-number". These can be the results of operations such as division by 0:

In [12]:
a = np.array([0, 1, -4]) / 0
print("Dividing by 0 can generate np.nan or np.inf (also negative) as a result:", a)

Dividing by 0 can generate np.nan or np.inf (also negative) as a result: [ nan  inf -inf]


  a = np.array([0, 1, -4]) / 0
  a = np.array([0, 1, -4]) / 0


Standard operations, when applied to data containing np.nan, will also return **np.nan**:

In [13]:
a = [0, np.nan, 1]
print("The mean of a vector with a NaN is: ", np.mean(a))

The mean of a vector with a NaN is:  nan


However, **Numpy** offers functions that can ignore NaNs, such as <code>nanmax</code>, <code>nanmin</code> and <code>nanmean</code> . Let's create an array including NaN values and test these functions.

**Exercise 3.** Apply the following functions of numpy to the array a: <code>amax</code>, <code>amin</code> and <code>nanmax</code>, <code>nanmin</code>.

In [14]:
a = np.array([1, 2, np.nan, np.inf])
# Your code here
print("The value of the mean is", np.amax(a),"", np.amin(a), "", np.nanmax(a),"" ,np.nanmin(a), "", np.nanmean(a))

The value of the mean is nan  nan  inf  1.0  inf


**Exercise 4.** We want to write some code which, given a point, finds the closest one in a set of other points. Such a function is important, for example, in information theory, as it is the basic operation of the vector quantization (VQ) algorithm. In the simple, two-dimensional case shown below, the values refer to the weight and height of an athlete. The set of weights and heights represents different classes of athletes. We want to assign the athlete to the class it is closest to. Finding the closest point requires calculating the distance between the athlete's parameters and each of the classes of athletes.
Now, let's define an athlete with [weight, height] = [111.0, 188.0], and a list of 4 classes [[102.0, 203.0], [132.0, 193.0], [45.0, 155.0], [57.0, 173.0]]. In the next cell, write some code which returns the index of the class of athletes that the athlete should be assigned to.

In [29]:
# Your code here
from cmath import inf


athletes = np.array([[102.0, 203.0], [132.0, 193.0], [45.0, 155.0], [57.0, 173.0]])
The = np.array([110.0,188.0])

proche=[inf, inf]
for el in athletes:
    print(el[0])
    if proche[0]>=abs(el[0]-The[0]):
     proche[0] = abs(el[0]-The[0])
     result=el[0]
     index1= athletes.index(el) 
    print(el[1])
    if proche[1]>=abs(el[1]-The[1]):
     proche[1] = abs(el[1]-The[1])
     result2=el[1]
     index2= athletes.index(el) 
print(result, result2,"", athletes[index1][index2])
   


102.0


AttributeError: 'numpy.ndarray' object has no attribute 'index'

### Linear algebra examples
Linear algebra is at the core of Data Science. That's why **NumPy** offers array-like data structures & dedicated operations and methods. Let's first have a look together at the <code>dot</code> function as an example, which computes the matrix multiplication between two vectors or matrices.

In [32]:
a = np.array([[1,2,3],[2,0,3],[7,-5,1]])
b = np.array([[3,-1,5],[-2,-6,4], [0,4,4]])
print('a @ b: \n', np.dot(b,a))
print('a @ b: \n', b.dot(a))

a @ b: 
 [[ 36 -19  11]
 [ 14 -24 -20]
 [ 36 -20  16]]
a @ b: 
 [[ 36 -19  11]
 [ 14 -24 -20]
 [ 36 -20  16]]


**Exercise 5.** Define two random matrices, a and b, of sizes (4x2). Transpose b and save in c the matrix product between a and b transposed.

In [37]:
# Your code here
a= np.random.rand(4,2)
b = np.random.rand(4,2)
c=np.transpose(b)
d=a.dot(c)
print(a, "", b, "", c)
print("A times C", a.dot(c))


[[0.27553879 0.24324438]
 [0.9949545  0.15818415]
 [0.583281   0.97727141]
 [0.11604473 0.94235866]]  [[8.52850117e-04 3.74447418e-01]
 [9.32978326e-01 1.92654505e-01]
 [9.62119586e-01 7.64698189e-01]
 [3.03515765e-01 9.58838356e-01]]  [[8.52850117e-04 9.32978326e-01 9.62119586e-01 3.03515765e-01]
 [3.74447418e-01 1.92654505e-01 7.64698189e-01 9.58838356e-01]]
A times C [[0.09131722 0.30393384 0.4511098  0.31686241]
 [0.06008019 0.95874587 1.07822835 0.45365741]
 [0.36643421 0.73246427 1.30850375 1.11408029]
 [0.35296274 0.28981686 0.83226886 0.93879103]]


**Exercise 6.** Can the c matrix be inverted? Check it out by computing its determinant and, if it exists, get the inverse matrix.

In [3]:
# Your code here
c= np.arange(0,16).reshape(4,4)
print(c)

if abs(np.linalg.det(c))>0:
    print("L'inverse vaut",np.linalg.inv(c))
else:
    print("Le determinant vaut 0")

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
Le determinant vaut 0


**Exercise 7.** Using the inverse matrix and the matrix-multiplication operator, you can now solve a matrix-vector equation. Let's now find the vector x that solve the following equation Ax = b. Given A equal to ([2,1,-2],[3,0,1],[1,1,-1]]) and b equal to ([-3,5,-2]), compute x.

In [8]:
# Your code here
A = np.array([[2,1,-2],[3,0,1],[1,1,-1]])
B= np.array([-3,5,-2])
print(A, B)

if abs(np.linalg.det(A))>0:
    print("L'inverse vaut",np.linalg.inv(A))
else:
    print("Le determinant vaut 0")

print("X vaut :",np.linalg.inv(A).dot(B))


[[ 2  1 -2]
 [ 3  0  1]
 [ 1  1 -1]] [-3  5 -2]
L'inverse vaut [[ 2.50000000e-01  2.50000000e-01 -2.50000000e-01]
 [-1.00000000e+00  1.11022302e-16  2.00000000e+00]
 [-7.50000000e-01  2.50000000e-01  7.50000000e-01]]
X vaut : [ 1. -1.  2.]


**Exercise 8.** Computing the inverse could be very time-consuming. Therefore, it is always better to take advantage of the highly optimized **NumPy** functions to solve linear equations. Try to solve the same exercise as before but using <code>linalg.solve</code> to compute x.

In [9]:
# Your code here
A = np.array([[2,1,-2],[3,0,1],[1,1,-1]])
B= np.array([-3,5,-2])
print(A, B)
X = np.linalg.solve(A,B)
print(X)

[[ 2  1 -2]
 [ 3  0  1]
 [ 1  1 -1]] [-3  5 -2]
[ 1. -1.  2.]


## Branching operation

### *if*, *else* and *elif*
In Python, similarly to all of the C-like languages, branching operations are implemented using the **if** keyword. If the expression is true, the statement following it will be executed. Otherwise, it is possible to specify the statement to execute in case of the expression is false, by using the *else* keyword. Both **if** and **else** need a colon (:) at the line, as in the following example:

In [None]:
r = np.random.randn()
if r > 0:
    print("The random number is positive")
else:
    print("The random number is negative")

In case you want to create multiple branches by applying more than one condition, you can use the keyword **elif** as in the following example:

In [11]:
animal = "crui"

if animal == "cat":
    print("meow")
elif animal == "dog":
    print("woof")
elif animal == "cow":
    print("moo")
else:
    print(f"I don't know  the {animal}'s call, sorry :(")

I don't know  the crui's call, sorry :(


**Exercise 9.** Let's try to implement a calculator using **if**, **else** and **elif**. The head of the calculator is already written as the following. You can input a, b and option when running the code. Now please finish the calculation.

In [19]:
from pydoc import doc
from cmath import inf


print("Welcome to CALCULATOR!")

a = float(input("Enter the first number: "))
b = float(input("Enter the second number: "))

print("Choose one of the following operations:")
print("1 - addition")
print("2 - subtraction")
print("3 - multiplication")
print("4 - division")
option = inf
while option>4:
    print(("Your number must be selected between 1 and 4"))
    option = int(input(""))

result = 0  

if option==1:
    result = a+ b
    print(a, " + ", b ," is equal to :")
elif option==2:
    result = a-b
    print(a, " - ", b ," is equal to :")
elif option==3:
    result=a*b
    print(a, " * ", b ," is equal to :")
else:
    result = a/b
    print(a, "/", b ," is equal to :")


# Change the value of result according the previous calculation
print("The result is ", result)

Welcome to CALCULATOR!
Choose one of the following operations:
1 - addition
2 - subtraction
3 - multiplication
4 - division
Your number must be selected between 1 and 4
28.0  *  18.0  is equal to :
The result is  504.0


### Break and continue

The **break** statement in Python terminates the current loop and resumes execution at the next statement, just like the traditional *break* found in C. On the other hand, the **continue** statement skips all the remaining code in the current iteration of the loop and moves the control back to the top of the loop.

**Exercise 10.** Try to use for loop and continue to remove all the "h"s in the string "hello, haha, python".

In [29]:
# Your code here
print("We want to remove all h in a precise word")
word = input("Enter a word")
print(word)

for i in range(0,len(word)):
    if(i<1):
        new_word= word.replace('h', '', 1)
    new_word = new_word.replace('h', '', 1)
print(new_word)

We want to remove all h in a precise word
hhhhhhhhhhhhhhhhhhhhhhh



**Exercise 11.** Try to use for loop and break to only keep the letters before "p" in the string "hello, haha, python".

In [30]:
# Your code here
print("We want to only keep p in a precise word")
word = input("Enter a word")
print(word)
for el in word:
    if 'p'!= el:
        word=word.replace(el, '', 1)

print(word)

We want to remove all h in a precise word
ohpp
pp
