# Lesson 5: 
## An Intro to Scientific Computing and Data Visualization 

## 5.5 Transforming data

In this notebook we learn different operations that we can apply to NumPy arrays.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### 5.5.1 Operations

#### Element-wise arithmetic. 

Element-wise arithmetic also extends to 2D arrays.

In [3]:
A=np.array(
    [[2,7,6],
     [9,5,1],
     [4,3,8]])
B=np.array(
    [[1,1,1],
     [2,2,2],
     [3,3,3]    
    ]
)

In [4]:
print(A+B)

[[ 3  8  7]
 [11  7  3]
 [ 7  6 11]]


In [5]:
print(A-B)

[[ 1  6  5]
 [ 7  3 -1]
 [ 1  0  5]]


In [6]:
print(A*B)

[[ 2  7  6]
 [18 10  2]
 [12  9 24]]


In [7]:
print(A/B)

[[2.         7.         6.        ]
 [4.5        2.5        0.5       ]
 [1.33333333 1.         2.66666667]]


#### Vectorized operations

Sometimes we would like to apply an operation to all the entries of an array. The Python way to do this is to run a `for` loop through all entries.
NumPy allow us operate on all the entries more efficiently in a process called *vectorization* that we exemplify below.
You can learn more about vectorization here: https://www.youtube.com/watch?v=qsIrQi0fzbYl.

In [8]:
# First we initialize an array
A=np.array(
    [[2,7,6],
     [9,5,1],
     [4,3,8]])
print(A)

[[2 7 6]
 [9 5 1]
 [4 3 8]]


Here are the vectorized ways to add, multiply and divide by a constant:

In [99]:
print(A+3)

[[ 5 10  9]
 [12  8  4]
 [ 7  6 11]]
[[ 4 14 12]
 [18 10  2]
 [ 8  6 16]]


In [100]:
print(A*2)

[[ 4 14 12]
 [18 10  2]
 [ 8  6 16]]


In [101]:
print(A/2)

[[1.  3.5 3. ]
 [4.5 2.5 0.5]
 [2.  1.5 4. ]]


To apply $\sqrt(x)$, $\sin(x)$, $\cos(x)$, $\log(x)$, $e^x$ to all the entries, we do the following:

In [103]:
np.sqrt(A)

array([[1.41421356, 2.64575131, 2.44948974],
       [3.        , 2.23606798, 1.        ],
       [2.        , 1.73205081, 2.82842712]])

In [104]:
np.sin(A)

array([[ 0.90929743,  0.6569866 , -0.2794155 ],
       [ 0.41211849, -0.95892427,  0.84147098],
       [-0.7568025 ,  0.14112001,  0.98935825]])

In [105]:
np.cos(A)

array([[-0.41614684,  0.75390225,  0.96017029],
       [-0.91113026,  0.28366219,  0.54030231],
       [-0.65364362, -0.9899925 , -0.14550003]])

In [106]:
np.log(A)

array([[0.69314718, 1.94591015, 1.79175947],
       [2.19722458, 1.60943791, 0.        ],
       [1.38629436, 1.09861229, 2.07944154]])

In [108]:
np.exp(A)

array([[7.38905610e+00, 1.09663316e+03, 4.03428793e+02],
       [8.10308393e+03, 1.48413159e+02, 2.71828183e+00],
       [5.45981500e+01, 2.00855369e+01, 2.98095799e+03]])

An extensive list of NumPy supported vectorized operations can be found here: https://numpy.org/doc/stable/reference/routines.math.html

#### Aggregates

Aggregates are functions that summarize some feature of our data. Examples of aggregates are the mean, total sum, median, maximum element, minimum element, variance and standard deviation. These are the following NumPy functions: `np.mean()`, `np.sum()`, `np.median()`, `np.max()`, `np.min()`, `np.var()`, `np.std()`. An extended list can be found in https://jakevdp.github.io/PythonDataScienceHandbook/02.04-computation-on-arrays-aggregates.html. 

In [11]:
# Some 2D array
print(A)

[[2 7 6]
 [9 5 1]
 [4 3 8]]


In [111]:
np.mean(A)

5.0

In [3]:
# Here we load a NumPy array containing the price of the listings 
airbnb_df=pd.read_csv("airbnb_data.csv")
airbnb_prices=airbnb_df['price'].values 

In [6]:
print("Average value:", np.mean(airbnb_prices))

Average value: 103.36912286110683


In [8]:
print("Max value", np.max(airbnb_prices))

Max value 22400


In [7]:
print("Min value", np.min(airbnb_prices))

Min value 0


#### User defined operations

Using the previous operations we can define our own functions. We for instance redefine a vectorized version of our mean function.

In [114]:
def my_mean(array):
    n=array.size
    return np.sum(array)/n

my_mean(A)
    

5.0

**Quiz:** Create a user-defined function that computes the standard deviation of an array.

### 5.5.2 Linear algebra operations

Numpy also has plenty of functions for doing matrix operations. We will just see basic ones, but for a full list we refer to https://numpy.org/doc/stable/reference/routines.linalg.html .


#### Transpose

Let us first look at the transponse of a matrix:

This can be achieved in NumPy in a different number fo multiple ways:

In [145]:
A=np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])
print(A)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [146]:
np.transpose(A)

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

#### Matrix multiplication

Now lets look at matrix multiplication, which  instead of defining it formally, we recall it from the following example:

$$
\begin{pmatrix}
1 & 2 & 3\\
2 & 3 & 4\\
3 & 4 & 5
\end{pmatrix}\cdot
\begin{pmatrix}
1 & 3 \\
2 & 4 \\
3 & 5
\end{pmatrix}
= 
\begin{pmatrix}
1 * 1+2*2 +3*3 & 1*3 +2*4+3*5\\
2*1+3*2+4*3    & 2*3 +3*4+4*4\\
3*1+4*2+1*3    & 3*3+4*4+5*5
\end{pmatrix}
=\begin{pmatrix}
14 & 26\\
20 & 38\\
26 & 50
\end{pmatrix}
$$

We perform this in NumPy with the `np.matmul(A,B)` function.

In [3]:
A=np.array([
    [1,2,3],
    [2,3,4],
    [3,4,5]  
])
B=np.array([
    [1,3],
    [2,4],
    [3,5]
])

np.matmul(A,B)

array([[14, 26],
       [20, 38],
       [26, 50]])

One can also use the `@` operator to achieve the same result:

In [4]:
print(A@B)

[[14 26]
 [20 38]
 [26 50]]


#### A note on 1D arrays vs $(n,1)$-vectors 

Since in Linear Algebra we many times deal with matrix-vector multiplications, it is important to clarify whether how to work with vectors in NumPy. Usually, an $n$-dimensional vector in mathematics is considered as a $(n,1)$-array as the following:
$$
\begin{pmatrix}
v_1\\
v_2\\
\vdots\\
v_n
\end{pmatrix}
$$

Such a vector is also called a *column vector*, and if it is presented in an horizontal way:
$$
\begin{pmatrix}
v_1,
v_2, \cdots
v_n
\end{pmatrix}
$$

is called *row vector*. 

*Are NumPy 1D  arrays column or row vectors?*

The answer is none of them. If we create three arrays...

In [5]:
a=np.array([1,2,3])
b=np.array([[1],
            [2],
            [3]
           ])
c=np.array([[1,2,3]])

... transponsing such arrays produce the following effects:

In [6]:
np.transpose(a)

array([1, 2, 3])

In [7]:
np.transpose(b)

array([[1, 2, 3]])

In [8]:
np.transpose(c)

array([[1],
       [2],
       [3]])

This modifies  the last two arrays but not the first one. Indeed, if one inspects the shape of a 1D array, we get the following unexpected answer:

In [134]:
a.shape

(3,)

compared to 

In [135]:
b.shape

(3, 1)

or

In [136]:
c.shape

(1, 3)

The moral here is that if you are doing Linear Algebra, do not use 1D arrays. Try to always convert them into 2D dimensional arrays. This can always be achieved with the `np.reshape` method:

In [141]:
a=np.reshape(a,(3,1))
print(a)

[[1]
 [2]
 [3]]


For more information you can watch Andrew Ng discussion on this topic https://www.youtube.com/watch?v=V2QlTmh6P2Y.