# Section 2 - NumPy
## Author: Gustavo Amarante

NumPy is a fundamental library for scientific computing. Specially useful because it provides a **multidimensional array object** (essentially matrices and linear algebra) but also for its mathematical functions, statistical methods and random number generators, all of them optimized for speed.

Here, we will go through the most important features. You can find more details in this [quickstart tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html) or deep dive in the [full numpy manual](https://docs.scipy.org/doc/numpy/contents.html).

Since `numpy` is not a standard library in python, we have to import it. The `np` is the **alias** of the library, it is a way to shorten the code.

In [1]:
import numpy as np

## Mathematical Constants and Computations

In [2]:
np.pi

3.141592653589793

In [3]:
np.e

2.718281828459045

In [4]:
np.sin(np.pi / 2)

1.0

In [5]:
np.mean([2, 3, 4, 5])

3.5

## Array Creation
Arrays are the main advantage of the numpy library. Arrays can be created with the following structure.

In [6]:
np.array([2, 3, 4])

array([2, 3, 4])

Notice that the `np.array()` command only accepts **one argument**, one input: a list of elements.

In [7]:
np.array([3, 5, 7])  # RIGHT

np.array(3, 5, 7)  # WRONG

ValueError: only 2 non-keyword arguments accepted

To create a **multidimensional array**, you need to give a "list of lists"

In [8]:
my_array = np.array([[2, 3, 4] ,[6, 7, 8]])
my_array

array([[2, 3, 4],
       [6, 7, 8]])

After creating an array you can check its **atributes**

In [9]:
my_array.shape

(2, 3)

In [10]:
my_array.ndim

2

Although they are a bit counter intuitive at first, you can create arrays with more than 2 dimensions. For example, to create a 3-dimensional array you need a *"list of lists of lists"*.

In [11]:
my_3d_array = np.array([[[2, 3, 4],[6, 7, 8]],[[2, 3, 4],[6, 7, 8]],[[2, 3, 4] ,[6, 7, 8]]])
my_3d_array

array([[[2, 3, 4],
        [6, 7, 8]],

       [[2, 3, 4],
        [6, 7, 8]],

       [[2, 3, 4],
        [6, 7, 8]]])

In [12]:
my_3d_array.shape

(3, 2, 3)

In [13]:
my_3d_array.ndim

3

Of course we are not going to create these multidimensional structures by hand, we will eventually have programs that create them for us.

## Special Matrices
There are functionalities to create import arrays. When specifying the shape of a multidimensional array, you should use a tuple.

In [14]:
np.ones((5, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [15]:
np.zeros((4, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [16]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [17]:
5 * np.eye(4)

array([[5., 0., 0., 0.],
       [0., 5., 0., 0.],
       [0., 0., 5., 0.],
       [0., 0., 0., 5.]])

## Linearly Spaced Values
Without NumPy, the `range` method could only create list of integers. NumPy has a similar function called `np.arange()` that accpets floats as arguments and outputs. The arguments of the function follow the pythonic interval convetion: It includes the first argument and excludes the second.

In [18]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
np.arange(3, 10)

array([3, 4, 5, 6, 7, 8, 9])

In [20]:
np.arange(3, 10, 0.25)

array([3.  , 3.25, 3.5 , 3.75, 4.  , 4.25, 4.5 , 4.75, 5.  , 5.25, 5.5 ,
       5.75, 6.  , 6.25, 6.5 , 6.75, 7.  , 7.25, 7.5 , 7.75, 8.  , 8.25,
       8.5 , 8.75, 9.  , 9.25, 9.5 , 9.75])

## Reshaping Arrays
The input of the `.reshape()` method must be a **tuple** with the new shape.

In [21]:
np.arange(1, 2, 0.3).reshape((2, 2))

array([[1. , 1.3],
       [1.6, 1.9]])

In [22]:
a = np.arange(30)
a.shape

(30,)

When reshaping an array, you must give a valid new shape, where all values can fit inside. If you do not what is the correct size use a `-1` for that dimension. Then, numpy will figure out the correct size for you.

In [23]:
a.shape = 5, -1      # -1 means "whatever size is needed"
a

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

## Basic Array Operations
Arithmetic operations apply **elementewise**.

In [24]:
a = np.array([20, 30, 40, 50])
print(a)

b = np.arange(4)
print(b)

[20 30 40 50]
[0 1 2 3]


In [25]:
a + b

array([20, 31, 42, 53])

In [26]:
a*b

array([  0,  30,  80, 150])

In [27]:
a**b

array([     1,     30,   1600, 125000])

## Linear Algebra

Linear algebra operations between arrays require specific commands.

In [28]:
A = np.array([[1, 1], 
              [0, 1]])

B = np.array([[2, 0], 
              [3, 4]])

Traditional matrix multiplication (dot product) can be done in two ways:

In [29]:
np.dot(A, B)

array([[5, 4],
       [3, 4]])

In [30]:
A.dot(B)

array([[5, 4],
       [3, 4]])

The transpose of a matrix can be done in three ways

In [31]:
A.transpose()

array([[1, 0],
       [1, 1]])

In [33]:
np.transpose(A)

array([[1, 0],
       [1, 1]])

In [34]:
A.T

array([[1, 0],
       [1, 1]])

The Numpy library contains more complex linear algebra functions, like the cholesky decomposition or finding eigenvalues and eigenvectors. They are inside a separate **sublibrary** called `linalg`.

In [35]:
Sigma = np.array([[1.0, 0.3, 0.7],
                  [0.3, 1.0, 0.5],
                  [0.7, 0.5, 1.0]])

np.linalg.cholesky(Sigma)

array([[1.        , 0.        , 0.        ],
       [0.3       , 0.9539392 , 0.        ],
       [0.7       , 0.3040026 , 0.64620617]])

In [36]:
values, vectors = np.linalg.eig(Sigma)

print(values)
print(' ') # prints a blank line
print(vectors)

[2.0179834  0.26126421 0.7207524 ]
 
[[-0.58739683 -0.60528479  0.53721065]
 [-0.4896118  -0.26275949 -0.83140708]
 [-0.64439526  0.75139056  0.14201048]]


In [37]:
print(np.diag(Sigma))
print(np.trace(Sigma))
print(np.linalg.det(Sigma))

[1. 1. 1.]
3.0
0.38000000000000006


**OBS**: If you are using a lot of functions from the `linalg` sublibrary and want your code to be more concise, you can import the sublibrary with an alias as well.

```python
import numpy as np
import numpy.linalg as la
```

## Indexing and Slicing
The same indexing logic from python lists is used with numpy arrays

In [38]:
A = np.arange(25).reshape((5, 5))
A

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [39]:
A.shape

(5, 5)

In [40]:
A[0:2, 0:2]

array([[0, 1],
       [5, 6]])

In [41]:
A[-3:-1, -3:-1]

array([[12, 13],
       [17, 18]])

In [42]:
A[:, -1]

array([ 4,  9, 14, 19, 24])

## Iterating on Arrays

Iterating over a multidimensional array is done with respect to the first axis

In [43]:
for row in A:
    print(row)

[0 1 2 3 4]
[5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]


To iterate over all values, you could do it two ways

In [44]:
for row in A:
    for col in row:
        print(col)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


In [45]:
for elem in A.flat:
    print(elem)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


## Stacking

In [46]:
a = np.array([[1, 2],
              [3, 4]])

b = np.array([[5, 6],
              [7, 8]])

In [47]:
np.vstack((a, b))

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [48]:
np.hstack((a, b))

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

---
# Practical Example - Combining assets to build a portfolio.
Consider that you have 3 assets available. Their expected returns, risks (standard-deviations) and betas are on the table bellow and $\rho$ is the correlation matrix of their returns.

| Asset | Return | Risk | Beta |
|-------|--------|------|------|
|A      |3%      | 10%  | 0.5  |
|B      |3.5%    | 11%  | 1.2  |
|C      |5%      | 15%  | 1.8  |

$$
\rho = 
\begin{bmatrix}
1 & 0.3 & -0.6 \\
0.3 & 1 & 0 \\
-0.6 & 0 & 1 
\end{bmatrix}
$$

Choose the weigthts of each asset of the portfolio and calculate the expected return, risk and beta of the resulting portfolio.

In [49]:
retu = np.array([0.03, 0.035, 0.05])
risk = np.array([0.10, 0.11, 0.15])
beta = np.array([0.5, 1.2, 1.8])

corr = np.array([[1, 0.3, -0.6], 
                 [0.3, 1, 0],
                 [-0.6, 0, 1]])

Select the weights of $A$, $B$ and $C$ in the portfolio

In [50]:
weights = np.array([0.3, 0.6, 0.1])

Expected return and beta are a simple weighted average.

In [51]:
port_retu = retu.dot(weights)
print('Portfolio return is', port_retu)

port_beta = beta.dot(weights)
print('Portfolio beta is', port_beta)

Portfolio return is 0.035
Portfolio beta is 1.05


The standard deviation requires a few mor lines

In [52]:
covar = np.diag(risk).dot(corr).dot(np.diag(risk))
covar

array([[ 0.01  ,  0.0033, -0.009 ],
       [ 0.0033,  0.0121,  0.    ],
       [-0.009 ,  0.    ,  0.0225]])

In [53]:
port_risk = (weights.dot(covar).dot(weights))**0.5
print('portfolio risk is', port_risk)

portfolio risk is 0.07828793010419932


---
## Vectorizing Functions
Converts an ordinary Python function which accepts scalars and returns scalars into a “vectorized-function”, meaning that you can give the function arrays and it returns and array of outputs.

In [54]:
def addsubtract(a,b):
   if a > b:
       return a - b
   else:
       return a + b
    
vec_addsubtract = np.vectorize(addsubtract)

In [55]:
vec_addsubtract([0,3,6,9],[1,3,5,7])

array([1, 6, 1, 2])

## Polynomials
This class accepts coefficients or polynomial roots to initialize a polynomial. The polynomial object can then be manipulated in algebraic expressions, integrated, differentiated, and evaluated. It even prints like a polynomial.

In [56]:
p = np.poly1d([3,4,5])
p

poly1d([3, 4, 5])

In [57]:
print(p)

   2
3 x + 4 x + 5


In [58]:
print(p*p)

   4      3      2
9 x + 24 x + 46 x + 40 x + 25


In [59]:
print(p.deriv())

 
6 x + 4


In [60]:
p(6)

137

In [61]:
p.roots

array([-0.66666667+1.1055416j, -0.66666667-1.1055416j])

## Random Number Generator

Every major statistical distribution has a random number generator built in numpy. You can find the guide for all of them [here](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html).

In [62]:
import numpy.random as rnd

In [65]:
rnd.rand(10)

array([0.95069378, 0.1349457 , 0.42278928, 0.73183166, 0.8803338 ,
       0.46961804, 0.75399898, 0.77387444, 0.77861025, 0.54888694])

In [70]:
rnd.randint(1, 10)

2

In [135]:
rnd.poisson(2.2, (3,3))

array([[4, 3, 1],
       [2, 3, 2],
       [2, 2, 1]])

In [116]:
rnd.choice(['A', 'B', 'C'])

'C'

When working with simulations in the academic world, it is good practice to provide the seed of the random number generator in order to make the results replicable. The numpy library uses the [Mersenne Twister pseudo-random number generator](https://en.wikipedia.org/wiki/Mersenne_Twister), just like every major software and programming language.

In [1]:
rnd.seed(123)
rnd.rand(1)

NameError: name 'rnd' is not defined