# Basics of numpy

The NumPy library is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.


## Content

- Basics
    - Creating arrays
    - Inspecting arrays
    - Array Mathematics
    - Arithmetic
    - Array Manipulation

        
- Working with matrices
    - Creating matrices
    - Inspecting matrices
    - Simple linear algebra

In [1]:
import numpy as np

### Basics
- Creating arrays
- Inspecting arrays
- Data types

#### Creating and Inspecting Arrays

In [2]:
# NPV calculation from the previous week -- has some cycles, enumeration etc.

I = 100
CFs = [10, 20, 50, 70]
wacc = 0.12

DCFs = [CF/(1+wacc)**i for i, CF in enumerate(CFs, start=1)]
NPV = -I + sum(DCFs)

print(NPV)

4.947726858600561


#### Task: calculate NPV in one line

In [3]:
I = 100
CFs = [10, 20, 50, 70]
wacc = 0.12

NPV = -I + sum(np.array(CFs) / np.array([(1 + wacc) ** i for i in range(1, 5)]))

print(NPV)

4.947726858600561


### Array Mathematics
- Arithmetic
- Comparisons
- Aggregating
- Sorting

In [4]:
# if we have revenues and costs we probably want to calculate a difference to obtain profits (sorry for simplification)

I = 100

revenues = [50, 70, 42, 68, 100, 89]
costs = [20, 30, 12, 37, 50, 31]

wacc = 0.12

CFs = revenues - costs

TypeError: unsupported operand type(s) for -: 'list' and 'list'

#### Task: calculate CFs in one line without numpy

In [5]:
CFs = []

CFs

[30, 40, 30, 31, 50, 58]

#### Task: calculate CFs in one line with numpy

In [6]:
CFs = 

CFs

array([30, 40, 30, 31, 50, 58])

In [7]:
# we again vectorize our WACCs

waccs = np.ones(CFs.shape) * wacc

waccs

array([0.12, 0.12, 0.12, 0.12, 0.12, 0.12])

In [8]:
# and here we can modify it using indexing

waccs[1] = 0.09
waccs[4:] = 0.14


waccs

array([0.12, 0.09, 0.12, 0.12, 0.14, 0.14])

#### Task: assign values 0.07 to years 3 and 4, and value 0.17 for the last year

In [9]:


waccs

array([0.12, 0.09, 0.07, 0.07, 0.14, 0.17])

In [10]:
# we can calculate a vector of discounting factors using .cumprod()

discount_factors = 1 / (1+waccs).cumprod()

discount_factors

array([0.89285714, 0.81913499, 0.76554672, 0.71546423, 0.6276002 ,
       0.53641043])

In [11]:
# now it looks prettier

DCFs = CFs * discount_factors

NPV = -I + DCFs.sum()

NPV

67.18872144030829

#### Task: calculate a vector of accumulated CFs

In [12]:
# we can look at accumulated CFs

accumulated_dcfs = 

accumulated_dcfs

array([-73.21428571, -40.44888598, -17.48248429,   4.69690674,
        36.0769167 ,  67.18872144])

In [13]:
# and find a payback period (in a kinda weird way)

is_positive = accumulated_dcfs > 0

is_positive

array([False, False, False,  True,  True,  True])

In [14]:
np.where(is_positive == True) 

(array([3, 4, 5]),)

#### Task: find DPP using np.where

In [15]:
DPP = 

DPP

4

### Working with matrices
- Creating matrices
- Inspecting matrices
- Simple linear algebra

In [16]:
b = 0.42
weights = np.array([1., 0.5, 0.2])

n_points = 10 ** 3

In [17]:
np.random.uniform(low=-1, high=1, size=(n_points, 3))

array([[-0.75350653,  0.76585791,  0.94142908],
       [-0.76472776, -0.45625487, -0.70018228],
       [-0.62375478,  0.20373535,  0.61317283],
       ...,
       [-0.75556846, -0.35887221, -0.93829793],
       [-0.43657764,  0.94542502, -0.25469227],
       [-0.78160161,  0.43408745, -0.72608812]])

In [18]:
low = - np.ones((n_points, 3), 'float')
high = np.ones((n_points, 3), 'float')

np.random.seed(42)

X = np.random.uniform(low=low, high=high)

X

array([[-0.25091976,  0.90142861,  0.46398788],
       [ 0.19731697, -0.68796272, -0.68801096],
       [-0.88383278,  0.73235229,  0.20223002],
       ...,
       [ 0.60000696,  0.10541415, -0.20689264],
       [-0.73656994,  0.73059152, -0.68545358],
       [-0.38042428, -0.41990894,  0.74282807]])

In [19]:
low.shape

(1000, 3)

#### Task: create a vector of random samples from normal distribution with std=0.1, n_points=1000

In [20]:
noise_std = 0.1

noise = 

In [21]:
noise.std()

0.1004012969357801

#### Task: calculate a random vector $$y = b + X\theta + \epsilon$$

In [22]:
bias = b * np.ones(n_points).reshape((-1,1))
theta = np.random.uniform(1, 3, size=(3, 1))

Y = 

In [23]:
print(Y.shape)
print(X.shape)

(1000, 1)
(1000, 3)


In [24]:
X.mean(axis=0)

array([ 0.02395998, -0.01155026, -0.01305055])

In [25]:
X.max(axis=0)

array([0.99943535, 0.99669502, 0.99339371])

In [26]:
X.min(axis=0)

array([-0.99973061, -0.99997673, -0.99729275])

In [27]:
X.std(axis=0)

array([0.5733859 , 0.59225948, 0.58309619])

In [28]:
X_plus_ones = np.column_stack((X, np.ones(X.shape[0])))

In [29]:
X_plus_ones

array([[-0.25091976,  0.90142861,  0.46398788,  1.        ],
       [ 0.19731697, -0.68796272, -0.68801096,  1.        ],
       [-0.88383278,  0.73235229,  0.20223002,  1.        ],
       ...,
       [ 0.60000696,  0.10541415, -0.20689264,  1.        ],
       [-0.73656994,  0.73059152, -0.68545358,  1.        ],
       [-0.38042428, -0.41990894,  0.74282807,  1.        ]])

#### Task: implement OLS using numpy matrix operations.

Recall the OLS method for some data matrix $X$, coefficients $\theta$ and a vector of output variables $y$:
$$X \cdot \hat{\theta} = y$$
$$X^T X \hat{\theta} = X^T y$$
Finally:
$$\hat{\theta} = (X^T X )^{-1} X^T y$$ 

Compute the coefficients using `np.matmul`, `np.linalg.inv` and the `.T` operation.

In [30]:
# not the most efficient way (using np.matmul)

theta_analytical = np.matmul(np.matmul(np.linalg.inv(np.matmul(X_plus_ones.T, X_plus_ones)), X_plus_ones.T), Y)

theta_analytical

array([[1.50494153],
       [1.24128807],
       [2.16527107],
       [0.42010717]])

In [31]:
# optimized way (using np.linalg)

theta_analytical_v3 = np.linalg.lstsq(X_plus_ones, Y)[0]

theta_analytical_v3

  theta_analytical_v3 = np.linalg.lstsq(X_plus_ones, Y)[0]


array([[1.50494153],
       [1.24128807],
       [2.16527107],
       [0.42010717]])