# Basics of numpy

The NumPy library is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.


## Content

- Basics
    - Creating arrays
    - Inspecting arrays
    - Array Mathematics
    - Arithmetic
    - Array Manipulation

        
- Working with matrices
    - Creating matrices
    - Inspecting matrices
    - Simple linear algebra

In [2]:
import numpy as np

### Basics
- Creating arrays
- Inspecting arrays
- Data types

#### Creating and Inspecting Arrays

In [3]:
# NPV calculation from the previous week -- has some cycles, enumeration etc.

I = 100
CFs = [10, 20, 50, 70]
wacc = 0.12

DCFs = [CF/(1+wacc)**i for i, CF in enumerate(CFs, start=1)]
NPV = -I + sum(DCFs)

print(NPV)

4.947726858600561


#### Task: calculate NPV in one line

In [4]:
I = 100
CFs = [10, 20, 50, 70]
wacc = 0.12

NPV = -I + sum(np.array(CFs) / np.array([(1 + wacc) ** i for i in range(1, 5)]))

print(NPV)

4.947726858600561


### Array Mathematics
- Arithmetic
- Comparisons
- Aggregating
- Sorting

In [5]:
# if we have revenues and costs we probably want to calculate a difference to obtain profits (sorry for simplification)

I = 100

revenues = [50, 70, 42, 68, 100, 89]
costs = [20, 30, 12, 37, 50, 31]

wacc = 0.12

CFs = revenues - costs

TypeError: unsupported operand type(s) for -: 'list' and 'list'

#### Task: calculate CFs in one line without numpy

In [22]:
#discounted
CFs = [(x - y)/(1 + wacc)**(1 + t) for t, (x, y) in enumerate(zip(revenues, costs))]
#not discounted
CFs = [(x - y) for x, y in zip(revenues, costs)]
CFs

[30, 40, 30, 31, 50, 58]

#### Task: calculate CFs in one line with numpy

In [23]:
CFs = np.array(revenues) - np.array(costs)

CFs

array([30, 40, 30, 31, 50, 58])

In [12]:
# we again vectorize our WACCs

waccs = np.ones(CFs.shape) * wacc

waccs

array([0.12, 0.12, 0.12, 0.12, 0.12, 0.12])

In [13]:
# and here we can modify it using indexing

waccs[1] = 0.09
waccs[4:] = 0.14


waccs

array([0.12, 0.09, 0.12, 0.12, 0.14, 0.14])

#### Task: assign values 0.07 to years 3 and 4, and value 0.17 for the last year

In [15]:
waccs[2:4] = 0.07

waccs[-1] = 0.17

waccs

array([0.12, 0.09, 0.07, 0.07, 0.14, 0.17])

In [24]:
# we can calculate a vector of discounting factors using .cumprod()

discount_factors = 1 / (1+waccs).cumprod()

discount_factors

array([0.89285714, 0.81913499, 0.76554672, 0.71546423, 0.6276002 ,
       0.53641043])

In [28]:
# now it looks prettier

DCFs = CFs * discount_factors

NPV = -I + DCFs.sum()

NPV

100

#### Task: calculate a vector of accumulated CFs

In [30]:
# we can look at accumulated CFs
print(DCFs)
DCFs[0] -= I

accumulated_dcfs = np.array(DCFs).cumsum()

accumulated_dcfs

[26.78571429 32.76539974 22.96640169 22.17939104 31.38000995 31.11180474]


array([-73.21428571, -40.44888598, -17.48248429,   4.69690674,
        36.0769167 ,  67.18872144])

In [31]:
# and find a payback period (in a kinda weird way)

is_positive = accumulated_dcfs > 0

is_positive

array([False, False, False,  True,  True,  True])

In [14]:
np.where(is_positive == True) 

(array([3, 4, 5]),)

#### Task: find DPP using np.where

In [15]:
DPP = 

DPP

4

### Working with matrices
- Creating matrices
- Inspecting matrices
- Simple linear algebra

In [16]:
b = 0.42
weights = np.array([1., 0.5, 0.2])

n_points = 10 ** 3

In [17]:
np.random.uniform(low=-1, high=1, size=(n_points, 3))

array([[-0.75350653,  0.76585791,  0.94142908],
       [-0.76472776, -0.45625487, -0.70018228],
       [-0.62375478,  0.20373535,  0.61317283],
       ...,
       [-0.75556846, -0.35887221, -0.93829793],
       [-0.43657764,  0.94542502, -0.25469227],
       [-0.78160161,  0.43408745, -0.72608812]])

In [18]:
low = - np.ones((n_points, 3), 'float')
high = np.ones((n_points, 3), 'float')

np.random.seed(42)

X = np.random.uniform(low=low, high=high)

X

array([[-0.25091976,  0.90142861,  0.46398788],
       [ 0.19731697, -0.68796272, -0.68801096],
       [-0.88383278,  0.73235229,  0.20223002],
       ...,
       [ 0.60000696,  0.10541415, -0.20689264],
       [-0.73656994,  0.73059152, -0.68545358],
       [-0.38042428, -0.41990894,  0.74282807]])

In [19]:
low.shape

(1000, 3)

#### Task: create a vector of random samples from normal distribution with std=0.1, n_points=1000

In [37]:
mean = 0
noise_std = 0.1
n_points = 1000

noise = np.random.normal(loc=mean, scale=noise_std, size=n_points)


In [38]:
noise.std()

0.09956780397406875

#### Task: calculate a random vector $$y = b + X\theta + \epsilon$$

In [57]:
b = 5

bias = b * np.ones(n_points).reshape((-1,1))

theta = np.random.uniform(1, 3, size=(3, 1))

X = np.random.uniform(-1, 1, size=(n_points, 3))

Y = X@theta + bias

array([[6.10468971],
       [3.7330815 ],
       [3.36286419],
       [8.55184799],
       [5.22341837],
       [2.60312532],
       [7.88155381],
       [3.79498313],
       [5.48765382],
       [4.78741898],
       [6.13461628],
       [5.2676708 ],
       [6.18382163],
       [2.32953182],
       [6.89025063],
       [6.40986454],
       [5.80355422],
       [2.28859903],
       [2.78126406],
       [6.53918004],
       [3.95099702],
       [8.0171345 ],
       [7.14125571],
       [7.18529383],
       [7.11886831],
       [3.6695437 ],
       [5.42571664],
       [3.76900515],
       [7.3843338 ],
       [7.73616244],
       [2.23374657],
       [6.99814277],
       [7.95183961],
       [1.57040376],
       [3.14321945],
       [6.26287075],
       [3.55271768],
       [6.89552404],
       [4.69863323],
       [4.02769934],
       [3.61250173],
       [5.53755829],
       [5.53425583],
       [6.06780677],
       [4.23160466],
       [4.35300026],
       [3.95701812],
       [5.034

In [58]:
print(Y.shape)
print(X.shape)

(1000, 1)
(1000, 3)


In [60]:
X.mean(axis=0)

array([-0.01169271, -0.0125757 , -0.01702304])

In [61]:
X.max(axis=0)

array([0.99854358, 0.99875338, 0.99566092])

In [62]:
X.min(axis=0)

array([-0.99929973, -0.9995564 , -0.99985517])

In [63]:
X.std(axis=0)

array([0.57405965, 0.56963362, 0.5642005 ])

In [64]:
X_plus_ones = np.column_stack((X, np.ones(X.shape[0])))

In [65]:
X_plus_ones

array([[ 0.49863883,  0.83966622, -0.76593537,  1.        ],
       [-0.59302914, -0.89729386,  0.84229768,  1.        ],
       [-0.26045317, -0.90867848,  0.11509311,  1.        ],
       ...,
       [ 0.94826509,  0.83475758, -0.32731167,  1.        ],
       [-0.63655548,  0.27190486,  0.93379599,  1.        ],
       [-0.29027146,  0.90567242, -0.97806018,  1.        ]])

#### Task: implement OLS using numpy matrix operations.

Recall the OLS method for some data matrix $X$, coefficients $\theta$ and a vector of output variables $y$:
$$X \cdot \hat{\theta} = y$$
$$X^T X \hat{\theta} = X^T y$$
Finally:
$$\hat{\theta} = (X^T X )^{-1} X^T y$$ 

Compute the coefficients using `np.matmul`, `np.linalg.inv` and the `.T` operation.

In [67]:
# not the most efficient way (using np.matmul)

theta_analytical = np.matmul(np.matmul(np.linalg.inv(np.matmul(X_plus_ones.T, X_plus_ones)), X_plus_ones.T), Y)

theta_analytical

array([[1.98843482],
       [1.40859567],
       [1.396425  ],
       [5.        ]])

In [68]:
# optimized way (using np.linalg)

theta_analytical_v3 = np.linalg.lstsq(X_plus_ones, Y)[0]

theta_analytical_v3

  theta_analytical_v3 = np.linalg.lstsq(X_plus_ones, Y)[0]


array([[1.98843482],
       [1.40859567],
       [1.396425  ],
       [5.        ]])