# Introduction

The main purpose of this file is to act as reference for NumPy methods and techniques that are generally used in Finance. The code is from Dr. Yves Hilpish's book, <a href='https://www.amazon.co.uk/Python-Finance-Mastering-Data-Driven/dp/1492024333'>Python for Finance 2e: Mastering Data-Driven Finance</a> with some minor adjustments. I highly recommend this book to anyone interested in Python and its applications in Finance. 

# Arrays with Python Lists

One can easily create arrays in Python using list objects. It is also possible to nest lists inside a Python list, creating 'multi-dimensional arrays'.

In [1]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]
m = [v, v, v]

m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

In [2]:
# Accessing elements
m[1][0]

0.5

A problem that can sometimes arise is that when you combine objects this in Python, a change in one list can impact other lists.

In [3]:
# Change the first element of v to be 'Python'
v[0] = 'Python'

# Check the output of m
m

[['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0]]

To avoid this problem, import the <i>deepcopy</i> function from the <i>copy</i> module.

In [4]:
v[0] = 0.5

from copy import deepcopy
m = 3 * [deepcopy(v), ]
m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

In [5]:
v[0] = 'Python'

m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

As you can see, changing <i>v</i> does not impact <i>m</i> anymore.

## The Python array Class

Along with lists, there is also a dedicated <i>array</i> class available in Python. Let's try it out.

In [6]:
import array

v[0] = 0.5

# Instantiate an array with float as its type
a = array.array('f', v)

a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0])

In [7]:
# The array class has similar methods to those of the list object
a.append(0.5)

a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5])

In [8]:
a.extend([3.0, 6.75])

a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 3.0, 6.75])

In [9]:
2 * a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 3.0, 6.75, 0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 3.0, 6.75])

In [10]:
# Trying to append a different data type other than the specified one
# will lead to an error

a.append('Python')

TypeError: must be real number, not str

In [11]:
# An array can be easily converted to a list object

a.tolist()

[0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 3.0, 6.75]

# Regular NumPy Arrays

Working with lists and arrays this way is not convenient or efficient. Luckily, we have <i>numpy.ndarray</i> which allows us to work with n-dimensional arrays in a convenient and efficient manner.

In [12]:
import numpy as np

In [13]:
a = np.array([0.5, 0.75, 1.0, 1.5, 2.0])
a

array([0.5 , 0.75, 1.  , 1.5 , 2.  ])

In [14]:
type(a)

numpy.ndarray

In [15]:
a = np.array(['a', 'b', 'c'])
a

array(['a', 'b', 'c'], dtype='<U1')

In [16]:
# We have the arange function in numpy
# which is very similar to the built-in range function in Python
a = np.arange(2, 20, 2)
a

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [17]:
# It can take as additional input the dtype
a = np.arange(8, dtype=np.float32)
a

array([0., 1., 2., 3., 4., 5., 6., 7.], dtype=float32)

In [18]:
# Slicing here is the same as with normal lists
a[5:]

array([5., 6., 7.], dtype=float32)

In [19]:
a[:2]

array([0., 1.], dtype=float32)

One of the best things about the <i>ndarray</i> class (and really all of NumPy) is the wide variety of built-in methods that allow to accomplish sophisticated tasks using simple lines of code.

In [20]:
# Getting the sum of the array is as simple as calling a method
a.sum()

28.0

In [21]:
# Getting the mean (average) of the array is as simple as calling a method
a.mean()

3.5

In [22]:
# Getting the standard deviation (volatility) of the array is as simple as calling a method
a.std()

2.291288

In [23]:
# Getting the cumulative sum of the array is as simple as calling a method
a.cumsum()

array([ 0.,  1.,  3.,  6., 10., 15., 21., 28.], dtype=float32)

Another great feature is the vectorised mathematical operations.

In [24]:
# Scalar multiplication with lists leads to repititon
l = [0.5, 0.75, 1.0, 1.5, 2.0]

2 * l

[0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 0.75, 1.0, 1.5, 2.0]

In [25]:
# Whereas scalar multiplication with ndarrays leads to proper scalar multiplication
2 * a

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14.], dtype=float32)

In [26]:
# Squaring elements in ndarray
a ** 2

array([ 0.,  1.,  4.,  9., 16., 25., 36., 49.], dtype=float32)

In [27]:
2 ** a

array([  1.,   2.,   4.,   8.,  16.,  32.,  64., 128.], dtype=float32)

In [28]:
a ** a

array([1.00000e+00, 1.00000e+00, 4.00000e+00, 2.70000e+01, 2.56000e+02,
       3.12500e+03, 4.66560e+04, 8.23543e+05], dtype=float32)

Universal functions are another important feature of the NumPy package. They are “universal” in the sense that they in general operate on ndarray objects as well as on basic Python data types. However, when applying universal functions to, say, a Python float object, one needs to be aware of the reduced performance compared to the same functionality found in the math module.

In [29]:
np.exp(a)

array([1.0000000e+00, 2.7182820e+00, 7.3890557e+00, 2.0085537e+01,
       5.4598148e+01, 1.4841316e+02, 4.0342877e+02, 1.0966332e+03],
      dtype=float32)

In [30]:
np.sqrt(a)

array([0.       , 1.       , 1.4142135, 1.7320508, 2.       , 2.236068 ,
       2.4494898, 2.6457512], dtype=float32)

In [31]:
np.sqrt(2.5)

1.5811388300841898

In [32]:
import math

math.sqrt(2.5)

1.5811388300841898

In [33]:
math.sqrt(a)

TypeError: only size-1 arrays can be converted to Python scalars

In [35]:
%timeit np.sqrt(2.5)

331 ns ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [36]:
%timeit math.sqrt(2.5)

18.1 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


The transition to more than one dimension is seamless, and all features presented so far carry over to the more general cases. In particular, the indexing system is made consistent across all dimensions.

In [37]:
b = np.array([a, a * 2])
b

array([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],
       [ 0.,  2.,  4.,  6.,  8., 10., 12., 14.]], dtype=float32)

In [38]:
b[0]

array([0., 1., 2., 3., 4., 5., 6., 7.], dtype=float32)

In [39]:
b[0, 2]

2.0

In [40]:
b[0][2]

2.0

In [41]:
b[:, 1]

array([1., 2.], dtype=float32)

In [42]:
b.sum() # sum of all values

84.0

In [43]:
b.sum(axis=0) # column-wise summation

array([ 0.,  3.,  6.,  9., 12., 15., 18., 21.], dtype=float32)

In [44]:
b.sum(axis=1) # row-wise summation

array([28., 56.], dtype=float32)

There are a number of ways to initialize (instantiate) ndarray objects. One is as presented before, via np.array. However, this assumes that all elements of the array are already available. In contrast, one might like to have the ndarray objects instantiated first to populate them later with results generated during the execution of code.

In [45]:
c = np.zeros((2, 3), dtype='i', order='C')
c

array([[0, 0, 0],
       [0, 0, 0]], dtype=int32)

In [46]:
c = np.ones((2, 3), dtype='i', order='C')
c

array([[1, 1, 1],
       [1, 1, 1]], dtype=int32)

In [47]:
# Create two 3x4 matrices of ones
c = np.ones((2, 3, 4), dtype='i', order='C')
c

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int32)

In [48]:
# Create an array of zeroes and infers the shape from another array
d = np.zeros_like(c, dtype='f16', order='C')
d

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float128)

In [49]:
# Create an array of zeroes and infers the shape from another array
d = np.ones_like(c, dtype='f16', order='C')
d

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]], dtype=float128)

In [50]:
# Creates an empty array.
# Numbers in the array depend on the bits present in memory
e = np.empty((2, 3, 2))
e

array([[[1.36334752e-316, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [2.22809558e-312, 8.38733546e+169]],

       [[1.36257802e+161, 3.40186548e+175],
        [1.71597043e+185, 5.19919129e-062],
        [3.50609175e-033, 1.08516690e-042]]])

In [51]:
f = np.empty_like(c)
f

array([[[  27594461,          0,          0,          0],
        [         0,          0,          0,          0],
        [         0,        105,  828597816, 1664498022]],

       [[1684157283, 1633903153,  926507575, 1684091235],
        [1684431671, 1717842224,  859071286,  859136867],
        [ 862009657,  959591525,  842621793,  926430005]]], dtype=int32)

In [52]:
# Creates a square matrix array of zeroes with diagonal populated by ones
# A square matrix is a matrix whose number of rows = number of its columns
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [53]:
# Creates a one-dimensional ndarray object with evenly spaced intervals between numbers
# Parameters used are start, end, and num (number of elements)
g = np.linspace(5, 15, 12)
g

array([ 5.        ,  5.90909091,  6.81818182,  7.72727273,  8.63636364,
        9.54545455, 10.45454545, 11.36363636, 12.27272727, 13.18181818,
       14.09090909, 15.        ])

## Metainformation

Every ndarray object provides access to a number of useful attribute.

In [54]:
# Number of elements in the array
g.size 

12

In [55]:
# Number of bytes used to represent one element
g.itemsize

8

In [56]:
# Number of dimensions
g.ndim

1

In [57]:
# The shape of the array (number of rows, number of columns)
g.shape

(12,)

In [58]:
# The data type of the elements
g.dtype

dtype('float64')

In [59]:
# Total number of bytes used in memory
g.nbytes

96

## Reshaping and Resizing

Although ndarray objects are immutable by default, there are multiple options to reshape and resize such an object. While reshaping in general just provides another view on the same data, resizing in general creates a new (temporary) object. First, some examples of reshaping.

In [60]:
g = np.arange(15)
g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [61]:
g.shape

(15,)

In [62]:
np.shape(g) # Same as the above

(15,)

In [63]:
# Reshaping to 2 dimensions
# 3 rows and 5 columns
g.reshape((3, 5))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [64]:
# Creating a new object
h = g.reshape((5, 3))
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [65]:
# Transpose of the matrix
# Rows become columns and columns become rows
h.T

array([[ 0,  3,  6,  9, 12],
       [ 1,  4,  7, 10, 13],
       [ 2,  5,  8, 11, 14]])

In [66]:
h.transpose() # Same as the above

array([[ 0,  3,  6,  9, 12],
       [ 1,  4,  7, 10, 13],
       [ 2,  5,  8, 11, 14]])

During a reshaping operation, the total number of elements in the ndarray object is unchanged. During a resizing operation, this number changes—it either decreases (“down-sizing”) or increases (“up-sizing”).

In [67]:
g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [68]:
# 2 dimensions, down-sizing
np.resize(g, (3, 1))

array([[0],
       [1],
       [2]])

In [69]:
# 2 dimensions, down-sizing
np.resize(g, (1, 5))

array([[0, 1, 2, 3, 4]])

In [70]:
# 2 dimensions, down-sizing
np.resize(g, (2, 5))

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [71]:
# 2 dimensions, up-sizing
np.resize(g, (5, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14,  0],
       [ 1,  2,  3,  4]])

Stacking is a special operation that allows the horizontal or vertical combination of two ndarray objects. However, the size of the “connecting” dimension must be the same.

In [72]:
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [73]:
# Horizontal stacking of two ndarrays
# Connecting dimension is the row
np.hstack((h, 2 * h))

array([[ 0,  1,  2,  0,  2,  4],
       [ 3,  4,  5,  6,  8, 10],
       [ 6,  7,  8, 12, 14, 16],
       [ 9, 10, 11, 18, 20, 22],
       [12, 13, 14, 24, 26, 28]])

In [74]:
# Vertical stacking of two ndarrays
# Connecting dimension is the column
np.vstack((h, 0.5 * h))

array([[ 0. ,  1. ,  2. ],
       [ 3. ,  4. ,  5. ],
       [ 6. ,  7. ,  8. ],
       [ 9. , 10. , 11. ],
       [12. , 13. , 14. ],
       [ 0. ,  0.5,  1. ],
       [ 1.5,  2. ,  2.5],
       [ 3. ,  3.5,  4. ],
       [ 4.5,  5. ,  5.5],
       [ 6. ,  6.5,  7. ]])

Another special operation is the flattening of a multidimensional ndarray object to a one-dimensional one. One can choose whether the flattening happens row-by-row (C order) or column-by-column (F order).

In [75]:
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [76]:
h.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [77]:
h.flatten(order='C') # Same as the above

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [78]:
h.flatten(order='F') # Column-wise

array([ 0,  3,  6,  9, 12,  1,  4,  7, 10, 13,  2,  5,  8, 11, 14])

In [79]:
for i in h.flat:
    print(i, end=',')

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,

In [80]:
# ravel() is an alternative to flatten()
for i in h.ravel(order='C'):
    print(i, end=',')

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,

In [81]:
for i in h.ravel(order='F'):
    print(i, end=',')

0,3,6,9,12,1,4,7,10,13,2,5,8,11,14,

## Boolean Arrays