# Python for Algorithmic Trading
[Chapter 4. Mastering Vectorized Backtesting](https://learning.oreilly.com/library/view/python-for-algorithmic/9781492053347/ch04.html)

## Vectorization with NumPy
The NumPy package for numerical computing (cf. `NumPy` [home page](http://numpy.org/)) introduces vectorization to Python. The major class provided by `NumPy` is the `ndarray` class, which stands for _n-dimensional array_. An instance of such an object can be created, for example, on the basis of the `list` object `v`. Scalar multiplication, linear transformations, and similar operations from linear algebra then work as desired:

In [4]:
import numpy as np

In [7]:
v = [1, 2, 3, 4, 5]
a = np.array(v)

In [8]:
type(a)

numpy.ndarray

In [9]:
2 * a

array([ 2,  4,  6,  8, 10])

In [10]:
0.5 * a + 2

array([2.5, 3. , 3.5, 4. , 4.5])

The transition from a one-dimensional array (a vector) to a two-dimensional array (a matrix) is natural. The same holds true for higher dimensions:

In [11]:
a = np.arange(12).reshape((4, 3))
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [12]:
2 * a

array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16],
       [18, 20, 22]])

In [13]:
a ** 2

array([[  0,   1,   4],
       [  9,  16,  25],
       [ 36,  49,  64],
       [ 81, 100, 121]])

In addition, the `ndarray` class provides certain methods that allow vectorized operations. They often also have counterparts in the form of so-called universal functions that `NumPy` provides:

In [14]:
a.mean()

5.5

In [15]:
np.mean(a)

5.5

In [16]:
a.mean(axis=0)

array([4.5, 5.5, 6.5])

In [17]:
a.mean(axis=1)

array([ 1.,  4.,  7., 10.])

### Vectorization with pandas

In [18]:
b = np.arange(15).reshape(5,3)
b

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [19]:
import pandas as pd

In [20]:
columns = list('abc')
columns

['a', 'b', 'c']

In [22]:
index = pd.date_range('2021-7-1', periods=5, freq='B')
index

DatetimeIndex(['2021-07-01', '2021-07-02', '2021-07-05', '2021-07-06',
               '2021-07-07'],
              dtype='datetime64[ns]', freq='B')

In [24]:
df = pd.DataFrame(b, columns=columns, index=index)
df

Unnamed: 0,a,b,c
2021-07-01,0,1,2
2021-07-02,3,4,5
2021-07-05,6,7,8
2021-07-06,9,10,11
2021-07-07,12,13,14


In principle, vectorization now works similarly to `ndarray` objects. One difference is that aggregation operations default to column-wise results:

In [25]:
2 * df

Unnamed: 0,a,b,c
2021-07-01,0,2,4
2021-07-02,6,8,10
2021-07-05,12,14,16
2021-07-06,18,20,22
2021-07-07,24,26,28


In [26]:
df.sum()

a    30
b    35
c    40
dtype: int64

In [27]:
np.mean(df)

a    6.0
b    7.0
c    8.0
dtype: float64

Column-wise operations can be implemented by referencing the respective column names, either by the bracket notation or the dot notation:

In [28]:
df['a'] + df['c']

2021-07-01     2
2021-07-02     8
2021-07-05    14
2021-07-06    20
2021-07-07    26
Freq: B, dtype: int64

In [29]:
0.5 * df.a + 2 * df.b - df.c

2021-07-01     0.0
2021-07-02     4.5
2021-07-05     9.0
2021-07-06    13.5
2021-07-07    18.0
Freq: B, dtype: float64

Similarly, conditions yielding Boolean results vectors and SQL-like selections based on such conditions are straightforward to implement:

In [31]:
df['a'] > 5

2021-07-01    False
2021-07-02    False
2021-07-05     True
2021-07-06     True
2021-07-07     True
Freq: B, Name: a, dtype: bool

Select all those rows where the element in column `a` is greater than five.

In [33]:
df[df['a'] > 5]

Unnamed: 0,a,b,c
2021-07-05,6,7,8
2021-07-06,9,10,11
2021-07-07,12,13,14


For a vectorized backtesting of trading strategies, comparisons between two columns or more are typical:

In [34]:
df['c'] > df['b']

2021-07-01    True
2021-07-02    True
2021-07-05    True
2021-07-06    True
2021-07-07    True
Freq: B, dtype: bool

Condition comparing a linear combination of columns `a` and `b` with column `c`.

In [35]:
0.15 * df.a + df.b > df.c

2021-07-01    False
2021-07-02    False
2021-07-05    False
2021-07-06     True
2021-07-07     True
Freq: B, dtype: bool