## **4. NumPy and Basic Pandas**

### **NumPy**
NumPy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. It also has strong integration with Pandas, which is another powerful tool for manipulating financial data.

Python packages like NumPy and Pandas contain classes and methods which we can use by importing the package:

In [97]:
import numpy as np

**Basic NumPy Arrays**

A NumPy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. Here we make an array by passing a list of Apple stock prices:

In [98]:
# QuantConnect Example
price_list = [143.73, 145.83, 143.68, 144.02, 143.5, 142.62]
price_array = np.array(price_list)
print(price_array, type(price_array))

[143.73 145.83 143.68 144.02 143.5  142.62] <class 'numpy.ndarray'>


In [99]:
# My Example
aList: list = [1,2,3,4,5,6,7,8,9,10]
anArray = np.array(aList)

print(anArray)

[ 1  2  3  4  5  6  7  8  9 10]


Notice that the type of array is "ndarray" which is a multi-dimensional array. If we pass np.array() a list of lists, it will create a 2-dimensional array.



In [100]:
# QuantConnect Example
Ar = np.array([[1,3], [2,4]])
print(Ar, type(Ar))

[[1 3]
 [2 4]] <class 'numpy.ndarray'>


In [101]:
# My Example
maybeMatrix = [
    [1,2,3],[4,5,6],[7,8,9]
  ]

matrix = np.array(maybeMatrix)
print(matrix)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


We get the dimensions of an ndarray using the .shape attribute:



In [102]:
# QuantConnect Example
print(Ar.shape)

(2, 2)


In [103]:
# My Example
print(matrix.shape)

(3, 3)


If we create an 2-dimensional array (i.e. matrix), each row can be accessed by index:



In [104]:
# QuantConnect Example
print(Ar[0])

print(Ar[1])

[1 3]
[2 4]


In [105]:
# My Example
print(f"row 1: {matrix[0,:]}")
print(f"row 2: {matrix[1,:]}")
print(f"row 3: {matrix[2,:]}")

row 1: [1 2 3]
row 2: [4 5 6]
row 3: [7 8 9]


If we want to access the matrix by column instead:



In [106]:
# QuantConnect Example
print('First column:', Ar[:, 0])

print('Second column:', Ar[:, 1])

First column: [1 2]
Second column: [3 4]


In [107]:
# My Example
print(f"column 1: {matrix[:,0]}")
print(f"column 2: {matrix[:,1]}")
print(f"column 3: {matrix[:,2]}")

column 1: [1 4 7]
column 2: [2 5 8]
column 3: [3 6 9]


<br>

**Array Functions**

Some functions built in NumPy that allow us to perform calculations on arrays. For Example, we can apply the natural logarithm to each element of an array:

In [108]:
# QuantConnect Example
np.log(price_array)

array([4.96793654, 4.98244156, 4.9675886 , 4.96995218, 4.96633504,
       4.96018375])

In [109]:
# My Example
np.log(matrix)

array([[0.        , 0.69314718, 1.09861229],
       [1.38629436, 1.60943791, 1.79175947],
       [1.94591015, 2.07944154, 2.19722458]])

Other functions return a single value:



In [110]:
# QuantConnect Example
np.mean(price_array)

print(np.std(price_array))

print(np.sum(price_array))

print(np.max(price_array))

0.9673790478515796
863.38
145.83


In [111]:
# My Example
print(np.mean(matrix))
print(np.std(matrix))
print(np.sum(matrix))
print(np.max(matrix))

5.0
2.581988897471611
45
9


The functions above return the mean, standard deviation, total and maximum value of an array.

<br>



### **Pandas**

Pandas is one of the most powerful tools for dealing with financial data. First we need to import Pandas:

In [112]:
import pandas as pd

**Series**

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, float, Python object, etc.)

We create a Series by calling pd.Series(data), where data can be a dictionary, an array or just a scalar value.

In [113]:
# QuantConnect Example
price_quant = [143.73, 145.83, 143.68, 144.02, 143.5, 142.62]
s = pd.Series(price_quant)
print(s)

0    143.73
1    145.83
2    143.68
3    144.02
4    143.50
5    142.62
dtype: float64


In [114]:
# My Example
someWords: list = ["Hi","there","I","Love","ML!"]
ser = pd.Series(someWords)
print(ser)

0       Hi
1    there
2        I
3     Love
4      ML!
dtype: object


We can customize the indices of a new Series:



In [115]:
# QuantConnect Example
s = pd.Series(price_quant, index = ['a', 'b', 'c', 'd', 'e', 'f'])
print(s)

a    143.73
b    145.83
c    143.68
d    144.02
e    143.50
f    142.62
dtype: float64


In [116]:
# My Example
otherSer = pd.Series(someWords, index = ["a.","b.","c.","d.","e."])
print(otherSer)

a.       Hi
b.    there
c.        I
d.     Love
e.      ML!
dtype: object


Or we can change the indices of an existing Series:



In [117]:
# QuantConnect Example
s.index = [6,5,4,3,2,1]
print(s)

6    143.73
5    145.83
4    143.68
3    144.02
2    143.50
1    142.62
dtype: float64


In [118]:
# My Example
ser.index = [0.0, 0.1, 0.2, 0.3, 0.4]
print(ser)

0.0       Hi
0.1    there
0.2        I
0.3     Love
0.4      ML!
dtype: object


Series is like a list since it can be sliced by index:



In [119]:
# QuantConnect Example
print(s[1:])
print(s[:-2])

5    145.83
4    143.68
3    144.02
2    143.50
1    142.62
dtype: float64
6    143.73
5    145.83
4    143.68
3    144.02
dtype: float64


In [120]:
# My Example
ser = pd.Series(someWords)
print(ser[:2])

0       Hi
1    there
dtype: object


Series is also like a dictionary whose values can be set or fetched by index label:



In [121]:
# QuantConnect Example
print(s[4])
s[4] = 0
print(s)

143.68
6    143.73
5    145.83
4      0.00
3    144.02
2    143.50
1    142.62
dtype: float64


In [122]:
# My Example
print(ser[4])

ML!


Series can also have a name attribute, which will be used when we make up a Pandas DataFrame using several series.



In [123]:
# QuantConnect Example
s = pd.Series(price_quant, name = 'Apple Prices')
print(s)
print(s.name)

0    143.73
1    145.83
2    143.68
3    144.02
4    143.50
5    142.62
Name: Apple Prices, dtype: float64
Apple Prices


In [124]:
# My Example
anotherSer = pd.Series(someWords, name = "A Confession!")
print(anotherSer)

0       Hi
1    there
2        I
3     Love
4      ML!
Name: A Confession!, dtype: object


We can get the statistical summaries of a Series:



In [125]:
# QuantConnect Example
print(s.describe())

count      6.000000
mean     143.896667
std        1.059711
min      142.620000
25%      143.545000
50%      143.705000
75%      143.947500
max      145.830000
Name: Apple Prices, dtype: float64


In [126]:
# My Example
print(ser.describe())

count      5
unique     5
top       Hi
freq       1
dtype: object


**Time Index**

Pandas has a built-in function specifically for creating date indices: pd.date_range(). We use it to create a new index for our Series:

In [127]:
# QuantConnect Example
time_index = pd.date_range('2017-01-01', periods = len(s), freq = 'D')
print(time_index)
s.index = time_index
print(s)

DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06'],
              dtype='datetime64[ns]', freq='D')
2017-01-01    143.73
2017-01-02    145.83
2017-01-03    143.68
2017-01-04    144.02
2017-01-05    143.50
2017-01-06    142.62
Freq: D, Name: Apple Prices, dtype: float64


In [128]:
# My Example
timeIndexes = pd.date_range("01-01-2021", periods = len(ser), freq = "M")
anotherSer.index = timeIndexes

print(timeIndexes, "\n")
print(anotherSer)

DatetimeIndex(['2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30',
               '2021-05-31'],
              dtype='datetime64[ns]', freq='M') 

2021-01-31       Hi
2021-02-28    there
2021-03-31        I
2021-04-30     Love
2021-05-31      ML!
Freq: M, Name: A Confession!, dtype: object


Series are usually accessed using the iloc[] and loc[] methods. iloc[] is used to access elements by integer index, and loc[] is used to access the index of the series.

iloc[] is necessary when the index of a series are integers, take our previous defined series as Example:

In [129]:
# QuantConnect Example
s.index = [6,5,4,3,2,1]
print(s)
print(s[1])

6    143.73
5    145.83
4    143.68
3    144.02
2    143.50
1    142.62
Name: Apple Prices, dtype: float64
142.62


In [130]:
# My Example

anotherSer.index = [1,2,3,4,5]
print(anotherSer, "\n")

print(anotherSer[1])

1       Hi
2    there
3        I
4     Love
5      ML!
Name: A Confession!, dtype: object 

Hi


If we intended to take the second element of the series, we would make a mistake here, because the index are integers. In order to access to the element we want, we use iloc[] here:



In [131]:
# QuantConnect Example
print(s.iloc[1])

145.83


In [132]:
# My Example
print(anotherSer.iloc[1])

there


While working with time series data, we often use time as the index. Pandas provides us with various methods to access the data by time index.



In [133]:
# QuantConnect Example
s.index = time_index
print(s['2017-01-03'])

143.68


In [140]:
# My Example
anotherSer.index = timeIndexes
print(anotherSer["2021-04-30"])

Love


We can even access to a range of dates:



In [135]:
# QuantConnect Example
print(s['2017-01-02':'2017-01-05'])

2017-01-02    145.83
2017-01-03    143.68
2017-01-04    144.02
2017-01-05    143.50
Freq: D, Name: Apple Prices, dtype: float64


In [143]:
# My Example
print(anotherSer["2021-02-28":"2021-04-30"])

2021-02-28    there
2021-03-31        I
2021-04-30     Love
Freq: M, Name: A Confession!, dtype: object


Series[] provides us a very flexible way to index data. We can add any condition in the square brackets:



In [137]:
# QuantConnect Example
print(s[s < np.mean(s)])
print(s[(s > np.mean(s)) & (s < np.mean(s) + 1.64*np.std(s))])

2017-01-01    143.73
2017-01-03    143.68
2017-01-05    143.50
2017-01-06    142.62
Name: Apple Prices, dtype: float64
2017-01-04    144.02
Freq: D, Name: Apple Prices, dtype: float64


In [150]:
# My Example
naturals: list = [1,2,3,4,5]
nSer = pd.Series(naturals)
print(nSer[nSer < np.mean(nSer)])

0    1
1    2
dtype: int64


As demonstrated, we can use logical operators like & (and), | (or) and ~ (not) to group multiple conditions.

<br>

### **Summary**

Here we have introduced NumPy and Pandas for scientific computing in Python. In the next chapter, we will dive into Pandas to learn resampling and manipulating Pandas DataFrame, which are commonly used in financial data analysis.