# **Numpy**
## **MSc in Mathematics and Finance 2024-2025**
---
<img src="Imperial_logo.png" align = "left" width=250>
 <br><br><br> 

# What is Numpy?

#### NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.


# Why is it useful for Math Finance?

#### In finance, Numpy's ability to handle vast arrays of numerical data efficiently makes it invaluable for tasks like portfolio optimization, where complex calculations on price data, correlations, and volatilities are essential.  Its linear algebra capabilities power algorithms for risk management, allowing analysts to quickly compute Value at Risk (VaR) and stress test portfolios under various market scenarios.  Furthermore, Numpy's speed and efficiency are crucial for pricing complex financial instruments like derivatives, where numerical methods such as Monte Carlo simulations rely heavily on its ability to perform rapid calculations on large datasets.ies

# What's behind Numpy?

#### Numpy exposes a set of libraries written in C/Fortran. This libraries have been optimized over years to give the best performance possible on a CPU to perform linear algebra operations and mathematical functions.

#### **Hint:** When trying to speed up your code, often times efficiently translating into numpy will yield dramatic speed improvements

# NUMPY INSTALLATION

## Installing Python packages 

#### `pip install numpy` in the OS command line or anaconda prompt.

In [1]:
import numpy as np
print(np.__version__)

2.2.2


#### When importing a library in python, one can use the library name, `numpy` in this case, or create an alias or nickname. The most common alias for numpy in the python community is `np`.

#### All python libraries have a version number, that can be printed calling the `__version__` method. It is important to be aware of the version currently running in order to be able to adapt the code accordingly.

## Standard Versioning (e.g., Semantic Versioning)

#### **Major Version** (e.g., 2 in NumPy 2.x): Indicates significant changes that might break compatibility with earlier versions. This could involve major feature additions, removal of old features, or changes to core functionality. Upgrading to a new major version often requires code modifications.
#### **Minor Version** (e.g., 0 in 2.0) : Represents smaller updates, often adding new features or improvements while maintaining backward compatibility. Generally, upgrading to a new minor version should not break existing code.
#### **Patch Version** (e.g., 1 in 2.0.1): Indicates bug fixes, security patches, and other minor updates that don't introduce new features. These are usually safe to upgrade without any code changes.

# Numpy 2.X.X released on June 2024



#### Numpy 2.X was released very recently and this has been the first major release since 2006! This is an excellent example that the Python community is very active and costantly improving the available set of libraries for the community. A summary of the enhancements is given here:

**Improved Performance** : Significant performance enhancements were introduced, particularly for certain array operations and linear algebra routines.

**New Data Type API** : A new public API for defining custom data types (dtypes) was introduced, enabling developers to extend NumPy's capabilities and create specialized data structures.

**Deprecation of Python 2 Support** : NumPy 2.x dropped support for Python 2, aligning with the broader Python community's shift towards Python 3.

**Changes to the C API** : The C API underwent some modifications, mainly to improve internal consistency and facilitate future development. This might require updates for code that directly interacts with the NumPy C API.

**Enhanced Error Handling** : Improvements were made to error handling and reporting, providing more informative messages and making it easier to debug issues.

# The Building Blocks of Scientific Computing: Arrays

#### In the world of programming and scientific computing, arrays are often used to represent mathematical tensors.  Think of a vector as a list of numbers that has both magnitude and direction.  An array, which is essentially an ordered collection of elements, can perfectly capture this information. Each element in the array corresponds to a component of the vector.

#### For example, the vector (1, 5, 2, 0) can be represented as a one-dimensional array [1, 5, 2, 0]. This makes it easy to perform mathematical operations on vectors using arrays, such as:
![1D_ARRAY.png](attachment:bfd4207a-6b6f-4ce2-84df-d87a0743e061.png)

#### Likewise a matrix can be represented as nested 1d arrays or nested vectors. 

![2D_ARRAY.png](attachment:0f1236f3-6257-4104-94ba-0215d58d0117.png)

#### This matrix can be represented as [[1, 5, 2, 0],[8, 3, 6, 1],[1, 7, 2, 9]]


#### Higher dimensional tensors are harder to visualize but are often encountered in finance. A common use case of 3D tensors for example, are time-series of covariance matrices


# Array Fundamentals

#### The syntax to define an array in numpy is `np.array()` https://numpy.org/doc/stable/reference/generated/numpy.array.html . Tipically a list object is passed to the function to construct the desired dimensionality array.

#### A useful method to apply to an array is `.shape` which returns a tuple with the sizes of different dimensions of the array

In [2]:
vector = np.array([1, 2, 3, 4, 5, 6])
print("Shape:",vector.shape)
print(type(vector.shape))
vector

Shape: (6,)
<class 'tuple'>


array([1, 2, 3, 4, 5, 6])

In [3]:
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print("Shape:",matrix.shape)
matrix

Shape: (2, 3)


array([[1, 2, 3],
       [4, 5, 6]])

In [4]:
three_tensor = np.array([[[1, 2], [3, 4]],[[5, 6], [7, 8]]])
print("Shape:",three_tensor.shape)
three_tensor

Shape: (2, 2, 2)


array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

# Numpy Arrays vs. Python containers (Lists,Tuples,Sets,...)


#### The biggest difference between numpy arrays and python containers is that numpy arrays hold homogeneous types. This means that each element of a numpy array is of the same type. This homogeneity, allows to optimize how the data is handled and stored:



#### **NumPy Arrays** : Homogeneous, meaning they hold elements of the same data type (e.g., all integers or all floats). This allows for efficient storage and optimized operations.
#### **Python Lists**: Heterogeneous, meaning they can hold elements of different data types (e.g., a mix of integers, floats, and strings). This offers flexibility but comes with performance overhead.
  

# Crash course on data types and computer representation

#### Computers store numbers as sequences of bits (0s and 1s).  Different data types use varying numbers of bytes (8 bits each) to represent numbers with different ranges and precisions:

#### Integers: Whole numbers.

##### int8: 1 byte, -128 to 127

##### int32: 4 bytes, -2,147,483,648 to 2,147,483,647

#### Floating-Point: Numbers with decimal points.

##### float32: 4 bytes, approximately 7 decimal digits of precision

##### float64: 8 bytes, approximately 15 decimal digits of precision

#### The specific way bits represent a number depends on the chosen encoding (e.g., two's complement for signed integers, IEEE 754 for floating-point).

![data_bytes.png](attachment:ccf2cd93-6f53-4a6f-a8d0-f63576fb8f84.png)

# How do computers store arrays in-memory (RAM)?

#### Computers store arrays in RAM (Random Access Memory) using a contiguous block of memory locations. Here's how it works:   

#### **Memory Allocation**: When you create an array, the computer allocates a chunk of consecutive memory locations large enough to hold all the elements of the array. The size of this block depends on the data type of the elements and the number of elements in the array.   

#### **Element Storage**: Each element in the array is stored in a specific memory location within this block. The elements are typically stored in order, with the first element at the beginning of the block and subsequent elements following sequentially.   

#### **Addressing**: The computer keeps track of the starting memory address of the array. To access any element in the array, it calculates the address of that element by adding an offset to the starting address. The offset is determined by the index of the element and the size of each element in bytes.   

#### Python efficiently stores arrays in contiguous memory blocks


![array_structure.png](attachment:95fea093-681a-4e07-8e6e-3f610bc28629.png)

![memory_layout.png](attachment:2c00d929-8c08-4b44-94b6-4b8d722c5fe4.png)

In [5]:
vector

array([1, 2, 3, 4, 5, 6])

In [6]:
vector.data

<memory at 0x000001AE5C734C40>

In [7]:
vector.strides

(8,)

In [8]:
vector.nbytes

48

# Data Types and promotions when mixing types

#### As we discussed in the first session, the behaviour when performing operations on different data types needs to be carefully understood. When using numpy arrays, we can use the `.dtype` method returns data type of the array (remember that arrays hold homogeneous data types)

In [9]:
vector.dtype

dtype('int64')

In [10]:
vector

array([1, 2, 3, 4, 5, 6])

In [11]:
vector_float=np.array([1.5,2.5,3.5,4.5,5.5,6.5])
vector_float.dtype

dtype('float64')

In [12]:
vector+vector_float

array([ 2.5,  4.5,  6.5,  8.5, 10.5, 12.5])

![data_types.png](attachment:eee963c4-e307-4187-81d1-99ac44ef87ec.png)

# To sum up
#### In general, numpy will behave as one expects, promoting smaller precision types to higher precision ones

# Array indexing and slicing

#### You can index and slice NumPy arrays in the same ways you can slice Python lists.

In [13]:
data = np.array([1, 2, 3])

print("data[0]:",data[0])

print("data[1]:",data[1])

print("data[0:2]:",data[0:2])

print("data[1:]:",data[1:])

print("data[-2:]:",data[-2:])


data[0]: 1
data[1]: 2
data[0:2]: [1 2]
data[1:]: [2 3]
data[-2:]: [2 3]


![indexing.png](attachment:d7e4d0e6-4f71-4e34-b8f4-298ee9926d4d.png)

# Useful functions for  array creation

#### `np.zeros(dim)` creates an array filled with zero value of the desired dimension 
#### `np.ones(dim)` creates an array filled with ones value of the desired dimension 
#### `np.arange(start, stop,step)` creates an array of integers with evenly spaced values with a step size within a given interval [start,stop).
#### `np.empty(dim)` creates an empty array that is randomly initializes --> Careful not to use uninitialized values
#### `np.linspace(start, stop, num)` creates an array with evenly spaced values with a step size within a given interval [start,stop].


In [14]:
np.zeros(3)
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [15]:
np.ones((4,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [16]:
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

In [17]:
np.empty((2,10))

array([[6.23042070e-307, 4.67296746e-307, 1.69121096e-306,
        1.86921007e-306, 1.86921686e-306, 1.89146896e-307,
        7.56571288e-307, 3.11525958e-307, 1.24610723e-306,
        1.37962320e-306],
       [1.29060871e-306, 2.22518251e-306, 1.33511969e-306,
        1.78022342e-306, 1.05700345e-307, 3.11525958e-307,
        2.13619585e-306, 1.42420209e-306, 8.34420522e-308,
        2.07507571e-322]])

In [18]:
np.linspace(0.0, 1.0,10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

# Adding, removing, and sorting elements

#### Numpy arrays are not inmutable, as values can be replaced. **However numpy array can be understood as inmutable when it comes to memory allcoation and size. Note that, most methods return a brand new copy of the array**

#### `np.sort(array)` Return a sorted copy of an array.
#### `np.insert(array,position,value)`  inserts a value in the desired position-> very inneficcient
#### `np.delete(array,position)` deletes a value in the specified position-> very inneficcient
#### `np.concatenate((a, b))` concatenate arrays a,b (dimensions must be compatible)


In [19]:
my_array=np.array([5,4,3,2,1])

In [20]:
### Note that the original array stays unchanged
np.sort(my_array)

array([1, 2, 3, 4, 5])

In [21]:
np.insert(my_array,0,10)

array([10,  5,  4,  3,  2,  1])

In [22]:
np.delete(my_array,0)

array([4, 3, 2, 1])

In [23]:
### Note that the original array stays unchanged

In [24]:
my_array

array([5, 4, 3, 2, 1])

In [25]:
my_second_array=np.array([10,9,8,7,6])

In [26]:
np.concatenate((my_array,my_second_array))

array([ 5,  4,  3,  2,  1, 10,  9,  8,  7,  6])

In [27]:
np.delete(my_array,0)

array([4, 3, 2, 1])

# Strenghs and Weaknesses of lists vs numpy arrays

Lists are useful when the data types of contents are non-homogeneous and the size of the list is unknown beforehand or requires append and delete operations.

Numpy arrays are efficient for data types that are homogeneous and sizes that are known beforehand

# Array reductions

#### Array reductions are operations applied on a given axis of an array such that the dimension of the array is reduced by one.

#### `np.sum(array,axis)`  Adds  all the elements in your array along a given axis
#### `np.prod(array,axis)`  Multiplies  all the elements in your array along a given axis
#### `np.max(array,axis) / np.min(array,axis)`  Finds max/min element in your array along a given axis
#### `np.mean(array,axis) / np.median(array,axis)` Computes mean/median in your array along a given axis
#### `np.std(array,axis) / np.var(array,axis)` Computes variance/standard deviation in your array along a given axis
#### `np.argmin((array,axis) / np.argmax((array,axis)` Computes the index of the min/max element in your array along a given axis

In [60]:
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2, 3],[4, 5, 6]])

In [61]:
np.sum(arr)

np.int64(15)

In [62]:
np.sum(matrix,0)

array([5, 7, 9])

In [63]:
np.sum(matrix,1)

array([ 6, 15])

In [64]:
np.prod(arr)

np.int64(120)

In [65]:
np.max(matrix,1)

array([3, 6])

In [66]:
np.min(matrix,0)

array([1, 2, 3])

In [67]:
np.argmax(matrix,1)

array([2, 2])

In [68]:
np.argmin(matrix,0)

array([0, 0, 0])

In [69]:
random_sample=np.random.normal(loc=0.5,scale=5,size=20000) # this creates a sample from the normal distribution (see below)

In [70]:
np.mean(random_sample)

np.float64(0.5346763596597058)

In [71]:
np.std(random_sample)

np.float64(4.973974078076211)

# Creating arrays of random variables

#### Random variables are a core part of financial computing. Numpy provides a simple interface to compute arrays of random varaibles

#### `np.random.rand(dim1,dim2,...)` $X\sim Uniform(0,1)$
#### `np.random.randn(dim1,dim2,...)` $X\sim N(0,1)$
#### `np.random.normal(loc,scale,size)` $X\sim N(\mu,\sigma)$ size is a tuple
#### `np.random.binomial(n,p,size)`  $X\sim Binomial(n,p)$  size is a tuple
#### `np.random.poisson(lam,size)` $X\sim Poisson(\lambda)$  size is a tuple
#### `np.random.exponential(scale,size)` $X\sim Exponential(\lambda)$  size is a tuple

In [72]:
np.random.randn(10)

array([-1.39349312,  0.84343336, -0.74838974,  0.69696246, -0.02318777,
        0.31478762, -0.3274646 ,  0.22589291, -0.01273708,  0.59368282])

In [73]:
np.random.randn(2,4)

array([[-0.22367506, -1.25800369,  1.24562282,  0.53223233],
       [-0.8918709 ,  0.70676774, -0.06016479, -2.26525337]])

In [74]:
np.random.binomial(10,0.5,(10,2))

array([[4, 4],
       [4, 3],
       [7, 3],
       [4, 8],
       [4, 4],
       [7, 5],
       [4, 3],
       [5, 5],
       [2, 5],
       [4, 5]], dtype=int32)

In [75]:
np.random.exponential(1,(10,2))

array([[0.01844066, 0.28035011],
       [0.97648061, 0.97387662],
       [0.35162146, 1.03883675],
       [1.29352203, 0.10715651],
       [0.83067969, 0.86452983],
       [0.03751537, 0.80676289],
       [1.10285924, 0.0794288 ],
       [1.31001252, 0.3161974 ],
       [0.43740673, 1.48607305],
       [2.41858559, 0.15517049]])

# Random seed and reproducibility

#### In NumPy's pseudorandom number generation, a seed acts as an initialization point for the underlying algorithm.  This integer value determines the sequence of numbers produced by the generator. While the sequence appears random, it's deterministic, meaning the same seed will always yield the same sequence. This property is crucial for reproducibility and debugging.

#### **Setting the Seed:** NumPy provides the `np.random.seed()` function for this purpose

#### Subsequent calls to random number generation functions (e.g., np.random.rand(), np.random.randn()) will then produce a predictable sequence based on the seed.

In [76]:
np.random.seed(1234)
np.random.randn(10)

array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

In [77]:
np.random.seed(1234)
np.random.randn(10)

array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

# Numpy and Lineal Algebra (Linalg subpackage)

#### NumPy is a powerful Python library for numerical computing, and it provides excellent support for linear algebra operations through its linalg module. Here's a breakdown of the basics:

In [78]:
import numpy as np
vector_a = np.array([1, 2, 3])

matrix_A = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

#### **1)Scalar Multiplication:** Multiplying a vector or matrix by a scalar.

In [79]:
result = 2 * vector_a 
print(result)
result = 0.5 * matrix_A
print(result)

[2 4 6]
[[0.5 1.  1.5]
 [2.  2.5 3. ]
 [3.5 4.  4.5]]


##### **Carefull** with type promotion. 

#### **2) Addition and Subtraction:** Element-wise addition or subtraction between vectors or matrices of the same shape.

In [80]:
vector_b = np.array([4, 5, 6])
result = vector_a + vector_b  # result = [5, 7, 9]
result = matrix_A - matrix_A  # result = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

#### **3) Dot product:**  `np.dot(a,b)` computes the dot product for 1D arrays  $$\mathbf a \cdot \mathbf b = \sum_{i=1}^n a_i b_i = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n$$
for 2D arrays `np.dot(a,b)` computes the matrix multiplication $\mathbf a  \mathbf b$, this is equivalent to using  `a@b`or  `np.matmul(a,b)`



In [81]:
result = np.dot(vector_a, vector_b)  # result = 32
matrix_B = np.array([[10, 11, 12],
                     [13, 14, 15],
                     [16, 17, 18]])
result = np.dot(matrix_A, matrix_B) 
print(result)

[[ 84  90  96]
 [201 216 231]
 [318 342 366]]


In [82]:
matrix_A@matrix_B

array([[ 84,  90,  96],
       [201, 216, 231],
       [318, 342, 366]])

In [83]:
np.matmul(matrix_A, matrix_B)

array([[ 84,  90,  96],
       [201, 216, 231],
       [318, 342, 366]])

#### **4) Transpose:**  Flipping a matrix over its diagonal.

In [84]:
 matrix_A.T 

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

#### **5) Norm:** `np.linalg.norm(a)` computes the euclidean norm of a :

$$||\boldsymbol{a}||_2 := \sqrt{a_1^2 + \cdots + a_n^2}$$

In [85]:
np.linalg.norm(vector_a)

np.float64(3.7416573867739413)

In [86]:
np.linalg.norm(matrix_A)

np.float64(16.881943016134134)

#### **6) Linear system of equations:** `np.linalg.solve(A,b)` solves the linear set of equations $Ax=b$ where A is a matrix and b is a vector with compatible dimensions



In [87]:
A = np.array([[2, 4], [5, 8]])
b = np.array([5, 6])
x = np.linalg.solve(A, b)
x# Solves Ax = b

array([-4.  ,  3.25])

#### **7) Matrix determinant, Eigenvalues and Inverse:** `np.linalg.det(A)`, `np.linalg.eig(A)`, `np.linalg.inv(A)` compute the determinant, eigenvalues and inverse of a matrix respectively

In [88]:
np.linalg.det(A)

np.float64(-3.999999999999999)

In [89]:
inverse=np.linalg.inv(A)
inverse

array([[-2.  ,  1.  ],
       [ 1.25, -0.5 ]])

In [90]:
inverse@A

array([[ 1.00000000e+00,  0.00000000e+00],
       [-1.11022302e-16,  1.00000000e+00]])

In [91]:
np.linalg.eig(A)

EigResult(eigenvalues=array([-0.38516481, 10.38516481]), eigenvectors=array([[-0.85889508, -0.43055332],
       [ 0.51215158, -0.90256514]]))