<a href="https://colab.research.google.com/github/aeyage/intraday_prices/blob/main/intraday_prices.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Import and setup

In addition to cuDF, we use CuPy which is a GPU-accelerated array library.

In [2]:
import time
import numpy as np
import pandas as pd
import cudf
import cupy as cp

##Load and preprocess price data using pandas

First, we define a function to load our price data from a CSV file using pandas. The function will read the data, set the date as the index, and ensure all date indices are in the correct datetime format.

In [3]:
def get_prices_as_pandas(prices_file):
  d = pd.read_csv(prices_file)
  d.set_index("date_time", inplace=True)
  d.index = pd.to_datetime(d.index)

  return d.bfill().ffill()

Next, we define a similar function to load the price data using cuDF, a GPU-accelerated library similar to pandas. This will allow us to perform computations on the GPU.

In [4]:
def get_prices_as_cudf(prices_file):
    c = cudf.read_csv(prices_file)
    c.set_index("date_time", inplace=True)
    c.index = pd.to_datetime(c.index)

    return c.bfill().ffill()

Note a few things. First, in the second function we use cudf.read_csv. Second, in both cases, we are backfilling and forward filling data. This means we copy prices into NaN values.

In practice, we’d be more careful about how we avoid sparse matrixes.

##Compute optimal asset weights using pandas on the CPU

We will now compute the optimal asset weights using the classical Markowitz mean-variance optimisation method with pandas. This involves reading the price data, calculating returns, and deriving the portfolio weights that minimise risk.

In [5]:
print(f"=== Pandas (CPU) Computation ===")

start_cpu = time.time()

df_pd = get_prices_as_pandas("intraday_prices.csv")
n_assets = len(df_pd.columns)

df_returns_cpu = df_pd.pct_change().dropna()
mean_returns_cpu = df_returns_cpu.mean()
cov_matrix_cpu = df_returns_cpu.cov()

inv_cov_cpu = np.linalg.inv(cov_matrix_cpu.values)
ones_cpu = np.ones((n_assets, 1))
w_cpu = inv_cov_cpu.dot(ones_cpu)
w_cpu = w_cpu / (ones_cpu.T.dot(w_cpu))

end_cpu = time.time()
cpu_elapsed = end_cpu - start_cpu

print(f"CPU elapsed time: {cpu_elapsed} seconds")
print(f"optimall weights (first 5):\n{w_cpu[:5].flatten()}")

=== Pandas (CPU) Computation ===
CPU elapsed time: 11.40806794166565 seconds
optimall weights (first 5):
[-0.00056054 -0.00106743 -0.0014569  -0.00147924 -0.00150658]


In the trailing step, we read the price data and calculating daily asset returns as percentage changes. After computing the mean returns and covariance matrix, we use these to find the optimal portfolio weights that minimise variance.

The weights are computed using a closed-form solution involving the inverse of the covariance matrix. The elapsed time for these calculations is printed, along with the first few optimal weights.

##Perform the same computations using cuDF and cuPY on the GPU

In [None]:
print(f"=== cuDF (GPU) Computation ===")

start_gpu = time.time()

df_cudf = get_prices_as_cudf("intraday_prices.csv")
n_assets = len(df_cudf.columns)

df_returns_gpu = df_cudf.pct_change().dropna()
mean_returns_gpu = df_returns_gpu.mean()
cov_matrix_gpu = df_returns_gpu.cov()

inv_cov_gpu = cp.linalg.inv(cov_matrix_gpu.values)
ones_gpu = cp.ones((n_assets, 1))
w_gpu = cp.matmul(inv_cov_gpu, ones_gpu)