Skip to content

Covariance regression software project based on Hoff and Niu (2012).

Notifications You must be signed in to change notification settings

Cole-vJ/CovRegpy

Repository files navigation

Python CovRegpy Package

Regularised Covariance regression software project based on Hoff and Niu (2012). This package was developed out of research performed by Cole van Jaarsveldt, Prof. Gareth W. Peters, Dr Matthew Ames, and Prof. Mike Chantler. This package was built entirely using Python 3.11.5 - Python guarantees backwards compatibility which should ensure that this software package functions as expected on all future Python versions.

Disclaimer

Our team acknowledges the financial contributions and support of our benefactors in the production of this research. The authors, owners, and benefactors of this research reserve the right to request compensation from commercial entities seeking to use our software. All non-commercial usage of this software package is allowed with the associated permissions, limitations, and conditions as outlined in the Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). For any commercial usage of this software, the authors, owners, and benefactors of this research respectfully request that the commercial entity in question consult the authors, owners, and benefactors.

Linux Installation

Create new environment (if needed) with specific version of Python. Python is backwards compatible such that packages working on Python version 3.11 will work on Python version 3.12 et cetera, but the reverse is not true. Python version 3.11 is recommended for the best user experience. A Docker version of the code will be made available in the subsequent major release.

conda create -n environment_name python=3.11.5

Proceed.

proceed ([y]/n)?

Activate new environment.

source activate environment_name

Install specific package versions from attached requirements file. If this is not done first, when pip installing 'CovRegpy', one can not guarantee that all of the required packages downloaded will be those exact versions specified in 'requirements.txt'.

pip install -r /path/to/requirements/file/requirements.txt

Install 'CovRegpy' package.

pip install CovRegpy

Windows Installation

Create new environment (if needed) with specific version of Python.

virtualenv --python="/usr/bin/python3.11.5" environment_name

Activate new environment. Navigate to environment location or adjust input accordingly.

C:\path\to\environments> environment_name\Scripts\activate

or

C:> path\to\environments\environment_name\Scripts\activate

Common Error

Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")

https://stackoverflow.com/questions/54135206/requests-caused-by-sslerrorcant-connect-to-https-url-because-the-ssl-module

Solution

Windows 10 | Miniconda 3 | Python 3.11.5

Copy the following files:

  • libcrypto-1_1-x64.dll
  • libcrypto-1_1-x64.pdb
  • libssl-1_1-x64.dll
  • libssl-1_1-x64.pdb

from:

C:\Users\MyUser\Miniconda3\Library\bin

to:

C:\Users\MyUser\Miniconda3\DLLs

https://stackoverflow.com/questions/45954528/pip-is-configured-with-locations-that-require-tls-ssl-however-the-ssl-module-in/62264725#62264725

conda/conda#8273

Install specific package versions from attached requirements file. If this is not done first, when pip installing 'CovRegpy', one can not guarantee that all of the required packages downloaded will be those exact versions specified in 'requirements.txt'.

pip install -r path\to\requirements\file\requirements.txt

Install 'CovRegpy' package.

pip install CovRegpy

NEW USERS START HERE

Package Folders Content

  • 'aas_examples' - contains scripts for case studies in associated paper and supplement

    • Funded_&_Unfunded_UK_Pensions_example.py
    • Peter_Hoff_example_and_extended_example.py
  • 'aas_figures' - contains figures in associated paper and supplement

  • 'examples' - contains all case studies - explore for simple to complex usage examples

    • CovRegpy_PCA_example_2.py
  • 'FTSE_100_Data' - contains FTSE100 data used in Funded_&_Unfunded_UK_Pensions_example.py within 'aas_examples'

  • 'Jupyter_Notebooks' - contains all Jupyter Notebooks - includes link above for new users and others

  • 'Peter_Hoff_Data' - contains original data set from Hoff and Niu (2012) (see full citation below) used in Peter_Hoff_example_and_extended_example.py within 'aas_examples'

  • 'README_Images' - contains images for this README file as seen below

  • 'S&P500_Data' - contains S&P500 data used in CovRegpy_PCA_example_2.py within 'examples'

Replicate Output from Hoff and Niu (2012)

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from CovRegpy.CovRegpy_RCR import cubic_b_spline, cov_reg_given_mean

plt.rcParams['figure.figsize'] = [10, 8]  # resize figures
sns.set(style='darkgrid')  # style of plot (dark grey background, etc)

# load raw data
raw_data = pd.read_csv('../Peter_Hoff_Data/peter_hoff_data', header=0)
raw_data = np.asarray(raw_data)

# prepare data
peter_hoff_data = np.zeros((654, 3))

for row in range(654):

    if row < 309:
        peter_hoff_data[row, 0] = int(raw_data[row, 0][2])
    else:
        peter_hoff_data[row, 0] = int(raw_data[row, 0][1:3])

    if peter_hoff_data[row, 0] == 3:  # original paper groups those aged 3 into age 4
        peter_hoff_data[row, 0] = 4
    elif peter_hoff_data[row, 0] == 19:  # original paper groups those aged 19 into age 18
        peter_hoff_data[row, 0] = 18
    peter_hoff_data[row, 1] = float(raw_data[row, 0][4:10])  # fev values always 6 text values
    peter_hoff_data[row, 2] = float(raw_data[row, 0][11:15])  # height values always 4 text values

peter_hoff_data = pd.DataFrame(peter_hoff_data, columns=['age', 'fev', 'height'])

# knots and time used in original paper
spline_basis = cubic_b_spline(knots=np.linspace(-17, 39, 9), time=np.linspace(4, 18, 15))
spline_basis = np.vstack((spline_basis, np.linspace(4, 18, 15)))

age_vector = np.asarray(peter_hoff_data['age'])
spline_basis_transform = np.zeros((6, 654))
for col in range(len(age_vector)):
    spline_basis_transform[:, col] = spline_basis[:, int(age_vector[col] - 4)]

coef_fev = np.linalg.lstsq(spline_basis_transform.transpose(), np.asarray(peter_hoff_data['fev']), rcond=None)
coef_fev = coef_fev[0]
mean_fev = np.matmul(coef_fev, spline_basis)

coef_height = np.linalg.lstsq(spline_basis_transform.transpose(), np.asarray(peter_hoff_data['height']), rcond=None)
coef_height = coef_height[0]
mean_height = np.matmul(coef_height, spline_basis)

x_cov = np.vstack((np.ones((1, 654)), (age_vector ** (1 / 2)).reshape(1, 654), age_vector.reshape(1, 654)))
y = np.vstack((np.asarray(peter_hoff_data['fev']).reshape(1, 654),
               np.asarray(peter_hoff_data['height']).reshape(1, 654)))
# mean = np.vstack((np.matmul(coef_fev, spline_basis_transform), np.matmul(coef_height, spline_basis_transform)))
A_est = np.hstack((coef_fev.reshape(6, 1), coef_height.reshape(6, 1)))
B_est, Psi_est = cov_reg_given_mean(A_est=A_est, basis=spline_basis_transform, x=x_cov, y=y, iterations=100)

mod_x_cov = np.vstack((np.ones((1, 15)),
                       (np.linspace(4, 18, 15) ** (1 / 2)).reshape(1, 15),
                       np.linspace(4, 18, 15).reshape(1, 15)))

# mean and covariance plots

cov_3d = np.zeros((2, 2, 15))
for depth in range(np.shape(cov_3d)[2]):
    cov_3d[:, :, depth] = Psi_est + np.matmul(np.matmul(B_est.T, mod_x_cov[:, depth]).reshape(2, -1),
                                              np.matmul(mod_x_cov[:, depth].T, B_est).reshape(-1, 2))

fig, axs = plt.subplots(1, 2, figsize=(8, 5))
fig.suptitle('Rank 1 Figure 5 in Hoff and Niu (2012)')
axs[0].scatter(peter_hoff_data['age'], peter_hoff_data['fev'], facecolor='none', edgecolor='black')
axs[0].plot(np.linspace(4, 18, 15), mean_fev, linewidth=3, c='k')
axs[0].plot(np.linspace(4, 18, 15), mean_fev + 2 * np.sqrt(cov_3d[0, 0, :]), c='grey')
axs[0].plot(np.linspace(4, 18, 15), mean_fev - 2 * np.sqrt(cov_3d[0, 0, :]), c='grey')
axs[0].set_xlabel('age')
axs[0].set_ylabel('FEV')
axs[0].set_xticks([4, 6, 8, 10, 12, 14, 16, 18])
axs[0].set_yticks([1, 2, 3, 4, 5, 6])
axs[1].scatter(peter_hoff_data['age'], peter_hoff_data['height'], facecolor='none', edgecolor='black')
axs[1].plot(np.linspace(4, 18, 15), mean_height, linewidth=3, c='k')
axs[1].plot(np.linspace(4, 18, 15), mean_height + 2 * np.sqrt(cov_3d[1, 1, :]), c='grey')
axs[1].plot(np.linspace(4, 18, 15), mean_height - 2 * np.sqrt(cov_3d[1, 1, :]), c='grey')
axs[1].set_xlabel('age')
axs[1].set_ylabel('height')
axs[1].set_xticks([4, 6, 8, 10, 12, 14, 16, 18])
axs[1].set_yticks([45, 50, 55, 60, 65, 70, 75])
plt.show()

fig, axs = plt.subplots(1, 3, figsize=(8, 5))
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.3, hspace=None)
fig.suptitle('Rank 1 Figure 6 in Hoff and Niu (2012)')
axs[0].plot(np.linspace(4, 18, 15), cov_3d[0, 0, :], c='grey')
fev_var = np.zeros_like(np.linspace(4, 18, 15))
for i, age in enumerate(range(4, 19)):
    fev_var[i] = np.var(np.asarray(peter_hoff_data['fev'])[np.asarray(peter_hoff_data['age']) == age])
axs[0].scatter(np.linspace(4, 18, 15), fev_var, facecolor='none', edgecolor='black')
axs[0].set_xlabel('age')
axs[0].set_ylabel('Var(FEV)')
axs[0].set_xticks([4, 6, 8, 10, 12, 14, 16, 18])
axs[0].set_yticks([0.2, 0.4, 0.6, 0.8, 1.0])
axs[1].plot(np.linspace(4, 18, 15), cov_3d[1, 1, :], c='grey')
height_var = np.zeros_like(np.linspace(4, 18, 15))
for i, age in enumerate(range(4, 19)):
    height_var[i] = np.var(np.asarray(peter_hoff_data['height'])[np.asarray(peter_hoff_data['age']) == age])
axs[1].scatter(np.linspace(4, 18, 15), height_var, facecolor='none', edgecolor='black')
axs[1].set_xlabel('age')
axs[1].set_ylabel('Var(height)')
axs[1].set_xticks([4, 6, 8, 10, 12, 14, 16, 18])
axs[1].set_yticks([4, 6, 8, 10, 12])
axs[2].plot(np.linspace(4, 18, 15), cov_3d[0, 1, :] / (np.sqrt(cov_3d[0, 0, :]) * np.sqrt(cov_3d[1, 1, :])), c='grey')
fev_height_cov = np.zeros_like(np.linspace(4, 18, 15))
for i, age in enumerate(range(4, 19)):
    fev_height_cov[i] = np.corrcoef(np.asarray(peter_hoff_data['fev'])[np.asarray(peter_hoff_data['age']) == age],
                                    np.asarray(peter_hoff_data['height'])[
                                        np.asarray(peter_hoff_data['age']) == age])[0, 1]
axs[2].scatter(np.linspace(4, 18, 15), fev_height_cov, facecolor='none', edgecolor='black')
axs[2].set_xlabel('age')
axs[2].set_ylabel('Cor(FEV,height)')
axs[2].set_xticks([4, 6, 8, 10, 12, 14, 16, 18])
axs[2].set_yticks([0.5, 0.6, 0.7, 0.8, 0.9])
plt.show()

Figures

Contributing

This project is by no means complete or exhaustive.

References

P. Hoff and X. Niu. 2012. A Covariance Regression Model. Statistica Sinica 22, 2 (2012), 729–753.

About

Covariance regression software project based on Hoff and Niu (2012).

Resources

Stars

Watchers

Forks

Packages

No packages published