Skip to content

aalfons/CGGM-replication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CGGM-replication

This repository contains a collection of R scripts to reproduce all examples, simulations, and figures of Touw, Alfons, Groenen & Wilms (2025).

Below, we give a detailed explanation on which results from the paper are reproduced by which R script(s). Furthermore, we report the running time of each script when we conducted the analyses. We used the following two machines:

  • A desktop PC with an Intel i9 10-core CPU running Ubuntu 24.04 LTS.
  • A laptop with an Apple M3 8-core CPU (4 performance cores and 4 efficiency cores) running macOS Sequoia 15.5.

Both machines were using R version 4.5.1 and our package clusterGGM version 0.1.0. Rather than installing the latest version of our package from CRAN, we recommend installing this specific version with the following commands:

install.packages("remotes")
remotes::install_github("aalfons/clusterGGM", ref = "v0.1.0")

If you already have package remotes installed, you can skip the first line.

Illustrative figures

Figure 1 in Section 2 was drawn with the online diagram editor draw.io such that a replication script is not applicable.

Folder illustration contains two scripts that produce the other illustrative figures from the paper:

  • simulation_designs.R produces Figure 2 in Section 3.
  • covariance_precision.R produces Figure 5 in Section 4.

Simulations

Folder simulations contains all scripts and output from our extensive simulations. Results are always saved in folder simulations/results and figures are always saved in simulations/figures.

Most scripts were run in parallel, with each script running on a single CPU core. However, some scripts use parallel computing themselves via package parallel.

Baseline simulation designs

The following scripts produce the results for the four baseline simulation designs:

  • simulations_WB2022_random.R: running time was 7h17min using a single core on the desktop PC.
  • simulations_WB2022_chain.R: running time was 7h18min using a single core on the desktop PC.
  • simulations_WB2022_unbalanced.R: running time was 7h19min using a single core on the desktop PC.
  • simulations_WB2022_unstructured.R: running time was 4hY24min using a single core on the desktop PC.

The script figure_WB2022_baseline.R then reads in the results and produces Figure 3 in Section 3.

Increasing the number of variables or clusters

Together with simulations_WB2022_chain.R from above, the following scripts produce the results for the chain design with varying number of variables or clusters:

  • simulations_WB2022_chain_p=30_K=3.R: running time was 1d7h40min using a single core on the desktop PC.
  • simulations_WB2022_chain_p=30_K=5.R: running time was 1d9h48min using a single core on the desktop PC.
  • simulations_WB2022_chain_p=30_K=6.R: running time was 1d9h40min using a single core on the desktop PC.
  • simulations_WB2022_chain_p=30_K=10.R: running time was 1d9h9min using a single core on the desktop PC.
  • simulations_WB2022_chain_p=60.R: running time was 1d17h26min using five cores on the desktop PC.
  • simulations_WB2022_chain_p=120.R: running time was 7d0h47min using ten cores on the desktop PC.

The following scripts then read in the relevant results and produce the following figures:

  • figure_WB2022_variables.R produces Figure 1 in online Appendix C.
  • figure_WB2022_clusters.R produces Figure 2 in online Appendix C.

Approximate block structure

The following scripts produce the results for the modification of the baseline simulation designs in which the block structure is not exact but only approximate:

  • simulations_approximate_random.R: running time was 7h30min using a single core on the desktop PC.
  • simulations_approximate_chain.R: running time was 7h35min using a single core on the desktop PC.
  • simulations_approximate_unbalanced.R: running time was 7h37min using a single core on the desktop PC.
  • simulations_approximate_unstructured.R: running time was 4h25min using a single core on the desktop PC.

The script figure_approximate.R then reads in the results and produces Figure 3 in online Appendix C.

Clustering structure on diagonal / Blockdiagonal structure

The following scripts produce the results for the two designs in which the relevant structure is on the diagonal of the precision matrix, as well as the two designs with a noisy blockdiagonal structure:

  • simulations_diagonal_balanced.R: running time was 9h26min using a single core on the desktop PC.
  • simulations_diagonal_unbalanced.R: running time was 10h02min using a single core on the desktop PC.
  • simulations_blockdiagonal_balanced.R: running time was 10h7min using a single core on the desktop PC.
  • simulations_blockdiagonal_unbalanced.R: running time was 10h21min using a single core on the desktop PC.

The script figure_diagonal_blockdiagonal.R then reads in the results and produces Figure 4 in Section 3.

Computation time

The script simulations_computation_time.R measures the computation time of the compared methods on simulated data sets. Running time of this script was 3h25min using a single core on the laptop.

The script figure_computation_time.R then reads in the results and produces Figure 4 in online Appendix C.

Estimation of a clustered covariance matrix

The following scripts produce the results for the two designs in which the structure of interest is on the covariance matrix:

  • simulations_Sigma_exact.R: running time was 17h36min using a single core on the desktop PC.
  • simulations_Sigma_approximate.R: running time was 19h44min using a single core on the desktop PC.

The script figure_Sigma.R then reads in the results and produces Figure 6 in Section 4.

Applications

Folder applications contains all scripts and output from our empirical applications. Some scripts use parallel computing via package parallel.

S&P 100 Stocks

The relevant files can be found in folder applications/finance:

  • The script data_preprocessing.R reads in the raw data in .csv format and stores the processed data in an .RData file. Both the raw data and the preprocessed data are stored in the subfolder data.
  • The script applications_finance.R produces the variable clustering results. It reads in the preprocessed data and stores the results in the subfolder output. It also produces Figures 5 and 6 in online Appendix D.1, which are stored in the subfolder figures. Running time was 14h15min using ten cores on the desktop PC.
  • The script applications_finance_oos.R produces the results on out-of-sample errors via double cross-validation. It reads in the preprocessed data and stores the results in the subfolder output. Running time was 6d17h using ten cores on the desktop PC.
  • The script plot_finance.R then reads in the results and produces Figure 7 in Section 5.1, which is stored in the subfolder figures.

OECD Well-Being Indicators

The relevant files can be found in folder applications/oecd:

  • The script data_preprocessing.R reads in the raw data in .ods format and stores the processed data in an .RData file. Both the raw data and the preprocessed data are stored in the subfolder data.
  • The script applications_oecd.R reads in the preprocessed data and stores the results in the subfolder output. Running time was 3 seconds using a single core on the laptop.
  • The script plot_oecd.R then reads in the results and produces Figure 8 in Section 5.2, which is stored in the subfolder figures.

Humor Styles Questionnaire

The relevant files can be found in folder applications/HSQ:

  • The script HSQ.R reads in the raw data in .csv format, preprocesses and analyzes the data. Results from Table 1 in Section 5.3 are printed on the R console. The script also stores the results in .RData format and produces Figures 7 and 8 in online Appendix D.3. Running time was 14 minutes using four cores on the laptop.

About

Reproduce all examples, simulations, and figures of Touw, Alfons, Groenen & Wilms (2025).

Resources

License

Stars

Watchers

Forks

Contributors

Languages