Skip to content

AEBilgrau/correlateR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Coverage status Maintained

correlateR

General purpose correlation and covariance estimation

The R-package correlateR is planned to be a comprehensive resource of functions related to correlations and covariances. It features fast, robust, and efficient (as well as inefficient) marginal, partial, semi-partial correlations and covariances of arbitrary conditional order. A good discussion and explanation of marginal (unconditioned), partial, and semi-partial (or, part) correlations can be found here. Another good resource is found here.

The package is designed to perform well in both high and low dimensional cases as well as both on dense and sparse matrices.

The package is not being actively developed as of 2019. I might pick it up again in the future if there is a need.

Installation

If you wish to install the latest version of correlateR directly from the master branch here at GitHub, run

#install.packages("devtools")  # Uncomment if devtools is not installed
devtools::install_github("AEBilgrau/correlateR")

The package is still under heavy development and should be considered unstable. Be sure that you have the package development prerequisites if you wish to install the package from the source.

NOTE The interface and function names may still see significant changes and modifications!

Features

Currently, the packages is planned to feature:

  • cor/cov Marginal (unconditional) correlation/covariance. These basic functions can be prefixed to yield other correlation/covariance estimates. This covariance is also known as the auto-correlation, the variance-covariance, or simply the variance (in the generalized sense).
    • p-prefix: partial (arbitrary order) correlation and covariance.
    • x-prefix: cross correlation and covariance.
    • P-prefix: Part (semi-partial) correlation and covariances
    • s-prefix: sparse shrinkage estimation methods
    • r-prefix: robust estimation methods. E.g. Minimum covariance determinant, Robust midweight correlation, etc
    • S-prefix: Shrinkage estimation. (Or, d for dense shrinkage?)
  • Interface using formulas ~.
  • Conversion between cov and cor and pcor functions.
    • cov2cor cor2cov cor2pcor pcor2cor
  • Conditional and unconditional independence test
    • cor.text pcor.test xcor.test pxcor.test
    • Also with cross, sparse, shrinked, robust, etc., versions
  • Canonical correlation analysis (CCA)
    • Also with cross, sparse, shrinked, robust, etc., versions
  • pre (alternative to cov) direct estimation of the precision matrix or concentration matrix.
    • Also with cross, sparse, robust, etc., versions
  • ... and more! (??)

Hence the following core-functons are available:

  • xcor Cross-correlation
  • xcov Cross-covariance
  • pcor Partial correlation (arbitrary order)
  • pcov Partial covariance (arbitrary order)
  • pxcor Partial cross-correlation (arbitrary order)
  • pxcov Partial cross-covariance (arbitrary order)
  • scor Sparse correlation
  • scov Sparse covariance
  • sxcor Sparse cross-correlation
  • sxcov Sparse cross-covariance
  • spcor Sparse partial correlation (arbitrary order)
  • spcov Sparse partial covariance (arbitrary order)
  • spxcor Sparse partial cross-correlation (arbitrary order)
  • spxcov Sparse partial cross-covariance (arbitrary order)

Naming conventions and interface

To easily navigate the package some naming conventions has been decided upon.

Lower-case x, y, z always denotes numeric vectors while the upper-case counterparts X, Y, or Z denote a numeric matrix where observations correspond to rows and variables/feature to columns. The Z and z always express the variables conditioned on. Furthermore, S is used to denote the empirical (marginal) covariance matrix.

Function names are in camelCase except for some special cases. Otherwise cor is for correlation cov is for covariance. These are prefixed with x or p (or both) to denote cross or partial correlations/covariance respectively. For example, pcor is the partial correlation and pxcov is the partial cross covariance.

Alternative packages

There are some alternative packages on CRAN form which some inspiration have been drawn.

  • corpcor: Only features estimation of the full partial correlations.
  • ppcor: Partial and semi-partial correlations