Skip to content
An R package for optimal partitioning algorithm(Google Summer of Code 2019).
C R
Branch: master
Clone or download
Latest commit 7e48ed6 Aug 26, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R removed unused vectors from output Aug 11, 2019
man fixed issues in documentation and updated manual Aug 16, 2019
src
tests updated test cases Aug 19, 2019
vignettes updated runtime vs penalty plot with a large dataset Aug 19, 2019
.RData modified C code Jun 3, 2019
.Rbuildignore Added code coverage Jun 8, 2019
.gitignore Added vignette section and modified R code May 21, 2019
.travis.yml modified travis.yml to include code coverage using coveralls Aug 14, 2019
DESCRIPTION updated Description Aug 19, 2019
NAMESPACE
NEWS updated NEWS Aug 19, 2019
Opart.Rproj Initial commit May 17, 2019
README.md
codecov.yml Added code coverage Jun 8, 2019

README.md

opart

Foo Coverage Status

This repository is for Google Summer of Code 2019 project Optimal partitioning algorithm for changepoint detection under organization R project for statistical computing.

Installation

The package can be installed from github as follows:

install.packages("devtools") #if not already installed

devtools::install_github("as4378/opart")

Alternatively, you can use remotes which is a lighter-weight package that has the install_github functionality that devtools provides, but not all the “development” parts of devtools.

install.packages("remotes") #if not already installed

remotes::install_github("as4378/opart")

This package requires R version (>= 3.5.0). Also, make sure that you have a stable version of Rtools installed.

Introduction

There are several applications where we need to work with ordered data (e.g. Time-series). This includes financial data, climate data, radio signals, genome data in bioinformatics etc. Often it is of interest to find patterns or structure in this data and to detect abrupt changes in structure to model the data effectively. There are a class of dynamic programming algorithms that can solve the optimal partitioning problem and find optimal segments given a statistical criteria.

A quadratic time algorithm described in https://link.springer.com/article/10.1007/s11222-016-9636-3 is one such algorithm that solves the optimal partitioning problem given square error loss function. Currently there are no R packages available that provide a reference implementation to this algorithm.

This project provides an efficient C reference implementation to the standard optimal partitioning algorithm in an R package (opart) using square error loss function and poisson loss. Its simplicity makes it easy to modify (which is not true of other algorithms such as FPOP), which will be useful when developing other changepoint models/algorithms.

Features

This package provides 2 functions:

(i) opart_gaussian: This function computes the optimal changepoint model for a vector of real-valued data and a non-negative real-valued penalty, given the square loss (to minimize) / gaussian likelihood (to maximize).

library(opart)
sample_data <- rnorm(100, mean=50)
opart::opart_gaussian(data=sample_data, penalty=1)

The execution returns a cost vector (cost.vec) which contains the optimal cost values for partitioning the data and end vector (end.vec) which contains the optimal segment ends.

Complete documentation including results with neuroblastoma dataset in package neuroblastoma and model and runtime comparison with other algorithms(fpop and cpt.mean) is described in detail in vignettes.

(ii) opart_poisson: This function computes the optimal changepoint model for a vector of count data and a non-negative real-valued penalty, given the poisson loss (to minimize) / log likelihood (to maximize).

library(opart)
sample_data <- rpois(100, 10.5)
opart::opart_poisson(data=sample_data, penalty=1)

The execution returns a cost vector (cost.vec) which contains the optimal cost values for partitioning the data and end vector (end.vec) which contains the optimal segment ends.

The vignettes describe in detail about poisson loss and model comparison with Segmentor3IsBack which provides several options for loss functions including poisson loss.

Documentation

All the vignettes can be accessed in R envirnment after package installation. Use build_opts argument as follows with install_github to install vignettes as by default manuals are not installed.

devtools::install_github("as4378/opart", build_opts = c("--no-resave-data"))

A quick reference to all vignettes in pdf format can be found here

Future Work

In future we would like to extend this package by supporting different loss functions and move this to CRAN so that it is available to wide community of R users. If possible, extend this further to provide a generic option for supporting user specified loss function as an input argument.

Acknowledgements

  • The Google Summer of Code team for funding and motivating open source development

  • My mentors Guillem Rigaill, Rebecca Killick and Toby Hocking

  • Members of R Project for Statistical Computing community

  • A special thanks to Toby Hocking for his support and insights throughout the entire GSoC. His experience working with R packages helped in efficient implementation of this package and making it a reality

Link to commits

All the commits made throughout the entire GSoC program can be accessed by this link

You can’t perform that action at this time.