kerndisc
is a library for automated kernel structure discovery in univariate data. It aims to find the best composition of kernels in order to represent a time series. kerndisc
currently possesses a test coverage of over 90 %. Still, there is no claim to correctness, contribution and correction is heavily desired.
It is thought to be useful to either:
-
Be used repeatedly on different time series of the same variable to discover some recurring, stable structure
-
Find a composited kernel that best describes a single time series for some variable
Search and description of kernels is heavily inspired by the PhD thesis of David Duvenaud et al., the Automated Statistician project and Lloyd et al..
In the future it is planned to bring down evaluation cost to O(n^2)
, by employing upper, lower bound estimation as introduced by Kim et al..
Currently, this library (development) is in idle mode, however, this is expected to change if there is any interest from the community in this.
kerndisc
can be installed in the following way on mac:
> git clone https://github.com/BracketJohn/kernDisc
> cd kernDisc
> brew install pipenv
> pipenv install
brew
can be substituted for other package maangers on non-mac systems. Afterwards you can spawn an interactive kerndisc
session executing:
> pipenv shell
> cd src
> python
This will create a new virtual environment and enter it. From there one can start to develop. Although, I would recommend ipython
or some similar, enhanced, development environment instead.
An usage example can be found below.
kerndisc
can be used in the following way:
> import numpy as np
> from kerndisc import discover
> X, Y = np.array([0, 1, 2, 3]), np.array([-1, 1, -1, 1])
> discover(X, Y)
...
Depth `2`: Empty search space, no new asts found.
{'periodic': {'ast': Node("/<class 'gpflow.kernels.Periodic'>", full_name='Periodic'),
'depth': 0,
'params': {'GPR/kern/variance': array(1.00037322),
'GPR/kern/lengthscales': array(0.09897968),
'GPR/kern/period': array(0.66666667),
'GPR/likelihood/variance': array(1.00000004e-06)},
'score': -11.34804081379194},
'highscore_progression': [inf, -11.34804081379194, -11.34804081379194],
'termination_reason': 'Depth `2`: Empty search space, no new asts found.'}
For scoring the following metrics are available:
- Negative log likelihood (
negative_log_likelihood
), - bayesian information criterion (BIC,
bayesian_information_criterion
), - BIC modified to not take "irrelevant" parameters into account (Duvenaud et al.,
bayesian_information_criterion_duvenaud
).
BIC is default, a metric can be selected by setting the environment variable METRIC
. This can also be used to define custom metrics.
To populate the search space, i.e., the possible combinations of kernels that are explored, kerndisc
uses a grammar from kerndisc.expansion.grammars
.
It is also possible to define your own grammar for discovery and search space population.
A new metric can be implemented in the kerndisc.evaluation.scoring._metrics
module, afterwards it can be imported and added to the _METRICS
dictionary in the packages __init__
. Then it can be selected for training by setting the environment variable METRIC
to its name.
All metrics MUST be minimization problems, i.e., be better when lower.
To define a new grammar, please create a new module in kerndisc.expansion.grammars
called _grammar_*.py
. This new module MUST offer:
expand_kernel
: A method that takes a single gpflow kernel and applies desired alterations to it.IMPLEMENTED_BASE_KERNEL_NAMES
: A globalList[str]
, which contains onlyBASE_KERNELS.keys()
from_kernels.py
. The base kernels in this list represent all kernels implemented by the respective grammar.
Once your custom grammar is created, you can select it by adding it to the _GRAMMARS
dictionary in kerndisc.expansion.grammars.__init__.py
and then setting the environment variable GRAMMAR
to your grammars name.
See:
kerndisc.expansion.grammars.__init__.py
for general concept and description,kerndisc.expansion.grammars._grammar_duvenaud.py
for an example of a grammar.
pipenv
is used for development. Please install it via pip
if necessary. Usage:
> git clone https://github.com/BracketJohn/kernDisc
> cd kernDisc
> pipenv install --dev
> pipenv shell
This will install all necessary packages, create a new virtual environment and enter it. From there one can start to develop and test.
Tests can be executed by running the following:
> pytest
Depending on your environment, it might be necessary to do this in a pipenv shell
.