Relational Machine Learning Library (RMLLib)

The Relational Machine Learning Library (rmllib) is aimed at providing scalable relational machine learning solutions in python.

Features

Collective inference for relational inference
Semi-supervised learning utilizing esimates of labels for previous rounds
Scalable solutions for single-box machines
Additional implementations of state-of-the-art generative graph models for synthetic experimentation

Getting started

RMLLib uses APIs inspired by sklearn and relies heavily on numpy, scipy and pandas for data wrangling and optimizations, but generally these are not compatible learners for RMLLib. This is largely due to the interconnectedness between labeled and unlabeled data. The RMLLib dataformat largely hides this problem from the user by providing / using masking functions in the dataset to ensure the training labels remain unobserved during training.

For a simple example of building data and running methods, please see the provided notebook.

Learning and Inference

The crux of RMLLib focuses on a relational dependency network representation, where a set of conditional distributions (e.g, Relational Naive Bayes) of a label given its neighbors is laced together via a collective inference algorithms (e.g., Variational Inference). On top of this, RMLLib provides semi-supervised learning and inference methods that perform well in sparsely labeled data scenarios.

For the optimization step, RMLLib follows RDNs by maximizing the pseudolikelihood, allowing for faster optimization of the parameter space. For collective inference, RMLLib diverges slightly from most implementations as it performs this largely through a single (potentially sparse) matrix multiply, rather than each instance updated once. This allows for considerably faster implementations of inference than previously reported as it can use existing BLAS (or alternative) implementations.

RMLLib also aims to provide alternative learning/inference algorithms to RDNs, although this is todo.

Data Format

RMLLib is intended to run from the ground up on large, potentially multi-class datasets. To facilitate this, the generic dataset class that wraps four basic datastructures:

labels: a pandas DataFrame with rows indicating sample labels and columns as a multiindex with level 0 being the "Y" label and class values being level 1
features: either a pandas DataFrame or SparseDataFrame, with feature values being level=0 feature name and feature values being level=1. Categorical features are assumed to have a one-hot-encoding representation allowing for simple slicing and sparse matrix multiplication (see Boston Medians for a simple example).
edges: either dense or sparse matrix containing the weight values between nodes.

In addition, the dataset module provides helpers such as masks for defining a training/test split, and helpers for creating training sets that obscure unlabeled parts of the graph.

Installation

Currently, installation is only from source, i.e.:

git clone https://github.com/jpfeiffe/rmllib
cd rmllib
pip install rmllib

Blame

Currently the project is maintained by me, Joel Pfeiffer. I'm always looking for help with new methods.

If you find the library useful for your work, please consider citing:

@misc{rmllib,
title = {Relational Machine Learning Library (RMLLib)},
author = {Joseph J. {Pfeiffer III}},
howpublished = {\url{https://github.com/jpfeiffe/rmllib}},
note = {Accessed: 2010-09-30}
}

Additionally, please ensure to cite relevant articles for the corresponding methods, algorithms and/or datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
docs/notebooks		docs/notebooks
rmllib		rmllib
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/notebooks

docs/notebooks

rmllib

rmllib

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Relational Machine Learning Library (RMLLib)

Features

Getting started

Learning and Inference

Data Format

Installation

Blame

About

Releases

Packages

Contributors 2

Languages

License

jpfeiffe/rmllib

Folders and files

Latest commit

History

Repository files navigation

Relational Machine Learning Library (RMLLib)

Features

Getting started

Learning and Inference

Data Format

Installation

Blame

About

Topics

Resources

License

Stars

Watchers

Forks

Languages