POME: Learning partially observed mixed-type data embeddings

POME is a graph-based representation-learning method for heterogeneous datasets that incorporates missingness structures into its low-dimensional embeddings. It is applicable to any tabular datasets consisting of both numeric- and categorical-type features.

Installation

POME is implemented as a Python package and is easily installable from this repository by running

pip install -e .

Input format

POME expects input data to be given in the form of a pandas dataframe object, with rows representing variables/features and columns representing samples. Missing data needs to be encoded by a unique numerical value. Furthermore, POME expects one column storing datatypes of the respective variables. An example dataset could have the following structure, with value -99 encoding missing data:

	Sample1	Sample2	Sample3	Type
VariableA	0	1	-99	cat
VariableB	3.14	-0.1	2.5	numerical
VariableC	0.3	1.2	-99	numerical
VariableD	1	0	2	cat

Minimal working example

POME's core functionality is integrated into its Embedder class, which handles input transformation, training and output generation. In order to provide the user with an optimal, dataset-specific choice of parameters, we implemented an automated architecture search, whose output can then be passed to the Embedder class. A typical such workflow looks as follows:

from pome import Embedder, run_architecture_search
import pandas as pd

if __name__ == "__main__":
    # Load data and set parameters.
    example_df = pd.read_csv("data/example.csv", index_col=0)
    NA_ENCODING = -99.0
    device = "cpu"
    # Run archictecture search for optimal embedding dimension.
    opt_params = run_architecture_search(example_df, non_informative_na=NA_ENCODING)
    input_params = {'non_informative_na' : NA_ENCODING, "device" : device}
    embedder_params = opt_params | input_params
    # Run final embedding with optimal architecture and parameters.
    embedder = Embedder(**embedder_params, epochs=1000)
    embedder.fit(example_df)
    # Output stores low-dimensional embeddings for samples, variables, and bins of discretized numeric variables.
    sample_df, variable_df, bin_df, _ = embedder.get_embeddings()
    print("Computed sample embeddings: ", sample_df)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
src/pome		src/pome
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POME: Learning partially observed mixed-type data embeddings

Installation

Input format

Minimal working example

About

Uh oh!

Releases

Packages

Languages

License

bionetslab/POME

Folders and files

Latest commit

History

Repository files navigation

POME: Learning partially observed mixed-type data embeddings

Installation

Input format

Minimal working example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages