This package provides a database schema and Python wrapper for storing the embeddings generated through various representation learning packages.
Currently, this package focuses on using a SQL database with SQLAlchemy, but might be extended to use a NoSQL database as an alternative.
Install embeddingdb
from PyPI with:
$ pip install embeddingdb
Alternatively, install the latest development version of embeddingdb
directly
from GitHub with:
$ pip install git+https://github.com/cthoyt/embeddingdb
For developers, install embeddingdb
in development mode from GitHub with:
$ git clone https://github.com/cthoyt/embeddingdb.git
$ cd embeddingdb
$ pip install -e .
Set the environment variable EMBEDDINGDB_CONNECTION
to a valid
SQLAlchemy connection string for a PostgreSQL instance, as this package uses
the PostgreSQL-specific ARRAY
type.
This package installs an entrypoint embeddingdb
that can be used directly from
the shell.
Entities can be embedded and stored from various types of representation learning, including network representation learning, knowledge graph embedding, and textual learning.
Upload embeddings generated by word2vec
by specifying the file path with:
$ embeddingdb upload --fmt word2vec --path ~/path/to/file.txt
Upload embeddings generated by pykeen
by specifying the output directory
with:
$ embeddingdb upload --fmt keen --path ~/path/to/directory/
After uploading, the collections can be listed with:
$ embeddingdb ls
One of the motivations for building this repository was to make a convenient way to
compare the embeddings for entities generated through orthogonal embedding tecnhiques.
For example, we wanted to know to what extent the embeddings for proteins generated from
their sequences with ratvec
contained the same information as the embeddings generated
from protein-protein interaction networks with pykeen
or nrl
.
The two positional arguments correspond to the collection identifiers in the database.
$ embeddingdb analyze 1 2
After installing Docker, the entire web application can be instantiated with:
$ docker-compose up
Get the endpoint /test
to instantiate the database and add a test collection.