Bio4j abstract model and general entry point to the project
Switch branches/tags
v0.12.0 v0.12.0-RC3 v0.12.0-RC2 v0.12.0-RC1 v0.12.0-M1 v0.11.0 v0.10.0 v0.8.0 v0.7.0 archive/update/deps/bioinfo-utils archive/update/build/163 archive/update/build/stuff/137 archive/update/angulillos/124 archive/update/angulillos/0.4.1 archive/update/angulillos/version archive/update/UniProt/GO/graph/135 archive/uniprot/go/crossref/154 archive/uniprot/enzyme/annotations/157 archive/type/order/vertices/first/148 archive/review/uniprot/133 archive/review/taxonomy/159 archive/review/ncbi/taxonomy/115 archive/review/go/116 archive/review/enzyme/160 archive/review/enzyme/132 archive/review/data/import/104 archive/reorg/old-releases-tags archive/rename/accession/protein/142 archive/remove/isCanonical/145 archive/remove/import/code/131 archive/release/0.12.0-RC2 archive/release/0.12.0-RC1 archive/release/0.12.0-M1 archive/release-0.8 archive/release-0.7 archive/refactor/vertex/edge/classes archive/refactor/uniref/150 archive/refactor/one/package/134 archive/raw/types/strike/again archive/pullapprove/setup/130 archive/pr/165 archive/org/move-wiki-to-docs archive/org/move-model archive/org/gitter-badge archive/org/codacy-setup archive/no/null/keywords/167 archive/new/model/go archive/meeting/reorg/followup archive/meeting/clarify/releases/current/state archive/gsoc/bio4j/org/ideas/visibility archive/go/haspart/155 archive/generate/docs/151 archive/feature/uniref/add/edge/property archive/feature/uniprot/taxonomy/152 archive/feature/uniprot/batch/import archive/feature/protein/sequence/length/144 archive/feature/new/typed/graphs/version archive/feature/model/uniprot/features archive/feature/migrate/angulillos/stream archive/feature/import/uniprot/gene/names archive/feature/generic/code/crossrefs/129 archive/feature/gene/names archive/feature/gene/locations/149 archive/feature/deps/with/methods archive/feature/add/optional/arities archive/experiment/scala/model archive/enzyme/add/classes/156 archive/enhancement/xml/jdom2/123 archive/enhancement/review/protein/112 archive/enhancement/review/gene/119 archive/enhancement/remove/redundant/methods/126 archive/enhancement/remove/graph/dependencies/125 archive/enhancement/project/docs archive/enhancement/no/unneded/raws archive/enhancement/import/methods/105 archive/enhancement/functional/style/test/110 archive/docs/review/enzymedb archive/docs/resources/logos archive/docs/javadocs/generation archive/docs/geneproduct/146 archive/docs/fix/import-data-links archive/docs/basic/module/documentation archive/docs/UniProt/136 archive/deprecate/uniprot/taxonomy/121 archive/deprecate/uniprot/keywords/128 archive/build/java/138 archive/bug/uniprot/name/methods archive/bug/uniprot/keyword/types/162 archive/bug/taxonomy/scientific-name archive/bug/remove/gi/107 archive/bug/property/type/init archive/bug/names/uniref archive/bug/names/uniprot archive/bug/names/interpro archive/bug/names/enzymedb archive/add/keyword/index/147 archive/add/gene/names/140 archive/add/description/annotations/141 archive/add/comments/uniprot/139 archive/add/accession/proteins/143
Nothing to show
Clone or download

README.md

Bio4j bioinformatics graph data platform

Bio4j is a bioinformatics graph data platform, integrating most data available in Uniprot KB (SwissProt + Trembl), Gene Ontology (GO), UniRef (50,90,100), NCBI Taxonomy, and Expasy Enzyme DB.

Bio4j provides a completely new and powerful framework for protein related information querying and management. The use of a graph-based data model makes possible to store and query data in a way that semantically represents its own structure. On the contrary, traditional relational models and databases must flatten the data they represent into tables, creating artificial ids in order to connect the different tuples; which can in some cases eventually lead to domain models that have almost nothing to do with the actual structure of data.

Project structure and overview

Bio4j can look a bit intimidating at first, with all those repositories with kind of similar names; here you have a guided tour around:

bio4j/bio4j

In this repository bio4j/bio4j you will find the generic Bio4j model and API. Entities, relationships and their properties are modeled using a typed property graph model. For example, there are vertex types for Protein or GoTerm, and a GoAnnotation edge type going from Protein to GoTerm. This graph schema is separated into different graphs, corresponding to the different data sources (UniProt, Go, UniRef, ...) and connections between them (UniProtGo, UniProtUniRef, ...).

The API, based on bio4j/angulillos, lets you write generic typed traversals over this graph schema:

protein.uniref50Member_outV()
  .map(
    UniRef50Cluster::uniRef50Member_inV
  )
  .map(
    prts -> prts.map(
      Protein::goAnnotation_outV
    )
  );

which can later be executed on a particular backend. Generic data import code is also here, which can be used to load the data using any implementation of angulillos.

bio4j/angulillos

You can think of bio4j/angulillos as a strongly typed version of the property graph model. You can describe graph schemas and write generic traversals over them which are guranteed to be well-typed in that for example

  • you cannot retrieve the outgoing edges of and edge
  • and you can get the tweets that a user tweeted, but not the users that a tweet follows!

bio4j/bio4j-titan

In bio4j/bio4j-titan you will find a Titan-based Bio4j distribution. This is the the default standard distribution, and we also provide through AWS S3 the database binaries with all data already loaded. Go there if you want to stop reading and use Bio4j now!

bio4j/angulillos-titan

bio4j/angulillos-titan is an implementation of the angulillos API using Titan.

Documentation

Community and contact

Licensing

Bio4j is an open source platform released under the AGPLv3 license.