-
Notifications
You must be signed in to change notification settings - Fork 1
cloud based query benchmarking suite
- difficulty medium
- technologies scala, java, titan, neo4j, blueprints, gremlin, cypher, aws, biology, datavis
Bio4j has one of the most richly-typed graph-based data models out there, and it makes for a perfect fit in terms of testing and comparing the performance of different engines and technologies for real-world biologically-meaningful queries/traversals. Here we will design and implement a set of queries taking into account the specifics of each engine (at least Titan and Neo4j, using Blueprints, Gremlin, Cypher, etc). Then an automated AWS-based testing system will be developed building on top of the already existent AWS deployment infrastructure, together with graphical output for the results.
An automated performance testing system based on the execution of a set of biologically-meaningful traversals/queries on bio4j instances, displaying its output in a graphical way.
- @pablopareja (mailto:ppareja@ohnosequences.com)
- @eparejatobes (mailto:eparejatobes@ohnosequences.com)
They both have 4+ years of experience with Neo4j, Blueprints and more recently Titan. @pablopareja in particular is a well-known active participant in the graph databases community.
- your own idea!
- DynamoDB backed bio4j prototype
- AWS based bio4j specific CI platform
- incorporate range based data into bio4j
- apply advanced Scala techniques to bio4j
- integrate sequence data into bio4j
- cloud based query benchmarking suite
- OrientDB based bio4j distribution
- graphical browser for bio4j model
- Cytoscape app/plugin for bio4j
- graphml/gexf exporter
- Bio4j Gephi toolkit