Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Old Roadmap #35
Modular and universal bioinformatics
Bionode.io is a community that aims at building highly reusable tools and code for bioinformatics by leveraging the Node.JS ecosystem and community.
Genomic data is flooding the web, and we need tools that can scale to analyse it in realtime and fast to:
Short term - what we're working on now
Medium term - what we're working on next!
Longer term items - working on this soon!
Of course @Wandalen! I think it is safe to say one of bionode's goals is to gather up contributors. One of the contributing factors to this would be to have extremely well documented+explained core APIs (e.g. watermill task API) - so that a plugin ecosystem for bioinformatics can emerge.
One of the most immediate + substantial things to work on would be bringing streaming tasks to bionode watermill. Right now tasks declare input and output as glob patterns (in the future, regex, other options). However, you cannot stream between tasks. This should be a matter of the task declaring its input/output as
This might actually be less overhead than the way I have already done it, which does a number of things involving files (was made clear very soon that bioinformatics workflows are now, for the most part, consisting of reading/writing files - once we can handle that perfectly, and researchers are happy they can implement what they are used to - time to unveil the curtain of streams and its pros/cons given certain pipeline situations):
For streams, it's basically a bunch of functions that use read and write pipes. Would also be good to observe streams and write to file, in case task needs to be reran, or for storage of intermediate results (probably good point for discussion - why use streams if writing files anyway? someone might ask - interesting place to look for pro/cons - e.g., by using streams, can check stderr for messages indicative of failure for a specific tool despite 0 exit code). Another interesting aspects of streams is instead of doing this can spawn three child process, pass streams between them, enabling storage of intermediate results and experimentation with paramaters on a per tool basis. E.g. for
I see you've made BufferFromFile, very cool! Perhaps a simple watermill pipeline using that module can be a place to get started for you.
Other than watermill, our modules are built for performing bioinformatics tasks via JS APIs, Node streams, and proxying to tools (binaries). Our "wrapper modules" should probably be deprecated and replaced with a watermill task module once watermill matures. Too much work to maintain a wrapper module for every tool.
Hopefully that gives you a taste of what I see in the roadmap for bionode, and introduce you to topics you might be interested in working on!