-
Notifications
You must be signed in to change notification settings - Fork 39
Old Roadmap #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Any chance to get involved from outside? |
Of course @Wandalen! I think it is safe to say one of bionode's goals is to gather up contributors. One of the contributing factors to this would be to have extremely well documented+explained core APIs (e.g. watermill task API) - so that a plugin ecosystem for bioinformatics can emerge. One of the most immediate + substantial things to work on would be bringing streaming tasks to bionode watermill. Right now tasks declare input and output as glob patterns (in the future, regex, other options). However, you cannot stream between tasks. This should be a matter of the task declaring its input/output as This might actually be less overhead than the way I have already done it, which does a number of things involving files (was made clear very soon that bioinformatics workflows are now, for the most part, consisting of reading/writing files - once we can handle that perfectly, and researchers are happy they can implement what they are used to - time to unveil the curtain of streams and its pros/cons given certain pipeline situations):
For streams, it's basically a bunch of functions that use read and write pipes. Would also be good to observe streams and write to file, in case task needs to be reran, or for storage of intermediate results (probably good point for discussion - why use streams if writing files anyway? someone might ask - interesting place to look for pro/cons - e.g., by using streams, can check stderr for messages indicative of failure for a specific tool despite 0 exit code). Another interesting aspects of streams is instead of doing this can spawn three child process, pass streams between them, enabling storage of intermediate results and experimentation with paramaters on a per tool basis. E.g. for I see you've made BufferFromFile, very cool! Perhaps a simple watermill pipeline using that module can be a place to get started for you. Other than watermill, our modules are built for performing bioinformatics tasks via JS APIs, Node streams, and proxying to tools (binaries). Our "wrapper modules" should probably be deprecated and replaced with a watermill task module once watermill matures. Too much work to maintain a wrapper module for every tool. We are also trying to ease the use of somewhat undocumented web APIs. For example, bionode-ncbi provides access to NCBI, but not for all databases, and needs to parse FTP/HTML responses. In bionode-blast I wrote a helper function to pull down "json" response which is actually a (malformed - but still extracts with less spec-following modules) zip, and return a JSON. I hope to improve that module by strictly documenting all parameter types and their validations, and provide a documented REST API. Perhaps using things like JSON schema, schema-salad (my JS version does nothing atm), typed JavaScript with Flow and Typescript, swagger/raml. Imagine having a REST API which documents its response schema (array of objects) - could then have that enter an object streaming pipeline and operate on its shape confidently (IDEs pick up on types). Overall - some way to create ultra documented APIs that work as CLI tools, JS modules, proxied by REST APIs - all from one main definition, would be very cool. Hopefully that gives you a taste of what I see in the roadmap for bionode, and introduce you to topics you might be interested in working on! |
Interesting, @thejmazz. Want to play with the project. Thank you for such deep excurse. Good starting point. |
Sorry, but I'm moving this discussion to #42 just because it's a more meaningful number (see what I did there?) and easier to remember when I'm making a link to this issue! 😎 |
Roadmap
Modular and universal bioinformatics
Bionode.io is a community that aims at building highly reusable tools and code for bioinformatics by leveraging the Node.JS ecosystem and community.
Why
Genomic data is flooding the web, and we need tools that can scale to analyse it in realtime and fast to:
Core features
Short term - what we're working on now
Medium term - what we're working on next!
Longer term items - working on this soon!
Achievements
The text was updated successfully, but these errors were encountered: