Find file
Fetching contributors…
Cannot retrieve contributors at this time
18 lines (11 sloc) 4.81 KB
layout title author image
Some thoughts on a GitHub of Science
Trevor Bedford

Lately, I've been thinking more about issues surrounding Open Science and scientific publishing. This post is in part a response to posts by Scott Chamberlain and Marcio von Muhlen. Marcio's idea represents a major call-to-arms for innovation in how science is conducted and communicated. He states that "we need a social network of science, meaning scientific bundles of knowledge must be structured and accessible by API, with the connections among those bundles and appropriate utility metrics being what connects and prioritizes scientists." I would completely agree here. Making small steps, this is why I chose to post my latest paper to the arXiv and to GitHub itself.

Scott questions whether GitHub could be useful as a scientific publishing platform, which I think is a very different thing from Marcio's GitHub of Science. Here, as publishing platform, I think the primary advantage of GitHub is the versioning system at its heart. This would allow an audience to follow a scientific story as is progresses, but would also allow the history of a project to be queried and individual contributions to be easily assessed (at least in terms of writing and coding). If we want to move towards a system of post publication peer review there needs to be a good way of continually updating a manuscript and making it obvious what each new version brings. A nice open source analogy here (that Scott originally mentioned on Twitter) is the idea of peer review as opening issues. Right now, in Google Code or GitHub, you can open an issue with a project documenting a bug or other sort of problem. Developers can then respond to this issue and make the appropriate changes to their project (that are then linked to the issue, making tracking of specific revisions straight-forward). Peer review acts in a very similar fashion, documenting inadequacies with the approach taken in a scientific manuscript. So, please, please, open an issue with the canalization paper. I would be happy to try to attend to it.

However, I think the potential for something like a GitHub of Science goes much farther than just a publishing platform. In the current paradigm, manuscripts are built on top of manuscripts, but there is a lot of replicated effort. Let's say someone thinks of a small, but highly relevant, addition to a paper. For example, in the case of the canalization paper, what is the effect of vaccination on the antigenic evolution of influenza? This could be a one figure addition to the present paper. However, in the current system, doing this research would entail rerunning a lot of the model basics, writing a new paper, with a new introduction and a new discussion, all centered around vaccination. This one-figure vaccination addendum may not make a paper by itself, but it would be great if it could somehow be integrated into the literature.

The basic paradigm of GitHub is the forking of a software project. I write some code, you take what I've done and make some additions. I then have the option of folding your changes back into my version, or if I'm not happy with your changes, the two versions continue on their separate ways. With something like a GitHub of Science, someone else could fork my canalization paper, code and all, and append a short section and figure on vaccination. I could choose to pull this addition, integrating it as part of the paper, or the forked version could exist on its own. Here, I'm imagining a scenario where most collaboration manifests as a network of fork and pull requests between co-authors, where a story emerges by combining a number of individual contributions.

In a conversation with Ben Fry about this, he commented that the most beneficial aspect of peer review is that it forces scientists to work in such a way that their research can be reviewed, and, at least in theory, replicated. Working in such a way that research could be forked would be a much higher, better, bar in terms of documentation and reproducibility. There is continual innovation in terms of models for Open Science (Stack Exchange, arXiv, GitHub, etc...). I'm hopeful that we can eventually come up with something that gains some traction. However, I'm sure that whatever we start with, it has to produce a publishable end product, so that both old and new systems could continue forward, existing side-by-side.