New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments on 'Full Stack Science' #9

Open
BillMills opened this Issue Jan 1, 2016 · 7 comments

Comments

Projects
None yet
5 participants
@BillMills
Owner

BillMills commented Jan 1, 2016

What infrastructure do you use for doing open, reproducible science? Discuss here, and post comments on the blog post, Full Stack Science.

@DamienIrving

This comment has been minimized.

Show comment
Hide comment
@DamienIrving

DamienIrving Jan 2, 2016

@BillMills A fantastic and MUCH needed post! Couple of comments:

  • In this episode of the Talk Python podcast, Travis Oliphant suggests Anaconda Cloud as a lightweight (i.e. easier to learn for researchers) alternative to Docker.
  • I recently published an essay that takes the opposite approach to your post. Rather than describe the complete set of tools and practices one can use, I looked at the minimum you'd need to do in order to make your computational research reproducible. This post provides a quick summary of the essay (reassuringly, there's lots of overlap with your post).

DamienIrving commented Jan 2, 2016

@BillMills A fantastic and MUCH needed post! Couple of comments:

  • In this episode of the Talk Python podcast, Travis Oliphant suggests Anaconda Cloud as a lightweight (i.e. easier to learn for researchers) alternative to Docker.
  • I recently published an essay that takes the opposite approach to your post. Rather than describe the complete set of tools and practices one can use, I looked at the minimum you'd need to do in order to make your computational research reproducible. This post provides a quick summary of the essay (reassuringly, there's lots of overlap with your post).
@BillMills

This comment has been minimized.

Show comment
Hide comment
@BillMills

BillMills Jan 2, 2016

Owner

That's really interesting - you're right, there is a lot of overlap there, but we're coming at the problem from different directions; what do we need, in your case, and what do we have, in mine. I think my stack hits your points (hooray!) but I would not miss a docker-like entity on both ends here; I've been involved a bunch lately with trying to re-run third party analyses, and it's been... a pretty magical experience in trying to recreate completely undocumented dependency stacks.

Re: anaconda v. docker: whatever gets the job done! I'm mostly concerned with transferable dependency stacks, and if anaconda cloud delivers, then by all means. Another easy(ish?) way to do this is with spin-uppable instances on services like Heroku or Digital Ocean. That said, Docker is a bit like git in that it's a beast in full generality, but you really only need a tiny subset of its power to get over the finish line; particularly seeing as it's less of a collaborative exercise than a GitHub repo (which is where the real git gong show begins, with forks / branches / PRs / conflicts), I wouldn't be too put off by it.

Owner

BillMills commented Jan 2, 2016

That's really interesting - you're right, there is a lot of overlap there, but we're coming at the problem from different directions; what do we need, in your case, and what do we have, in mine. I think my stack hits your points (hooray!) but I would not miss a docker-like entity on both ends here; I've been involved a bunch lately with trying to re-run third party analyses, and it's been... a pretty magical experience in trying to recreate completely undocumented dependency stacks.

Re: anaconda v. docker: whatever gets the job done! I'm mostly concerned with transferable dependency stacks, and if anaconda cloud delivers, then by all means. Another easy(ish?) way to do this is with spin-uppable instances on services like Heroku or Digital Ocean. That said, Docker is a bit like git in that it's a beast in full generality, but you really only need a tiny subset of its power to get over the finish line; particularly seeing as it's less of a collaborative exercise than a GitHub repo (which is where the real git gong show begins, with forks / branches / PRs / conflicts), I wouldn't be too put off by it.

@jspauld

This comment has been minimized.

Show comment
Hide comment
@jspauld

jspauld Jan 7, 2016

@BillMills Cool post, great work.

If people are interested in sharing their entire research projects openly, my startup, Thinklab, aims to be of value. We want to be the best possible platform for openly sharing research updates in real-time, while engaging and rewarding participation from the community.

A few features I'd like to highlight:

  • Users can publish research proposals, and there's a cool system for open review of those proposals.
  • Users can reward feedback from peers by rating the value of their contributions. This allows us to publish a "contributor impact leaderboard". See @dhimmel's Thinklab project for an example.
  • Each discussion page is published with a DOI.
  • To cite an article users can add markdown like so [@10.1038/msb.2009.98]. This links to the article, while also creating a page where users can track citations or comment on the article directly.
  • The longer term vision is to create a system that intelligently directs researcher attention to ideas, questions, and problems that need their expertise most. Basically what Michael Nielsen describes in Reinventing Discovery. (This part requires more users.)

Would be great to get feedback. Cheers

jspauld commented Jan 7, 2016

@BillMills Cool post, great work.

If people are interested in sharing their entire research projects openly, my startup, Thinklab, aims to be of value. We want to be the best possible platform for openly sharing research updates in real-time, while engaging and rewarding participation from the community.

A few features I'd like to highlight:

  • Users can publish research proposals, and there's a cool system for open review of those proposals.
  • Users can reward feedback from peers by rating the value of their contributions. This allows us to publish a "contributor impact leaderboard". See @dhimmel's Thinklab project for an example.
  • Each discussion page is published with a DOI.
  • To cite an article users can add markdown like so [@10.1038/msb.2009.98]. This links to the article, while also creating a page where users can track citations or comment on the article directly.
  • The longer term vision is to create a system that intelligently directs researcher attention to ideas, questions, and problems that need their expertise most. Basically what Michael Nielsen describes in Reinventing Discovery. (This part requires more users.)

Would be great to get feedback. Cheers

@kidpixo

This comment has been minimized.

Show comment
Hide comment
@kidpixo

kidpixo Mar 12, 2016

Really interesting piece!

I'm trying to use the "full stack" in my work, but being a physicist without a formai training in informatics / data science is quite hard.

The link on the top here brings you to a nice 404 File not found due to the trailing backslash . It could be linked to github switch to jekyll 3 , see jekyll/jekyll/issues/4440 .

Cheers.

kidpixo commented Mar 12, 2016

Really interesting piece!

I'm trying to use the "full stack" in my work, but being a physicist without a formai training in informatics / data science is quite hard.

The link on the top here brings you to a nice 404 File not found due to the trailing backslash . It could be linked to github switch to jekyll 3 , see jekyll/jekyll/issues/4440 .

Cheers.

@BillMills

This comment has been minimized.

Show comment
Hide comment
@BillMills

BillMills Mar 12, 2016

Owner

@kidpixo thanks for pointing out that broken link! I definitely feel you when it comes to how undertrained we are in physics for this kind of thing - I originally got interested in scientific computing as a result of eternal frustration with how computationally underprepared I was for my research in grad school at CERN. Stick with it and you'll get there!

Owner

BillMills commented Mar 12, 2016

@kidpixo thanks for pointing out that broken link! I definitely feel you when it comes to how undertrained we are in physics for this kind of thing - I originally got interested in scientific computing as a result of eternal frustration with how computationally underprepared I was for my research in grad school at CERN. Stick with it and you'll get there!

@timstaley

This comment has been minimized.

Show comment
Hide comment
@timstaley

timstaley Mar 14, 2016

Hi Bill, nice post! Have you considered configuration-management languages + virtual machines as opposed to docker? Docker is probably quicker and more effcient if you just want to reproduce something statically, but my feeling is that actually 'docker is to dependencies what a pre-built tarball release is to code', and if you want a proper source-code equivalent then you should be using configuration management (Ansible is my personal favourite, and of course Puppet and Chef are the big names). That said, I'm not totally up to speed on docker, so I may be missing some subtleties.

The downside is that learning a configuration-management tool is a job of magnitude akin to learning git, but: If you set it up with a 'Vagrant' box-configuration, then all you need for a static reproduction is:

cd vagrant
vagrant up

timstaley commented Mar 14, 2016

Hi Bill, nice post! Have you considered configuration-management languages + virtual machines as opposed to docker? Docker is probably quicker and more effcient if you just want to reproduce something statically, but my feeling is that actually 'docker is to dependencies what a pre-built tarball release is to code', and if you want a proper source-code equivalent then you should be using configuration management (Ansible is my personal favourite, and of course Puppet and Chef are the big names). That said, I'm not totally up to speed on docker, so I may be missing some subtleties.

The downside is that learning a configuration-management tool is a job of magnitude akin to learning git, but: If you set it up with a 'Vagrant' box-configuration, then all you need for a static reproduction is:

cd vagrant
vagrant up
@BillMills

This comment has been minimized.

Show comment
Hide comment
@BillMills

BillMills Mar 14, 2016

Owner

@timstaley nice suggestions, thanks! The things I really like about Docker are (1) it makes CI on Travis pretty trivially easy (which I find a huge headache otherwise), and (2) docker images are shareable in a way that only really requires one maintainer to know what they're doing - everyone else just grabs the image and is happily off to the races. Particularly in an archival context, I think this makes sense.

BUT, Ansible looks like the Right Tool for devops management at a lab, for example, particularly if you've got a lot of cluster-y type things to do. Now if only we actually hired devops...

Owner

BillMills commented Mar 14, 2016

@timstaley nice suggestions, thanks! The things I really like about Docker are (1) it makes CI on Travis pretty trivially easy (which I find a huge headache otherwise), and (2) docker images are shareable in a way that only really requires one maintainer to know what they're doing - everyone else just grabs the image and is happily off to the races. Particularly in an archival context, I think this makes sense.

BUT, Ansible looks like the Right Tool for devops management at a lab, for example, particularly if you've got a lot of cluster-y type things to do. Now if only we actually hired devops...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment