Skip to content

Our Reproducible Repository Guidelines πŸ˜ƒ

Seth Russell edited this page Apr 16, 2019 · 3 revisions

Making PheKnowVec a Reproducible Research Repository


"Computational science has led to exciting new developments, but the nature of the work has exposed limitations in our ability to evaluate published findings. Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible ...The field of science will not change overnight, but simply bringing the notion of reproducibility to the forefront and making it routine will make a difference." -Roger Peng


PheKnowVec Reproducibility Guidelines

Now that you have an idea of what GitHub tools are available, we'd like to provide some additional guidelines for how we would like to make our projects reproducible. Since there are no universally agreed upon standards for developing reproducible research repositories, consider what follows as a working draft of our best attempt at defining a new GitHub-based standard:


Create Issues for Everything:

  • Issues should represent a single idea, task, bug, question, or suggestion
  • All correspondence about an issue, whenever possible, should be conducted within that issue
  • When an issue has been addressed, close it, but don't be afraid to re-open it if more work is needed
  • Specific labels have been created to help track issues, use them!

Track Progress:

  • We have created specific Project Boards in an effort to keep things organized. These boards have been automated, meaning issues will be automatically added to specific boards if a project is assigned at the time of creation.
  • Please remember to assign labels when making new issues. If you are unsure of which project to assign an issue to, let me know by mentioning me in the issue by including @callahantiff in the issue description.
  • If you have addressed a new issue, go to the project it's connected to and drag the issue to the "In Progress" board.
  • If you create an issue that is related to an existing Milestone, make sure you assign it at the time of creation.

Communicate Openly:

  • The README.md should be use for basic documentation (think Abstract of a paper).
  • The Wiki has been designed for extended documentation and in an effort to keep collaborators up to date on the project.
  • Provided detailed documentation of current analyses and work being performed.
  • Update meeting minutes religiously and referencing or link, when needed, to current issues.
  • Add evidence of success to the homepage. Providing links to the products of this work is an important aspect of reproducible science.
  • Provide links to collaborators via hyperlinking their GitHub username when referenced in meetings or other documentation -- share the love!

If the Code Changes a Bit, You Must Commit!:

  • Commit code often (at least daily); smaller changes are much easier for people to understand, validate, or even revert if there is a problem. Multi-day breaking changes should be done in a branch, still committed at least daily, and then merged in when ready.
  • Only tested code should be merged with the master branch, otherwise commits should be made to the development branch.
  • Write comments at commit time explaining what has changed.
  • Document code releases with Zenodo; a DOI can be obtained after the initial release.

Conduct Honest and Transparent Analyses:

  • Data processing pipeline should be in git. See recommendation on reproducibility spectrum image, here.
  • In working with HIPAA data, you can't save data to a public git repo, but you can and should save everything you do with your data.

If you have any suggestions for additional guidelines, please let us know here