Data Carpentry for Biologists - Semester Course
This is a forkable set of teaching materials for teaching biologists how to work with data through programming, database management and computing more generally.
This repository contains the complete teaching materials (excluding exams and answers to assignments) and website for a university style and self-guided course teaching computational data skills to biologists. The course is designed to work primarily as a flipped classroom, with students reading and viewing videos before coming to class and then spending the bulk of class time working on exercises with the teacher answering questions and demoing the concepts.
Helpful information is available regarding the structure and function of the course and website materials for customized development and delivery of the course.
We encourage collaborative development. This repository was used by @ethanwhite to teach a version of this course (Fall 2016) at the University of Florida. The course remains under active development. We welcome contributions to all aspects of the course/site and are especially seeking exercises and assignments for a range of disciplines. Key site and course materials are available as templates for contributions of new materials and other materials that are specific to the course (e.g., the syllabus) are developed in a way to facilitate easy customization.
Here are some examples of courses using the infrastructure and material from this course:
- Data Science for Biologists at Virginia Commonwealth University
- Data Science for Agriculture at Oklahoma State University
- Data Visualization for Plant Pathologists at the University of Florida
- Data Science for SAFS at the University of Washington
- Data Carpentry for Pharmacists at the University of Health Sciences and Pharmacy in St. Louis
- R Programming for Biologists at Stonehill College
- Data Carpentry for Ecologists at the University of Georgia
- Introduction to Data Analysis for Aquatic Sciences at the University of Washington
- Data Science in Omics Introduction at Oklahoma State University
- Ecoinformatics at Kenyon College
- Data Management for Biologists at the University of Minnesota
- Introducing Agroecology: The Basics of Agroecology for Practitioners at the University of Florida
- Data Science with R
Where is everything
Core teaching materials are stored in
Class specific materials are stored in the
Most of the other folders and files support creating the course website using Jekyll.
How to contribute
We use standard GitHub flow, so fork the repository, add or change material, and submit a pull request.
The goal of making this course forkable is to facilitate collaboration on developing this kind of material for university courses. The central component of a flipped computing course is the exercises, so one of the primary forms of contribution will be adding exercises to the pool of exercises. Individual instructors can then select from a rich pool of exercises the ones that fit the topics, languages, and scientific domains that best fit the material they want to cover in the course.
There are lots of great resources for being introduced to the individual concepts being taught in courses like this. Our philosophy is to use and improve these external resources when available instead of creating new versions of the same content. In particularly we actively use Data Carpentry and Software Carpentry workshop materials. However, in cases where the necessary material doesn't exist elsewhere it can certainly be added here.
New pull requests to this site are scanned using pa11y and pa11y-ci to ensure that additions to the site follow best practices for accessibility. If you discover any accessibility issues with the site please open an issue and we'll get them fixed.
Using Jekyll to build your own course website
The website is setup to be easy to run automatically through GitHub:
or import the repository to
# Setupinformation in
_config.ymlin the main directory for proper site rendering.
- You must
pushthis change to your repository to build and browse your forked version.
- In a few minutes you should be able to see the site at:
- You must
- Edit any of the markdown (.md) files
- Commit and push the changes
- The changes should now be reflected on the website
- If you want to use a custom domain name instead of
github.io, follow GitHub's instructions for setting up a custom domain.
If you have any problems please let us know and we'll be happy to help.
Previewing changes locally
If you want to view your changes locally, before pushing them to the live website, you'll need to setup Jekyll locally. GitHub provides a good introduction on how to do this.
If you have Jekyll properly installed, you can then run
bundle exec jekyll serve --baseurl ''
from the command line and navigate to http://localhost:4000/ in your browser to preview the current state of the website.
Creating new pages
If you want to add new exercises, lecture notes, etc. you do this by creating a markdown file in the appropriate directory. Each markdown file needs to start with some information that tells Jekyll what the page is. This is done using something called YAML, and the standard YAML for a new exercise would look like this:
--- layout: exercise topic: Topic group of exercise title: Name of exercise language: [R, Python, SQL] ---
This is placed at the very beginning of the markdown file and provides information on what kind of content it is (e.g., exercise, page, etc.), the title of the page, and what language it applies to.
The page should then be available at a url based on where the file is located
and what the file name is. So if you created a new exercise in the
my_awesome_exercise.md it would be located at:
After pushing to GitHub:
Building the site locally requires a local Ruby installation with 3 packages (gems):
For help with installation see:
- Installing Ruby Documentation
- Installing Jekyll Documention
- Using GitHub Pages with Jekyll Documenation
One you have installed Ruby and the jekyll gem go to the root of the site repository and run:
to install the rest of the dependencies.
Development of this material is funded by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and the National Science Foundation as part of a CAREER award to Ethan White.