Skip to content

blaserlab/datascience.curriculum

Repository files navigation

Welcome to the Computational Data Analysis Workshop

This course is designed to help you improve your data management and analysis skills. The skills you will learn can be used for everything from straightforward measurements of biological data to high-dimensional data like scRNA-seq.

Did you ever see a figure you made last year but can't remember how you made it? Do you need to write a Data Sharing Plan for an NIH grant but don't know where to begin? Understanding the principles we cover here will help you and others reproduce and extend your work.

The 2023 course has already started, but you can still join in by emailing bradley.blaser@osumc.edu.

Dates for the 2023 workshop:

  • April 12
  • April 19
  • April 26
  • May 3
  • May 10
  • May 17

These are all Wednesday mornings. Time will be 10-11 AM. Format is Zoom Webinar.

Goals

The goals of this workshop are for you to become comfortable with the R statistical computing language and associated computational technology:

  • Rstudio for interacting with R
  • git for version control
  • github to share your work
  • R package development to keep your code and data separate and organized

Specific Objectives

Once you have completed this course you should be able to:

  • properly format data for efficient computation
  • generate a table of descriptive statistics for data from a typical biological experiment
  • perform statistical testing as appropriate
  • generate publication-quality plots using ggplot
  • perform basic analysis of single cell RNA sequencing data
  • compile processed source data into an R data package
  • understand the difference between analysis code and source data
  • use basic version control functions to track and document changes to your analysis
  • publish your code so reviewers can understand how you arrived at your results

Prerequisites

The course assumes no prior knowledge of R. It is designed for biologists with an interest in analyzing high-dimensional and/or computationally-intensive data. The only prerequisites are a basic understanding of biological experimental design (controls, biological replicates, technical replicates, etc.) and a computer.

Before the first class you should make sure your computer is ready to go. There are several computing options to choose from.

  • You can use your personal computer. Most/all of the software we use in the course is available for Mac and PC and will run directly on your machine. If you own the computer, just install the programs below. If it is a lab computer, have your IT admin install the programs.
  • You may have access to a lab server or cloud service running Rstudio server. This will be linux-based and will run everything we will be using in the course.
  • You can register for the course at the Ohio Supercomputing Center. This will be a free for you to use for the duration of the course. Access will be terminated at the end of the course. If you would like to use this option, please email bradley.blaser@osumc.edu. I will send you an invitation.
  • You can use an existing academic account at the Ohio Supercomputing center or your institution's equivalent. The cost for what we will be doing will be tiny and this has the advantage of being a computing environment you may already be familiar with and have used/will use for your work.

R studio cloud (now Posit cloud) is not a great option for the course or for your academic research. It is subscription-based and the rates are exorbitant compared to what you will pay at a supercomputing center at your academic institution.

The best computing option for this class will be what you wish to use for your own research projects.

If you wish to work on your local computer, here are links to the programs we will be using:

Windows and Mac users: see this note on installing Rtools and Xcode.

You should also register for a free github account. Choose a name that you would be OK with putting in a publication.

Course structure

This workshop starts from the basics and moves through somewhat advanced topics. There will be no formal homework or assignments, but you will want to be comfortable with the topics previously presented by the time the next class arrives. It will help to read the course material in advance. If there are things that don't make sense or are causing you trouble at first, I encourage you to try to figure them out using Google and Stack Overflow. This is the best way to learn. Your questions or problems will have been encountered before. Try running the code in the class notes or R script as we go through the lecture. Then expand your horizon and work on your own data.

Although I will try to address all conceptual questions, we will be unable troubleshoot individual technical issues during class. If you have questions, comments or problems getting things to work, and we don't get to them by the end of class, you can post these issues here. You probably aren't the only person with that question/problem, so posting them in this forum will benefit others.

Each lesson will be structured in the following way:

  • 5 min for people to log in and enter any pre-existing questions they have in the Q&A. These can be questions from the prior week or questions about the current day's material. I will cover what I can during the lecture period.
  • 45 min for me to demonstrate the concepts in the day's lecture
  • 10 min for discussion and additional Q&A that arise.

Follow along with the workshop project on github

2023 Curriculum

This course covers a relatively wide range of topics which may be intimidating for new R users. Don't worry if you don't get it all the first time through. The lectures will be recorded and the code and notes will be published for your reference, so you can go back and review what you may have missed.

The first three lectures will present some basics in using R. The last three will be more advanced.

Even intermediate-level users with some pre-existing experience using R will likely learn some helpful information in the early lectures.

Week 3: More advanced concepts in R - April 26, 10-11 AM ET

  • Thanks for attending
  • I appreciate feedback on things that are or are not working.
  • Your comments will help improve the course for next year
  • Please email to bradley.blaser@osumc.edu

About

Curriculum for the Spring 2023 Series

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages