Skip to content

ID529/lectures

Repository files navigation

Course Content for ID529 Data Management and Analytic Workflows in R

What follows are the time-tables and resources for each course day in ID529, a 2-week short-course being offered in January 2023.

The website for the course is online at https://id529.github.io/.

Monday January 9th

  • 1:30-1:40 Welcome to ID529
  • 1:40-2:10 Introductory Lecture: Course Details
  • 2:10-2:40 Live Demonstration: Modern Data Science Practices in an R Project Based Workflow
  • 2:40-2:55 Activity: Q&A + Group reflection on principles shown in the demonstration
  • 2:55-3:10 Lecture: Introduce the Final Presentations Format
  • 3:10-3:15 Break
  • 3:15-3:50: Intro to RStudio and R
  • 3:50-4:05: Discussion and Self-Introductions
  • 4:05-4:45: Lecture: Intro to Git and GitHub
  • 4:45-5:00: Demo of how to do the homework
  • 5:00-5:05: Positive Affirmations
  • 5:05-5:30 Activity: Setup GitHub accounts + work on the homework + chat with classmates

Recommended materials:

Homework:

  • Write a bio for yourself and include a picture!
  • Complete the intro to course survey
  • Check that your R and RStudio installations are working on your computer

Tuesday January 10th

  • 1:30-2:00 Lecture: Intro to R Programming (including conditionals, control flow, etc.)
  • 2:00-2:15 Activity: learnr Tutorial on Conditionals and Control Flow
  • 2:15-2:45 Lecture: Data Dictionaries and Documentation
  • 2:45-3:00 Activity: Q&A + Discussion
  • 3:00-3:05 Break
  • 3:05-3:35 Lecture: Reading in data of various formats
  • 3:35-4:30 Lecture: Intro to ggplot2 (common types of figures, faceting, legends, patchwork, and saving figures)
  • 4:30-4:50 Discussion: What are the ingredients to a ggplot? What makes an effective data visualization?
  • 4:50-5:10 Demonstration of how to do the homework
  • 5:10-5:30 Time to work on homework, chat with classmates, peruse recommended materials, engage in self-affirmation

Recommended materials:

Truly extra reading for those interested in advancing their conceptual understanding of ggplot2 and the possibilities in data visualization:

Homework:

  • Read in a dataset of your choice [we will give you some example datasets you can use] and create a figure using ggplot2. We want to see students include titles, subtitles, captions, data sources, legends, etc.

What you can expect from the instructional team:

  • We will be reading your bios and working on sorting you into groups for the final presentation topics.

Wednesday January 11th

  • 1:30-2:00 Lecture: R Projects
  • 2:00-2:50 Lecture: Intro to dplyr
  • 2:50-3:10 Activity: Work on dplyr learnr tutorial in groups
  • 3:10-3:15 Break
  • 3:15-3:35 Lecture: Cleaning Text Data
  • 3:35-4:05 Lecture: Writing Functions
  • 4:05-4:30 Activity: Write functions together
  • 4:30-4:45 Activity: Discussion Q&A from ggplot2 Homework
  • 4:45-4:55 Survey: Checking in on Pacing
  • 4:55-5:10 Demonstration of how to do the homework
  • 5:10-5:30 Time to work on homework, chat with classmates, peruse recommended materials

Recommended reading:

Homework:

  • No homework

Thursday January 12th

  • 1:30-2:00 Lecture: Diverse Data Sources (APIs [tidycensus, WHO, World Bank, qualtRics], scraping web data, tabulizer, tesseract, datapasta)
  • 2:00-2:20 Discussion: What kind of sources are students interested in using in their research or future work?
  • 2:20-2:50 Lecture: How to handle factors and date-times
  • 2:50-3:00 Break
  • 3:00-3:30 Lecture: Regression Modeling: How to fit regression models, extracting model summaries from fit model objects, visualizing model estimates
  • 3:30-4:15 Activity: Working with Regression Models in R
  • 4:15-4:45 Lecture: Creating maps in R
  • 4:45-5:00 Lecture: Reproducible Examples for Getting Help
  • 5:00-5:30 Time to work on final presentation materials together, peruse recommended materials, chat with classmates

Homework 4:

  • Choose one of the prespecified geospatial datasets and create a map in R
  • Fit and report on a regression model including categorical (factor) variables
  • Peer Review for Homework 3

Available/Recommended Reading (Remember! You don’t have to read all of this! Just focus on what’s most useful to you):

Friday January 13th

  • 1:30-2:00 Lecture: Why reproducibility and robustness are important principles in science and data analysis and acknowledging the pressures in academia that push people away from reproducible science
  • 2:00-2:15 Discussion
  • 2:15-2:45 Lecture: Visualizing and Reporting on Regression Models
  • 2:45-3:05 Lecture: Data Linkage Methods
  • 3:05-3:25 Activity: Working with Joins
  • 3:25-3:35 Break
  • 3:35-4:00 Activity: Hallway QR Code Challenges
  • 4:00-4:15 Lecture: Introduction of the Brown et al (partial) reproduction of Onikye et al’s results
  • 4:15-4:30 Homework demonstration
  • 4:30-5:25 Time to work on homework + chat together + work on final projects
  • 5:25-5:30 Giveaway for a Copy of R for Data Science
  • 5:30 Meet Hodu!

Homework 5

Available/Recommended Reading:

Tuesday January 17th

  • 1:30-1:50 Activity: Discussion of Onikye et al reproduction article
  • 1:50-2:10 Lecture: Introduction to R Packages
  • 2:10-2:30 Demonstration of how to create R packages that standardize data loading and cleaning processes
  • 2:30-3:00 Lecture: How to use R Markdown to produce reproducible reports including tables, visualizations, and inline-quantitative statements.
  • 3:00-3:10 Break
  • 3:10-3:30 Activity: Experiment with different R Markdown features
  • 3:30-3:50 Lecture: Advice for Debugging
  • 3:50-4:10 Activity: Debugging
  • 4:10-4:30 Activity: Getting Help Online and Debugging
  • 4:30-4:45 Demonstration of how to do the homework
  • 4:45-5:30 Time to do the homework, work on the final project together, peruse recommended materials

Recommended Reading:

  • R Packages:
    • Read Karl Broman’s Why write an R package?
    • Familiarize yourself with what’s in the R Packages book: https://r-pkgs.org/ — having a rough familiarity with the different parts will be helpful. Our suggestion here is to try and approach this more in terms of “what are the ingredients in a good R Package?” rather than trying to learn how to craft all of those ingredients from the ground up immediately.
  • R Markdown:

Homework:

  • Use R Markdown to document some exploratory data analysis.

Wednesday January 18th

  • 1:30-2:00 Lecture: A Data Analysis from Start to Finish
  • 2:00-2:30 Lecture: Longitudinal Data Analysis
  • 2:30-3:00 Lecture: Best practices for reporting on missing data
  • 3:00-3:30 Lecture: Intro to accessible exploratory data analysis methods: Correlation, principal components analysis, variable importance
  • 3:30-3:40 Break
  • 3:40-4:00 Discussion: What are the ethical principles involved in data analysis? What are the risks involved?
  • 4:00-4:30 Lecture: Clean Code and Considerate Coding
  • 4:30-5:30 Free time to work together on the final project, chat with classmates, peruse recommended materials

Recommended reading:

Thursday January 19th

  • 1:30-2:00 Lecture/Demo: Principles for Data Analysis from Start to Finish
  • 2:00-2:30 Lecture: Functional Programming
  • 2:30-3:00 Discussion + Demo: When does it make sense to use functional programming?
  • 3:00-3:10 Break
  • 3:10-3:40 Lecture: [Students’ Choice]
  • 3:40-4:10 Lecture: How to Keep Growing as a Programmer (and stay up to date)
  • 4:10-4:20 Details about turning in your final presentations
  • 4:20-5:30 Positive affirmations, free time to work together on final projects

Recommended materials:

Friday January 20th

  • 1:30-4:30 Final Presentations:
    • Students will be divided into 10 groups with 10-12 minutes presentation time and 3-5 minutes for feedback from the instructional team and Q&A from the audience.
    • 15 minutes × 10 groups = ~2.5 hours
    • we’ll make sure to take some breaks between every few groups
  • 4:30-5:00 Lecture: Recap of Key Takeaways
  • 5:00-5:30:
    • Make sure you’ve uploaded your presentation!
    • Feedback and Course Evaluations

Enjoy being done with the class and go on to do great things with your newly learned R skills!