Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
174 lines (123 sloc) 8.77 KB

STAT 547M: Basic Training for Data Science

This is the course syllabus for the Fall 2018 edition of STAT 547M (click here for the STAT 545 syllabus preceding this course). You should use this syllabus to:

  1. find information about the course, and
  2. navigate the course.

About STAT 5457M

Course Description

STAT 547M is "Part II" of learning how to

  • explore, groom, visualize, and analyze data
  • make all of that reproducible, reusable, and shareable
  • using R

Part I was STAT 545.

Selected Topics

  • Be the boss of non-numeric data, esp. character and factor
  • Interactive pages, apps, and graphics with Shiny
  • Get data off the web and expose data, code, results on the web
  • Distribute data and code via an R package
  • Automate an analytical pipeline, e.g. via Make

Timing and Location

This course runs from October 23 until November 29, 2018.

We'll meet as a class every Tuesday and Thursday, 09:30-11:00, in ESB 2012.

I'll aim to end class at 10:45.


Jenny Bryan deserves a huge amount of credit for founding and developing both STAT 545 and 547M over many years, along with her TA's. Some of their content is even being used in this very syllabus. Thank you!

Course Destinations

Both STAT 545 and STAT 547M make use of the following tools:

  1. This "Classroom" website.
    • Think of this as the course "home" -- and this syllabus as your launch pad to other destinations.
    • Contains lecture notes, assignments, and course information.
  2. Discussion-Internal GitHub repository.
    • For internal discussion. The world cannot see this.
  3. Discussion GitHub repository.
    • For public discussion. The world can see this.
  4. STAT545-UBC-students GitHub Organization.
    • This will contain one GitHub repository per student, for you to submit homework to and give peer reviews.
  5. UBC canvas
    • This is for grade management. You'll be interacting with it by submitting a link to your homework.
    • This holds course content, such as tutorials. Think of this as a textbook. We'll point you there when needed.
    • Previously contained the information contained in Classroom. That eventually became confusing. Some headers there are becoming deprecated.
Meeting No. Date TA's Topic Resources
01 oct-23 tues Chad, Sherrie Regular expressions and character data
02 oct-25 thurs Chad, Hossam Writing your own R functions
03 oct-30 tues Hossam, Sherrie purrr, list-columns, nested data frames
04 nov-01 thurs Hossam, Rashedul Part II
05 nov-06 tues Chad, Sherrie Automate tasks and pipelines
06 nov-08 thurs Chad, Rashedul Part II
07 nov-13 tues Rashedul, Sherrie Build your first R package
08 nov-15 thurs Rashedul, Chad Part II
09 nov-20 tues Hossam, Sherrie Build your first R package
10 nov-22 thurs Hossam, Chad Part II
11 nov-27 tues Chad, Sherrie Get data from the web
12 nov-29 thurs Hossam, Rashedul Part II


To gain marks in this course, you'll be completing five assignments, and submitting two peer reviews for each assignment. Participation counts too!

Here's the breakdown of your course grade:

Assessment Weight
5 Assignments 75% (15% per assignment)
10 Peer Reviews 15% (3% per assignment)
Participation 10%

There is no final exam.

Auditing students must still complete and submit all assessments, to be graded on a pass/fail basis.

Assignments and peer review: For information about and links to assignments and peer reviews, go to the assignments folder.

Participation: will be evaluated by:

  • Participation in class discussions
  • Submitting work on in-class activities to your STAT 545 home GitHub repository.
    • Be sure to submit these in class, not at some other time!
  • Activity in the Issues of the two Discussion repositories -- with preference given to Discussion (public) over Discussion-Internal (private).

Teaching Team

Here is your dedicated teaching team!

Teaching Member Position Contact
Vincenzo Coia Instructor

- Email:

- GitHub: @vincenzocoia

- Twitter: @VincenzoCoia

- LinkedIn: vincenzocoia

Chad Fibke Teaching Assistant GitHub: @ChadFibke
Hossameldin Mohammed Teaching Assistant GitHub: @hsmohammed
Rashedul Islam Teaching Assistant GitHub: @rashedul
Sherrie Wang Teaching Assistant GitHub: @sherrie9

Please see the "Conversation" section below to determine who to get in touch with for what, and how.

Office hours: Want to talk about the course outside of lecture? Let's talk during these dedicated times (generally, 11:00-12:00 every Monday, Tuesday, Wednesday). You're always welcome to schedule alternative times, too.

Teaching Member Date Time Place
Rashedul Tue, Oct 23 11:00 - 12:00 ESB 3174
Vincenzo Wed, Oct 24 11:00 - 12:00 ESB 1043
Chad Mon, Oct 29 11:00 - 12:00 ESB 3174
Hossam Tue, Oct 30 11:00 - 12:00 ESB 3174
Vincenzo Wed, Oct 31 11:00 - 12:00 ESB 1043
Chad Mon, Nov 05 11:00 - 12:00 ESB 3174
Rashedul Tue, Nov 06 11:00 - 12:00 ESB 3174
Vincenzo Wed, Nov 07 11:00 - 12:00 ESB 1043
Sherrie Mon, Nov 12 11:00 - 12:00 ESB 3174
Rashedul Tue, Nov 13 11:00 - 12:00 ESB 3174
Vincenzo Wed, Nov 14 11:00 - 12:00 ESB 1043
Hossam Mon, Nov 19 11:00 - 12:00 ESB 3174
Chad Tue, Nov 20 11:00 - 12:00 ESB 3174
Vincenzo Wed, Nov 21 11:00 - 12:00 ESB 1043
Sherrie Mon, Nov 26 11:00 - 12:00 ESB 3174
Hossam Tue, Nov 27 11:00 - 12:00 ESB 3174
Vincenzo Wed, Nov 28 11:00 - 12:00 ESB 1043
Sherrie Mon, Dec 03 11:00 - 12:00 ESB 3174
Rashedul Tue, Dec 04 11:00 - 12:00 ESB 3174
Vincenzo Wed, Dec 05 11:00 - 12:00 ESB 1043


We strongly encourage you to communicate with the teaching team and your classmates. The best way to do this is by posting an Issue in one of the two Discussion GitHub repositories:

  1. Want to talk about content/coding issues? Post an Issue in the Discussion (public) repository.
  2. Want to talk about the course? Post an Issue in the Discussion-Internal (private) repository.
  3. Want to privately contact Vincenzo? Feel free to send me an email.
    • I look forward to receiving your email, though I do encourage you to post in one of the Discussion repositories unless it's really not appropriate for either platform.


  • To get the attention of the teaching team, add the @2018_teaching_team tag to notify all five of us.
  • To get the attention of your fellow students, add the @2018_students tag to notify them.
  • Don't just create Issues -- also respond to them! Think about this in terms of adding to the conversation, not in terms of "correctness".

Annotated Resources

Here are the resources we will be referring to throughout the course, along with a brief description of the resource.

Overarching resources:

    • As mentioned earlier, this website can be thought of as a textbook for STAT 545/547.
  2. R for Data Science (aka "r4ds"), by Garrett Grolemund and Hadley Wickham.
    • STAT 545/547 closely mirrors the topics of this book, making this book more of a true "textbook" for the course.

Resources for more specific topics:

  1. R packages
  2. Advanced R, by Hadley Wickham
    • If you want to learn more about R as a programming language, this is a very readable and concise way of doing so.