Repository for UBC's Introduction to Data Science course (DSCI 100)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md

README.md

DSCI 100: Introduction to Data Science

Jan-Apr 2019, Tues/Thurs 8:00 - 9:30 am, ORCH 4074

Use of Data Science tools to summarize, visualize, and analyze data. Sensible workflows and clear interpretations are emphasized. Prerequisite: MATH 12. UBC course calendar entry.

Expanded Course Description

In recent years, virtually all areas of inquiry have seen an uptake in the use of Data Science tools. Skills in the areas of assembling, analyzing, and interpreting data are more critical than ever. This course is designed as a first experience in honing such skills. Students who have completed this course will be able to implement a Data Science workflow in the R programming language, by “scraping” (downloading) data from the internet, “wrangling” (managing) the data intelligently, and creating tables and/or figures that convey a justifiable story based on the data. They will be adept at using tools for finding patterns in data and making predictions about future data. There will be an emphasis on intelligent and reproducible workflow, and clear communications of findings.

Course Software Platforms

Students will learn to perform their analysis using the R programming language. Worksheets and tutorial problem sets as well as the final project analysis, development, and reports will be done using Jupyter Notebooks. Students will be working on their own devices in lecture and tutorials (if students do not have a laptop, chromebook or tablet of their own, the UBC library has a technology lending program.

Learning Outcomes

By the end of the course, students will be able to:

  • Download and scrape data off the world-wide-web.
  • Wrangle data from their original format into a fit-for-purpose format.
  • Create, and interpret, meaningful tables from wrangled data.
  • Create, and interpret, impactful figures from wrangled data.
  • Apply, and interpret the output of, a simple classifier.
  • Make and evaluate predictions using a simple classifier.
  • Apply, and interpret the output of, a simple clustering algorithm.
  • Apply, and interpret the output of, a regression model.
  • Make and evaluate predictions using a regression model.
  • Distinguish between in-sample prediction, out-of-sample prediction, and cross-validation.
  • Apply and interpret a bootstrap analysis in a regression context.
  • Accomplish all of the above using workflows and communication strategies that are sensible, clear, reproducible, and shareable.

Teaching Team

Position Name email office hours office location
Instructor Tiffany Timbers tiffany.timbers@stat.ubc.ca TBD ESB 3152
Teaching Assistant TBD
Teaching Assistant TBD

Assessment

Deliverable % grade
Lecture worksheets 5
Tutorial problem sets 15
Group project 15
Peer-review of other groups projects 5
Two quizzes 20
Final Exam 40

Schedule

Week Topic Description
1 Chapter 1: Introduction to Data Science Learn to use the R programming language and Jupyter notebooks as you walk through a real world Data Science application that includes downloading data from the web, wrangling the data into a useable format and creating an effective data visualization.
2 Chapter 2: Reading in data locally and from the web Learn to read in various cases of data sets locally and from the web. Once read in, these data sets will be used to walk through a real world Data Science application that includes wrangling the data into a useable format and creating an effective data visualization.
3 Chapter 3: Cleaning and wrangling data This week will be centered around tools for cleaning and wrangling data. Again, this will be in the context of a real world Data Science application and we will continue to practice working through a whole case study that includes downloading data from the web, wrangling the data into a useable format and creating an effective data visualization.
4 Chapter 4: Effective data visualization Expand your data visualization knowledge and tool set beyond what we have seen and practiced so far. We will move beyond scatter plots and learn other effective ways to visualize data, as well as some general rules of thumb to follow when creating visualations. All visualization tasks this week will be applied to real world data sets. Again, this will be in the context of a real world Data Science application and we will continue to practice working through a whole case study that includes downloading data from the web, wrangling the data into a useable format and creating an effective data visualization.
5 Transition week Quiz #1
6 Chapter 5: Classification Introduction to classification using K-nearest neighbours (k-nn)
7 Chapter 6: Classification, continued Classification continued
8 Chapter 7: Clustering Introduction to clustering using K-means
9 Transition week Quiz 2
10 Chapter 8: Regression Introduction to regression using K-nearest neighbours (k-nn). We will focus on prediction in cases where there is a response variable of interest and a single explanatory variable.
11 Chapter 9: Regression, continued Continued explortion of k-nn regression in higher dimensions. We will also begin to compare k-nn to linear models in the context of regression.
12 Chapter 10: Bootstrap applied to regression This week will introduce the bootstrap, first by visualizing bootstrap samples and their fitted regression lines for cases where there is a response variable of interest and a single explanatory variable. An intuitive case will be made for what the ensemble of slopes represents, Then we work through examples from multiple regression, emphasizing the scientific interpretation and relevance of the mix of negative/positive slopes. We will emphasize that this is a jumping off point for the study of statistical inference.