Skip to content

Commit

Permalink
Merge pull request #1 from data-8/dsep-changes
Browse files Browse the repository at this point in the history
  • Loading branch information
chrispyles committed Jun 11, 2020
2 parents 3993567 + 3bbd7d2 commit 38b329d
Show file tree
Hide file tree
Showing 18 changed files with 1,230 additions and 31 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This repository hosts the content and website for Zero to Data 8,
the guide to deploying your own Data 8 course infrastructure.

All of the content resides in `content/` and is hosted online
at `data8.github.io/zero-to-data-8`.
at `data8.org/zero-to-data-8`.

## To make changes to this book

Expand All @@ -13,4 +13,4 @@ at `data8.github.io/zero-to-data-8`.
2. Run `make book`. This will update the book's built markdown files.
3. Push to github.

That's it!
That's it!
Binary file added content/v2/.DS_Store
Binary file not shown.
20 changes: 20 additions & 0 deletions content/v2/pedagogy/connectors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<!-- #region -->
# Connector courses


Connectors are one or two-unit courses which give students an in-depth introduction into the use of the data science toolkit and concepts in a particular field. Connector courses vary every semester, but are usually within the domain of social sciences. Connectors are not an official part of Data 8, though they play a crucial role in the data science education ecosystem with the DSEP (Data Science Education Program). Students are encouraged to take them together with Data 8 or a semester after they take Data 8.

For more information about connector courses, check out [The Data Science Connector Courses Page](https://data.berkeley.edu/education/connectors)



# Modules

Modules are short explorations into data science that can be implemented into (almost) any class on campus. Modules allow students to explore a data set relevant to their course with the help of the data science toolkit.

Like connectors, modules vary widely, and can be customized to fit each instructor's need and objective for the course. A module might include one or two lectures on analyzing course-relevant data and learning how to interpret the graphs, or they can be a set of labs in which students will learn the basics of programming and statistics which they can apply to their field of studies. Some modules also include a final class project run completely within Jupyter Notebooks.

Modules allow students from different fields of studies to get familiarized with the foundations of data analysis.

You can learn more about the modules the DSEP Team has created over the years on the [Modules webpage](https://data.berkeley.edu/education/modules)
<!-- #endregion -->
26 changes: 26 additions & 0 deletions content/v2/pedagogy/course_structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Data 8 Course Staff Structure

## Motivation

Data 8 has three hours of lecture a week with a two hour weekly lab section, 12 weekly lab and homework assignments each, and three larger projects. These are reflected in the course resource allocation diagram below. Weekly lab sections have a ratio of 27 students to one graduate student instructor and allow students to make personal connections with instructors in a large course. With large enrollment numbers, UC Berkeley had to address how to make class sizes in the thousands feel small.

![](https://i.imgur.com/e421NaU.png)


The answer is reflected in Data 8’s support resource strategy in the figure above. While the lecture halls seat hundreds, the teaching staff create a personal experience for students in the course by:

- Hosting over **25 weekly office hours** for one-on-one help staffed by anywhere between 2 and 10+ people an hour (with 2 separate professor office hours)
- Hosting **weekly small group tutoring sessions** (5 students and 1 tutor), and tailored exam review lectures.

In the past, Data 8 course have also hosted:
- **Guerilla sections**: Optional small topical worksheet-based review sessions offered by tutors once major course topics are covered.
- **Project ”parties”**: Allows students to work in the same room and ask staff members questions close to deadlines.

With this wide array of support resources, students in a 1500-person computing course can connect with their individual staff members.

The scale of staffing for these support resources is only possible through the enthusiasm of former students in the class. An undergraduate who has performed well in Data 8 and enjoys the course material can join the course staff as an **academic intern (AI)**. Academic interns assist teaching assistants (TAs) by answering student questions in a weekly lab section for 1 academic Pass/No Pass unit - an end of semester reflection on their experience is also required for credit. Undergraduates who prove to be engaged and talented academic interns are then selected to be tutors for the class.

**Tutors** assist with grading written responses on assignments, holding office hours, and conducting small group tutoring sections. Experienced tutors can eventually advance to positions as **Undergraduate Student Instructors (uGSIs)**, holding a weekly lab section and helping out with various pedagogical and logistical aspects of running the course. This hierarchical course staff structure is enabled by the careful leadership of Head GSIs and course instructors.


You can read more about the [Roles and Responsibilities here](roles_data8)
21 changes: 21 additions & 0 deletions content/v2/pedagogy/intro_pedagogy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<!-- #region -->
# The Data 8 Pedagogy Guide

Data 8 is a “The Foundations of Data Science” course taught to first-year students at UC Berkeley. It combines principles and skills from statistics and computer science, such as inference, modeling, hypothesis testing, visualization, and others. It provides a foundation in the many disciplines encompassed by “data science”, and gives students a practical introduction to the field.

Teaching Data Science requires a shift in the way we traditionally teach each of the individual concepts. What were once introductory classes in statistics, computer science, and ethics (among others) are now combined into a single introductory course.


This book covers many of the pedagogical decisions that were made in Data 8 and should be seen as a reference and background for it.

All of the tools that Data 8 uses are open source, and available for the community to use (either as broader community-run projects, or as Berkeley projects). The course material can be accessed at the following online resources:

- [The Data 8 Course Site](http://data8.org)
- [The Data 8 Course Textbook](https://www.inferentialthinking.com/chapters/intro)

To explore the guide, select a section to the left!

### Contacting Us

If you woud like to learn more about any of the tools used in Data 8 or are interested in deploying your own data 8 course, please fill out our Data 8 [Instructor Interest form](https://forms.gle/y5aQBi816xDXKe7r7) or shoot us an email at [ds-help@berkeley.edu](mailto:ds-help@berkeley.edu).
<!-- #endregion -->
126 changes: 126 additions & 0 deletions content/v2/pedagogy/roles_data8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Roles and Responsibilities
Each semester, around 45 teaching assistants (TAs), 45 tutors, and 150 academic interns (AIs) are hired to be part of Data 8 course staff. There are differing levels of hourly commitments, roles, and responsibilities both by and within roles depending on the staff member’s experience and ability to contribute to course staff.

### Academic Intern (AIs)

##### Lab assisting (2 hrs/week)
- 10 out of the 13 weeks (20 hours) → required for P
- Actively participating in the worksheet discussion time and answering questions during the lab notebook time
##### Lab prep (0.5 hr/week)
- Work through the lab assignment
- Discussion worksheet solutions provided

##### Pedagogy Practice/Reflection (0.5 hr/week)
- Weekly written check-ins
- General meetings/professional development
-
##### Meetings
- Meetings with TA
- Pedagogy lessons for the upcoming weeks

### Tutor

#### Returning Tutors (8 hours)
##### Grading (2)
- Read grading rubrics
- Grading assignments
- Handling regrades
##### Two tutoring sections (2)
##### Preparation (1)
- Review material being covered in lectures that week
- Tutoring section worksheet
- Weekly homework/projects
##### Office hours (1)
##### Flexible hour (1)
- Midterm proctoring, prep, etc.
##### Staff meeting (1)


#### New Tutors (8 hours)

##### Grading (2)
- Read grading rubrics
- Grading assignments
- Handling regrades

##### Preparation (1)
- Review material being covered in lectures that week
- Tutoring section worksheet
- Weekly homework/projects

##### Office hours (2)
##### GSI-shadowed tutoring section (1)
##### Flexible hour (1)
- Midterm proctoring, prep, etc.
##### Staff meeting (1)

### uGSIs
#### Lead uGSI (20)
##### One lab sections (2)
##### Tutor mentoring/Office hours (1)
##### Preparation (2)
- Review material being covered in lectures that week
- Lab discussion worksheet
- Work through/familiarize self with the lab assignment
- Weekly homework/project
##### Staff meeting (1)
##### Other (2, spread out through the semester)
- Proctoring for exams
- Helping at review sessions or guerrilla sections
##### Primary team (12)
- Delegating/managing their team
- Varies per role

#### uGSI, returning (8)
##### Lab section (2)
##### Office hours (1)
##### Preparation (2)
- Review material being covered in lectures that week
- Lab discussion worksheet
- Work through/familiarize self with the lab assignment
- Weekly homework/project
##### Staff meeting (1)
##### Additional role (1)
- Lead TA will delegate roles, should work out to roughly 1 hour a week, but varies per team
##### Other (1, spread out through the semester)
- Proctoring and grading for exams
- Lab assistant management

### uGSI, new (8)
New uGSIs are only hired in the Fall semester to be part of Data 8 course staff. In the Spring semester, all GSIs have roles and responsibilities equal to those for returning uGSIs.

##### Lab section (2)
##### Tutor mentoring/Office hours (1)
##### Staff meeting (1)
##### Pedagogy/New GSI meeting (1)
##### Preparation (2)
- Review material being covered in lectures that week
- Lab discussion worksheet
- Work through/familiarize self with the lab assignment
- Tutoring section worksheet
- Weekly homework/project
##### Other (1, spread out through the semester)
- Proctoring and grading for exams
- Lab assistant management

### uGSI Teams
Each uGSI (new or returning) is assigned to one of the following teams:

#### New GSI Support/GSI Support
- Run weekly new GSI meetings, check-ins with new GSIs
- "Lost student" support like connecting students to GSIs
##### Content & Infrastructure
- HW/Project/Lab Development and Release
- Infrastructure (Website + Jupyter support)
##### Grading
- Autograder wrangling
- Maintaining student grades, handling regrades
##### Lab Logistics
- LA Hiring/evaluations
- Lab assignments, lab shadowing
##### Tutor Support
- Tutor management and logistics
- Lead weekly tutor meetings
##### Pedagogy
- Assignment creation, exam writing
- Help new GSIs/tutors teach
140 changes: 140 additions & 0 deletions content/v2/pedagogy/syllabus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Course structure

The following is a general description of the course structure for Data 8.
It covers how to build your course syllabus and is a reflection of the content that is covered in the Data 8
textbook, which is freely available at the following link:

https://www.inferentialthinking.com/

The following sections describe some of the major takeaways that students
should learn in the class. Data 8 assumes *no programming*,
*no statistics*, and *no math* beyond a standard high-school level.

## Conceptual understanding of uncertainty and causality

A lot of the technical pieces in the course focus on enabling students to practice
specific technical skills (like programming). It is crucial that these skills
be learned in order to solidify a high-level understanding of how data, statistics,
and inference are inter-related. For example, see
[Chapter 2: Causality and Experiments](https://www.inferentialthinking.com/chapters/02/causality-and-experiments.html)
from the Data 8 textbook.

Below are a few high-level concepts that students should come away with:

* A solid understanding of causality on a conceptual level.
* Where did you get your data from?
* Is it a controlled randomized study or observational data.
* What is cause and effect? In the context of data?

## Programming fundamentals

Scripting and interactive computing are the primary ways that we operationalize
the data science methods covered in the course. While it is possible to find
programs that let you carry out various techniques with user-interfaces, Data 8
stresses that programming fundamentals will facilitate learning the analytic
topics and provide a more useful and generic skillset in computational methods.

In Data 8, programming fundamentals are taught *alongside* statistical concepts.
For example, [iteration is taught alongside random sampling](https://www.inferentialthinking.com/chapters/09/2/iteration.html).

Below are some programming fundamentals that students come away with:

* Data types, structures, functions, tables, etc
* Programming fundamentals *in the context of data*.
* How to answer data science questions with the tables you have.
* Complex operations with tables (grouping, joining, etc).
* How to create visualizations (charts, histograms, etc) and understand what is being represented.


## Statistics, sampling, and hypothesis testing

Randomness and statistics are core components of data science. Data 8
has a heavy emphasis on both. It is particularly important that students come
away with an appreciation for how a sampling method is
used to generate data, as well as an understanding for how statistics can
be used (and mis-used) to understand a dataset given a limited number
of data points.

Below are some statistics fundamentals that students come away with:

* Iteration in the context of probability of an event (e.g probability when flipping a coin)
* "for" loops etc are covered in the context of a coin flipping simulation
* How to manipulate tables to calculate probability.
* Sampling and empirical distributions - how to make conclusions based on random samples?
* How to compare two samples.

## Inference, prediction, and models

While statistics describe a dataset, it does not inherently make *predictions*
about the underlying distribution from which the data are drawn. Data 8 relies
heavily on bootstrapping and permutation methods in order to make estimations
of error/confidence in parameters derived from the data.

Beyond estimating the value of a model's parameter given limited data, models
are also used to generate *predictions* about the world given a new set of
data. Data 8 treats prediction as an extension of inference. In the same
sense that inference quantifies uncertainty in a model's parameter, we can also
generate uncertainty in predictions given a data point that the model has not
seen before. This is given treatment in the case of regression (models with quantitative outputs)
as well as classification (models with qualitative outputs).

Below are some inference, prediction, and modeling fundamentals that students come away with:

* Correlation in the context of modeling.
* Linear regression and least squares.
* Regression inference - what does a regression line really mean?
* How to compute confidence integrals of regression lines.
* How to train classifiers with simple algorithms such as k-nearest neighbors
* Brief introduction to machine learning

## Comparing distributions

Once students learn the various steps that go into statistically describing a
single dataset, Data 8 covers how to make comparisons *between* datasets. This
is a crucial part of most scientific analysis, as well as in industry data
analytics (e.g., in A/B testing). Data 8 covers comparisons between distributions
as an advanced case of the material that has been covered above.

Below are some fundamentals for comparing two distributions that students come away with:

* A/B testing - with what confidence can we say if two numerical samples come from the same underlying distribution or not?
* Bootstrapping and confidence intervals
* Importance of bias and variance of the sample mean
* Statistics about statistics (e.g. variability of sample mean)
* How to use sample means effectively for inference?
* How to design experiments and intrepret the distributions with Central Limit Theorem, Chebyshev's, etc.


# Building a Course syllabus page

You may use whatever technology you prefer for managing your course and
distributing content. However, we recommend setting up a syllabus page that
is used for distributing interact links and course materials. For an example,
see the structure of the Spring 2020 course syllabus: http://data8.org/sp20/

The syllabus has the following structure:

| Date | Topic | Lecture | Reading | Assignment
| ------------- | ----- | ------- | ------- | ----------
| Fri 01/24 | Cause and Effect | [Slides](https://docs.google.com/presentation/d/1lSwG_uGwQRL3oGQnmn7aphxYZlzX0G0KHeyNM-cNZb4/edit?usp=sharing) | [Chapter 2](https://www.inferentialthinking.com/chapters/02/causality-and-experiments.html) | [Homework 01](http://datahub.berkeley.edu/hub/user-redirect/git-sync?repo=https://github.com/data-8/materials-sp20&subPath=materials/sp20/hw/hw01/hw01.ipynb)
| Mon 01/27 | Tables | [Slides](https://docs.google.com/presentation/d/1jn2X5JtbOqOfiBa_QdV2ITw491dhRdxIvKR0bR1tWUk/edit?usp=sharing), [Demos](http://datahub.berkeley.edu/hub/user-redirect/git-sync?repo=https://github.com/data-8/materials-sp20&subPath=lec/lec03.ipynb), [Video](https://www.youtube.com/watch?v=BW9XcOG8jag) | [Chapter 3](https://www.inferentialthinking.com/chapters/03/programming-in-python.html)| |
| Wed 01/29 | Data Types | [Slides](https://docs.google.com/presentation/d/1TjU8ismB9qSrWAivDq2HvuYy68sRTE_Q_V3gPcHP0qQ/edit?usp=sharing), [Demos](http://datahub.berkeley.edu/hub/user-redirect/git-sync?repo=https://github.com/data-8/materials-sp20&subPath=lec/lec04.ipynb), [Video](http://youtube.com/watch?v=F3krVj7GkI0) | [Chapters 4](https://www.inferentialthinking.com/chapters/04/data-types.html), [5](https://www.inferentialthinking.com/chapters/05/sequences.html)| [Lab 02: Table Operations](http://datahub.berkeley.edu/hub/user-redirect/git-sync?repo=https://github.com/data-8/materials-sp20&subPath=materials/sp20/lab/lab02/lab02.ipynb)

Each row is a lecture, and each column is a type of material you can distribute.
The links in the columns either point to pages on the [course textbook](https://inferentialthinking.com).
or interact links that connect students with the course JupyterHub for distribution of homeworks and labs.

The videos and slides listed above and on the [Data 8 website](http://data8.org/sp20/) are restricted to berkeley.edu addresses.


## Assignments

Alongside the textbook are several computational homeworks, labs, and projects that let students interact with the
ideas covered in class. They can all be run interactively in the Data 8 environment.

These homework, labs, and project materials are freely available on the semester course repository.
There is a new repository created for each semester of the course, as the materials
tend to evolve over time. For example, here is the repository for the Spring 2020 course: https://github.com/data-8/materials-sp20



6 changes: 6 additions & 0 deletions content/v2/pedagogy/syllabus_intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Syllabus & Lectures
Teaching Data Science requires a shift in the way we traditionally teach each of the individual concepts in the course. What were once several introductory courses in statistics, computer science, writing, and ethics (among others) are now combined into a single introductory course.

In Data 8, touching on these topics in a single course is a feature, not a bug. For example, by teaching programming with the fundamentals of statistics, students gain an appreciation for the close interplay between these two fields and learn a valuable set of skills at the same time. Through this course, students are able to learn how to interpret and ask questions about their data with the support of inferential and computational tools.

This section covers many of the pedagogical decisions that were made in Data 8. It should be seen as a reference and background for the [Data 8 textbook](https://inferentialthinking.com), which is freely-available online.
Loading

0 comments on commit 38b329d

Please sign in to comment.