# Teaching Strategy

## Ordinary Least-Squares Estimation

G. Alexi Rodríguez-Arelis, **May 2022**

# Outline

1. How to holistically teach ordinary least-squares estimation?
2. Student Engagement and Learning
3. Assessment
4. Technology
5. Student Difficulties

## 1. How to holistically teach the chosen topic?

In a Data Science context, teaching a statistical model involves these steps:

1. Main statistical inquiry.
2. Data collection.
4. Exploratory data analysis (EDA).
3. Data modelling (assumptions involved!).
4. Estimation.
5. Inference or prediction **(not covered in the sample lecture)**.
6. Data storytelling.

The previous process can even be generalized to models different from ordinary least-squares (OLS). Moreover, this might be applicable in a Bayesian framework.

We need to highlight the following:

- Students need to keep in mind the main statistical inquiry **all the time**. This is the "big picture" of the whole Data Science problem.
- Statistical modelling involves a well-defined process. Thus, it is crucial to keep track of the seven steps.

## 2. Student Engagement and Learning

From previous experience, using datasets involving **Data Science-related applications** has improved student engagement in the statistical courses. 

In this case, the OLS lecture is more interesting if we set up a scenario where the student is in **the shoes of a "Data Scientist at Tidal"** with a concrete statistical objective.

A Master of Data Science (MDS) program requires applied Statistics and not just abstract modelling. It is essential to cater to a diverse cohort of students.

Recall the sample lecture's high-level learning outcomes:

>- **Define linear regression models.**
>- **Estimate their terms using `R` via a sample and interpret them.**

The first learning outcome is approached by setting up this **hypothetical scenario** where a linear regression allows the student to solve the main inquiry. We aim to build a practical intuition as follows:

- Posing specific **in-class EDA questions**.
- Identifying the model's components in the context of the problem as **in-class modelling questions**.

The second learning outcome heavily relies on **code cells** distributed throughout the lecture's presentation. Specifically, the lecture notes involve three types of code cells:

1. Data wrangling.
2. Data plotting
3. Model estimation.

Finally, besides in-class activities, platforms such as Slack allow the whole cohort to work together on their learning goals.

## 3. Assessment

Assessing the topic's learning outcomes in applied statistical models (in a Data Science context) **DOES NOT NECESSARILY** involve abstract mathematical proofs. As instructors, we need to clarify this point from the very beginning of the course.

Students' attaintment for the learning outcomes of the chosen topic can be done as follows:

1. **Formative Assessment.** We can do it via in-class questions and discussions. The in-class discussions are constructive when models get more complex, e.g., students can discuss in groups the overall goal when using a given statistical model (this is especially useful in subsequent stages of the program such as the Capstone Project).

2. **Summative Assessment.** A non-project based MDS course typically has two summative assessments:

- **Labs.** The chosen topic is evaluated via another Data Science-related problem involving an interesting dataset (e.g., the Facebook data in the sample lab). This lab should cover exactly the concepts and practice discussed during lecture time. Moreover, we can include **optional and more challenging questions**.

- **Quizzes.** These assessments **SHOULD NOT INVOLVE** purely abstract statistical concepts. Ideally, one would embed these concepts into brief case studies, which are doable during quiz time.

## 4. Technology

Teaching these statistical topics would involve the following learning technologies:

- **Jupyter notebook and book.** The Jupyter notebook allows combining both equations and code cells. Furthermore, as the course progresses, the Jupyter book knits all the notebooks into a single website for easy access when reviewing all contents.

- **RISE.** Integrating this extension to turn the Jupyter notebook into a live reveal.js-based presentation is practical when preparing the lecture.

- **`R` markdown.** This reproducible tool is handy when working on the lab assignment.
- **Otter Grader.** This tool allows us to build auto-graded questions via different test functions. This is especially helpful in introductory courses since it provides instant feedback to the students.

- **Slack.** This messaging platform allows the whole cohort to help each other when solving assignments and reviewing lecture content.

## 5. Student Difficulties

MDS cohorts are diverse in terms of academic backgrounds. Thus, teaching statistical topics needs to be flexible enough to provide a great learning experience for everybody.

Nevertheless, students could face the following difficulties:

- They could find mathematical notation **overwhelming at first**. Therefore, we need to decompose these equations into understandable concepts.
- Identifying the **right types of variables** (continuous or discrete).
- How the **loss function** is used in this estimation process (and its relationship with Machine Learning).
- Using the **proper coding functions and arguments**.