<a href="https://colab.research.google.com/github/brendanpshea/data-science/blob/main/Data_Science_07_InferentialStats.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install pydataset -q # Install required packages
from pydataset import data # Import required modules
import pandas as pd # More on this below

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.9/15.9 MB[0m [31m34.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for pydataset (setup.py) ... [?25l[?25hdone
initiated datasets repo at: /root/.pydataset/


## Background To the Case Study

Imagine you're learning to read better, and your teacher says, "Let's try something new today." She breaks the class into smaller groups and gives each a different task. You're curious: Will this new method actually help you understand what you're reading? This is the crux of the Baumann experiment. It sought to find out whether particular teaching methods could improve how well fourth-grade students understood their reading material.

Sixty-six fourth-grade students were randomly assigned to one of three experimental groups: (a) a Think-Aloud (**TA, or "Strategy"**) group, in which students were taught various comprehension monitoring strategies for reading stories (e.g., self-questioning, prediction, retelling, rereading) through the medium of thinking aloud; (b) a Directed Reading-Thinking Activity (**DRTA**) group, in which students were taught a predict-verify strategy for reading and responding to stories; or (c) a Directed Reading Activity (**DRA or Basal**) group, an instructed control, in which students engaged in a noninteractive, guided reading of stories.
This is what we call a controlled experiment, a cornerstone of scientific research. In a controlled experiment, you have one or more groups who receive a special treatment (TA and DRTA in this case), and a control (**basal**) group that doesn't (DRA,). This setup allows researchers to compare results and draw conclusions about the effectiveness of the methods being tested.

So, why should you care? Well, the results showed that students in the Strat and DRTA groups were better at understanding their reading than those in the DRA/Basal group. They were more skilled at monitoring their comprehension, as shown by tests and questionnaires. Interestingly, Strat students were particularly good at being aware of their own understanding, while DRTA students were sometimes even better at spotting errors. This is crucial because it shows that teaching methods can significantly affect how well students understand what they read, a vital skill in almost every area of life.

In essence, the Baumann experiment shows us that the way we're taught can make a difference in how well we understand information. That's not just useful for teachers wanting to improve their methods; it's valuable knowledge for anyone who cares about learning, at school or beyond.

### Loading the Baumann Data
Let's get started by loading the Baumann data, and take a look at the head.

In [None]:
read_df = data('Baumann') # Load the baumann dataset
read_df.head()

Unnamed: 0,group,pretest.1,pretest.2,post.test.1,post.test.2,post.test.3
1,Basal,4,3,5,4,41
2,Basal,6,5,9,5,41
3,Basal,9,4,5,3,43
4,Basal,12,6,8,5,46
5,Basal,16,5,10,9,46


It looks like this contains students in the "Basal" (control) group. Now, let's look at the middle of the data.

In [None]:
read_df[21:26]

Unnamed: 0,group,pretest.1,pretest.2,post.test.1,post.test.2,post.test.3
22,Basal,9,6,7,8,32
23,DRTA,7,2,7,6,31
24,DRTA,7,6,5,6,40
25,DRTA,12,4,13,3,48
26,DRTA,10,1,5,7,30


Here we see students in DRTA group. Finally, we can take a look at the tail of the data:

In [None]:
read_df.tail()

Unnamed: 0,group,pretest.1,pretest.2,post.test.1,post.test.2,post.test.3
62,Strat,11,4,11,7,48
63,Strat,14,4,15,7,49
64,Strat,8,2,9,5,33
65,Strat,5,3,6,8,45
66,Strat,8,3,4,6,42


This appears to contain students in the "Strat" group. If we look closer, we'll find that there are exactly 22 students in each group. Now, let's get a summary of the data:

In [None]:
read_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 66 entries, 1 to 66
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   group        66 non-null     object
 1   pretest.1    66 non-null     int64 
 2   pretest.2    66 non-null     int64 
 3   post.test.1  66 non-null     int64 
 4   post.test.2  66 non-null     int64 
 5   post.test.3  66 non-null     int64 
dtypes: int64(5), object(1)
memory usage: 3.6+ KB


## A Brief Review of Descriptive Statistics
Before moving on to material about inferential statistics, let's briefly review what we early learned about descriptive statistics. We can retrieve many of these statistics as follows:

In [None]:
read_df.describe()

Unnamed: 0,pretest.1,pretest.2,post.test.1,post.test.2,post.test.3
count,66.0,66.0,66.0,66.0,66.0
mean,9.787879,5.106061,8.075758,6.712121,44.015152
std,3.02052,2.212752,3.393707,2.635644,6.643661
min,4.0,1.0,1.0,0.0,30.0
25%,8.0,3.25,5.0,5.0,40.0
50%,9.0,5.0,8.0,6.0,45.0
75%,12.0,6.0,11.0,8.0,49.0
max,16.0,13.0,15.0,13.0,57.0


As you can see, descriptive statistics give us a snapshot of our data, allowing us to understand its main features without making broader conclusions. These statistics can be like a magnifying glass, bringing into focus the central tendencies, dispersion, and shape of the data distribution. Let's go through some of the key terms based on the `read_df.describe()` output.

1. The **count** tells us the number of data points in each column. Here, we have 66 students in each group, which means 66 pretests and post-tests were conducted. The count is an essential starting point because it lets us know the size of our data set.
2. The **mean** is the average of all the scores. For example, the average score for `pretest.1` is approximately 9.79. This gives us a general idea of how the group performed but doesn't tell us much about individual performance or variability within the group.
3. The **standard deviation** measures how spread out the scores are from the mean. A small standard deviation means the scores are closely clustered around the mean, while a large one indicates a wider range of scores. For example, `post.test.3` has a standard deviation of approximately 6.64, suggesting that the post-test 3 scores varied reasonably around the mean.
4. The **minimum** and **maximum **values tell us the range of the scores. In `pretest.1`, the scores ranged from a minimum of 4 to a maximum of 16. Knowing the range can help us understand the breadth of performance among the students.
5. **Quartiles** divide the data into four equal parts. The 25% (first quartile), 50% (**median**), and 75% (third quartile) give us a sense of the data's distribution. For example, in `post.test.2`, 25% of students scored 5 or below, 50% scored 6 or below (the median), and 75% scored 8 or below.

By exploring these key terms, we can better understand the landscape of our data. And remember, while descriptive statistics are insightful, they're just the tip of the iceberg. They set the stage for inferential statistics, where we'll use this data to make more general conclusions.

## Samples and Populations
Inferential statistics allows us to "learn" things about the wider world (outside of our study) from out data. As a first step to understanding how this works, we should distinguish between a **sample**, which is a subset of individuals from a larger group, and a **population**, the entire group we're interested in. In the Baumann experiment, are the 66 fourth-grade students a sample or a population?

In the realm of statistics, the terms sample and population carry weighty significance. A sample is like a snapshot---a smaller group pulled from a broader context. On the other hand, a population is the entire movie reel, the full context from which the snapshot is taken.

In the Baumann experiment, the 66 fourth-grade students would most likely be considered a sample. Why? Because they stand in for a much larger group---say, all fourth-grade students in a district, state, or even the country. Researchers often use samples because studying an entire population would be impractical or impossible due to time, resources, and logistical constraints.

The purpose of using a sample is to make inferences about the larger population. In the Baumann experiment, we're not just interested in whether these specific 66 students improve their reading comprehension with different teaching methods. Instead, the ultimate goal is to generalize these findings to a broader population of fourth-grade students. We want to answer a bigger question: Can different teaching methods improve reading comprehension for all fourth-graders, or at least a significant subset of them?


### Sampling Methods
**Sampling methods** are the various techniques used to select a subset of individuals from a larger group for study. The choice of sampling method is crucial because it affects how well the sample represents the population, which in turn influences the validity of the study's conclusions. One key concern here is **bias**, a tendency to systematically favor certain outcomes over others. Bias can creep in through poorly designed surveys, non-representative samples, or even subtle wording in questions. It's a bit like taking a photograph at a strange angle; what you capture won't accurately reflect the whole scene. Common sampling method include:

1.  **Simple Random Sampling.** Every member of the population has an equal chance of being selected. It's the statistical equivalent of drawing names out of a hat. This method is excellent for reducing bias because it doesn't favor any group.

2.  **Stratified Random Sampling.** The population is divided into smaller groups based on a particular characteristic, like age or income. Then, a random sample is drawn from each group. This ensures that the sample represents all the strata in the population.

3.  **Cluster Sampling.** The population is divided into clusters, often geographically, and a random sample of clusters is chosen. Then, all members, or a random sample of members from those clusters, are surveyed. This method is often used when the population is spread out over a large area.

4.  **Systematic Sampling.** Every nth member of the population is selected, starting from a random point. For example, you might survey every 10th person on a customer list.

5.  **Convenience Sampling.** The sample consists of easily accessible members of the population. This method is the least rigorous and most prone to bias.

In all of these methods, the aim is to select a sample that is as similar as possible to the population in all respects that are relevant to the study. When bias is reduced, the results of the study are more generalizable to the larger population. For example, if you're studying voter behavior and only sample from one neighborhood, you may miss broader trends affecting other areas.

In the context of the Baumann experiment, simple random sampling was an appropriate choice for several reasons. First, it minimizes bias by giving every fourth-grade student an equal chance of being part of the experiment. This is essential for the integrity of the study, as it assures that the sample is likely representative of the population. Second, simple random sampling is straightforward to understand and implement, making it a practical choice for many types of research. Lastly, because the Baumann experiment aims to make general claims about teaching methods and reading comprehension for all fourth-graders, a sample that is as unbiased as possible is crucial.


### Control Groups
The concept of a **control group** is central to scientific research, acting as a sort of yardstick against which other experimental changes are measured. In an experiment, you have groups that receive some sort of treatment---these are your experimental groups. The control group, however, doesn't receive this special treatment or gets a neutral, standard one. It's like running a race where some runners get a high-tech pair of shoes designed to boost speed, while one group wears ordinary sneakers. That group in the regular footwear? That's your control.

In the Baumann experiment, the Basal group serves as this control. They engage in a "non-interactive, guided reading of stories," which we can consider the standard or traditional method of teaching reading comprehension. This control group is essential for several reasons:

1. The control group provides a **baseline level** of performance against which the effects of the different teaching methods (Think-Aloud and Directed Reading-Thinking Activity) can be compared. It's the "default setting" of the experiment.

2. By comparing the control group with the experimental groups, we can better **isolate** the effects of the specific teaching methods under scrutiny. If the experimental groups show significant improvement over the control group, we have strong evidence that the teaching methods are effective.

3.  In any experiment, various factors could potentially influence the outcome. The control group helps to mitigate the effects of these **confounding variables.** If both the control and experimental groups are subjected to the same conditions apart from the variable being tested, we can be more confident that any differences in outcomes are due to the variable itself.

4. Sometimes, it's not ethical to deprive a group of a standard treatment, especially in medical studies. In educational settings like the Baumann experiment, however, using a control group that receives the standard teaching method is generally considered ethical and helps validate the results. (For examle, it would NOT be ethical to simply have one group of fourth graders receive NO reading instruction!).

In summary, the Basal control group in the Baumann experiment acts as a critical anchor, grounding the study and enabling researchers to measure the effectiveness of the new teaching methods with greater confidence. Without this control group, distinguishing the impact of the teaching methods from other factors would be much more challenging.