# Imports

In [1]:
import pandas as pd
import numpy as np

# Introduction

A fictional company MBAs R Us advertises that its program increases a person’s score on the GMAT by an average of forty points. As a way of checking the validity of that claim, a consumer watchdog group randomly hired fifteen students to take both the review course and the GMAT. Prior to starting the course, the fifteen students were given a diagnostic test that predicted how well they would do on the GMAT in the absence of any special training. The watchdog group recorded the fifteen students' predicted scores and their scores after taking the review course. Is MBAs R Us claim believable?

In [8]:
df = pd.read_csv("../datasets/gmat_scores/gmat_scores.csv")

In [9]:
df

Unnamed: 0,subject,prior,post
0,SA,494,529
1,LG,608,645
2,SH,575,608
3,KN,460,494
4,DF,715,753
5,SH,473,513
6,ML,544,579
7,JG,595,631
8,KH,386,424
9,HS,537,570


# Experimental Design

Before proceeding to analyze the data, let's review the experimental design the consumer watchdog group used to assess MBAs R Us' claim. The experimental design had the following basic parts: 

- The **subject** of the experimental design was a student.
- The single **factor** or **treatment** applied to the students was "Review Course", and it had two **levels**: "Took the Review Course" and "Did not Take the Review Course". Also note that every factor in an experimental design is either a **fixed** effect or a **random** effect. A fixed effect is factor that the was preselected before the experiment, and a random effect is a factor that was not. The factor in this experiment was fixed.
- We know the consumer watchdog group randomly selected the students, but we do not know anything about where the students took the GMAT or what time of day. If the students had been divided into two distinct test groups that used different test sites, then the experimental design would have had two **blocks**, because each test site would have had a different environmental conditions that could have affected the students' scores. For example, one site could have been noisy, while the other site was quiet. However, any blocks that may have been present in the experimental design were irrelevant because the consumer watchdog group wanted to compare students against themselves - did a student's GMAT score improve?
- The prior and post GMAT scores are **dependent** because they not only depend on the review course but also on the students' innate abilities. The experimental design used repeated measurements.
- The measurements, i.e. the GMAT scores, are **similar** because they have the same units, i.e. score points.
- Lastly, the measurements are **quantitative** and not qualitative. (Qualtitaive measurements can come from grouping subjects by factor/level.

The most common experimental designs used are one-sample, two-sample, *k*-sample, paired, randomized block, regression/correlation, categorical, and factorial. The answers to five questions can be used to determine the experiemental design. We list the five questions and answers for the data given in the Introduction.

1. Are the observations quantitative or qualitative? Quantitave.
2. Are the units similar or dissimilar? Similar.
3. Is there one factor or more than one? One.
4. How many factor levels are involved? Two.
5. Are the observations dependent or independent? Dependent.

Using the flowchart in Figure 1, we see that the experimental design was a **paired data model** design. Futhermore, the model equations, which express an arbitrary measurement as a sum of fixed and random effects, for a paired data model are the following:

\begin{align}
X_i &= \mu_X + P_i + \epsilon_i, \quad i = 1,2,\dots, n \\
Y_i &= \mu_Y + P_i + \epsilon'_i, \quad i = 1,2,\dots, n.
\end{align}

For this case study, $X_i$ is the $i$-th student's score before to taking the review course, $Y_i$ is the $i$-th student's score after to taking the review course,
$P_i$ is the $i$-th student's individual conditions that uniquely affect its scores, and $\epsilon_i, \epsilon'_i$ are the random effects that could affect a student's score, such as noise, hunger, etc.

<center><b>Figure 1: Determining the Experimental Design</b></center>

![](../images/determining_experimental_design_flow_chart.png) 

In [None]:
As soon as a design has been chosen, a second question immediately follows:
How large should the sample size (or sample sizes) be?