# Change runtime type to R

Remember, the first step when opening a Google Colab notebook is to change the runtime type to R from Python. Our code will not work otherwise!

# Research question

We're turning to another sample from the original Cleary data to check for bias in a different variable: High School Rank (i.e., the class rank in their graduating high school class).

**RQ**: Is selecting on High School Rank biased against a particular group?  

# 1) Load packages

Install the `easystats` package then load it along with the `tidyverse` package.

In [None]:
## Install packages


In [None]:
## Load packages


# Load the data

These today data were simulated from the correlation matrices on Table 3 (groups 1 & 3) from Cleary (1968). These data *perfectly* reproduce the correlation matrix--within rounding error--but the individual observations are computer generated (ask me for that code if you wish).  

These data are from the original paper that the "Cleary Model" (i.e., using multiple regression with 1 continuous and 1 categorical predictor for a continuous outcome variable to detect discrimination).

The following variables are used in our dataset:

- **HSR**: Continuous variable transformed to a percentile score (i.e., ranges from 0 to 100 with 100 representing the top student in their HS class).

- **Group**: A categorical variable designating the racial identity of the hypothetical student. Levels are "Black" and "White".

- **GPA**: A continuous variable representing the college GPA (range from 0 to 4).

Data originated from the follower paper: Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and white students in integrated colleges. *Journal of Educational Measurement, 5*(2), 115-124.

In [None]:
## Set the URL to Casey's GitHub page where the dataset is located
FileURL <- "https://raw.githubusercontent.com/CaseyGio/Psyc6290/refs/heads/main/Datasets/Cleary2.csv"

## Read the csv file from GitHub and create a new object
Cleary <- read_csv(url(FileURL))

## Check out the dataset
head(Cleary, n = 10)

# 2) Data cleaning

Conver the `Group` variable to a factor in R. It may be helpful to create a new object (e.g., ClearyClean) to store this change.

In [5]:
## Convert group to factor


# 3) Create scatterplot visualization for the Cleary regression model

Create a scatterplot to show the relationship between HSR, Group, and GPA.

In [None]:
## Create scatterplot


# 4) Main effect model estimation

Estimate the main-effects regression model to predict GPA from the predictors in our Research Question.

In [7]:
## Estimate main effect model


## 5) Interpret the main effect model results

Examine the estimated parameters in our regression model then address both prompts below:

(a) State the null and alternative hypotheses for each parameter estimate

(b) identify which parameters, if any, are statistically significant given an $\alpha$ level of 0.05.

In [None]:
## Show parameters


(a) H0: $\beta_i = 0$ \\
 Ha: $\beta_i \neq 0$

(b) each parameter is statistically significant

# 6) Interaction effect model

Estimate the interaction-effects regression model to predict GPA from the predictors in our Research Question.

In [None]:
## Estimate interaction model


## 7) Interpret the interaction effect model results

Examine the estimated parameters in our regression model then address both prompts below:

(a) State the null and alternative hypotheses for each parameter estimate

(b) identify which parameters, if any, are statistically significant given an $\alpha$ level of 0.05.

(a) H0: $\beta_i = 0$ \\
 Ha: $\beta_i \neq 0$

(b) HSR is not statistically significant.

# 8) Compare the main- and interaction-effects models

Use the `anova()` function to examine these nested models. Based on the output and an $\alpha$ significance level of 0.05, determine which model is a better representation of the data.

In [13]:
## Compare models


Unnamed: 0_level_0,Res.Df,RSS,Df,Sum of Sq,F,Pr(>F)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,197,109.5055,,,,
2,196,105.4679,1.0,4.037589,7.503397,0.006726323


# 9) Respond to our research question

**RQ**: Is selecting on High School Rank biased against a particular group?

Defend your answer with the data (i.e., tell me how you are using our models to make your conclusion).