# Exploratory Factor Analysis in R

In this assignment, we are going to use a simulated dataset. It builds on the Google Colab demonstration and the HTML demonstration posted to Canvas. You will be asked to identify the number of latent factors, assess latent factor influences on items, and interpret the results as they relate to revising our scales and investigating the construct validity of our scales.

## Learning objectives

By the completion of this assignment, you should be able to:



*   Identify and justify the number of latent factors in a dataset using scree plots and parallel analyses

*   Conduct exploratory factor analysis on a dataset

* Interpret factor loadings, including identifying weak items and cross-loadings

* Evaluate the internal (latent factor) structure and construct validity of our assessments



## Load packages

As usual, we will need to install the `psych` package then load the `psych` and `tidyverse` packages.

In [None]:
## Install psych
install.packages("psych")

In [None]:
## Load packages
library(tidyverse)
library(psych)

## Load the data

These data were simulated. They include 100 responses (rows) to 10 items (columns). These data are continuous and close to *z*-scores (i.e., mean of 0, standard deviation ~1.0).

In [None]:
## Load data
## Set the URL to Casey's GitHub page where the dataset is located
FileURL <- "https://raw.githubusercontent.com/CaseyGio/Psyc6263/refs/heads/main/Datasets/EFA%20Assignment%20Data.csv"

## Read the csv file from GitHub and create a new object
EFAData <- read_csv(url(FileURL)) %>% select(starts_with("Item"))

## Check out the dataset
head(EFAData, n = 10)

## 1) Compute the descriptive statistics

Use the `describe()` function from the `psych` package to evalute the means, SDs, and skew of our variables.



In [None]:
## Calculate descriptive statistics


## 2) Conduct reliability analyses

Use the `alpha()` function from the `psych` package to conduct reliability analyses. Be sure to identify the following:

  A) Overall (standardized) alpha ($\alpha$)

  B) Items that might be dropped to improve the reliability of our assessment

In [None]:
## Conduct alpha reliability analyses


## 4) Create the correlation matrix

Exploratory factor analyses use a correlation matrix as the main input. Create an object to store the correlation matrix for our analyses.


In [None]:
## Create correlation matrix


## 5) Create the scree plot

We will examine two methods to determine the number of factors to retain. First will be the Scree Plot. Use the `scree()` function from the `psych` package.


In [None]:
## Create the scree plot


## 6) Conduct parallel analysis

The second method will be the parallel analyses. Use the `fa.parallel()` function from the `psych` package. Remember to examine the FA Actual and FA Simulated data, **not** the PC Actual/Simulated.

In [None]:
## Conduct parallel analysis


## 7) Determine the number of factors

Based on both methods above, determine how many factors we should retain in our dataset. Briefly defend your stance.

[You may edit this box to answer the prompt]

## 7) Conduct exploratory factor analysis

Using the `fa()` function from the `psych` package, conduct exploratory factor analysis. Remember to specify the following arguments:
*   `nfactors`: Set this argument to the number of factors identified above.
*   `rotate`: *If we need a factor rotation*, start with an oblique rotation, such as "oblimin" or "geominQ".
*   `fm`: Set the factor method to "ols"

Please print out the factor **loadings** and, if we applied a factor rotation, the factor correlation matrix (the Phi matrix, see the demonstration).

In [None]:
## Conduct EFA


## 8) Interpret our results

Based on our all of our factor analysis results, interpret your results by answering the following prompts:

* How many latent factors were found? If multiple factors were found, was the factor correlation sufficiently large to suggest that an oblique rotation was necessary?

* Which factor(s) are each item best representing? Stated differently, what factor has the largest factor loading for each item?

* Are there any items that are "bad" (e.g., weak primary factor loading and/or cross-loadings)? If so, which items and briefly state how they are "bad."

[You may edit this textbox to answer the prompts]