# Exploratory Factor Analysis in *Your* Assessments

This assignment is a continuation of our previous assignment. Instead of using simulated data, we shall investigate the latent structures of our own psychometric assessments!

## Learning objectives

By the completion of this assignment, you should be able to:



*   Identify and justify the number of latent factors in a dataset using scree plots and parallel analyses

*   Conduct exploratory factor analysis on a dataset

* Interpret factor loadings, including identifying weak items and cross-loadings

* Evaluate the internal (latent factor) structure and construct validity of our assessments


## Load packages

As usual, we will need to install the `psych` package then load the `psych` and `tidyverse` packages.

In [None]:
## Install psych
install.packages("psych")

In [None]:
## Load packages
library(tidyverse)
library(psych)

## Load the data

Remember, each group was given a unique identifier for their items. This identifier will be how we specifically extract *your* group's data (i.e., items). This first chunk will load all of the data from our class survey.

**Data Structure**
- These data are in "wide" format wherein each respondent is placed on a row and their columns represent responses to each question/variable.

- All items have been recoded into numeric values for you. The lowest response option (e.g., "Strongly disagree") was converted to a 1 and the highest (e.g., "Strongly agree") to the highest value (e.g., 5 or 7).

- Columns represent individual items/variables.

In [None]:
## Load data
## Set the URL to Casey's GitHub page where the dataset is located
FileURL <- "https://raw.githubusercontent.com/CaseyGio/Psyc6263/refs/heads/main/Datasets/QualtricsData.csv"

## Read the csv file from GitHub and create a new object
QualtricsData <- read_csv(url(FileURL))

## View the first few rows of data
QualtricsData %>% head(n = 10)

## 1) Review: Which items were previously identified as *potentially problematic*?

We have done several rounds of analyses on this dataset by now. Please draw upon past submissions (i.e., look back at your previous work) to remind us which items, across analyses, were identified as potentially problematic.

1a) Which items from the item-analyses (i.e., descriptive stats, item difficulty/discrimination) were identified as potentially problematic?

1b) Which items from the reliability analyses were identified as potentially problematic?

[You may edit this textbox to include your answers here]

## 2) Create a correlation matrix (without a total score)

Exploratory factor analyses rely on correlation matrices. We do not want to include a total/overall score in our correlation matrix because that is not an observed item we collected.

Create an object to store your correlation matrix

In [None]:
## Create correlation matrix


## 3) Create a Scree Plot

The first method to determining the number of latent factors we'll explore is the scree plot.

In [None]:
## Create the scree plot


## 4) Conduct parallel analysis

The second method will be the parallel analyses. Use the `fa.parallel()` function from the `psych` package. Remember to examine the FA Actual and FA Simulated data, **not** the PC Actual/Simulated.

In [None]:
## Condcut parallel analysis


## 5) Determine the number of factors

Based on both methods above, determine how many factors we should retain in our dataset. Briefly defend your stance.

[You may edit this text box for your answer]

## 6) Conduct exploratory factor analysis

Using the `fa()` function from the `psych` package, conduct exploratory factor analysis. Remember to specify the following arguments:
*   `nfactors`: Set this argument to the number of factors identified above.
*   `rotate`: *If we need a factor rotation*, start with an oblique rotation, such as "oblimin" or "geominQ".
*   `fm`: Set the factor method to "ols"

Please print out the factor **loadings** and, if we applied a factor rotation, the factor correlation matrix (the Phi matrix, see the demonstration).

In [None]:
## Conduct EFA


## 7) Interpret our results

Based on our all of our factor analysis results, interpret your results by answering the following prompts:

* How many latent factors were found? If multiple factors were found, was the factor correlation sufficiently large to suggest that an oblique rotation was necessary?

* Which factor(s) are each item best representing? Stated differently, what factor has the largest factor loading for each item?

* Are there any items that are "bad" (e.g., weak primary factor loading and/or cross-loadings)? If so, which items and briefly state how they are "bad."

[You may edit this textbox to answer the prompts]