# PRIMMDebug Student Survey Analysis

This notebook contains a narrative of the code used to analyse the responses from the student survey taken after usage of PRIMMDebug. The goal of the survey was to understand students' perspectives towards PRIMMDebug, which could be triangulated with their actions within the PRIMMDebug tool.

The analysis is split up into X sections:
1. Summary statistics across the whole dataset.
2. Establishing of variables we're looking to measure
3. Factor analysis
4. An appendix of visualisations for each individual question response.

First, let us import the necessary libraries and data.

In [None]:
library(ltm)

survey_response_data <- read.table("pilot_survey_responses.csv", header=TRUE, sep=",")
all_quant_responses <- names(survey_response_data)[sapply(survey_response_data, is.numeric)]

## 1. Summary Statistics
We first report on overall statistics of the survey to give context to the responses.

In [None]:
print("Number of student respondents: ")

print(f"- Gender split (self-reported):")
print(f"- Year group split (self-reported):")
print(f"- Number of students per school:")

## 2. Establishment of Variables

We group the survey responses into some groups based on the content of the survey items:
- **Usability**: The usability of programs, borrowing items from the (System Usability Scale)[https://en.wikipedia.org/wiki/System_usability_scale].
- **Utility: Restrictive Factors**: Perceived utility of the parts of the PRIMMDebug tool that restricted students' "programming autonomy".
- **Utility: PRIMMDebug Challenges**: Perceived utility of the PRIMMDebug challenges they attempted.
- **Utility: SIFFT**: Perceived utility of the SIFFT process that teachers taught for debugging.

Each of these have several Likert scale response questions associated with them. The internal consistency of each of these items, as well as all of the Likert scale responses, is now mapped.

In [None]:
internal_consistency <- function(columns) {
    #'Calculates the internal consistency of a set of columns in a dataframe
    #' @param columns A vector of column names to calculate the internal consistency of
    return(cronbach.alpha(survey_response_data[, columns], na.rm = TRUE))
}

primmdebug_usability_questions <- c("Q1_1","Q1_2","Q1_3","Q1_4","Q1_5")
restrictive_factors_questions <- c()
primmdebug_challenge_questions <- c()
sifft_questions <- c()
sifft_utility_questions <- c("Pilot3","Q4","Q5")

print(internal_consistency(all_quant_responses))
print(internal_consistency(primmdebug_usability_questions))
print(internal_consistency(primmdebug_utility_questions))
print(internal_consistency(sifft_utility_questions))

## 3. Factor Analysis
Each of these factors is now confirmed and their relationship modelled with confirmatory factor analysis

*Are these attributes suitable for factor analysis?*
*If not, could I do exploratory --> confirmatory factor analysis?*

## 4. Correlation Between Variables
Can I perform correlation between self-created factors rather than individual Likert scale response items? How do I go about doing this if so?



## 5. Visualisation of Individual Questions
We now provide some basic plots of the responses to individual Likert response items for comprehensiveness. Each question contains information on:
- The normality of the distribution.
- The skewness and kurtosis of the distribution.