# LACA for the Written Responses
This notebook documents the LLM-Assisted Content Analysis (LACA) procedure that took place for the analysis of written responses for the PRIMMDebug data.

The research question for part of initial study focused on students' engagement with the reflective aspect of debugging. We had a lot of written responses from students' interactions with the tool, and we wanted to analyse these to answer the research question "*What patterns/behaviours emerge from forced articulation during students' debugging process?*"

LLM-Assisted Content Analysis (LACA) was applied using .... .

From a data perspective:
1. Filter empty responses (report number of these)
2. Filter out responses containing no valid words
    - Problems with this: May be typos for answers that are almost correct.
    - Doesn't remove responses containing words that contain complete rubbish.

Criteria for categories:
- Cannot be exercise-specific
- Definitely something for null responses (although I have a feeling this can be further categorised)

Things to bear in mind:
- Remember that some data will include incomplete responses due to accidently pressing enter.

## Loading and importing
Before we start any content analysis, we first import necessary packages and summarise the scale of the data.

In [None]:
import { parse } from "csv-parse/sync";
import fs from "node:fs";
import path from "node:path";
import aitomics from "npm:aitomics";

import { WrittenResponse } from "../written_response_analysis/written-response.ts";
import { DebuggingStage } from "../written_response_analysis/debugging-stage.ts";

//Load data
//Load any custom types
//Load LACA
//Load codebook (previously generated in Atlas.ti)
// Read the CSV file

var responses: WrittenResponse[] = [];

const csvData = fs.readFileSync("../data/written_responses.csv", "utf8");
const records = parse(csvData, { delimiter: ",", from_line: 1 });
for (let i: number = 1; i < records.length; i++) {
    const record = records[i];
    responses.push(new WrittenResponse(record[1], record[2], record[3]));
}


## Breaking Down and Sampling
The written responses can be categorised by stage. Although the content of each relates to reflective debugging, each stage asks a particular question. As a result, I should look over a certain proportion of each stage's reflection when conducting the codebook. This represents *stratified sampling*, where responses to each PRIMMDebug stage represent a different strata. Random sampling is then performed on each strata.

This cells divides up the responses by PRIMMDebug stage to work out how many responses to code for each strata.

In [None]:
console.log(`Total number of log data responses: ${responses.length}`);
for (var stage: DebuggingStage of [DebuggingStage.predict, DebuggingStage.spot_the_defect, DebuggingStage.inspect_the_code, DebuggingStage.fix_the_error]) {
    const stageResponses: number = responses.filter((response) => response.getDebuggingStage() === stage);
    const emptyResponses: number = stageResponses.filter((response) => response.getResponse() === "");
    if (emptyResponses.length > 0) {
        console.log(`- Number of empty responses for ${stage}: ${stageResponses.length} (${emptyResponses.length} empty)`);
    }
    else {
        console.log(`- Number of responses for ${stage}: ${stageResponses.length}`);
    }
}

const nonEmptyResponses: WrittenResponse[] = responses.filter(response => response.getResponse() !== "");