# PRIMMDebug Log Data Analysis Notebook
This notebook displays all of the analysis of the log data that took place in the PRIMMDebug initial research paper.

The log data was collected from five schools between December 2024-February 2025. It is divided into the following sections:
1. **Summary statistics:** ...
2. **Establishing variables:**...
3. **Visualisation of variables:**...
4. **Students' written responses:**...

All you need to do is run the notebooks in order and the statistics that appear in the paper will be displayed. If there are any issues, please report them in the [Issues section of the GitHub repository](https://github.com/LaurieGale10/primmdebug-log-data-analysis/issues).

Before we run anything else, let's first import all of the necessary files.

In [None]:
from classes import ExerciseLog, StageLog, StudentId
from classes.exercise_classes import Exercise
from classes.processors.ExerciseLogProcessor import ExerciseLogProcessor
from classes.processors.StageLogProcessor import StageLogProcessor

from loading_services.fetch_logs_from_file import fetch_data_from_json

from constants import *
from notebook_utils import *
from loading_services.parse_logs import *

import plotly.express as px
import datetime
from collections import Counter

exercises: list[Exercise] = parse_exercises(fetch_data_from_json("data/exercises"))
stage_logs: list[StageLog] = parse_stage_logs(fetch_data_from_json("data/stage_logs"))
exercise_logs: list[ExerciseLog] = parse_exercise_logs(stage_logs, fetch_data_from_json("data/exercise_logs"))
student_ids: list[StudentId] = parse_student_ids(fetch_data_from_json("data/student_ids"))

## Summary Statistics

### Exercise/Stage Logs
This data displays the following summary statistics to give information into the scale of the data we collected. We report below on:
- Number of exercises (that contain at least one completed PRIMMDebug stage)
  - Successful
  - Unsuccessful
  - Completed
  - Per each PRIMMDebug challenge
- Number of PRIMMDebug stages.
- Time of data collection


In [4]:
print(f"Number of attempted PRIMMDebug challenges: {len(exercise_logs)}")

number_successful_exercises: int = 0
print(f"- Number of resolved PRIMMDebug challenges: {display_percentage_string(number_successful_exercises, len(exercise_logs))}")
number_unsuccessful_exercises: int = 0
print(f"- Number of unresolved PRIMMDebug challenges: {display_percentage_string(number_unsuccessful_exercises, len(exercise_logs))}")

number_completed_exercises: int = len([exercise_log for exercise_log in exercise_logs if ExerciseLogProcessor.get_last_stage(exercise_log).stage_name == DebuggingStage.modify])
print(f"- Number of entirely completed PRIMMDebug challenges (where students reached the Make stage of PRIMMDebug): {display_percentage_string(number_completed_exercises, len(exercise_logs))}")

final_program_states: list[bool] = [ExerciseLogProcessor.is_final_program_erroneous(exercise) for exercise in exercise_logs]
number_successful_final_program_states: list[bool] = len([final_program_state for final_program_state in final_program_states if final_program_state])
print(f"- Proportion of PRIMMDebug challenges where last program run successfully executed: {display_percentage_string(number_successful_final_program_states, len(exercise_logs))}\n")

total_time: float = sum([ExerciseLogProcessor.get_time_on_exercise(exercise_log) for exercise_log in exercise_logs])
print(f"Total time on PRIMMDebug challenges: {datetime.timedelta(seconds=total_time)}\n")

print(f"Number of completed PRIMMDebug stages: {len(stage_logs)}")

#Number of attempts at each PRIMMDebug challenge
challenge_attempts: dict[str, int] = {}
for exercise_log in exercise_logs:
    challenge_attempts[exercise_log.exercise_name] = challenge_attempts.get(exercise_log.exercise_name, 0) + 1
challenge_attempts = dict(sorted(challenge_attempts.items(), key=lambda item: item[1], reverse=True)) #Sort by frequency
challenge_attempts_fig = px.bar(x = challenge_attempts.keys(), y = challenge_attempts.values(), labels = {"x": "Challenge Name", "y": "Frequency"})
challenge_attempts_fig.show()

#Number of challenges attempted by each student
challenges_per_student: dict[str, int] = {}
for exercise in exercise_logs:
    student_id: str = exercise.student_id
    challenges_per_student[student_id] = challenges_per_student.get(student_id, 0) + 1
challenges_per_student_fig = px.histogram(challenges_per_student.values(), marginal="box", labels = {"value": "Attempted challenges per student", "count": "Frequency"})
challenges_per_student_fig.show()

Number of attempted PRIMMDebug challenges: 336
- Number of resolved PRIMMDebug challenges: 0/336 (0.00%)
- Number of unresolved PRIMMDebug challenges: 0/336 (0.00%)
- Number of entirely completed PRIMMDebug challenges (where students reached the Make stage of PRIMMDebug): 47/336 (13.99%)
- Proportion of PRIMMDebug challenges where last program run successfully executed: 245/336 (72.92%)

Total time on PRIMMDebug challenges: 1 day, 20:36:56.082000

Number of completed PRIMMDebug stages: 3891


### Student Demographics

Number of students:
- By gender
- By year group
- By school


In [None]:
print(f"Number of participating students: {len(student_ids)}")

gender_split_fig = px.bar(x = get_gender_split().keys(), y = get_gender_split().values(), labels = {"x": "Gender", "y": "Frequency"})
gender_split_fig.show()

year_group_split_fig = px.bar(x = get_year_group_split().keys(), y = get_year_group_split().values(), labels={"x": "Year Group", "y": "Frequency"})
year_group_split_fig.show()

school_split_fig = px.bar(x = get_school_split().keys(), y = get_school_split().values(), labels={"x": "School", "y": "Frequency"})
school_split_fig.show()

## Establishing Variables
Now we move onto introducing the variables that underpin our log data analysis. These include:

### Time Taken
- Per challenge attempt
- Per stage

In [5]:
#Time taken per PRIMMDebug challenge attempt
time_per_challenge_attempt: list[float] = [ExerciseLogProcessor.get_time_on_exercise(exercise) for exercise in exercise_logs if hasattr(exercise,"end_time")]
time_per_challenge_fig = px.histogram(time_per_challenge_attempt, marginal="box", labels={"value": "Time taken (seconds)", "count": "Count"})
time_per_challenge_fig.show()

#Time taken per PRIMMDebug stage
time_per_stage: list[float] = [StageLogProcessor.get_time_on_stage(stage) for stage in stage_logs if StageLogProcessor.get_time_on_stage(stage) is not None]
time_per_stage_fig = px.histogram(time_per_stage, marginal="box", labels={"value": "Time taken (seconds)", "count": "Count"})
time_per_stage_fig.show()

AttributeError: module 'classes.processors.StageLogProcessor' has no attribute 'get_time_on_stage'

### Correctness of exercise
- Per challenge
- Per student

In [None]:
print(" Correctness of PRIMMDebug challenges:")
print(f"- Per PRIMMDebug challenge")
print(f"- Per student")

### Number of stages taken for a PRIMMDebug challenge
- Per exercise
- Per student

In [None]:
from statistics import median

#Number of stages per PRIMMDebug challenge attempt
stages_per_challenge_attempt: list[int] = [len(exercise.stage_logs) for exercise in exercise_logs]
stages_per_challenge_fig = px.histogram(stages_per_challenge_attempt, marginal="box", labels={"value": "Number of stages"})
stages_per_challenge_fig.show()

#Median number of stages that each student took on the PRIMMDebug challenges they attempted
average_stages_per_student: list[int] = []
for student in student_ids:
    student_exercise_logs: list[ExerciseLog] = [exercise for exercise in exercise_logs if exercise.student_id == student.id]
    if len(student_exercise_logs) > 0:
        average_stages_per_student.append(median([len(exercise.stage_logs) for exercise in student_exercise_logs]))
average_stages_per_student_fig = px.histogram(average_stages_per_student, marginal="box", labels={"value": "Median number of stages per student", "count": "Count"})
average_stages_per_student_fig.show()

## Exercise Log Stats
Placeholder for exercise log stats

In [None]:
#Final stage of PRIMMDebug challenge attempts
challenge_end_stages: dict[str, int] = dict(Counter([ExerciseLogProcessor.get_last_stage(exercise_log).stage_name.name for exercise_log in exercise_logs]))
final_stage_fig = px.bar(x = list(challenge_end_stages.keys()), y = list(challenge_end_stages.values()), labels = {"x": "Final stage of PRIMMDebug", "y": "Frequency"})
final_stage_fig.show()

#Time spent focused on PRIMMDebug window per exercise
time_spent_focused: list[float] = [ExerciseLogProcessor.get_time_focused(exercise) for exercise in exercise_logs]
time_spent_focused_fig = px.histogram(time_spent_focused, marginal="box", labels={"x": "100% Time spent focused on PRIMMDebug window"})
time_spent_focused_fig.show()

#Challenge attempts where test case panes were viewed
exercises_with_test_case_views: int = len([exercise_log for exercise_log in exercise_logs if ExerciseLogProcessor.were_test_cases_viewed(exercise_log)]) / len(exercise_logs) * 100
print(f"Percentage of exercises where test cases were viewed at some point: {exercises_with_test_case_views:.2f}%")
inspect_the_code_test_case_views: int = len([exercise_log for exercise_log in exercise_logs if ExerciseLogProcessor.were_test_cases_viewed(exercise_log, [DebuggingStage.inspect_code])])
print(f"- In the Inspect the Code stage: {display_percentage_string(inspect_the_code_test_case_views, len(exercise_logs))}%")
test_stage_test_case_views: int = len([exercise_log for exercise_log in exercise_logs if ExerciseLogProcessor.were_test_cases_viewed(exercise_log, [DebuggingStage.test])])
print(f"- In the Test stage: {display_percentage_string(test_stage_test_case_views, len(exercise_logs))}%")

## Stage Log Stats
Placeholder for stage log stats

In [None]:
number_inspect_code_stages: int = len([stage_log for stage_log in stage_logs if stage_log.stage_name == DebuggingStage.inspect_code])
number_no_response_inspect_code_stages: int = len([stage_log for stage_log in stage_logs if stage_log.stage_name == DebuggingStage.inspect_code and StageLogProcessor.does_inspect_the_code_contain_response(stage_log) is False])
print(f"Number of inspect the code stages which contain no response: {display_percentage_string(number_no_response_inspect_code_stages, number_inspect_code_stages)}")

find_error_stages_with_correct_field: list[StageLog] = [stage_log for stage_log in stage_logs if stage_log.stage_name == DebuggingStage.find_error and stage_log.correct is not None]
correct_find_error_stages: int = len([stage_log for stage_log in find_error_stages_with_correct_field if stage_log.correct])
print(f"Number of find the error stages where the correct response was entered (for challenges where students had to pinpoint a line): {display_percentage_string(correct_find_error_stages, len(find_error_stages_with_correct_field))}")


## Program Log Stats
For relevant PRIMMDebug stages that contain program logs

In [None]:
number_of_runs_inspect_the_code_and_test: list[int] = [StageLogProcessor.get_number_of_runs(stage_log) for stage_log in stage_logs if stage_log.stage_name in [DebuggingStage.inspect_code, DebuggingStage.test] and StageLogProcessor.get_number_of_runs(stage_log) > 0] #Remove stages where there's 0 runs
number_of_runs_inspect_the_code: list[int] = [StageLogProcessor.get_number_of_runs(stage_log) for stage_log in stage_logs if stage_log.stage_name == DebuggingStage.inspect_code and StageLogProcessor.get_number_of_runs(stage_log) > 0]
number_of_runs_test: list[int] = [StageLogProcessor.get_number_of_runs(stage_log) for stage_log in stage_logs if stage_log.stage_name == DebuggingStage.test and StageLogProcessor.get_number_of_runs(stage_log) > 0]
number_of_runs_fig = px.histogram(number_of_runs_inspect_the_code_and_test, marginal="box", labels={"x": "Time taken (seconds)"})
number_of_runs_fig.show()

time_between_runs: list[float] = [time for stage_log in stage_logs if stage_log.stage_name in [DebuggingStage.inspect_code, DebuggingStage.test] for time in StageLogProcessor.get_time_between_runs(stage_log) if StageLogProcessor.get_time_between_runs(stage_log) != []]
time_between_runs_fig = px.histogram(time_between_runs, marginal="box", labels={"x": "Time between runs (seconds)"})
time_between_runs_fig.show()

runs_per_minute: list[float] = [round(StageLogProcessor.get_runs_per_minute(stage_log), 2) for stage_log in stage_logs if stage_log.stage_name in [DebuggingStage.inspect_code, DebuggingStage.test]]
print(f"Runs per minute for inspect the code/test stages: {runs_per_minute}")

number_of_inputs: list[list[int]] = [StageLogProcessor.get_number_of_inputs_from_runs(stage_log) for stage_log in stage_logs if stage_log.stage_name in [DebuggingStage.inspect_code, DebuggingStage.test]]
print(f"Number of inputs per stage for test stages: {number_of_inputs}")

## Written Responses

For now, just group written responses by stage name and investigate them. Also get some stats on written responses for context

In [None]:
from save_logs import *

import nltk
from nltk.corpus import words
from nltk.tokenize import word_tokenize

nltk.download("words", quiet=True)
nltk.download("punkt", quiet=True)
nltk.download("punkt_tab", quiet=True)

#save_written_responses(exercise_logs)

english_words = set(words.words("en"))  # Load English words into a set for fast lookup
written_responses: list[str] = [response for exercise_responses in [ExerciseLogProcessor.get_written_responses(exercise_log) for exercise_log in exercise_logs] for response in exercise_responses]
print(f"Number of written responses: {len(written_responses)}")

responses_with_valid_words: list[str] = []
responses_with_invalid_words: list[str] = []

for response in written_responses:
    tokens = word_tokenize(response.lower())  # Convert to lowercase for case-insensitive matching
    # Check if any token is a valid English word
    if any(token in english_words for token in tokens):
        responses_with_valid_words.append(response)
    else:
        responses_with_invalid_words.append(response)

print(f"Number of written responses that contain at least one valid English word: {len(responses_with_valid_words)}/{len(written_responses)} ({(len(responses_with_valid_words) / len(written_responses)) * 100:.2f}%)")

Stage-specific stuff:
- Success rate of students who didn't write anything for inspect the code