# PRIMMDebug Log Data Analaysis Notebook
This notebook displays all of the analysis of the log data that took place in the PRIMMDebug initial research paper.

The log data was collected from five schools between December 2024-February 2025. It is divided into the following sections:
1. **Summary statistics:** ...
2. **Establishing variables:**...
3. **Visualisation of variables:**...
4. **Students' written responses:**...

All you need to do is run the notebooks in order and the statistics that appear in the paper will be displayed. If there are any issues, please report them in the [Issues section of the GitHub repository](https://github.com/LaurieGale10/primmdebug-log-data-analysis/issues).

Before we run anything else, let's first import all of the necessary files.

In [1]:
from classes.ExerciseLog import ExerciseLog
from classes.StageLog import StageLog
from classes.StudentId import StudentId
from classes.processors.ExerciseLogProcessor import ExerciseLogProcessor

from fetch_log_from_firebase import *
from fetch_logs_from_file import fetch_data_from_json

stage_logs: list[StageLog] = parse_stage_logs(fetch_data_from_json("data/stage_logs"))
exercise_logs: list[ExerciseLog] = parse_exercise_logs(stage_logs, fetch_data_from_json("data/exercise_logs")) #Cleaning should be done in parsing rather than here
student_ids: list[StudentId] = parse_student_ids(fetch_data_from_json("data/student_ids"))

ValueError: invalid literal for int() with base 10: 'the'

## Summary Statistics

This data displays the following summary statistics to give information into the scale of the data we collected. We report below on:
- Number of exercises (that contain at least one completed PRIMMDebug stage)
  - Successful
  - Unsuccessful
  - Completed
- Number of PRIMMDebug stages.
- Number of students
- Time of data collection


In [2]:
print(f"Number of attempted PRIMMDebug challenges: {len(exercise_logs)}")

number_successful_exercises: int = 0
print(f"- Number of PRIMMDebug challenges where students reported successfully resolving the error they contained: {number_successful_exercises}")

number_unsuccessful_exercises: int = 0
print(f"- Number of PRIMMDebug challenges where students did not report successfully resolving the error they contained: {number_unsuccessful_exercises}")

number_completed_exercises: int = len([ExerciseLogProcessor.get_last_stage(exercise_log).stage_name for exercise_log in exercise_logs if ExerciseLogProcessor.get_last_stage(exercise_log) is not None and ExerciseLogProcessor.get_last_stage(exercise_log).stage_name == "modify"])
print(f"- Number of entirely completed PRIMMDebug challenges (where students reached the Make stage of PRIMMDebug): {number_completed_exercises}\n")

from collections import Counter
print(f"Final stage of challenge attempts:\n{dict(Counter([ExerciseLogProcessor.get_last_stage(exercise_log).stage_name for exercise_log in exercise_logs if ExerciseLogProcessor.get_last_stage(exercise_log) is not None]))}\n")

print(f"Number of completed PRIMMDebug stages: {len(stage_logs)}")
print(f"- Number of these containing written responses from students: \n")

print(f"Number of participating students: {len(student_ids)}")

from constants import *

print(f"- Gender split (self-reported):\n{get_gender_split()}")
print(f"- Year group split (self-reported):\n{get_year_group_split()}")
print(f"- Number of students per school:\n{get_school_split()}\n")

exercises_per_student: dict[str, int] = {}
for exercise in exercise_logs:
    student_id: str = exercise.student_id
    exercises_per_student[student_id] = exercises_per_student.get(student_id) + 1 if student_id in exercises_per_student else 1
print(f"Attempted challenges per student:\n{exercises_per_student.values()}") #To be tabulated or visualised


Number of attempted PRIMMDebug challenges: 426
- Number of PRIMMDebug challenges where students reported successfully resolving the error they contained: 0
- Number of PRIMMDebug challenges where students did not report successfully resolving the error they contained: 0
- Number of entirely completed PRIMMDebug challenges (where students reached the Make stage of PRIMMDebug): 5

Final stage of challenge attempts:
{'find': 65, 'test': 202, 'fix': 18, 'spot': 13, 'run': 25, 'predict': 5, 'inspect': 3, 'modify': 5}

Number of completed PRIMMDebug stages: 3892
- Number of these containing written responses from students: 

Number of participating students: 139
- Gender split (self-reported):
{'Female': 0, 'Male': 0, 'Non-binary': 0, 'Other': 0, 'Prefer not to say': 0}
- Year group split (self-reported):
{'Year 7': 0, 'Year 8': 0, 'Year 9': 0, 'Year 10': 0, 'Year 11': 0}
- Number of students per school:
{'School A': 0, 'School B': 0, 'School C': 0, 'School D': 0, 'School E': 0}

Attempted c

## Establishing Variables
Now we move onto introducing the variables that underpin our log data analysis. These include:
- Time taken
  - Per challenge attempt
  - Per stage
- Correctness of exercise
  - Per challenge
  - Per student
- Number of stages taken for a PRIMMDebug challenge
  - Per exercise
  - Per student

In [None]:
print("Time taken")
print(f"- Per PRIMMDebug challenge")
print(f"- Per PRIMMDebug stage")
      
print(" Correctness of PRIMMDebug challenges:")
print(f"- Per PRIMMDebug challenge")
print(f"- Per student")

print(" Number of stages taken on a PRIMMDebug challenge:")
print(f"- Per PRIMMDebug challenge")
print(f"- Per student")

Other ideas:



"""
Some basic exercise log data to get:
- Number of exercise logs
- Number of null exercise logs
- Number of non-null exercise logs
- Number of exercise logs for a certain PRIMMDebug challenge

Slightly more advanced:
- List/count of exercises that do/don't end in erroneous state
- Order of PRIMMDebug stages for a given log
- Exercises containing a modify stage
- Exercises containing a make stage

Others from lucidchart:
- Exercises where end state of program is erroneous
- Exercises where end state of program is not erroneous
- Exercises/stages where user entered correct answer
- Exercises/stages where user entered incorrect answer
- Exercises/stages where user didn't enter response
- Stages where user's response was less than x characters
- Exercises containing a modify stage
- Exercises containing a make stage
- Exercises/stages where user focused out
- Exercises/stages where user viewed test cases
- Exercises/stages where user viewed hints
- Total time on log data

"""


"""
Some basic stage log data to get:
- Number of stage logs
- Number of stage logs for a certain PRIMMDebug stage

Some slightly more advanced:
- Number of correct responses to find the error stage.
- Inputs entered into run stages (for certain exercise)
- Number of Inspect the Code and/or Test stage logs where test case pane is viewed
- Number of Inspect the Code stages where response isn't empty
- Average time spent on PRIMMDebug stage (for every stage and particular stages)
- Stages where user's response was less than x characters

And more advanced:
- Overall % of time spent focused on PRIMMDebug window
"""

"""
Some student id data to get:
- Number of participating students
- Number of students from certain school
- Number of students who've logged in
- Number of studnets who haven't logged in"""