This notebook is created during Vincent's Lab RETTL Project Summer 2022 by Tianze (Steven) Shou 

In this notebook, we are going to take `detector_results.csv` and aggregate the data by `studentID`, `periodID`, and `dayID` to summarize each student's struggle, system misuse, gaming, and idleness status in each day/period. We are also going to summarize teacher's help offered to each student during each day/period, each student's help seeking pattern (raising hands) for each day/period from `observation_events.tsv`. Eventually we are going to put them together to see if these variables pertain any correlation. 

In [2]:
# python setup 
import pandas as pd

# 1. Data Aggregation 

We are going to aggregate all extracted data into `analyticsDF` for further analysis. 

In [7]:
# file loading 
detectorDF = pd.read_csv("output_files/detector_results.csv", index_col=False) 
obsEventsDF = pd.read_csv("output_files/event_master_file.csv", index_col=False)
obsEventsDF = obsEventsDF.loc[obsEventsDF["modality"] == "observation"]

## 1.1 Extracting Hand-Raises and Teacher-Visits From Observation

In [10]:
def isValidStudID(studID: str) -> bool: 
    assert isinstance(studID, str)
    return studID[:4] == "Stu_"

# extract number of teacher visits and number of hand raises by studentID, periodID, and dayID from observation events 
if __name__ == "__main__": 

    # structure of these mapping: (dayID, periodID, studID): number of 
    # teahcer-visit/hand-raises during the day/period for the given student 
    teacherVisitMapping = dict() 
    handRaisesMapping = dict() 

    for i in obsEventsDF.index: 

        event = obsEventsDF.loc[i, "event"] 
        isVisitEvent = (event == "Talking to student: ON-task" or 
                        event == "Talking to small group: ON-task") 
        isHandRaiseEvent = (event == "Raising hand")
        subject = obsEventsDF.loc[i, "subject"] 
        actor = obsEventsDF.loc[i, "actor"] 
        periodID, dayID = obsEventsDF.loc[i, "periodID"], obsEventsDF.loc[i, "dayID"]

        # valid visit event with valid student as subject, add 1 to count 
        if isVisitEvent and isValidStudID(subject): 
            teacherVisitMapping[(dayID, periodID, subject)] = teacherVisitMapping.get((dayID, periodID, subject), 0) + 1 

        # valid hand-raising event with valid student as actor, add 1 to count
        elif isHandRaiseEvent and isValidStudID(actor): 
            handRaisesMapping[(dayID, periodID, actor)] = handRaisesMapping.get((dayID, periodID, actor), 0) + 1 

    # transform mappings into dataframes 
    dayID = [ key[0] for key in teacherVisitMapping ] 
    periodID = [ key[1] for key in teacherVisitMapping ] 
    studentID = [ key[2] for key in teacherVisitMapping ] 
    totalTeacherVisits = [ teacherVisitMapping[key] for key in teacherVisitMapping ] 
    teacherVisitDF = pd.DataFrame({"dayID": dayID, 
                                   "periodID": periodID, 
                                   "studentID": studentID, 
                                   "totalTeacherVisits": totalTeacherVisits}) 

    # do the same to hand raises 
    dayID = [ key[0] for key in handRaisesMapping ] 
    periodID = [ key[1] for key in handRaisesMapping ] 
    studentID = [ key[2] for key in handRaisesMapping ] 
    totalHandRaises = [ handRaisesMapping[key] for key in handRaisesMapping ] 
    handRaisesDF = pd.DataFrame({"dayID": dayID, 
                                 "periodID": periodID, 
                                 "studentID": studentID, 
                                 "totalHandRaises": totalHandRaises}) 

    # outer merge teahcer-vists and hand-raises dataframe 
    analyticsDF = pd.merge(teacherVisitDF, handRaisesDF, 
                           on=["dayID", "periodID", "studentID"], how="outer")

