## Welcome

Request data set access through:<br>
https://pslcdatashop.web.cmu.edu/Project?id=879<br><br>
Then go to this link to download the relevant files:<br>
https://pslcdatashop.web.cmu.edu/Files?datasetId=5833

## Reading in Data

In [None]:
import pandas as pd

## Event data (fine-grain position logs of teachers)

In [None]:
# Explanation: Every second, a position of a teacher is logged. Actions are classified in the "content" category
# where actions with target subjects have entries in the "subject" column
df = pd.read_csv('event_master_file_D10_R500_RNG1000_sprint2_shou (1).csv')

In [None]:
# Example aggregation: How many seconds of stop proximity did each student receive?
df2 = df[df['content'].map(lambda s: False if not isinstance(s, str) else 'Stopping' in s)].copy() # Remove stop events
df2 = df2[df2['subject'].map(lambda s: False if not isinstance(s, str) else 'no student seated' not in s)].copy() # Remove empty seats
df2.groupby('subject').size()

## Observation Events

In [None]:
# Time-stamped observation note events of humans present in the classroom. Includes visits to students and 
# other relevant teacher behaviors. Often, the subject can be specific seats. 
df = pd.read_csv('observation_events_anonymized (1).tsv', sep='\t')

In [None]:
## Would need to join student IDs to seat numbers to merge with other student data.
df_seats = pd.read_csv('student_position_sprint1_shou (1).csv')

In [None]:
# Example aggregation: What were the most common activities in different periods?
import matplotlib.pyplot as plt
grouped = df.groupby(['periodID', 'event']).size().unstack()
grouped.plot(kind='bar', stacked=True, figsize=(10, 6))

plt.title("Most Common Activities in Different Periods")
plt.xlabel("Period ID")
plt.ylabel("Count")
plt.legend(title="Event", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

## Meta Data

In [None]:
# Several outcomes of interest on the student level. Typical outcomes are "ck_lg": conceptual learning gain
# and "pk_lg": procedural learning gain
df = pd.read_csv('meta-data-aied (1).csv')

In [None]:
# Will have to join this with the following crosswalk to merge it with the teacher position file
df_crosswalk = pd.read_csv('crosswalk.csv')
# Also available in pd.read_csv('student_position_sprint1_shou (1).csv')

## Log Data from Tutoring System

In [None]:
df = pd.read_csv('tutor_log_anonymized (1).tsv', sep='\t')

In [None]:
df.head()

In [None]:
# Documentation: https://pslcdatashop.web.cmu.edu/help?page=importFormatTd
# Best to work with me if you are interested in analyzing this data set.