# Robomind Academy

Robomind Academy was created to teach (young) persons Computational Thinking. But hand-in-hand with
with this primary goal we as teachers can find out *how* these persons learn Computational Thinking. This will allow us to to improve our teaching methods and research the field coding education.

Robomind Academy collects data about the activity of its users and stores this data for a short period of time. This data is used in the normal operation of the site but an anonymized version of this data is stored for later analysis and research.The anonymized data is stored for a longer period of time and includes things like the number of runs per person per Challenge and when they were run, the Solution in each run used and the performance of the final Solutions. 

## The Data

The core anonymized data is in a compressed JSON format and contains details about the timing, scripts and results of the activities of the persons in the Robomind Academy. The drawback of this detailed, compressed format is that it makes analyzing the data problematic. Therefore the Robomind Academy data was aggregated in a CSV format that just describes the cumulative time a person works on a storyline item and the number of time he or she runs a program before solving the Challenge in the storyline item.

We will use the data from about 5000 students using the Basis1 course in Robomind Academy. This is an introductory course in Computational Thinking. The aggregated data was created by the 'sitting2csv.py' script in this repository. The following logic reads the data and displays the first five rows.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.read_csv("basis1.gz")

dataset.head()

Unnamed: 0,storyline,person,cumtime,count
0,Basis_1/Getting started/1,IPupil000126,74.0,1
1,Basis_1/Getting started/1,IPupil000127,183.0,1
2,Basis_1/Getting started/1,IPupil000128,124.0,1
3,Basis_1/Getting started/1,IPupil000129,147.0,3
4,Basis_1/Getting started/1,IPupil000130,56.0,3


As the first five line show the aggregated data contains the name/id of the storyline, the anonymized name/id of the person, the total time the person took before coming up with the solution (cumtime) and the number of runs (count) it took.

## Grading Pupils

In this notebook a grading system for pupils is developed using the aggregated data. 
As a first step we calculate the averages of 'cumtime' and 'count' for each storyline-item.

In [2]:
slis = dataset.groupby(['storyline']).describe()
slis.head()

Unnamed: 0_level_0,count,count,count,count,count,count,count,count,cumtime,cumtime,cumtime,cumtime,cumtime,cumtime,cumtime,cumtime
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
storyline,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Basis_1/Factories/1,5432.0,7.659978,8.298906,1.0,2.0,5.0,10.0,85.0,5432.0,475.006996,505.635286,0.0,193.0,315.0,553.0,7438.0
Basis_1/Factories/2,5374.0,1.551544,3.034427,1.0,1.0,1.0,1.0,68.0,5374.0,62.799963,165.458768,1.0,17.0,28.0,48.0,4255.0
Basis_1/Factories/3,5288.0,4.873109,5.956526,1.0,1.0,3.0,6.0,74.0,5288.0,201.389372,257.189134,0.0,78.0,128.0,217.0,4473.0
Basis_1/Factories/4,5211.0,2.962771,6.139122,1.0,1.0,1.0,2.0,87.0,5211.0,115.531568,255.370727,0.0,24.0,39.0,73.0,4491.0
Basis_1/Factories/5,4079.0,10.517038,12.317869,1.0,3.0,7.0,13.0,158.0,4079.0,430.912724,557.539334,0.0,140.0,257.0,510.0,10877.0


Here we are grading persons based on 'cumtime' or the cummulated time that person has spend on a storyline item before executing a correct Solution.The score is 1 point for a person/storyline_item if the person took longer then the 75% percentile of 'cumtime' (not so good) and 3 points if the person took less then the 25% percentile (fast). In all other cases a score of 2 is assigned.The following code adds and extra 'score' column to our dataset.

In [3]:
# create two python dictionaries to quickly determine scoring lines
twentyfives = {}
seventyfives = {}
for index, row in slis.iterrows():
    twentyfive = row['cumtime']['25%']
    seventyfive = row['cumtime']['75%']
    twentyfives[index] = twentyfive
    seventyfives[index] = seventyfive
# create a new score series
scores = []
for index, row in dataset.iterrows():
    cumtime = row['cumtime']
    if cumtime < twentyfives[row['storyline']]:
        score = 3
    elif cumtime > seventyfives[row['storyline']]:
        score = 1
    else:
        score = 2
    scores.append(score)
# add 'score' column to dataset
dataset['score'] = pd.Series(scores)
dataset.head()

Unnamed: 0,storyline,person,cumtime,count,score
0,Basis_1/Getting started/1,IPupil000126,74.0,1,3
1,Basis_1/Getting started/1,IPupil000127,183.0,1,2
2,Basis_1/Getting started/1,IPupil000128,124.0,1,2
3,Basis_1/Getting started/1,IPupil000129,147.0,3,2
4,Basis_1/Getting started/1,IPupil000130,56.0,3,3


Now we have scored every storyline item for every person, we can 
calculate an overall score for each person that has completed course Basic1.

In [11]:
scoresbasis1 = dataset[['person','score']].groupby(['person']).describe()
scoresbasis1[scoresbasis1['score']['count'] == 43].head(10)

Unnamed: 0_level_0,score,score,score,score,score,score,score,score
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
person,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
IPupil000038,43.0,2.348837,0.686042,1.0,2.0,2.0,3.0,3.0
IPupil000126,43.0,2.046512,0.615417,1.0,2.0,2.0,2.0,3.0
IPupil000127,43.0,2.395349,0.622573,1.0,2.0,2.0,3.0,3.0
IPupil000128,43.0,2.232558,0.527221,1.0,2.0,2.0,3.0,3.0
IPupil000130,43.0,2.186047,0.627004,1.0,2.0,2.0,3.0,3.0
IPupil000135,43.0,2.093023,0.609994,1.0,2.0,2.0,2.0,3.0
IPupil000160,43.0,2.162791,0.652111,1.0,2.0,2.0,3.0,3.0
IPupil000647,43.0,2.465116,0.701998,1.0,2.0,3.0,3.0,3.0
IPupil000860,43.0,2.046512,0.65296,1.0,2.0,2.0,2.0,3.0
IPupil000866,43.0,2.069767,0.551852,1.0,2.0,2.0,2.0,3.0
