# Robomind Academy

Robomind Academy was created to teach (young) persons Computational Thinking. 
But hand-in-hand with with this primary goal we as teachers can find 
out *how* these persons learn Computational Thinking. 
This will allow us to to improve our teaching methods and research the field coding education.

Robomind Academy collects data about the activity of its users and stores this data for a short period of time. 
This data is used in the normal operation of the site but an anonymized version of this 
data is stored for later analysis and research.The anonymized data is stored for a longer period of 
time and includes things like the number of runs per person per Challenge and when they were run, 
the Solution in each run used and the performance of the final Solutions. 

## The Data

The core anonymized data is in a compressed JSON format and contains details about the timing, 
scripts and results of the activities of the persons in the Robomind Academy. 
The drawback of this detailed, compressed format is that it makes analyzing the data problematic. 
Therefore the Robomind Academy data was aggregated in a CSV format that just describes the 
cumulative time a person works on a storyline item and the number of time he or she runs a 
program before solving the Challenge in the storyline item.

We will use the data from about 5000 students using the Basis1 course in Robomind Academy. 
This is an introductory course in Computational Thinking. 
The aggregated data was created by the 'sitting2csv.py' script in this repository. 
The following logic reads the data and displays the first five rows.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.read_csv("/data/robodata/csv/perf_basis1.csv.gz")

## Grading Pupils

In this notebook a grading system for pupils is developed using the aggregated data. 
As a first step we calculate the averages of 'cumtime' and 'count' for each storyline-item.

In [21]:
slis = dataset.groupby(['storyline']).describe()

In [23]:
print(slis)

                                cumtime                                        \
                                  count        mean          std  min     25%   
storyline                                                                       
Basis_1/Factories/1              8131.0  542.552208   599.339663  0.0  207.00   
Basis_1/Factories/2              8026.0   74.602417   170.654963  0.0   20.00   
Basis_1/Factories/3              7907.0  236.120526   343.491212  0.0   84.00   
Basis_1/Factories/4              7811.0  143.957368   375.005978  0.0   26.00   
Basis_1/Factories/5              6388.0  475.633845   631.013866  1.0  148.75   
Basis_1/Factories/6              5615.0  516.036687   649.892357  2.0  168.00   
Basis_1/Getting started/1       14468.0  581.792715   772.342343  0.0  136.00   
Basis_1/Getting started/2       10999.0  311.186017   593.363857  0.0   33.00   
Basis_1/Getting started/3        9629.0  295.573891   498.898593  0.0   48.00   
Basis_1/Getting started/4   

Here we are grading persons based on 'cumtime' or the cummulated time that person has 
spend on a storyline item before executing a correct Solution.
The score is 1 point for a person/storyline_item if the person took longer 
then the 75% percentile of 'cumtime' (not so good) and 3 points if the person took less 
then the 25% percentile (fast). 
In all other cases a score of 2 is assigned.The following code adds and extra 'score' column to our dataset.

In [None]:
# create two python dictionaries to quickly determine scoring lines
twentyfives = {}
seventyfives = {}
for index, row in slis.iterrows():
    twentyfive = row['cumtime']['25%']
    seventyfive = row['cumtime']['75%']
    twentyfives[index] = twentyfive
    seventyfives[index] = seventyfive
# create a new score series
scores = []
for index, row in dataset.iterrows():
    cumtime = row['cumtime']
    if cumtime < twentyfives[row['storyline']]:
        score = 3
    elif cumtime > seventyfives[row['storyline']]:
        score = 1
    else:
        score = 2
    scores.append(score)
# add 'score' column to dataset
dataset['score'] = pd.Series(scores)

Now we have scored every storyline item for every person, we can 
calculate an overall score for each person that has completed course Basic1.

In [None]:
scoresbasis1 = dataset[['person','score']].groupby(['person']).describe()
scoresbasis1 = scoresbasis1[scoresbasis1['score']['count']>40]
scoresmean = scoresbasis1['score']['mean']
plt.hist(scoresmean)

This looks ok except for the weird peek at the perfect score. This turns out the be some kids getting hold of the answer book.

In [None]:
import os
os.system("jupyter nbconvert --to pdf Basis1.ipynb")
