# Automatic Lab Evaluator

## Assessment based on student-provided results

* Jerónimo Arenas García
* Jesús Cid Sueiro

Version History:

    Version 0.1 (Dec. 2016)
        - Firts python 2 version and python 3 adaptation
    Version 0.2 (Dec. 2017) 
        - All configurable parameters in the first and second code cell.
        - Managing multiple mat files in students' zip files.
        - Corrected bug in readdatafiles (new student variables were not properly added to the dataframe)
        - Managing multiple class lists in Spanish and English.
        - External evaluation functions
        - New format of students report.
        - Uses ExamProject class.
        - Integrated student groups

In [1]:
import numpy as np
import pandas as pd
import os
from os.path import isfile, join
import scipy.io as sio
import scipy
import zipfile as zp
import shutil
import difflib
import csv
import glob

# ###############
# Local Libraries

# Evaluation libraries
import lib.dbEvaluator as dbeval

# Import all available solvers
import lib.dbSolverB12 as dbsB12
import lib.dbSolverB3 as dbsB3
import lib.dbSolverTD as dbsTD

# Import all available examProjext classes
import lib.examProject as exBase
import lib.examProjectB12 as exB12
import lib.examProjectB3 as exB3
import lib.examProjectTD as exTD


## 1. Opening an evaluation project.

In order to use this software you must first run the `mainExamManager.py` script, and follow the instructions to complete the main steps:

   * Step 1. Create an evaluation project.
   * Step 2. Generate the exam statements
   * Step 3. Place the student lists in the required folders.
   * Step 4. Generate the data for the students.

After that, you can celebrate the exam session. At the end do

   * Step 5. Place the student responses in the required folders.
   
Now, you are ready to evaluate the student exams.

### 1.1. Mandatory configurable variables:

There are two mandatory variables that must be configured:

* The path to the evaluation project, and 
* The exam to be evaluated (this is because a single evaluation project may contain several exams)

In [2]:
# ############
# Project path

# Write the path to the evaluaProject path
# project_path = '../../TestBed/TestB12/'
# project_path = '../../TestBed/TestB3/'
project_path = '../../TestBed/TestTD'

# #########
# Exam_name

# A single evaluation evaluation project main contain several exams.
# Write here which one of them you want to evaluate.
# exam_label = 'ExLabB12_0'
# exam_label = 'ExLabB12_1'
# exam_label = 'ExLabB12_2'
# exam_label = 'ExLabB3_00'
exam_label = 'ExLabMLTD'

The values of the following variables can be modified, but you can let them with the default values if you have no special reason to modify them.

In [3]:
# Expected name of the students' results file. 
# This is used to disambiguate situations where the student uploaded multiple mat files
# (e.g. the input data file provided with the exam statement, or .mat files in .DS_STORE folders)
results_fname = 'results.mat'

# Output file name with
finalnotes_fname = 'student_notes.xlsx'

# #####################
# Evaluation parameters

# Penalties:
p_nocode = 0.75
p_noresults = 0.75
p_delay = 0.25      # score reduction per minute.
p_email = 0.85      # score reduction to students delivering by email

### 1.2. Exam project object

Now we create the object of the project according to the class of the exam. The class of the exam is read from the metadata contained in the project folder.

In [4]:
# This is a trick to get the class of the exams in an existing evaluation project
exc = exBase.ExamProject(project_path)
exc.load()
exam_class = exc.exam_class

# Create the object of the project according to the class of the exam
if exam_class == 'B12':
    exam = exB12.ExamProjectB12(project_path)
    solveQuestion = dbsB12.solveQuestion
elif exam_class == 'B3':
    exam = exB3.ExamProjectB3(project_path)
    solveQuestion = dbsB3.solveQuestion
elif exam_class == 'TD':
    exam = exTD.ExamProjectTD(project_path)
    solveQuestion = dbsTD.solveQuestion

# #######################
# Other configurable data
# #######################
# If the file structure of the evaluation project was generated with the evaluator code, 
# you should not modify anything from the following lines.

# Paths to input and output files
exam.load()
class_list_path = exam.f_struct['class_list']
all_students_path = exam.f_struct['all_students']
data4st_path = exam.f_struct['data4students']
results_path = exam.f_struct['student_results'] + exam_label + '/'
output_path = exam.f_struct['eval_results'] + exam_label + '/'
csv_questions_list = exam.f_struct['exam_statement'] + exam_label + '/' + exam_label + '.csv'

print("Loaded project of type {0}".format(exam_class))

Loaded project of type TD


In [5]:
# List of exam questions from the database
print(csv_questions_list)
with open(csv_questions_list, 'r') as f:
    reader = csv.reader(f)
    questions = list(reader)[0]

# If the fils os not available, you can write the list os questions by hand 
# questions = ['F0_estimate_06', 'F1_model_01', 'F2_predict_03', 'F4_lms_02']
print("Questions in the exam: {0}".format(questions))

../../TestBed/TestTD/exam_statement/ExLabMLTD/ExLabMLTD.csv
Questions in the exam: ['R1', 'R2', 'R3', 'R4', 'C1', 'C2', 'C3', 'C4']


## 2. Read datafiles for all students

Student datafiles can be in any of the following formats:

   * `'.zip'`: When uncompressed, the zip may contain one or several matlab files. All matlab files are read and incorporated to a pandas Dataframe where each student is a column, and each index is a variable available for the exam solution
   * `'.mat'`: All data variables for the students are given in a single matlab file

In [6]:
def getFileName(fpath):
    return fpath.split('/')[-1]

def readData4st(datafiles_path):
    '''
    This function is used for reading the matlab data files provided to students 
    '''

    # Read matlab files in the input directory tree
    datafiles = glob.glob(datafiles_path + '**/*.mat', recursive=True)

    df = pd.DataFrame()
    
    # Read files
    print('Processing {0} files in {1} ...'.format(len(datafiles), datafiles_path))
    for dtfile in sorted(datafiles):
        
        # The tag can be the NIA, the student's name or just the begining of some other file
        tag = getFileName(dtfile).split('.')[0]

        # Load matlab data file
        data = sio.loadmat(dtfile, squeeze_me=True)
        
        # Read all variable names and the corresponding data values
        idx = []
        val = []
        for var in [el for el in data.keys() if not el.startswith('_')]:
            idx.append(var)
            val.append(data[var])
          
        # Add to dataframe
        df2 = pd.DataFrame()
        df2[tag] = pd.Series(val, index = idx)
        df = pd.concat([df, df2], axis=1)
        df.sort_index(axis=1, inplace=True)
    return df

In [7]:
# Read students' data.
print(data4st_path)
student_data = readData4st(data4st_path)

print('')
print('Number of students in dataframe:', str(student_data.shape[1]))
print('Number of variables read:', str(student_data.shape[0]))

print('Displaying data for first students ... ')
student_data[student_data.columns[:7]]

student_data[student_data.keys()[0]]


../../TestBed/TestTD/data4students/
Processing 67 files in ../../TestBed/TestTD/data4students/ ...

Number of students in dataframe: 67
Number of variables read: 7
Displaying data for first students ... 


xtrC     [[-1.4174456254, -2.1166985168, -1.44323579018...
xvalC    [[-1.03592029683, 2.41591360208, -1.1338101973...
ytrC     [1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, ...
yvalC    [0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, ...
xtrR     [0.5, 0.519038076152, 0.538076152305, 0.557114...
strR     [-0.564708690269, -0.0167090155288, 1.06903564...
state    (100278104, array([[-1.41744563, -2.11669852, ...
Name: 100278104, dtype: object

## 2. Read answers provided by students

In [8]:
def readdatafiles(datafiles_path, splitsymbol):
    '''
    This function is used for reading both the data files provided to students and the response
    files provided by students
    '''

    # Read file paths
    datafiles = glob.glob(datafiles_path + '**/*.*', recursive=True)
    # datafiles = [f for f in os.listdir(datafiles_path) if isfile(join(datafiles_path, f))]

    temporary_dir = './tmp'
    df = pd.DataFrame()
    
    # Read files
    print('Processing {0} files in {1} ...'.format(len(datafiles), datafiles_path))
    for dtfile in sorted(datafiles):
        idx = []
        val = []
        makedf = True      # This is a default flag. If it remains True, a new column will be added to the df

        # The tag can be the NIA, the student's name or just the begining of some other file
        tag = getFileName(dtfile).split(splitsymbol)[0]
        # tag = dtfile.split(splitsymbol)[0]

        # if dtfile.endswith('.zip'):
        # if dtfile.endswith('.7z'):
        # if dtfile.endswith('.rar'):
                        
        if dtfile.endswith('.zip'):

            # Read names of .mat files
            zpobj = zp.ZipFile(dtfile)            
                
            mat_fnames = [f for f in zpobj.namelist() if f.endswith('mat')]
            
            # mat file selection. This is to disambiguate cases with multiple files
            n = len(mat_fnames)
            if n == 0:
                print ('    WARNING: {} has not delivered any mat file'.format(tag))
                fname = None
            else:
                if n > 1:
                    print('    WARNING: {} has provided multiple mat files:'.format(tag))
                    print('        {0}'.format(mat_fnames))                  

                # Define a nested set of criteria to select a single mat file form multiple options:
                criteria = [mat_fnames]
                criteria.append([f for f in criteria[0] if '.ipynb_checkpoints' not in f])
                criteria.append([f for f in criteria[1] if f[0].isalnum()])
                criteria.append([f for f in criteria[2] if getFileName(f)[0].isalnum()])
                criteria.append([f for f in criteria[3] if getFileName(f)[0].isalpha()])
                criteria.append([f for f in criteria[4] if f.endswith(results_fname)])

                # Selecte the file according to the most restrictive criterium with non empty members.
                for c in reversed(criteria):
                    if len(c) > 0:
                        # We take the first file in the list (an arbitrary choice)
                        fname = c[0]
                        break
                if n > 1:
                    print('        Selected file: {}'.format(fname))

            # Read the selected mat file, if any
            if fname is not None:
                # Matlab files are extracted to a temporal subfolder
                zpobj.extract(fname, temporary_dir)
                data = sio.loadmat(join(temporary_dir, fname), squeeze_me=True)
                
                # Read all variable names and the corresponding data values
                for var in [el for el in data.keys() if not el.startswith('_')]:
                    idx.append(var)
                    val.append(data[var])

            # Remove temporary directory, if it has been created
            if os.path.exists(temporary_dir):
                shutil.rmtree(temporary_dir)

        elif dtfile.endswith('.mat'):

            # This block of code was removed from the original notebook.
            # I have rescued it from another notebook
            data = sio.loadmat(dtfile, squeeze_me=True)
            
            # Read all variable names and the corresponding data values
            for var in [el for el in data.keys() if not el.startswith('_')]:
                idx.append(var)
                val.append(data[var])

        elif dtfile.endswith('m') or dtfile.endswith('py') or dtfile.endswith('.ipynb'):
            print('    WARNING: {} has provided a code file only:'.format(tag))
            print('        {0}'.format(dtfile))
        else:
            makedf = False
            if os.path.isfile(dtfile):
                print('    File ignored: {0}'.format(dtfile))
            
        if makedf:
            df2 = pd.DataFrame()
            df2[tag] = pd.Series(val, index = idx)
            df = pd.concat([df, df2], axis=1)
            df.sort_index(axis=1, inplace=True)
    return df
        

### 2.1. Requested variable names.

In order to get the names of the requested variables, we solve the exam with an arbitrary set of variables.

In [9]:
data = student_data[student_data.columns[0]].to_dict()
print(questions)
solution, scoring_ex = dbeval.solveExam(questions, data, solveQuestion)
truenames = list(solution.keys())

print(truenames)

['R1', 'R2', 'R3', 'R4', 'C1', 'C2', 'C3', 'C4']
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
['wML', 'EAP', 'LwML', 'wmean', 'covw', 'xn_tr', 'xn_val', 'Eknn', 'wMAP', 'rho', 'Dtest']


### 2.2. Read student results into panda dataframe

In [10]:
# Read student results
student_results = readdatafiles(results_path, splitsymbol='_')

# Build a set of indices containing the expected variable names and all other variables provided by students
newindex = truenames + [el for el in student_results.index.tolist() if el not in truenames]

student_results = student_results.reindex(newindex)

print('')
print('Number of students in dataframe:', str(student_results.shape[1]))
print('Number of variables read:', str(student_results.shape[0]))

print('Displaying data for first students ... ')
student_results[student_results.columns[0:7]]

Processing 64 files in ../../TestBed/TestTD/student_results/ExLabMLTD/ ...
        ['ALBA MINGUEZ SANCHEZ_1205458_assignsubmission_file_examen_100315338/results.mat', '__MACOSX/ALBA MINGUEZ SANCHEZ_1205458_assignsubmission_file_examen_100315338/._results.mat']
        Selected file: ALBA MINGUEZ SANCHEZ_1205458_assignsubmission_file_examen_100315338/results.mat
        ['100376747.mat', '__MACOSX/._100376747.mat', 'results.mat', '__MACOSX/._results.mat']
        Selected file: results.mat
        ['100292668.mat', '__MACOSX/._100292668.mat', 'results.mat', '__MACOSX/._results.mat']
        Selected file: results.mat
        ['CARLOS FERNANDEZ DEL CERRO_1205465_assignsubmission_file_EXAMEN/results.mat', '__MACOSX/CARLOS FERNANDEZ DEL CERRO_1205465_assignsubmission_file_EXAMEN/._results.mat']
        Selected file: CARLOS FERNANDEZ DEL CERRO_1205465_assignsubmission_file_EXAMEN/results.mat
        ['EXAMEN/100318138.mat', 'EXAMEN/results.mat']
        Selected file: EXAMEN/results.mat
  

Unnamed: 0,ADRIAN VAZQUEZ ROMERO,ADRIANO AUGUSTO OSSES RODRIGUEZ,ADRIÁN DE LA TORRE MORENO,ALBA MINGUEZ SANCHEZ,ALEJANDRO BARON CUEVAS,ALEJANDRO CUADRADO TORRE,ALEJANDRO GARCIA DE LA SANTA RAMOS
wML,"[2.30292363628, 0.271085662512, 2.90598914493,...","[-0.53944840703, -0.217647615695, -0.050404483...","[[-0.000544786321363, 0.000564440837288, 0.000...","[1.30830939694, -0.848400369932, -0.3837580903...",-1.07976,"[-0.568574128469, -0.266302170381, -0.70041326...","[0.0728515666397, -0.180961489375, 0.624004405..."
EAP,0.495645,0.439803,5.81524e-15,0.68433,0.657657,0.370199,0.494791
LwML,477.213,0,230.397,695.72,231.242,-2.85395e-201,192.337
wmean,"[3.93735917925, 2.51024170732, 5.47119031071, ...","[0.00349230583626, -0.759079015424, 0.47584653...","[[-0.000544621138808, 0.000564269695362, 0.000...","[0.921889240399, -0.479483696734, -0.717430293...",,"[-0.44588212728, -0.390238177787, -0.577691343...","[-0.0828427205366, -0.02023787398, 0.463617385..."
covw,"[[0.108989997833, -0.103786263808, 0.094518459...","[[0.14000739641, -0.134754229403, 0.1237355493...","[[0.999241985015, 0.000525415949917, 0.0004970...","[[0.108989997833, -0.103786263808, 0.094518459...",,"[[0.108989997833, -0.103786263809, 0.094518459...","[[0.108989997833, -0.103786263808, 0.094518459..."
xn_tr,"[[0.960554162125, -0.451993334693, -0.75895366...","[[0.0608662353937, -0.0225735973485, -1.040690...","[[1.8425440258, -0.560506251002, -1.6956533905...","[[-2.03322946785, 0.377005524986, -0.293756629...","[[1.49856447065, 1.25028290536, 1.02460033963,...","[[-2.11816258205, -0.214607465844, 1.966794433...","[[-0.00303258712656, -0.771515556246, 0.678334..."
xn_val,"[[1.80155636087, -0.386260615084, 0.3421792003...","[[-0.277639123685, -0.839197744305, -0.1603261...","[[0.291576635963, -0.91361436165, 0.4705574898...","[[-0.31989939621, -0.36980464288, 0.0933371213...","[[-0.844137233003, 0.639479455465, 1.332136356...","[[-1.06988655341, 0.64605051043, 0.25339899468...","[[-0.536672639358, -0.0964773788667, 0.2999216..."
Eknn,0.104167,0.0854167,0.123958,0.117708,0.0291667,0.177203,0.496875
wMAP,"[-0.340900259533, 3.80708509814, 1.59508408318...","[1.75752997088, -0.103317286064, -2.6460849065...","[14.3740017692, 5.62852282732, 4.10860710287, ...","[1.31155077104, -1.49131656587, 0.125518966422...","[[-2.97946517318, -2.87600071481, -3.162986489...","[2.41137715136, -0.816724741338, 0.30251518758...","[-0.0672033785302, 0.0400192537695, -0.0401808..."
rho,,0.001,0.001,0.001,"[0.0301973834223, 0.0319224918349, 0.033746151...",0.001,0.001


### 2.3. Common Mistakes on variable names

In view of all variable names provided by all students, we may decide to allow alternative names for variables without any penalty

In [11]:
# print(student_results)

In [12]:
print('Number of students in dataframe:', str(student_results.shape[1]))

print('\nDisplaying number of missing data per variable name.')
print('Those with a large number are potential common mistakes for a variable name')

student_results.isnull().sum(axis=1)

Number of students in dataframe: 61

Displaying number of missing data per variable name.
Those with a large number are potential common mistakes for a variable name


wML        3
EAP        2
LwML      13
wmean     11
covw      10
xn_tr      2
xn_val     3
Eknn       4
wMAP      12
rho       12
Dtest     16
Cov_w     60
Dval      57
lwML      60
mean_w    60
nll_tr    60
state     59
wMl       60
w_ML      60
dtype: int64

In [13]:
###########################################
# EXAM DEPENDENT VARIABLE

# Dictionary with accepted mistakes in the following format
#     Expected variable name : Accepted mistake
# If there are several wrong namees for the same variable, write them in a list.
#     Expected variable name : List of Accepted mistakes
if exam_label == 'ExLabB12_0':
    Mistakes = {'xnVal': 'xnTest',
                'wp': 'w', 
                'EAPval':'EAP'}
elif exam_label == 'ExLabB12_1':
    Mistakes = {'sEst': 'sTest',
                'xnVal': ['xnTest', 'xnTest.mat'],
                'w5': ['w2', 'we'],
                'w': 'w1',
                'PFAx1': 'PFAX1',
                'etaNPx1': ['uNP', 'etaNPX1', 'etanNPx1']}
elif exam_label == 'ExLabB12_2':
    Mistakes = {'xmVal': ['xnTest', 'xnTest.mat'],
                'xmTrain': ['xnTrain', 'xnTrain.mat'],
                'we': 'we3', 
                'w3': 'we4',
                'm0': 'mo'}
elif exam_label == 'ExLabB3_00':
    Mistakes = {'xMSE10': 'XMSE10'}
elif exam_label == 'ExLabMLTD':
    Mistakes = {'wML': ['wMl', 'w_ML'], 
                'LwML': ['nll_tr', 'lwML'],
                'wmean': 'mean_w',
                'Dtest': 'Dval',
                'covw': 'Cov_w',
               }
print(exam_label)    
##########################################

# Fill and empty variable by the value of its accepted mistake.
for el, mk in Mistakes.items():
    if type(mk) != list:
        mk = [mk]

    # Correct all mistakes related to variable el
    for x in mk:
        # The following 'if is necessary because some of the mistakes in the dictionary may not happen.
        if x in student_results.index.tolist():
            # print(student_results.loc[Mistakes[el]])
            student_results.loc[el] = student_results.loc[el].fillna(student_results.loc[x])

# Remove rows with the wrong variables.
for el in student_results.index.tolist():
    if el not in truenames:
        student_results.drop(el, inplace=True)
        
student_results.head(40)

ExLabMLTD


Unnamed: 0,ADRIAN VAZQUEZ ROMERO,ADRIANO AUGUSTO OSSES RODRIGUEZ,ADRIÁN DE LA TORRE MORENO,ALBA MINGUEZ SANCHEZ,ALEJANDRO BARON CUEVAS,ALEJANDRO CUADRADO TORRE,ALEJANDRO GARCIA DE LA SANTA RAMOS,ALVARO GONZALEZ CABALLERO,ANA MARIA SANCHEZ DE LA NAVA,ASUNCION CAYUELA HIDALGO,...,MIGUEL ANGEL MARTINEZ SANCHEZ,MÓNICA BORGOÑÓS GARCÍA,ÓSCAR CONDE PAPÍN,PABLO FERNÁNDEZ RODRÍGUEZ,PABLO GÓNGORA LUQUE,PABLO MATEOS MASA,PAULA FERNANDEZ BLAZQUEZ,PAULA FERNÁNDEZ MARTÍNEZ,SERGIO SANCHEZ RODRIGUEZ,VICTOR VALBUENA MARTINEZ
wML,"[2.30292363628, 0.271085662512, 2.90598914493,...","[-0.53944840703, -0.217647615695, -0.050404483...","[[-0.000544786321363, 0.000564440837288, 0.000...","[1.30830939694, -0.848400369932, -0.3837580903...",-1.07976,"[-0.568574128469, -0.266302170381, -0.70041326...","[0.0728515666397, -0.180961489375, 0.624004405...","[1.09825427332, 1.25284508653, 0.267647702649,...","[0.0679221647752, -1.34402364357, -0.154632707...","[0.500976359976, -0.718239883651, 0.5018553962...",...,,"[-2.27256303519, 0.0352802143279, -0.963258624...",2.13015,1.0,"[-1.77195461419, 0.423784127254, -1.7617649245...","[1.64973907564, 2.57469651093, 1.4672808521, 1...","[1.39003722621, 0.025798509223, 1.53119412296,...","[-0.233803297194, 0.102755905566, -1.069415041...","[-1.25197081015, -0.0636970397847, -2.35885456...","[0.100417255465, -1.10793686269, -0.6040336129..."
EAP,0.495645,0.439803,5.81524e-15,0.68433,0.657657,0.370199,0.494791,A MEDIAS,0.507816,0.686864,...,,0.51855,0.591159,2.0,0.480893,0.389455,0.49257,0.49574,0.506304,0.500513
LwML,477.213,0,230.397,695.72,231.242,-2.85395e-201,192.337,0,476.668,468.022,...,,214.021,"[[8.7103394156e-101, 8.7103394156e-101, 8.7103...",3.0,,957.292,-26.7628,,241.008,0.717861
wmean,"[3.93735917925, 2.51024170732, 5.47119031071, ...","[0.00349230583626, -0.759079015424, 0.47584653...","[[-0.000544621138808, 0.000564269695362, 0.000...","[0.921889240399, -0.479483696734, -0.717430293...",,"[-0.44588212728, -0.390238177787, -0.577691343...","[-0.0828427205366, -0.02023787398, 0.463617385...",0,"[-0.2683107527, -1.01032678818, -0.47355419513...","[0.165149052415, -0.382607191984, 0.1764250874...",...,,"[-1.75707266367, -0.466787906808, -0.491480175...",2.13006,5.0,,"[1.51321253942, 2.70939382501, 1.33508318585, ...","[0.813758897299, 0.602529235211, 0.96797946565...","[-0.0377378922192, -0.0972323801894, -0.869771...","[-0.719975321357, -0.605911997188, -1.82073941...","[-0.108017188272, -0.906406372844, -0.79122928..."
covw,"[[0.108989997833, -0.103786263808, 0.094518459...","[[0.14000739641, -0.134754229403, 0.1237355493...","[[0.999241985015, 0.000525415949917, 0.0004970...","[[0.108989997833, -0.103786263808, 0.094518459...",,"[[0.108989997833, -0.103786263809, 0.094518459...","[[0.108989997833, -0.103786263808, 0.094518459...",0,"[[0.108989997833, -0.103786263808, 0.094518459...","[[0.108989997833, -0.103786263808, 0.094518459...",...,,"[[0.108989997833, -0.103786263808, 0.094518459...",2.27822e-05,4.0,,"[[0.143083299195, -0.137779736558, 0.127238800...","[[0.102542230376, -0.0973663385115, 0.08823400...","[[0.143083299196, -0.137779736558, 0.127238800...","[[0.108989997833, -0.103786263809, 0.094518459...","[[0.108989997833, -0.103786263808, 0.094518459..."
xn_tr,"[[0.960554162125, -0.451993334693, -0.75895366...","[[0.0608662353937, -0.0225735973485, -1.040690...","[[1.8425440258, -0.560506251002, -1.6956533905...","[[-2.03322946785, 0.377005524986, -0.293756629...","[[1.49856447065, 1.25028290536, 1.02460033963,...","[[-2.11816258205, -0.214607465844, 1.966794433...","[[-0.00303258712656, -0.771515556246, 0.678334...","[[1.07110131315, -1.72318496794, 0.61442510286...","[[-0.452759939961, -0.533110075188, -0.0609214...","[[0.350499532728, -0.961270014358, 1.289820784...",...,,"[[-0.0665834085868, -0.943275973243, -1.441695...","[[2.19480745183, -0.747949218223, -1.667160782...",6.0,"[[-0.150331670869, -2.8558879539, -0.552891455...","[[0.810033268935, -1.64815631735, -0.239351187...","[[-0.240451384981, -1.27833120226, 0.858554914...","[[-0.213223398664, 1.86706947213, -0.201349083...","[[1.36746890224, 0.316814915882, 1.02717241321...","[[-0.170635082321, -0.397654448551, 0.30095995..."
xn_val,"[[1.80155636087, -0.386260615084, 0.3421792003...","[[-0.277639123685, -0.839197744305, -0.1603261...","[[0.291576635963, -0.91361436165, 0.4705574898...","[[-0.31989939621, -0.36980464288, 0.0933371213...","[[-0.844137233003, 0.639479455465, 1.332136356...","[[-1.06988655341, 0.64605051043, 0.25339899468...","[[-0.536672639358, -0.0964773788667, 0.2999216...","[[-0.512004495011, -2.24040576301, -1.37658838...","[[0.128810485079, -0.774148457079, 0.044947806...","[[0.0474710278126, -0.0506325686116, 0.7511350...",...,,"[[-0.321747267002, -0.404831518322, -0.6117101...","[[2.25906168117, 0.184232542496, -2.0141545191...",7.0,"[[0.715217286715, -0.417503412077, -0.14519233...","[[0.28691913411, 1.88144116727, 0.804723124756...","[[-1.20559442053, -0.642635190335, 2.135773008...","[[-0.192718772687, 0.0509048192707, 0.07619718...","[[0.833836962437, -0.686492379704, 1.328706280...","[[-1.40639414616, 0.403700195493, -0.389980632..."
Eknn,0.104167,0.0854167,0.123958,0.117708,0.0291667,0.177203,0.496875,0.1,0.402083,0.0125,...,,0.00520833,0.116,,0.11875,0.009375,0.0854167,0.0354167,0.0645833,0.05
wMAP,"[-0.340900259533, 3.80708509814, 1.59508408318...","[1.75752997088, -0.103317286064, -2.6460849065...","[14.3740017692, 5.62852282732, 4.10860710287, ...","[1.31155077104, -1.49131656587, 0.125518966422...","[[-2.97946517318, -2.87600071481, -3.162986489...","[2.41137715136, -0.816724741338, 0.30251518758...","[-0.0672033785302, 0.0400192537695, -0.0401808...","[-1.23092781638, 3.34396941691, -1.18729145025...","[-0.437257041398, 1.00741116448, -0.5808148881...","[-3.05478413905, -0.1846220915, -0.10045435245...",...,,,"[-2.03278783374, 0.22871173926, -1.2929051215,...",,"[-1.47377608036, 2.48416457989, 0.146074223735...","[3.66867800819, -0.0960162528693, -0.346717957...","[-7.87048771229e+29, 6.51045112296e+29, -1.748...","[0.0891678111807, -0.0939274967945, 0.17027377...","[-2.62445108271, 0.362519316028, -0.9276025332...","[1.73302224359, 10.5876831092, 2.48572654306, ..."
rho,,0.001,0.001,0.001,"[0.0301973834223, 0.0319224918349, 0.033746151...",0.001,0.001,0.0001,0.001,0.001,...,,0.001,0.001,,0.001,0.001,0.001,0.001,0.001,0.001


### 2.4. Name to NIA dictionary

Finally, since datafiles are created by NIA and results are available per student name, we need to create a dictionary connecting them.

Student names are taken from one or several student lists. Using multiple list is useful when the same exam is stated to multiple groups, or in the frequent situation where students from one group carry out the exam of another group.

In [14]:
# Select xls file names in the class list folder
print("Reading class lists...")
xls_files = [f for f in os.listdir(class_list_path) if f.endswith('.xls') or f.endswith('.xlsx')]
if len(xls_files) > 1:
    print("    There are {} excel files in the class_list folder.".format(len(xls_files)))
    print("    All students will be merged in a single list.")

# Load all xls files into dataframes
groups = []
for g in xls_files:
    df = pd.read_excel(class_list_path + g)
    # Translate column names form Spanish to English.
    # This is required to concatenate student lists in different languages.
    df.rename(columns={'Dirección de correo': 'Email address',
                       'Apellido(s)': 'Surname', 
                       'Nombre': 'First name'}, inplace=True)
    groups.append(df)

# Concatenate class lists (we do not expect duplicated NIU's in different lists)
student_NIA_names = pd.concat(groups)
print("Done. {0} students in the lists".format(len(student_NIA_names)))
student_NIA_names.sort_values('Surname')     #.head()

Reading class lists...
    There are 3 excel files in the class_list folder.
    All students will be merged in a single list.
Done. 67 students in the lists


Unnamed: 0,NIU,Surname,First name,Email address
0,100305653,ABREGU MIRABAL,KEVIN,100305653@alumnos.uc3m.es
0,100376222,ANTONA SANTAMARIA,GORKA,100376222@alumnos.uc3m.es
1,100379372,ARNEDO MARTINEZ,LUIS,100379372@alumnos.uc3m.es
1,100374222,AVIT FERRERO,GASPAR HUGO,gavit@pa.uc3m.es
2,100375893,BARBÓN GARCÍA,LAURA,100375893@alumnos.uc3m.es
3,100315427,BARON CUEVAS,ALEJANDRO,100315427@alumnos.uc3m.es
4,100375569,BEZOS URUEÑA,JESÚS,100375569@alumnos.uc3m.es
2,100292214,BLANCO VERA,GUILLERMO,100292214@alumnos.uc3m.es
3,100375681,BORGOÑÓS GARCÍA,MÓNICA,100375681@alumnos.uc3m.es
5,100318138,BOTA,CLAUDIU EUGEN,100318138@alumnos.uc3m.es


In [15]:
# Build dictionary NIA: name
NIA_name = {}
for el in student_results.columns.tolist():

    # Find the student name in student_NIA_names that is most similar to el
    sim_list = []
    for idx, NIA in enumerate(student_NIA_names['NIU'].values):
        std_name = str(student_NIA_names['First name'].values.tolist()[idx]) + ' ' + \
                   str(student_NIA_names['Surname'].values.tolist()[idx])
        sim_list.append(difflib.SequenceMatcher(a=el.lower(), b=std_name.lower()).ratio())

    max_sim = max(sim_list)
    max_idx = sim_list.index(max_sim)
    NIA_name[student_NIA_names['NIU'].values.tolist()[max_idx]] = el

# Build reverse dictionary name: NIA
name_NIA = {NIA_name[el]: el for el in NIA_name}

### 2.5. Group of each student

We will include the information about the group in the final dataframe of results so as to make the separation of evaluation reports easier.

In [16]:
NIA_group = pd.read_csv(all_students_path)[['NIA', 'group']]
NIA_group.sort_values(['NIA']).head()

Unnamed: 0,NIA,group
29,100278104,[201718] C28.227.14311-1
55,100282701,[201718] C28.227.14311-2
62,100290677,[201718] C28.227.14311-2
35,100291097,[201718] C28.227.14311-1
44,100292214,[201718] C28.227.14311-2


At this point we have:

   * student_data: dataframe with data given to the students. Each index is a variable, and each column a NIA
   * student_results: dataframe with student results. Each index is a variable, and each column a name
   * NIA_name: NIA to name dictionary
   * name_NIA: name to NIA dictionary
   * NIA_group: dataframe

## 3. Exam evaluation

To carry out the evaluation of the exam, we use the external evaluation libraries.

Function evaluateExam computes the correct solutions for the given data and compares them with the responses provided by the students.

In [17]:
df = pd.DataFrame()

print('Evaluating all students... ')
for NIA in NIA_name:

    name = NIA_name[NIA]

    # Evaluate the exam from the data provided to the student and the student response
    dataex = student_data[str(NIA)].to_dict()
    response = student_results[name].to_dict()
    exam_report = dbeval.evaluateExam(questions, dataex, response, solveQuestion)
    
    # Convert exam_report, which is a nested dictionary, into a pandas dataframe
    # Note that all this conversion to and from dictionaries can be avoided if evaluateExam 
    # worked with dataframes. This is a pending task.
    ex = {}
    # Note that we take the last 2 characters of the group name only.
    ex[('', 'Group')] = NIA_group[NIA_group['NIA'] == NIA]['group'].tolist()[0][:]   
    for v  in exam_report:
        for w in exam_report[v]:
            ex[(v,w)] = exam_report[v][w]
    
    df[NIA_name[NIA]] = pd.Series(ex)

# Take the transpose to place students in rows, and restate the original variable ordering
# This is because pd.Series does not preserve the order.
cols = list(ex.keys())
df = df.T[cols]

# Pretty print results
df[df.columns[:]].head(100)

Evaluating all students... 
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving 

    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving question R2
    Solving question R3
    Solving question R4
    Solving question C1
    Solving question C2
    Solving question C3
    Solving question C4
    Solving question R1
    Solving ques

Unnamed: 0_level_0,Unnamed: 1_level_0,wML,wML,wML,wML,EAP,EAP,EAP,EAP,LwML,...,Eknn,wMAP,wMAP,wMAP,wMAP,Dtest,Dtest,Dtest,Dtest,Exam
Unnamed: 0_level_1,Group,Dim,w,s,w·s,Dim,w,s,w·s,Dim,...,w·s,Dim,w,s,w·s,Dim,w,s,w·s,Score
ADRIAN VAZQUEZ ROMERO,[201718] C28.227.14311-1,OK,1,0,0,OK,1,0,0,OK,...,0,Error,1,0,0,OK,1,0,0,1
ADRIANO AUGUSTO OSSES RODRIGUEZ,[201718] C28.227.14311-1,OK,1,0,0,OK,1,0,0,OK,...,0,OK,1,0,0,OK,1,0,0,0
ADRIÁN DE LA TORRE MORENO,[201718] C28.227.14311-2,Error,1,0,0,OK,1,0,0,OK,...,0,Error,1,0,0,OK,1,0,0,0
ALBA MINGUEZ SANCHEZ,[201718] C28.227.14311-1,OK,1,0,0,OK,1,0,0,OK,...,0,OK,1,0,0,OK,0.9,1,0.9,1.9
ALEJANDRO BARON CUEVAS,[201718] C28.227.14311-1,Error,1,0,0,OK,1,0,0,OK,...,0,Error,1,0,0,Error,1,0,0,0
ALEJANDRO CUADRADO TORRE,[201718] C28.227.14311-1,OK,1,0,0,OK,1,0,0,OK,...,0,OK,1,0,0,OK,0.9,1,0.9,1.9
ALEJANDRO GARCIA DE LA SANTA RAMOS,[201718] C4.278.12994-1,OK,1,1,1,OK,1,1,1,OK,...,1,OK,1,0,0,OK,0.9,1,0.9,7.9
ALVARO GONZALEZ CABALLERO,[201718] C28.227.14311-1,OK,1,0,0,Error,1,0,0,OK,...,0,OK,1,0,0,OK,0.9,1,0.9,0.9
ANA MARIA SANCHEZ DE LA NAVA,[201718] C4.278.12994-1,OK,1,1,1,OK,1,1,1,OK,...,1,OK,1,1,1,OK,1,1,1,9
ASUNCION CAYUELA HIDALGO,[201718] C28.227.14311-2,OK,1,0,0,OK,1,0,0,OK,...,0,OK,1,0,0,OK,0.9,1,0.9,1.9


### 3.1. Penalties

In addition to the evaluation of the results file provided by the student, the final mark depends on other factors:

1. If the student uploaded the code files
2. Delays in delivering the files during the exam.
3. Errors in the delivering process (use of e-mail, incorrect file types, etc).

The following function is used to identify the code uploaded by the student.

In [18]:
def detectCode(datafiles_path, splitsymbol):
    '''
    This function is used to check if the student has uploaded a python or a matlab code file
    '''

    # Read file paths
    # datafiles = [f for f in os.listdir(datafiles_path) if isfile(join(datafiles_path, f))]
    datafiles = glob.glob(datafiles_path + '**/*.*', recursive=True)
    
    # Read files
    df = pd.DataFrame()
    print('Processing {0} files in {1} ...'.format(len(datafiles), datafiles_path))
    for dtfile in datafiles:
        
        # This is a flag. If it remains True, a new column will be added to the df
        makedf = True      

        # The tag can be the NIA, the student's name or just the begining of some other file
        # tag = dtfile.split(splitsymbol)[0]
        tag = getFileName(dtfile).split(splitsymbol)[0]

        if tag in name_NIA:
        
            if dtfile.endswith('.zip'):
             
                # Read names of .mat files
                # files_in_zip = zp.ZipFile(join(datafiles_path, dtfile)).namelist()            
                files_in_zip = zp.ZipFile(dtfile).namelist()            

                # mat file selection. This is to disambiguate cases with multiple files
                n_mat = len([f for f in files_in_zip if f.endswith('.m')])
                n_py = len([f for f in files_in_zip if f.endswith('.py') or f.endswith('.ipynb')])

                if n_py * n_mat > 0:
                    print('WARNING: {} has delivered both matlab and python code'.format(name))

                if n_py > 0:
                    code = 'Py'
                elif n_mat > 0:
                    code = 'Mat'
                else:
                    code = 'None'

            elif dtfile.endswith('.py') or  dtfile.endswith('.ipynb'):
                code = 'Py'            
            elif dtfile.endswith('.m'):  
                code = 'Mat'
            else:
                code = 'None'

            df2 = pd.DataFrame()
            df2[tag] = pd.Series(code, index = ['Code'])
            df = pd.concat([df, df2], axis=1)
        elif os.path.isfile(dtfile):
            print('    File ignored: {0}'.format(dtfile))
    return df.T

In [19]:
# Identify the code delivered by the students
code_data = detectCode(results_path, splitsymbol='_')
code_data[code_data.columns][:].head()

# Add the code data to the evaluation dataframe
df['Delivery', 'Code'] = code_data
df['Delivery', 'Delay'] = 0.0
df['Delivery', 'Factor'] = 1.0

# Penalties for students that did not delivered any code.
df.loc[df['Delivery', 'Code'] == 'None', ('Delivery', 'Factor')] = 0.5 

Processing 64 files in ../../TestBed/TestTD/student_results/ExLabMLTD/ ...


In [20]:
# This cell contains project specific instructions.

# PENALTIES:
# if project_path == '../LabEvaluationProjects/ProjectB3_1718/':
if project_path == '../../EvaluationProjects/TestB3/':

    # STUDENTS THAT DID NOT DELIVER ANY RESULTS.
    #     ALEJANDRO GOMEZ RODENAS: (no e-mail) Delivers code only. Results generated with penalty
    df.at['ALEJANDRO GOMEZ RODENAS', ('Delivery', 'Factor')] = p_noresults
    #     ANDONI TAJUELO MUÑOZ: (no e-mail) Does not deliver results file. However, code computes some variables.
    #         Results generated with penalty
    df.at['ANDONI TAJUELO MUÑOZ', ('Delivery', 'Factor')] = p_noresults
    #     HAMZA EL HAMDAOUI ABOUEL ABBES: (e-mail) His computer get blocked and could not generate results file 
    #         savemat command incorrect. Code generated without penalty.
    df.at['HAMZA EL HAMDAOUI ABOUEL ABBES', ('Delivery', 'Factor')] = 1.0
    #     ROCIO BARTOLOME FERNANDEZ: (no e-mail) entrega un fichero Lab12.7z, pero cambia el nombre por Lab12zip
    #         Results generated with penalty.
    df.at['ROCIO BARTOLOME FERNANDEZ', ('Delivery', 'Factor')] = p_noresults
    #     CRISTINA GARCIA GARCIA: (e-mail) Does not deliver results file. Code does not compute any of the variables 
    #     NEREA MERIDA QUERO: (no e-mail) Delivers multiple code versions.
    #     RAQUEL CARMONA LOPEZ (no e-mail) No results file. The code is completely wrong.
    #     MICHAEL UMENDU RIOS: compressed files with .7z. Changed without penalty.
    
elif project_path == '../LabEvaluationProjects/ProjectB3_1718_Gbil/':
    # NO INCIDENTS IN THIS GROUP
    pass
elif project_path == 'prb12':    
    # ADRIAN LOPEZ RUIZ: 
    #      (1) python does not recognize the delivered file as zip. However, I could decompress with
    #          the unarchiver. zip file re-generated without penalty
    #      (2) the mat file is actualy a .ipynb with the extension changed.
    # FRANCISCO JAVIER VICENTE LASO: the .zip file cannot be read in any way. I have changed the extension to .unk.
    # MIGUEL RODRIGUEZ TALAVERON: delivers a .7z file. File .zip generated without penalty
    # ESTEFANIA FUENTES FERNANDEZ delivers a .7z file. File .zip generated without penalty
    pass
elif project_path == '../../examLabMLTD_201801/':    
    # ESTHER RITUERTO GONZALEZ: delivers a .rar file. File .zip generated without penalty
    # MARCOS RUBIO RUBIO: delivers a .rar file. File .zip generated without penalty
    # ALBA MINGUEZ SANCHEZ: delivers a .rar file. File .zip generated without penalty
    # CARLOS FERNANDEZ DEL CERRO: delivers a .rar file. File .zip generated without penalty
    # JAVIER SOLA MORRÁS: delivers a .rar file. File .zip generated without penalty
    # JUAN JIMÉNEZ GARCÍA:  delivers a .rar file. File .zip generated without penalty
    # ADRIÁN DE LA TORRE MORENO: delivers a .7z file. File .zip generated without penalty
    # IGNACIO SÁNCHEZ LARRÁYOZ: delivers a .rar file. File .zip generated without penalty
    # PAULA FERNÁNDEZ MARTÍNEZ: delivers a .rar file. File .zip generated without penalty
    # JOSE SERRANO FERNANDEZ: delivers by e-mail
    df.at['JOSE SERRANO FERNANDEZ', ('Delivery', 'Factor')] = p_email
# if exam_label == 'ExLabB12_0':
#    df.drop('mTrain', axis=1, inplace=True)

Now we are ready to compute the final score

In [21]:
df['Final', 'Score'] = (df['Exam', 'Score'] - p_delay * df['Delivery', 'Delay']) * df['Delivery', 'Factor']
df[df.columns]    # .head()

Unnamed: 0_level_0,Unnamed: 1_level_0,wML,wML,wML,wML,EAP,EAP,EAP,EAP,LwML,...,wMAP,Dtest,Dtest,Dtest,Dtest,Exam,Delivery,Delivery,Delivery,Final
Unnamed: 0_level_1,Group,Dim,w,s,w·s,Dim,w,s,w·s,Dim,...,w·s,Dim,w,s,w·s,Score,Code,Delay,Factor,Score
ADRIAN VAZQUEZ ROMERO,[201718] C28.227.14311-1,OK,1,0,0,OK,1,0,0,OK,...,0,OK,1,0,0,1,Py,0.0,1.0,1
ADRIANO AUGUSTO OSSES RODRIGUEZ,[201718] C28.227.14311-1,OK,1,0,0,OK,1,0,0,OK,...,0,OK,1,0,0,0,Py,0.0,1.0,0
ADRIÁN DE LA TORRE MORENO,[201718] C28.227.14311-2,Error,1,0,0,OK,1,0,0,OK,...,0,OK,1,0,0,0,Py,0.0,1.0,0
ALBA MINGUEZ SANCHEZ,[201718] C28.227.14311-1,OK,1,0,0,OK,1,0,0,OK,...,0,OK,0.9,1,0.9,1.9,Py,0.0,1.0,1.9
ALEJANDRO BARON CUEVAS,[201718] C28.227.14311-1,Error,1,0,0,OK,1,0,0,OK,...,0,Error,1,0,0,0,Py,0.0,1.0,0
ALEJANDRO CUADRADO TORRE,[201718] C28.227.14311-1,OK,1,0,0,OK,1,0,0,OK,...,0,OK,0.9,1,0.9,1.9,Py,0.0,1.0,1.9
ALEJANDRO GARCIA DE LA SANTA RAMOS,[201718] C4.278.12994-1,OK,1,1,1,OK,1,1,1,OK,...,0,OK,0.9,1,0.9,7.9,Py,0.0,1.0,7.9
ALVARO GONZALEZ CABALLERO,[201718] C28.227.14311-1,OK,1,0,0,Error,1,0,0,OK,...,0,OK,0.9,1,0.9,0.9,Py,0.0,1.0,0.9
ANA MARIA SANCHEZ DE LA NAVA,[201718] C4.278.12994-1,OK,1,1,1,OK,1,1,1,OK,...,1,OK,1,1,1,9,Py,0.0,1.0,9
ASUNCION CAYUELA HIDALGO,[201718] C28.227.14311-2,OK,1,0,0,OK,1,0,0,OK,...,0,OK,0.9,1,0.9,1.9,Py,0.0,1.0,1.9


## 4. Save results

In [22]:
# Save to excel file.
if not os.path.exists(output_path):
    os.makedirs(output_path)
df.to_excel(output_path + finalnotes_fname, columns=df.columns)

## 5. Summary statistics

This is not necessary for the evaluation. Some summary statistics are shown in order to check if some code modifications affect unexpectedly to the evaluation results.

In [23]:
for var in truenames:
    if var in df.columns:
        print('Score sum for variable {0}:    {1}'.format(var, df[(var, 'w·s')].sum()))
print('\nFinal score sum:    {0}'.format(df[('Final', 'Score')].sum()))
print('Average final score:    {0}'.format(df[('Final', 'Score')].mean()))


Score sum for variable wML:    4
Score sum for variable EAP:    4
Score sum for variable LwML:    1
Score sum for variable wmean:    4
Score sum for variable covw:    26.400000000000002
Score sum for variable xn_tr:    4
Score sum for variable xn_val:    4
Score sum for variable Eknn:    4
Score sum for variable wMAP:    3
Score sum for variable Dtest:    21.899999999999995

Final score sum:    76.3
Average final score:    1.2508196721311475
