# Lab 8 - Combining Attendance and Practice Quiz Attempts

## Background

In all of my class, I use an attendace quiz to track student attendance.  Note that students take multiple attempts at the same quiz, one per class; so that number of attempts a student takes on this quiz represents the number of class session that student has attended.

In some, but not all, of my courses I also provide practice quizzes that students can use to prepare for actual quizzes and tests.  These quizzes pull questions randomly from a bank of questions, allow students unlimited attempts, and are not used as part of the students grade.

In this lab, you will collect simulated data from mock classes into one table and in the next lab you will summarize these data.

## Tasks 

The files found in `attendance_example.zip` contains (made-up and random) examples of the D2L files that I use to summarize my attendance quizzes and practice quizzes
Make sure you download `attendance_example.zip` to the `data` folder inside the course repository, then unzip the file.

1. Use `glob` to find the path to all csv files.
2. Use write functions that use regular expressions to extract the class name, quiz type (`Attendance` and `Practice`), and the module number (if the file is a practice quiz.
3. Write a function that takes a path as an argument and returns a dataframe that contains:
    * All of the original columns
    * A Class column that holds the class identifier
    * A Category column that contains the quiz type
    * A Module column that (a) contains the module number for a practice quiz, or (b) is otherwise empty.
4. Use a loop, `union`, and the accumulator pattern to load all of the data into a single table.
5. Write the resulting table to a csv file.

In [1]:
import pandas as pd
from dfply import *
from glob import glob
import re

In [24]:
files = glob('./data/attendance_example/*/*.csv')
files

['./data/attendance_example/dsci494s7/Attendance Quiz - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 1 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 2 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 3 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 4 - User Attempts.csv',
 './data/attendance_example/stat180s18/Attendance Quiz - User Attempts.csv',
 './data/attendance_example/stat491s1/Attendance Quiz - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 1 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 2 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 3 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 4 - User Attempts.csv']

In [60]:
FILE_NAME_RE = re.compile(r'^\./data/attendance_example/([a-z\d]+)/(\w+).*\.csv$')
FILE_NAME_RE2 = re.compile(r'^\./data/attendance_example/[a-z\d]+/.*(\d).*\.csv$')

course = lambda p: FILE_NAME_RE.match(p).group(1) 
get_courses = lambda files: [course(p) for p in files]

quiz_type = lambda p: FILE_NAME_RE.match(p).group(2) 
get_quiz_types = lambda files: [quiz_type(p) for p in files]


module = lambda p: FILE_NAME_RE2.match(p).group(1) if quiz_type(p)=="Practice" else 0
get_module = lambda files: [module(p) if quiz_type(p)=="Practice" else 0 for p in files]

In [46]:
courses = get_courses(files)
courses

['dsci494s7',
 'dsci494s7',
 'dsci494s7',
 'dsci494s7',
 'dsci494s7',
 'stat180s18',
 'stat491s1',
 'stat491s1',
 'stat491s1',
 'stat491s1',
 'stat491s1']

In [47]:
quiz_types = get_quiz_types(files)
quiz_types

['Attendance',
 'Practice',
 'Practice',
 'Practice',
 'Practice',
 'Attendance',
 'Attendance',
 'Practice',
 'Practice',
 'Practice',
 'Practice']

In [52]:
modules = get_module(files)
modules

[0, '1', '2', '3', '4', 0, 0, '1', '2', '3', '4']

In [84]:
def add_to_file(path):
    df = pd.read_csv(path)
    df2 = (df >>
           mutate(Course = course(path),
                  Quiz_type = quiz_type(path),
                  Module = "Module_{}".format(module(path))))
    return df2

In [85]:
@dfpipe
def union_all(left_df, right_df, ignore_index=True):
    return pd.concat([left_df, right_df], ignore_index=ignore_index)

In [86]:
df_union = pd.DataFrame(columns=['Org Defined ID', 'UserName', 'FirstName', 'LastName', 'Attempt #',
       'Score', 'Out Of', 'Attempt_Start', 'Attempt_End', 'Percent', 'Course',
       'Quiz_type', 'Module'])
df_union

Unnamed: 0,Org Defined ID,UserName,FirstName,LastName,Attempt #,Score,Out Of,Attempt_Start,Attempt_End,Percent,Course,Quiz_type,Module


In [87]:
for file in files:
    df = add_to_file(file)
    df_union = df_union >> union_all(df)

In [88]:
df_union.shape

(3359, 13)

In [89]:
df_union.sample(20)

Unnamed: 0,Org Defined ID,UserName,FirstName,LastName,Attempt #,Score,Out Of,Attempt_Start,Attempt_End,Percent,Course,Quiz_type,Module
111,14924788,ju8890nw,Christendom,Montmartre,4,1,1,2019-01-25 14:01:00,2019-01-25 14:10:00,100 %,dsci494s7,Attendance,Module_0
2409,16077924,mg1452fz,Dorset,Nibelung,4,1,1,2019-01-24 09:05:00,2019-01-24 09:08:00,100 %,stat491s1,Attendance,Module_0
2921,16745015,gf4756td,Jane,Sommerfeld,14,1,1,2019-02-20 11:01:00,2019-02-20 11:08:00,100 %,stat491s1,Attendance,Module_0
2653,15227236,pc3278ep,Cyprus,Errol,12,1,1,2019-02-13 15:59:00,2019-02-13 16:01:00,100 %,stat491s1,Attendance,Module_0
427,14924788,ju8890nw,Christendom,Montmartre,11,1,1,2019-02-08 15:02:00,2019-02-08 15:07:00,100 %,dsci494s7,Attendance,Module_0
2908,12376671,nz6014ah,Edmonds,Tuesday,1,1,1,2019-01-14 11:03:00,2019-01-14 11:09:00,100 %,stat491s1,Attendance,Module_0
763,11702889,yw2911sz,Philippine,Gaylord,5,5,20,2019-01-28 15:51:00,2019-01-28 15:54:00,25 %,dsci494s7,Practice,Module_1
519,13114642,qe3386ba,Willard,Kafka,3,1,1,2019-01-23 11:00:00,2019-01-23 11:08:00,100 %,dsci494s7,Attendance,Module_0
3041,10744592,cv5264qt,McDonnell,Arizona,20,18,20,2019-01-28 15:32:00,2019-01-28 15:40:00,90 %,stat491s1,Practice,Module_1
2171,17647838,tb2740kh,Tiffany,Schmitt,5,1,1,2019-01-25 11:01:00,2019-01-25 11:06:00,100 %,stat180s18,Attendance,Module_0


In [90]:
df_union.to_csv("./attendance_and_quiz_data_combined.csv")