# Lab 8 - Combining Attendance and Practice Quiz Attempts

## Background

In all of my class, I use an attendace quiz to track student attendance.  Note that students take multiple attempts at the same quiz, one per class; so that number of attempts a student takes on this quiz represents the number of class session that student has attended.

In some, but not all, of my courses I also provide practice quizzes that students can use to prepare for actual quizzes and tests.  These quizzes pull questions randomly from a bank of questions, allow students unlimited attempts, and are not used as part of the students grade.

In this lab, you will collect simulated data from mock classes into one table and in the next lab you will summarize these data.

## Tasks 

The files found in `attendance_example.zip` contains (made-up and random) examples of the D2L files that I use to summarize my attendance quizzes and practice quizzes
Make sure you download `attendance_example.zip` to the `data` folder inside the course repository, then unzip the file.

1. Use `glob` to find the path to all csv files.
2. Use write functions that use regular expressions to extract the class name, quiz type (`Attendance` and `Practice`), and the module number (if the file is a practice quiz.
3. Write a function that takes a path as an argument and returns a dataframe that contains:
    * All of the original columns
    * A Class column that holds the class identifier
    * A Category column that contains the quiz type
    * A Module column that (a) contains the module number for a practice quiz, or (b) is otherwise empty.
4. Use a loop, `union`, and the accumulator pattern to load all of the data into a single table.
5. Write the resulting table to a csv file.

In [1]:
%reset -f

In [2]:
import pandas as pd
from dfply import *

In [3]:
from glob import glob

In [4]:
files = glob('./data/attendance_example/*/*.csv')
files

['./data/attendance_example/stat491s1/Attendance Quiz - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 4 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 2 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 3 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 1 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Attendance Quiz - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 4 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 2 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 3 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 1 - User Attempts.csv',
 './data/attendance_example/stat180s18/Attendance Quiz - User Attempts.csv']

In [5]:
import re

In [6]:
pattern = r'.\/data\/attendance_example\/(\w+\d+)\/([A-Za-z]*) Quiz.*'
module_pattern = r'.\/data\/attendance_example\/(\w+\d+)\/([A-Za-z]*) Quiz - (\w+) (\d+).*'

class_name = lambda file: re.compile(pattern).match(file).group(1)
quiz_type = lambda file: re.compile(pattern).match(file).group(2)
module_num = lambda file: re.compile(module_pattern).match(file)

module_num(files[0])

In [7]:
def get_df(file_path):
    df = pd.read_csv(file_path)
    df['Class'] = class_name(file_path)
    df['Category'] = quiz_type(file_path)
    module =  module_num(file_path)
    if module is None:
        df['Module'] = None
    else:
        df['Module'] = module_num(file_path).group(4)
    return df


In [8]:
df = get_df(files[1])
df

Unnamed: 0,Org Defined ID,UserName,FirstName,LastName,Attempt #,Score,Out Of,Attempt_Start,Attempt_End,Percent,Class,Category,Module
0,15135961,wd8670of,McKinley,Sabina,6,9,20,2019-02-24 11:01:00,2019-02-24 11:04:00,45 %,stat491s1,Practice,4
1,15135961,wd8670of,McKinley,Sabina,7,13,20,2019-02-25 11:59:00,2019-02-25 12:06:00,65 %,stat491s1,Practice,4
2,15135961,wd8670of,McKinley,Sabina,8,18,20,2019-02-26 11:15:00,2019-02-26 11:23:00,90 %,stat491s1,Practice,4
3,15135961,wd8670of,McKinley,Sabina,9,8,20,2019-02-27 11:40:00,2019-02-27 11:50:00,40 %,stat491s1,Practice,4
4,15135961,wd8670of,McKinley,Sabina,10,6,20,2019-02-27 11:55:00,2019-02-27 12:04:00,30 %,stat491s1,Practice,4
5,15135961,wd8670of,McKinley,Sabina,11,6,20,2019-02-28 11:56:00,2019-02-28 12:02:00,30 %,stat491s1,Practice,4
6,15135961,wd8670of,McKinley,Sabina,12,7,20,2019-02-28 11:24:00,2019-02-28 11:29:00,35 %,stat491s1,Practice,4
7,15135961,wd8670of,McKinley,Sabina,13,15,20,2019-03-01 11:02:00,2019-03-01 11:11:00,75 %,stat491s1,Practice,4
8,15135961,wd8670of,McKinley,Sabina,14,12,20,2019-03-01 11:58:00,2019-03-01 12:02:00,60 %,stat491s1,Practice,4
9,15135961,wd8670of,McKinley,Sabina,1,18,20,2019-02-22 11:37:00,2019-02-22 11:38:00,90 %,stat491s1,Practice,4


In [9]:
@dfpipe
def union_all(left_df, right_df, ignore_index=True):
    return pd.concat([left_df, right_df], ignore_index=ignore_index)

In [10]:
attendance_final = get_df(files[0])
len_count = len(attendance_final)

for file in files[1:]:
    new_df = get_df(file)
    len_count+= len(new_df) 
    attendance_final = attendance_final >> union_all(new_df)
    


attendance_final.shape


(3359, 13)

In [11]:
attendance_final.to_csv('./data/lab8_attendance_final.csv', index= False)