# (8) Cleaning Weekly Report

* **author** = Diego Sapunar-Opazo
* **copyright** = Copyright 2019, Thesis M.Sc. Diego Sapunar - Pontificia Universidad Católica de Chile
* **credits** = Diego Sapunar-Opazo, Ronald Perez, Mar Perez-Sanagustin, Jorge Maldonado-Mahauad
* **maintainer** = Diego Sapunar-Opazo
* **email** = dasapunar@uc.cl
* **status** = Dev

This scripts gets the raw weekly report for each group and clean it, creating for section 1:

(1) **num_alumno**, which corresponds to the internal face-to-face students' id

(2) **last_week**, which corresponds to the last week

(3) **dedicated_prepare_lecture_hours**, which corresponds to autoreported time in prepare lectures by the student

(4) **dedicated_watch_videos_hours**, which corresponds to autoreported time in watch videos by the student

(5) **dedicated_coursera_quiz_hours**, which corresponds to autoreported time in coursera quiz by the student

(6) **dedicated_prepare_exam_hours**, which corresponds to autoreported time in prepare an exam by the student

(7) **dedicated_other_hours**, which corresponds to autoreported time in other by the student

(8) **perception_lecture_score**, which corresponds to the student's perception of lectures last week

(9) **perception_comprehension_score**, which corresponds to the student's comprehension of the last week topic

(10) **next_week**, which corresponds to the next week

(11) **planning_hours**, which corresponds to the student's planning hours for the next week

(12) **planning_number_videos**, which corresponds to the student's planning number of videos for the next week

(13) **planning_number_quiz**, which corresponds to the student's planning number of quiz for the next week

(14) **planning_Mon**, which corresponds that the student planed to work on Monday.

(15) **planning_Tue**, which corresponds that the student planed to work on Tuesday.

(16) **planning_Wed**, which corresponds that the student planed to work on Wednesday.

(17) **planning_Thu**, which corresponds that the student planed to work on Thursday.

(18) **planning_Fri**, which corresponds that the student planed to work on Friday.

(19) **planning_Sat**, which corresponds that the student planed to work on Saturday.

(20) **planning_Sat**, which corresponds that the student planed to work on Sunday.


And for section 2:

(1) **num_alumno**, which corresponds to the internal face-to-face students' id

(2) **last_week**, which corresponds to the last week

(3) **dedicated_prepare_lecture_hours**, which corresponds to autoreported time in prepare lectures by the student

(4) **dedicated_watch_videos_hours**, which corresponds to autoreported time in watch videos by the student

(5) **dedicated_coursera_quiz_hours**, which corresponds to autoreported time in coursera quiz by the student

(6) **dedicated_prepare_exam_hours**, which corresponds to autoreported time in prepare an exam by the student

(7) **dedicated_other_hours**, which corresponds to autoreported time in other by the student

(8) **perception_lecture_score**, which corresponds to the student's perception of lectures last week

(9) **perception_comprehension_score**, which corresponds to the student's comprehension of the last week topic


## Part 0: Import Packages

In [1]:
# data analysis and wrangling
import pandas as pd
import numpy as np

## Part 1: Getting the Data

In [2]:
def read_data(path):
    '''
    Read a .csv file and convert it in a Pandas DataFrame.
    
    Input:
    path - String: path where the .csv is located.
    
    Output:
    Pandas DataFrame: .csv in the Pandas DataFrame format.
    '''
    
    return pd.read_csv(path, header=1)

## Part 2: Data Preprocessing & Wrangling

In [3]:
def preprocc_data(df, slices=False, columns_to_rename=False, datetime=False):
    '''
    From a dataframe on the fly, (1) get the necessary columns; (2) rename columns; and (3) clean data.
    
    Input: 
    df - Pandas DataFrame: dataframe to be cleaned.
    columns_to_rename - Dict: Columns to rename, Key: original name, Value: new name.
    datetime - List of Strings: List of the names of the columns to be datetime.
    
    Output:
    df - Pandas DataFrame: the dataframe already cleaned.
    '''
    
    df_cleaned = df.copy()
    
    # slicing the columns, getting only the one that I need (num_alumno and seccion)
    if slices:
        df_cleaned = df_cleaned.iloc[:,slices]
    
    del df  # clean memory
    
    # rename columns
    if columns_to_rename:
        df_cleaned.rename(_columns_to_rename, 
                          inplace=True, 
                          axis=1)
    
    if datetime:
        for cat in datetime:
            
            df_cleaned[cat] = pd.to_datetime(df_cleaned[cat], 
                                             format='%Y-%m-%d %H:%M:%S.%f')
    
    return df_cleaned

## Part 3: Export Data

In [4]:
def export_data(df, path):
    '''
    Export df in .csv fole to the path.
    
    Input:
    df - Pandas DataFrame: dataframe to be exported.
    path - String: path where the .csv will be exported.
    '''
    
    df.to_csv(path, index=False)

## Part 4: Main

In [5]:
_columns_to_rename = {
    'Open-Ended Response': 'num_alumno',
    'Response.1': 'last_week',
    'Preparar las clases (leer las lecturas y estudiar los casos)': 'dedicated_prepare_lecture_hours',
    'Ver los videos correspondientes al tema de las clases': 'dedicated_watch_videos_hours',
    'Realizar evaluaciones online': 'dedicated_coursera_quiz_hours',
    'Preparar una interrogación o examen': 'dedicated_prepare_exam_hours',
    'Otro (asistencia a una charla, etc.)': 'dedicated_other_hours',
    'Response.2': 'perception_lecture_score',
    'Response.3': 'perception_comprehension_score',
    'Response.5': 'next_week',
    'Horas que quieres invertir': 'planning_hours',
    'Número de videos\xa0que quieres ver': 'planning_number_videos',
    'Número de evaluaciones que quieres realizar': 'planning_number_quiz',
    'Lunes': 'planning_Mon',
    'Martes': 'planning_Tue',
    'Miércoles': 'planning_Wed',
    'Jueves': 'planning_Thu',
    'Viernes': 'planning_Fri',
    'Sábado': 'planning_Sat',
    'Domingo': 'planning_Sun'
}

_values_to_replace = {
    'Semana 15 (26-11-18 al 02-12-18)': 15,
    'Semana 14 (19-11-18 al 25-11-18)': 14,
    'Semana 13 (12-11-18 al 18-11-18)': 13,
    'Semana 12 (05-11-18 al 11-11-18)': 12,
    'Semana 11 (29-10-18 al 04-11-18)': 11,
    'Semana 10 (22-10-18 al 28-10-18)': 10,
    'Semana 9 (15-10-18 al 21-10-18)': 9,
    'Semana 8 (08-10-18 al 14-10-18)': 8,
    'Semana 7 (01-10-18 al 07-10-18)': 7,
    'Semana 6 (24-09-18 al 30-09-18)': 6,
    'Semana 5 (17-09-18 al 23-09-18)': 5,
    'Semana 4 (10-09-18 al 16-09-18)': 4,
    'Semana 3 (03-09-18 al 09-09-18)': 3,
    'Semana 2 (27-08-18 al 02-09-18)': 2,
    'Semana 1 (20-08-18 al 26-08-18)': 1,
    'Semana 0 (13-08-18 al 19-08-18)': 0,
    'Totalmente beneficiosas': 4,
    'Muy beneficiosas': 3,
    'Beneficiosas': 2,
    'Algo beneficiosas': 1,
    'Nada beneficiosas': 0,
    'Tengo una idea clara de lo que es y sé cómo aplicarlo.': 3,
    'Tengo una idea clara de lo que es y sé como aplicarlo.': 3,
    'Tengo una idea clara de lo que es, pero no sé cómo aplicarlo.': 2,
    'Tengo una idea de lo que es, pero no sé cómo aplicarlo.': 1,
    'No sé lo qué es o lo he escuchado, pero no sé a que se refiere.': 0,
    'No sé lo que es o lo he escuchado, pero no sé a que se refiere.': 0,
    'Lunes': 1,
    'Martes': 1,
    'Miércoles': 1,
    'Jueves': 1,
    'Viernes': 1,
    'Sábado': 1,
    'Domingo': 1
}

In [6]:
_report_sec1_path = '../data/raw_data/weekly_report/weekly_report_sec1.csv'
df_report_1 = read_data(_report_sec1_path)

df_report_1 = preprocc_data(df_report_1, 
                            slices=[i for i in range(10,19)] + [i for i in range(22,33)], 
                            columns_to_rename=_columns_to_rename)

df_report_1.fillna(0, inplace=True)

df_report_1.replace(_values_to_replace, inplace=True)
df_report_1.drop_duplicates(subset=['num_alumno', 'last_week'], keep='last', inplace=True)

_report_sec1_export_path = '../data/clean_data/week_report_sec1.csv'
export_data(df_report_1, _report_sec1_export_path)

In [7]:
_report_sec2_path = '../data/raw_data/weekly_report/weekly_report_sec2.csv'

df_report_2 = read_data(_report_sec2_path)

df_report_2 = preprocc_data(df_report_2, 
                            slices=[i for i in range(10,19)], 
                            columns_to_rename=_columns_to_rename)

df_report_2.fillna(0, inplace=True)
df_report_2.replace(_values_to_replace, inplace=True)

df_report_2.drop_duplicates(subset=['num_alumno', 'last_week'], keep='last', inplace=True)
_report_sec2_export_path = '../data/clean_data/week_report_sec2.csv'
export_data(df_report_2, _report_sec2_export_path)