# (9) Getting Students' Autoreport Dedicated Time

* **author** = Diego Sapunar-Opazo
* **copyright** = Copyright 2019, Thesis M.Sc. Diego Sapunar - Pontificia Universidad Católica de Chile
* **credits** = Diego Sapunar-Opazo, Ronald Perez, Mar Perez-Sanagustin, Jorge Maldonado-Mahauad
* **maintainer** = Diego Sapunar-Opazo
* **email** = dasapunar@uc.cl
* **status** = Dev

This script gets the clean week report for each section, and creates a csv file:

(1) **num_alumno**, which corresponds to the internal face-to-face students' id

(2) **week**, which corresponds to the week

(3) **dedicated_prepare_lecture hours**, which corresponds to autoreported time in prepare lectures by the student

(4) **dedicated_watch_videos_hours**, which corresponds to autoreported time in watch videos by the student

(5) **dedicated_coursera_quiz_hours**, which corresponds to autoreported time in coursera quiz by the student

(6) **dedicated_prepare_exam_hours**, which corresponds to autoreported time in prepare an exam by the student

(7) **dedicated_other_hours**, which corresponds to autoreported time in other by the student

(8) **dedicated_total_hours**, which corresponds to the total autoreported time by the student. Is the sum of the rest columns.

## Part 0: Import Packages

In [1]:
# data analysis and wrangling
import pandas as pd
import numpy as np

## Part 1: Getting the Data

In [2]:
def read_data(path):
    '''
    Read a .csv file and convert it in a Pandas DataFrame.
    
    Input:
    path - String: path where the .csv is located.
    
    Output:
    Pandas DataFrame: .csv in the Pandas DataFrame format.
    '''
    
    return pd.read_csv(path)

## Part 2: Export Data

In [3]:
def export_data(df, path):
    '''
    Export df in .csv fole to the path.
    
    Input:
    df - Pandas DataFrame: dataframe to be exported.
    path - String: path where the .csv will be exported.
    '''
    
    df.to_csv(path, index=False)

## Part 3: Main

In [5]:
_report_sec1_path = '../../data/clean_data/week_report_sec1.csv'
_report_sec2_path = '../../data/clean_data/week_report_sec2.csv'

# getting data and slicing
df_report_sec1 = read_data(_report_sec1_path).iloc[:,[i for i in range(0,7)]].rename({'last_week':'week'}, axis=1)
df_report_sec2 = read_data(_report_sec2_path).iloc[:,[i for i in range(0,7)]].rename({'last_week':'week'}, axis=1)

# keeping rows for experiment.
# (source: https://thispointer.com/python-pandas-how-to-drop-rows-in-dataframe-by-conditions-on-column-values/)

# dropping for both section week 5 and week 9
# dropping for sec1 week 12
# dropping for sec2 week 13
indexDrop = df_report_sec1[(df_report_sec1['week'] == 5) | 
                           (df_report_sec1['week'] == 9) | 
                           (df_report_sec1['week'] == 12)].index
df_report_sec1.drop(indexDrop, inplace=True)
indexDrop = df_report_sec2[(df_report_sec2['week'] == 5) | 
                           (df_report_sec2['week'] == 9) | 
                           (df_report_sec2['week'] == 13)].index
df_report_sec2.drop(indexDrop, inplace=True)

# replacing columns
_cols_to_replace = {
    6: 5,
    7: 6,
    8: 7,
    10: 8,
    11: 9,
    13: 10,
    12: 10,
    14: 11
}
df_report_sec1.replace(_cols_to_replace, inplace=True)
df_report_sec2.replace(_cols_to_replace, inplace=True)

# concat
df_dedicacated_time_report = pd.concat([df_report_sec1, df_report_sec2], ignore_index=True)

# adding column 'dedicated_total_hours'
df_dedicacated_time_report['dedicated_total_hours'] = df_dedicacated_time_report.iloc[:,[i for i in range (2,len(df_dedicacated_time_report.columns))]
                                                                                             ].sum(axis=1)

# export data
export_data(df_dedicacated_time_report, '../../data/final_data/dedicated_time_report.csv')

# cleaning memory
del df_report_sec1
del df_report_sec2