# (11) Getting Planning Report

* **author** = Diego Sapunar-Opazo
* **copyright** = Copyright 2019, Thesis M.Sc. Diego Sapunar - Pontificia Universidad Católica de Chile
* **credits** = Diego Sapunar-Opazo, Ronald Perez, Mar Perez-Sanagustin, Jorge Maldonado-Mahauad
* **maintainer** = Diego Sapunar-Opazo
* **email** = dasapunar@uc.cl
* **status** = Dev

This scripts gets the clean week report of section 1, students_NMP_id and the clean NMP_goals, and creates a csv file:

(1) **num_alumno**, which corresponds to the internal face-to-face students' id

(2) **week**, which corresponds to the week

(3) **planning_hours**, which corresponds to the student's planning hours for the next week

(4) **planning_number_videos**, which corresponds to the student's planning number of videos for the next week

(5) **planning_number_quiz**, which corresponds to the student's planning number of quiz for the next week

## Part 0: Import Packages

In [1]:
# data analysis and wrangling
import pandas as pd
import numpy as np

## Part 1: Getting the Data

In [2]:
def read_data(path):
    '''
    Read a .csv file and convert it in a Pandas DataFrame.
    
    Input:
    path - String: path where the .csv is located.
    
    Output:
    Pandas DataFrame: .csv in the Pandas DataFrame format.
    '''
    
    return pd.read_csv(path)

## Part 2: Data Preprocessing & Wrangling

In [3]:
def merging(df1, df2, variable1, variable2):
    '''
    Merge df1 and df2 over the variable.
    
    Input:
    df1 - Pandas DataFrame
    df2 - Pandas DataFrame
    variable - String: name of the column to use as pivot.
    
    Output:
    Pandas DataFrame
    '''

    df1.dropna(inplace=True)
    df2.dropna(inplace=True)
    
    # getting same types
    df1[variable1] = df1[variable1].astype('str')
    df2[variable2] = df2[variable2].astype('str')
    
    return pd.merge(left=df1, right=df2, left_on=variable1, right_on=variable2)
    

## Part 3: Export Data

In [4]:
def export_data(df, path):
    '''
    Export df in .csv fole to the path.
    
    Input:
    df - Pandas DataFrame: dataframe to be exported.
    path - String: path where the .csv will be exported.
    '''
    
    df.to_csv(path, index=False)

## Part 4: Main

In [5]:
# getting data and slicing
df_report_sec1 = read_data('../../data/clean_data/week_report_sec1.csv').loc[:,['num_alumno', 
                                                                             'next_week', 
                                                                             'planning_hours', 
                                                                             'planning_number_videos',
                                                                             'planning_number_quiz']].rename({'next_week':
                                                                                                              'week'}, 
                                                                                                             axis=1)
df_NMP_planning = read_data('../../data/clean_data/NMP_goals.csv')
df_NMP_id = read_data('../../data/clean_data/students_NMP_id.csv')

# keeping rows for experiment.
# dropping for both section week 5, 8 and week 15
# dropping for sec1 week 12
# dropping for sec2 week 13 anf week 16
indexDrop = df_report_sec1[(df_report_sec1['week'] == 5) | 
                           (df_report_sec1['week'] == 9) | 
                           (df_report_sec1['week'] == 12) | 
                           (df_report_sec1['week'] == 15)].index
df_report_sec1.drop(indexDrop, inplace=True)
indexDrop = df_NMP_planning[(df_NMP_planning['week'] == 5) |
                           (df_NMP_planning['week'] == 9) | 
                           (df_NMP_planning['week'] == 13) | 
                           (df_NMP_planning['week'] == 15) | 
                           (df_NMP_planning['week'] == 16)].index
df_NMP_planning.drop(indexDrop, inplace=True)

# replacing columns
_cols_to_replace = {
    6: 5,
    7: 6,
    8: 7,
    10: 8,
    11: 9,
    13: 10,
    12: 10,
    14: 11
}
df_report_sec1.replace(_cols_to_replace, inplace=True)
df_NMP_planning.replace(_cols_to_replace, inplace=True)

# Adding num_alumno
df_report_sec2 = merging(df_NMP_id, df_NMP_planning, 
                         'NMP_user_id', 'NMP_user_id').drop('NMP_user_id', axis=1)

# cleaning memory
del df_NMP_planning
del df_NMP_id

# concat
df_planning_report = pd.concat([df_report_sec1, df_report_sec2], ignore_index=True)

df_planning_report = pd.merge(left=df_planning_report[['num_alumno', 'week']].drop_duplicates(), left_index=True, 
         right=df_planning_report, right_index=True, how='left').drop(['num_alumno_y', 'week_y'], axis=1).rename({
                                                                                                                    'num_alumno_x': 'num_alumno',
                                                                                                                    'week_x': 'week'
                                                                                                                }, axis=1)

# export data
export_data(df_planning_report, '../../data/final_data/planning_report.csv')

del df_report_sec1
del df_report_sec2