# (11) Cleaning NMP Planning

* **author** = Diego Sapunar-Opazo
* **copyright** = Copyright 2019, Thesis M.Sc. Diego Sapunar - Pontificia Universidad Católica de Chile
* **credits** = Diego Sapunar-Opazo, Ronald Perez, Mar Perez-Sanagustin, Jorge Maldonado-Mahauad
* **maintainer** = Diego Sapunar-Opazo
* **email** = dasapunar@uc.cl
* **status** = Dev

This scripts gets the raw NMP logs of goal setting for the experimental group and clean it, creating the csv file NMP_goals with:

(1) **NMP_user_id**, which corresponds to NMP's internal id

(2) **week**, which corresponds to the week

(3) **planning_hours**, which corresponds to the student's planning hours for the next week

(4) **planning_number_videos**, which corresponds to the student's planning number of videos for the next week

(5) **planning_number_quiz**, which corresponds to the student's planning number of quiz for the next week

## Part 0: Import Packages

In [1]:
# data analysis and wrangling
import pandas as pd
import numpy as np

## Part 1: Getting the Data

In [2]:
def read_data(path):
    '''
    Read a .csv file and convert it in a Pandas DataFrame.
    
    Input:
    path - String: path where the .csv is located.
    
    Output:
    Pandas DataFrame: .csv in the Pandas DataFrame format.
    '''
    
    return pd.read_csv(path)

## Part 2: Data Preprocessing & Wrangling

In [3]:
def preprocc_data(df, slices=False, columns_to_rename=False):
    '''
    From a dataframe on the fly, (1) get the necessary columns; (2) rename columns; and (3) clean data.
    
    Input: 
    df - Pandas DataFrame: dataframe to be cleaned.
    columns_to_rename - Dict: Columns to rename, Key: original name, Value: new name.
    datetime - List of Strings: List of the names of the columns to be datetime.
    
    Output:
    df - Pandas DataFrame: the dataframe already cleaned.
    '''
    
    df_cleaned = df.copy()
    
    # slicing the columns, getting only the one that I need (num_alumno and seccion)
    if slices:
        df_cleaned = df_cleaned.iloc[:,slices]
    
    del df  # clean memory
    
    # rename columns
    if columns_to_rename:
        df_cleaned.rename(_columns_to_rename, 
                          inplace=True, 
                          axis=1)
    
    
    return df_cleaned

## Part 3: Export Data

In [4]:
def export_data(df, path):
    '''
    Export df in .csv fole to the path.
    
    Input:
    df - Pandas DataFrame: dataframe to be exported.
    path - String: path where the .csv will be exported.
    '''
    
    df.to_csv(path, index=False)

## Part 4: Main

In [6]:
_columns_to_rename = {
    'user_id': 'NMP_user_id',
    'goal_hours': 'planning_hours',
    'goal_videos': 'planning_number_videos',
    'goal_evaluations': 'planning_number_quiz'
} 

df = read_data('../../data/raw_data/NMP/NMP_planning.csv')

df = preprocc_data(df, slices=[0,10,4,6,8], columns_to_rename=_columns_to_rename)

df.fillna(0, inplace=True)

# exporting
export_data(df, '../../data/clean_data/NMP_goals.csv')