# (15) Cleaning Students GPA

* **author** = Diego Sapunar-Opazo
* **copyright** = Copyright 2019, Thesis M.Sc. Diego Sapunar - Pontificia Universidad Cat√≥lica de Chile
* **credits** = Diego Sapunar-Opazo, Ronald Perez, Mar Perez-Sanagustin, Jorge Maldonado-Mahauad
* **maintainer** = Diego Sapunar-Opazo
* **email** = dasapunar@uc.cl
* **status** = Dev

This script gets the raw background academic performance (GPA) of the students and clean it, creating a csv file students_GPA with:

(1) **num_alumno**, which corresponds to the internal face-to-face students' id

(2) **GPA**, which corresponds to the student's GPA from 1 to 7

## Part 0: Import Packages

In [1]:
# data analysis and wrangling
import pandas as pd
import numpy as np

## Part 1: Getting the Data

In [2]:
def read_data(path):
    '''
    Read a .csv file and convert it in a Pandas DataFrame.
    
    Input:
    path - String: path where the .csv is located.
    
    Output:
    Pandas DataFrame: .csv in the Pandas DataFrame format.
    '''
    
    return pd.read_csv(path)

## Part 2: Data Preprocessing & Wrangling

In [3]:
def preprocc_data(df, slices=False, columns_to_rename=False):
    '''
    From a dataframe on the fly, (1) get the necessary columns; (2) rename columns; and (3) clean data.
    
    Input: 
    df - Pandas DataFrame: dataframe to be cleaned.
    columns_to_rename - Dict: Columns to rename, Key: original name, Value: new name.
    datetime - List of Strings: List of the names of the columns to be datetime.
    
    Output:
    df - Pandas DataFrame: the dataframe already cleaned.
    '''
    
    df_cleaned = df.copy()
    
    # slicing the columns, getting only the one that I need (num_alumno and seccion)
    if slices:
        df_cleaned = df_cleaned.iloc[:,slices]
    
    del df  # clean memory
    
    # rename columns
    if columns_to_rename:
        df_cleaned.rename(_columns_to_rename, 
                          inplace=True, 
                          axis=1)
    
    
    return df_cleaned

## Part 3: Export Data

In [4]:
def export_data(df, path):
    '''
    Export df in .csv fole to the path.
    
    Input:
    df - Pandas DataFrame: dataframe to be exported.
    path - String: path where the .csv will be exported.
    '''
    
    df.to_csv(path, index=False)

## Part 4: Main

In [5]:
_columns_to_rename = {
    'Alumno.NroAlumno': 'num_alumno',
    'AlumnoVigentePeriodo.PPA': 'GPA'
}

df = read_data('../data/raw_data/background/students_GPA.csv')

df = preprocc_data(df, slices=[len(df.columns) - 2, len(df.columns) - 1], columns_to_rename=_columns_to_rename)

df.dropna(inplace=True)



# Exporting
export_data(df, '../data/clean_data/students_GPA.csv')

In [7]:
len(df)

216

In [8]:
df_sec = pd.read_csv('../data/clean_data/students_sec.csv')

In [9]:
aux = pd.merge(left=df_sec, right=df, left_on='num_alumno', right_on='num_alumno')

In [11]:
len(aux)

211

In [12]:
len(df_sec)

242

In [15]:
len(aux[aux['sec'] == 2])

117