##Red neuronal - predicción rendimiento de estudiantes

Este es un proyecto de machine learning o aprenizaje automatico con la finalidad de practicar las diferentes habilidades requeridad para este tipo de tareas . El proposito es realizar una red neuronal capaz de analizar ciertas caracteristicas de estudiantes entregadas por medio de un dataset y que esta sea capaz de predecir posibles rendimientos futuros de otros estudiantes.

Realizado por Luis Felipe Sánchez Sánchez

Importamos Librerias

In [14]:
import tensorflow as tf
import numpy as np
import logging
logger = tf.get_logger()
logger.setLevel(logging.ERROR)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import accuracy_score

Conectamos a drive y extraemos el dataset

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
#Conectamos con el dataset
students_df = pd.read_csv('/content/drive/MyDrive/Ejecicios de practica/Red-Neuronal-Rendimiento-Estudiantes/study_performance.csv')

Validamos el dataset

In [4]:
students_df

Unnamed: 0,gender,race_ethnicity,parental_level_of_education,lunch,test_preparation_course,math_score,reading_score,writing_score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77


Validamos la limpieza de datos y datos faltantes

In [5]:
students_df.isnull().sum()

Unnamed: 0,0
gender,0
race_ethnicity,0
parental_level_of_education,0
lunch,0
test_preparation_course,0
math_score,0
reading_score,0
writing_score,0


In [6]:
students_df.isna()

Unnamed: 0,gender,race_ethnicity,parental_level_of_education,lunch,test_preparation_course,math_score,reading_score,writing_score
0,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...
995,False,False,False,False,False,False,False,False
996,False,False,False,False,False,False,False,False
997,False,False,False,False,False,False,False,False
998,False,False,False,False,False,False,False,False


##Sección analisis de datos

Graficamos información para enteder un poco el dataset

##Procesamiendo de la Red

Variable a precedir, dependiente: "Score"

In [7]:
#Extraemos todas las columnas que tengan en su nombre "column" y las agrupamos en un arreglo
scrore_columns = [columna for columna in students_df.columns if columna.endswith('score')]
scrore_columns

['math_score', 'reading_score', 'writing_score']

In [8]:
#Agregamos la columna score con la media de los 3 score
students_df['score'] = round(students_df[scrore_columns].sum(axis=1)/30)
students_df

Unnamed: 0,gender,race_ethnicity,parental_level_of_education,lunch,test_preparation_course,math_score,reading_score,writing_score,score
0,female,group B,bachelor's degree,standard,none,72,72,74,7.0
1,female,group C,some college,standard,completed,69,90,88,8.0
2,female,group B,master's degree,standard,none,90,95,93,9.0
3,male,group A,associate's degree,free/reduced,none,47,57,44,5.0
4,male,group C,some college,standard,none,76,78,75,8.0
...,...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95,9.0
996,male,group C,high school,free/reduced,none,62,55,55,6.0
997,female,group C,high school,free/reduced,completed,59,71,65,6.0
998,female,group D,some college,standard,completed,68,78,77,7.0


In [9]:
#Creamos una copia del dataset para evitar perdida de información
students_df_cp = students_df.copy()

In [10]:
students_df_cp

Unnamed: 0,gender,race_ethnicity,parental_level_of_education,lunch,test_preparation_course,math_score,reading_score,writing_score,score
0,female,group B,bachelor's degree,standard,none,72,72,74,7.0
1,female,group C,some college,standard,completed,69,90,88,8.0
2,female,group B,master's degree,standard,none,90,95,93,9.0
3,male,group A,associate's degree,free/reduced,none,47,57,44,5.0
4,male,group C,some college,standard,none,76,78,75,8.0
...,...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95,9.0
996,male,group C,high school,free/reduced,none,62,55,55,6.0
997,female,group C,high school,free/reduced,completed,59,71,65,6.0
998,female,group D,some college,standard,completed,68,78,77,7.0


In [11]:
#Eliminamos el arreglo ya creado que contenia las columnas de score
students_df_cp.drop(scrore_columns, axis=1, inplace=True)
students_df_cp

Unnamed: 0,gender,race_ethnicity,parental_level_of_education,lunch,test_preparation_course,score
0,female,group B,bachelor's degree,standard,none,7.0
1,female,group C,some college,standard,completed,8.0
2,female,group B,master's degree,standard,none,9.0
3,male,group A,associate's degree,free/reduced,none,5.0
4,male,group C,some college,standard,none,8.0
...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,9.0
996,male,group C,high school,free/reduced,none,6.0
997,female,group C,high school,free/reduced,completed,6.0
998,female,group D,some college,standard,completed,7.0


Tratamiento de variables categoricas

In [16]:
#Se hará una conversión de variables con el metodo OneHotEncoder, con la finalidad de que las predicciones sean más precisas
encoder = OneHotEncoder(sparse_output=False)

In [20]:
#Cuando el arreglo es "1,0" es mujer, cuando es "0,1" es hombre
encoded_data = encoder.fit_transform(students_df_cp[['gender']])
encoded_data

array([[1., 0.],
       [1., 0.],
       [1., 0.],
       ...,
       [1., 0.],
       [1., 0.],
       [1., 0.]])

In [21]:
#Si gender_female es 1.0 es mujer, pero, si gender_male es 1.0 es hombre
encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out(['gender']))
encoded_df

Unnamed: 0,gender_female,gender_male
0,1.0,0.0
1,1.0,0.0
2,1.0,0.0
3,0.0,1.0
4,0.0,1.0
...,...,...
995,1.0,0.0
996,0.0,1.0
997,1.0,0.0
998,1.0,0.0


In [22]:
#Eliminamos la columna gender del studentes_df_cp y agregamos las dos que acabamos de crear
students_df_cp.drop('gender', axis=1, inplace=True)
students_df_cp = pd.concat([students_df_cp, encoded_df], axis=1)
students_df_cp

Unnamed: 0,race_ethnicity,parental_level_of_education,lunch,test_preparation_course,score,gender_female,gender_male
0,group B,bachelor's degree,standard,none,7.0,1.0,0.0
1,group C,some college,standard,completed,8.0,1.0,0.0
2,group B,master's degree,standard,none,9.0,1.0,0.0
3,group A,associate's degree,free/reduced,none,5.0,0.0,1.0
4,group C,some college,standard,none,8.0,0.0,1.0
...,...,...,...,...,...,...,...
995,group E,master's degree,standard,completed,9.0,1.0,0.0
996,group C,high school,free/reduced,none,6.0,0.0,1.0
997,group C,high school,free/reduced,completed,6.0,1.0,0.0
998,group D,some college,standard,completed,7.0,1.0,0.0
