# Desafio 2: Rendimiento escolar

## Hito 1

### Preliminares

El objetivo de este desafio es generar un modelo que identifique aquellos alumnos que presentan un bajo desempeño académico, medido en el promedio final del año escolar. Además se debe inspeccionar una batería de preguntas asociadas a aspectos ambientales del alumno (de famrel a healt) y ver si éstas pueden abstraer en categorías latentes.

El desafio trata de un problema de **Regresion** para lo cual utilizare la ***Regresion Lineal***, y para validar el modelo me basare en las metricas de error cuadrático y r cuadrado realizando una comparación entre los modelos desarrollados.

### Aspectos Computacionales

A continuacion las librerias a utilizar seran:
   
    - warning: Para tener un notebook mas limpio y sin las alertas de deprecaion.
    - pandas: Libreria utilizada para el analisis de datos y lectura de archivos.
    - numpy: Libreria optimizada para el analisis de datos.
    - modulo stats: Modulo stats para realizar simulacion.
    - matplotlib: Libreria utilizada para realizar graficos
    - preprocessing: Lo utilizare para el preprocesamiento de datos.
    - seborn: Libreria utilizada para realizar graficos de manera simple.
    - modulo linear_model: Se utiliza para el modelo de regresion lineal
    - modulos mean_squared_error y r2_score: son las metricas que utilizare para validar el modelo.
    - modulo train_test_split: Se utiliza para separar los datos entre train y test. 
    - func_analitics: Son las funciones creadas por mi para realizar la manipulacion de los datos.
    

In [1]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import func_analitics as fa #funciones de analisis

In [2]:
df = pd.read_csv('students.csv')

In [3]:
df

Unnamed: 0,|school|sex|age|address|famsize|Pstatus|Medu|Fedu|Mjob|Fjob|reason|guardian|traveltime|studytime|failures|schoolsup|famsup|paid|activities|nursery|higher|internet|romantic|famrel|freetime|goout|Dalc|Walc|health|absences|G1|G2|G3
0,0|GP|F|nulidade|U|GT3|A|4|4|at_home|teacher|co...
1,"1|GP|F|""""""17""""""|U|GT3|T|1|1|at_home|other|cour..."
2,"2|GP|F|""""""15""""""|U|LE3|T|1|1|at_home|other|othe..."
3,"3|GP|F|""""""15""""""|U|GT3|T|4|2|health|services|ho..."
4,4|GP|F|sem validade|U|GT3|T|3|3|other|other|ho...
...,...
390,"390|MS|M|""""""20""""""|U|LE3|A|2|2|services|service..."
391,"391|MS|M|""""""17""""""|U|LE3|T|3|1|services|service..."
392,"392|MS|M|""""""21""""""|R|GT3|T|1|1|other|other|cour..."
393,"393|MS|M|""""""18""""""|R|LE3|T|3|2|services|other|c..."


In [4]:
df.columns

Index(['|school|sex|age|address|famsize|Pstatus|Medu|Fedu|Mjob|Fjob|reason|guardian|traveltime|studytime|failures|schoolsup|famsup|paid|activities|nursery|higher|internet|romantic|famrel|freetime|goout|Dalc|Walc|health|absences|G1|G2|G3'], dtype='object')

In [5]:
df['|school|sex|age|address|famsize|Pstatus|Medu|Fedu|Mjob|Fjob|reason|guardian|traveltime|studytime|failures|schoolsup|famsup|paid|activities|nursery|higher|internet|romantic|famrel|freetime|goout|Dalc|Walc|health|absences|G1|G2|G3'].head(1)

0    0|GP|F|nulidade|U|GT3|A|4|4|at_home|teacher|co...
Name: |school|sex|age|address|famsize|Pstatus|Medu|Fedu|Mjob|Fjob|reason|guardian|traveltime|studytime|failures|schoolsup|famsup|paid|activities|nursery|higher|internet|romantic|famrel|freetime|goout|Dalc|Walc|health|absences|G1|G2|G3, dtype: object

In [6]:
df['|school|sex|age|address|famsize|Pstatus|Medu|Fedu|Mjob|Fjob|reason|guardian|traveltime|studytime|failures|schoolsup|famsup|paid|activities|nursery|higher|internet|romantic|famrel|freetime|goout|Dalc|Walc|health|absences|G1|G2|G3'].head(1).values

array(['0|GP|F|nulidade|U|GT3|A|4|4|at_home|teacher|course|mother|2|2|0|yes|no|no|no|yes|yes|no|no|4|3|"""4"""|1|1|"""3"""|6|5|6|6'],
      dtype=object)

In [7]:
len(df['|school|sex|age|address|famsize|Pstatus|Medu|Fedu|Mjob|Fjob|reason|guardian|traveltime|studytime|failures|schoolsup|famsup|paid|activities|nursery|higher|internet|romantic|famrel|freetime|goout|Dalc|Walc|health|absences|G1|G2|G3'].head(1).values[0].split('|'))

34

In [8]:
df['|school|sex|age|address|famsize|Pstatus|Medu|Fedu|Mjob|Fjob|reason|guardian|traveltime|studytime|failures|schoolsup|famsup|paid|activities|nursery|higher|internet|romantic|famrel|freetime|goout|Dalc|Walc|health|absences|G1|G2|G3'].index

RangeIndex(start=0, stop=395, step=1)

In [9]:
df.columns[0].split('|')[1]

'school'

In [10]:
sort_data = pd.DataFrame()

In [11]:
for categoria in df.columns[0].split('|'):
    sort_data[categoria]=None

In [12]:
df.columns[0].split('|')

['',
 'school',
 'sex',
 'age',
 'address',
 'famsize',
 'Pstatus',
 'Medu',
 'Fedu',
 'Mjob',
 'Fjob',
 'reason',
 'guardian',
 'traveltime',
 'studytime',
 'failures',
 'schoolsup',
 'famsup',
 'paid',
 'activities',
 'nursery',
 'higher',
 'internet',
 'romantic',
 'famrel',
 'freetime',
 'goout',
 'Dalc',
 'Walc',
 'health',
 'absences',
 'G1',
 'G2',
 'G3']

In [13]:
sort_data

Unnamed: 0,Unnamed: 1,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3


In [14]:
df.loc[2, :][0].split('|')


['2',
 'GP',
 'F',
 '"""15"""',
 'U',
 'LE3',
 'T',
 '1',
 '1',
 'at_home',
 'other',
 'other',
 'mother',
 '1',
 '2',
 '3',
 'yes',
 'no',
 'yes',
 'no',
 'yes',
 'yes',
 'yes',
 'no',
 '4',
 '3',
 '"""2"""',
 '2',
 '3',
 '"""3"""',
 '10',
 'zero',
 '8',
 '10']

In [15]:
for num in range(len(df)):
    nueva_fila = pd.Series(df.loc[num, :][0].split('|'), index = sort_data.columns)
    sort_data = sort_data.append(nueva_fila, ignore_index=True)


In [16]:
sort_data

Unnamed: 0,Unnamed: 1,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,0,GP,F,nulidade,U,GT3,A,4,4,at_home,...,4,3,"""""""4""""""",1,1,"""""""3""""""",6,5,6,6
1,1,GP,F,"""""""17""""""",U,GT3,T,1,1,at_home,...,5,3,"""""""3""""""",1,1,"""""""3""""""",4,5,5,6
2,2,GP,F,"""""""15""""""",U,LE3,T,1,1,at_home,...,4,3,"""""""2""""""",2,3,"""""""3""""""",10,zero,8,10
3,3,GP,F,"""""""15""""""",U,GT3,T,4,2,health,...,3,2,"""""""2""""""",1,1,"""""""5""""""",2,15,14,15
4,4,GP,F,sem validade,U,GT3,T,3,3,other,...,4,3,"""""""2""""""",1,2,"""""""5""""""",4,6,10,10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
390,390,MS,M,"""""""20""""""",U,LE3,A,2,2,services,...,5,5,"""""""4""""""",4,5,"""""""4""""""",11,9,9,9
391,391,MS,M,"""""""17""""""",U,LE3,T,3,1,services,...,2,4,"""""""5""""""",3,4,"""""""2""""""",3,14,16,16
392,392,MS,M,"""""""21""""""",R,GT3,T,1,1,other,...,5,5,"""""""3""""""",3,3,"""""""3""""""",3,10,8,7
393,393,MS,M,"""""""18""""""",R,LE3,T,3,2,services,...,4,4,"""""""1""""""",3,4,"""""""5""""""",0,11,12,10


In [17]:
sort_data = sort_data.drop([''], axis=1)
    

In [18]:
sort_data

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,nulidade,U,GT3,A,4,4,at_home,teacher,...,4,3,"""""""4""""""",1,1,"""""""3""""""",6,5,6,6
1,GP,F,"""""""17""""""",U,GT3,T,1,1,at_home,other,...,5,3,"""""""3""""""",1,1,"""""""3""""""",4,5,5,6
2,GP,F,"""""""15""""""",U,LE3,T,1,1,at_home,other,...,4,3,"""""""2""""""",2,3,"""""""3""""""",10,zero,8,10
3,GP,F,"""""""15""""""",U,GT3,T,4,2,health,services,...,3,2,"""""""2""""""",1,1,"""""""5""""""",2,15,14,15
4,GP,F,sem validade,U,GT3,T,3,3,other,other,...,4,3,"""""""2""""""",1,2,"""""""5""""""",4,6,10,10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
390,MS,M,"""""""20""""""",U,LE3,A,2,2,services,services,...,5,5,"""""""4""""""",4,5,"""""""4""""""",11,9,9,9
391,MS,M,"""""""17""""""",U,LE3,T,3,1,services,services,...,2,4,"""""""5""""""",3,4,"""""""2""""""",3,14,16,16
392,MS,M,"""""""21""""""",R,GT3,T,1,1,other,other,...,5,5,"""""""3""""""",3,3,"""""""3""""""",3,10,8,7
393,MS,M,"""""""18""""""",R,LE3,T,3,2,services,other,...,4,4,"""""""1""""""",3,4,"""""""5""""""",0,11,12,10


In [19]:
for columna in sort_data.columns:
    sort_data[columna]=np.where((sort_data[columna] =='nulidade')|(sort_data[columna] =='sem validade')| 
                                (sort_data[columna] =='zero'), np.nan, sort_data[columna])

In [20]:
sort_data

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,,U,GT3,A,4,4,at_home,teacher,...,4,3,"""""""4""""""",1,1,"""""""3""""""",6,5,6,6
1,GP,F,"""""""17""""""",U,GT3,T,1,1,at_home,other,...,5,3,"""""""3""""""",1,1,"""""""3""""""",4,5,5,6
2,GP,F,"""""""15""""""",U,LE3,T,1,1,at_home,other,...,4,3,"""""""2""""""",2,3,"""""""3""""""",10,,8,10
3,GP,F,"""""""15""""""",U,GT3,T,4,2,health,services,...,3,2,"""""""2""""""",1,1,"""""""5""""""",2,15,14,15
4,GP,F,,U,GT3,T,3,3,other,other,...,4,3,"""""""2""""""",1,2,"""""""5""""""",4,6,10,10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
390,MS,M,"""""""20""""""",U,LE3,A,2,2,services,services,...,5,5,"""""""4""""""",4,5,"""""""4""""""",11,9,9,9
391,MS,M,"""""""17""""""",U,LE3,T,3,1,services,services,...,2,4,"""""""5""""""",3,4,"""""""2""""""",3,14,16,16
392,MS,M,"""""""21""""""",R,GT3,T,1,1,other,other,...,5,5,"""""""3""""""",3,3,"""""""3""""""",3,10,8,7
393,MS,M,"""""""18""""""",R,LE3,T,3,2,services,other,...,4,4,"""""""1""""""",3,4,"""""""5""""""",0,11,12,10


In [21]:
sort_data[['Medu','Fedu','traveltime','studytime','failures','famrel', 'freetime','Dalc', 'Walc', 'absences',
           'G1','G2','G3']] = sort_data[['Medu','Fedu','traveltime','studytime','failures','famrel', 'freetime',
                                         'Dalc', 'Walc', 'absences','G1','G2','G3']].astype(float)

In [22]:
p= '"""15"""'
caracteres = '"""'
p =int(''.join(x for x in p if x not in caracteres))
print(p)

15


In [23]:
def limpieza(p):
    caracteres = '"""'
    new =int(''.join(x for x in p if x not in caracteres))
    return new

In [24]:
for colname in ['age','goout', 'health']:
    new_valor = []
    for valor in sort_data[colname].values:
        if pd.isna(valor) | isinstance(valor,float):
            new_valor.append(valor)
        else:
            limpio = limpieza(valor)
            new_valor.append(limpio)
    sort_data[colname] = sort_data[colname].replace(sort_data[colname].values, new_valor)

In [25]:
sort_data

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,,U,GT3,A,4.0,4.0,at_home,teacher,...,4.0,3.0,4.0,1.0,1.0,3.0,6.0,5.0,6.0,6.0
1,GP,F,17.0,U,GT3,T,1.0,1.0,at_home,other,...,5.0,3.0,3.0,1.0,1.0,3.0,4.0,5.0,5.0,6.0
2,GP,F,15.0,U,LE3,T,1.0,1.0,at_home,other,...,4.0,3.0,2.0,2.0,3.0,3.0,10.0,,8.0,10.0
3,GP,F,15.0,U,GT3,T,4.0,2.0,health,services,...,3.0,2.0,2.0,1.0,1.0,5.0,2.0,15.0,14.0,15.0
4,GP,F,,U,GT3,T,3.0,3.0,other,other,...,4.0,3.0,2.0,1.0,2.0,5.0,4.0,6.0,10.0,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
390,MS,M,20.0,U,LE3,A,2.0,2.0,services,services,...,5.0,5.0,4.0,4.0,5.0,4.0,11.0,9.0,9.0,9.0
391,MS,M,17.0,U,LE3,T,3.0,1.0,services,services,...,2.0,4.0,5.0,3.0,4.0,2.0,3.0,14.0,16.0,16.0
392,MS,M,21.0,R,GT3,T,1.0,1.0,other,other,...,5.0,5.0,3.0,3.0,3.0,3.0,3.0,10.0,8.0,7.0
393,MS,M,18.0,R,LE3,T,3.0,2.0,services,other,...,4.0,4.0,1.0,3.0,4.0,5.0,0.0,11.0,12.0,10.0


In [26]:
rdf = sort_data.copy()

In [27]:
rdf.sex = np.where(rdf.sex=='M', 1, 0)

Los hombres se codifican con 0 y las mujeres con 1

In [28]:
rdf.school = np.where(rdf.school=='MS', 1, 0)

La escuela MS(Mousinho da Silviera) se recodifico con 1 y la escuela GP(Gabriela Pereira)

In [29]:
rdf.address = np.where(rdf.address=='R', 1, 0)

La ubicación de la casa del estudiante se binarizo con un 1 si es R(rural) y con 0 si es U(urbano)

In [30]:
rdf.famsize = np.where(rdf.famsize=='LE3', 1, 0)

Se binarizo la variable famsize y se le colocó un 1 a LE3(less or equal to 3) y un 0 a GT3(greater than 3)

In [31]:
rdf.Pstatus = np.where(rdf.Pstatus=='A', 1, 0)

Se recodifico la variable Pstatus y se le colocó un 1 a A(viviendo separados) y un 0 a (viviendo juntos)

In [32]:
rdf.schoolsup = np.where(rdf.schoolsup=='yes', 1, 0)

Se recodifico la variable schoolsup y se le colocó un 1 a yes y un 0 a no

In [33]:
rdf.famsup = np.where(rdf.famsup=='no', 1, 0)

Se recodifico la variable famsup y se le colocó un 0 a yes y un 1 a no

In [34]:
rdf.nursery = np.where(rdf.nursery=='no', 1, 0)

Se recodifico la variable nursery y se le colocó un 0 a no y un 1 a yes

In [35]:
rdf.higher = np.where(rdf.higher=='no', 1, 0)

Se recodifico la variable nursery y se le colocó un 1 a no y 0 a yes

In [36]:
rdf.internet = np.where(rdf.internet=='no', 1, 0)

Se recodifico la variable internet y se le colocó un 1 a no y 0 a yes

In [37]:
rdf.romantic = np.where(rdf.romantic=='yes', 1, 0)

Se recodifico la variable romantic y se le colocó un 1 a yes y 0 a no

In [38]:
rdf

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,0,0,,0,0,1,4.0,4.0,at_home,teacher,...,4.0,3.0,4.0,1.0,1.0,3.0,6.0,5.0,6.0,6.0
1,0,0,17.0,0,0,0,1.0,1.0,at_home,other,...,5.0,3.0,3.0,1.0,1.0,3.0,4.0,5.0,5.0,6.0
2,0,0,15.0,0,1,0,1.0,1.0,at_home,other,...,4.0,3.0,2.0,2.0,3.0,3.0,10.0,,8.0,10.0
3,0,0,15.0,0,0,0,4.0,2.0,health,services,...,3.0,2.0,2.0,1.0,1.0,5.0,2.0,15.0,14.0,15.0
4,0,0,,0,0,0,3.0,3.0,other,other,...,4.0,3.0,2.0,1.0,2.0,5.0,4.0,6.0,10.0,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
390,1,1,20.0,0,1,1,2.0,2.0,services,services,...,5.0,5.0,4.0,4.0,5.0,4.0,11.0,9.0,9.0,9.0
391,1,1,17.0,0,1,0,3.0,1.0,services,services,...,2.0,4.0,5.0,3.0,4.0,2.0,3.0,14.0,16.0,16.0
392,1,1,21.0,1,0,0,1.0,1.0,other,other,...,5.0,5.0,3.0,3.0,3.0,3.0,3.0,10.0,8.0,7.0
393,1,1,18.0,1,1,0,3.0,2.0,services,other,...,4.0,4.0,1.0,3.0,4.0,5.0,0.0,11.0,12.0,10.0


A continuacion dejare con valores numericos las variables categoricas.

In [39]:
rdf.Mjob.values

array(['at_home', 'at_home', 'at_home', 'health', 'other', 'services',
       'other', 'other', 'services', 'other', nan, 'services', 'health',
       'teacher', 'other', 'health', 'services', 'other', 'services',
       'health', 'teacher', 'health', 'teacher', 'other', 'services',
       'services', 'other', 'health', 'services', 'teacher', 'health',
       'services', 'teacher', 'other', 'other', 'other', 'teacher',
       'other', 'services', 'at_home', 'other', 'teacher', 'services',
       'services', 'other', 'other', 'other', 'health', 'teacher',
       'services', 'services', 'health', 'health', 'services', 'other',
       'other', 'services', 'teacher', 'other', 'services', 'health',
       'services', 'other', 'teacher', 'services', 'teacher', 'other',
       'services', 'health', 'other', 'other', 'other', 'other', 'other',
       'other', 'teacher', 'teacher', 'other', 'other', 'at_home',
       'other', 'other', 'services', 'services', 'other', 'services',
       'at_home

In [40]:
mjob = preprocessing.LabelEncoder()
mjob.fit(rdf.Mjob.values)

LabelEncoder()

In [41]:
mjob.classes_

array(['at_home', 'health', 'other', 'services', 'teacher', nan],
      dtype=object)

In [42]:
mjob.transform(rdf.Mjob.values)

array([0, 0, 0, 1, 2, 3, 2, 2, 3, 2, 5, 3, 1, 4, 2, 1, 3, 2, 3, 1, 4, 1,
       4, 2, 3, 3, 2, 1, 3, 4, 1, 3, 4, 2, 2, 2, 4, 2, 3, 0, 2, 4, 3, 3,
       2, 2, 2, 1, 4, 3, 3, 1, 1, 3, 2, 2, 3, 4, 2, 3, 1, 3, 2, 4, 3, 4,
       2, 3, 1, 2, 2, 2, 2, 2, 2, 4, 4, 2, 2, 0, 2, 2, 3, 3, 2, 3, 0, 2,
       3, 4, 2, 3, 5, 4, 3, 0, 3, 2, 2, 2, 3, 3, 3, 3, 3, 2, 2, 3, 2, 1,
       4, 3, 0, 4, 1, 4, 2, 2, 2, 2, 0, 3, 5, 1, 2, 3, 2, 0, 3, 4, 3, 0,
       2, 3, 0, 3, 0, 2, 3, 4, 4, 3, 4, 0, 2, 2, 1, 0, 4, 3, 2, 0, 3, 3,
       2, 0, 2, 0, 0, 2, 0, 2, 2, 0, 2, 3, 2, 1, 2, 1, 2, 2, 4, 0, 2, 4,
       3, 2, 4, 2, 4, 3, 3, 2, 2, 3, 3, 2, 1, 0, 3, 0, 0, 3, 2, 3, 3, 4,
       3, 4, 1, 2, 2, 2, 3, 0, 3, 4, 0, 4, 2, 3, 2, 3, 3, 2, 2, 3, 0, 0,
       0, 0, 3, 2, 4, 2, 2, 3, 0, 2, 1, 2, 4, 1, 2, 0, 2, 2, 0, 2, 1, 4,
       4, 3, 2, 2, 2, 3, 2, 2, 3, 0, 3, 2, 2, 1, 4, 3, 2, 3, 3, 4, 2, 2,
       0, 2, 3, 4, 1, 2, 2, 2, 2, 0, 0, 3, 2, 4, 1, 4, 3, 4, 0, 2, 2, 2,
       0, 3, 3, 4, 4, 1, 3, 3, 3, 1, 1, 2, 2, 4, 1,

In [43]:
rdf.Mjob = rdf.Mjob.replace(rdf.Mjob.values,mjob.transform(rdf.Mjob.values))

In [44]:
rdf

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,0,0,,0,0,1,4.0,4.0,0,teacher,...,4.0,3.0,4.0,1.0,1.0,3.0,6.0,5.0,6.0,6.0
1,0,0,17.0,0,0,0,1.0,1.0,0,other,...,5.0,3.0,3.0,1.0,1.0,3.0,4.0,5.0,5.0,6.0
2,0,0,15.0,0,1,0,1.0,1.0,0,other,...,4.0,3.0,2.0,2.0,3.0,3.0,10.0,,8.0,10.0
3,0,0,15.0,0,0,0,4.0,2.0,1,services,...,3.0,2.0,2.0,1.0,1.0,5.0,2.0,15.0,14.0,15.0
4,0,0,,0,0,0,3.0,3.0,2,other,...,4.0,3.0,2.0,1.0,2.0,5.0,4.0,6.0,10.0,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
390,1,1,20.0,0,1,1,2.0,2.0,3,services,...,5.0,5.0,4.0,4.0,5.0,4.0,11.0,9.0,9.0,9.0
391,1,1,17.0,0,1,0,3.0,1.0,3,services,...,2.0,4.0,5.0,3.0,4.0,2.0,3.0,14.0,16.0,16.0
392,1,1,21.0,1,0,0,1.0,1.0,2,other,...,5.0,5.0,3.0,3.0,3.0,3.0,3.0,10.0,8.0,7.0
393,1,1,18.0,1,1,0,3.0,2.0,3,other,...,4.0,4.0,1.0,3.0,4.0,5.0,0.0,11.0,12.0,10.0


In [45]:
fjob = preprocessing.LabelEncoder()
fjob.fit(rdf.Fjob.values)
fjob.classes_

array(['at_home', 'health', 'other', 'services', 'teacher'], dtype=object)

In [46]:
fjob.transform(rdf.Fjob.values)

array([4, 2, 2, 3, 2, 2, 2, 4, 2, 2, 1, 2, 3, 2, 2, 2, 3, 2, 3, 2, 2, 1,
       2, 2, 1, 3, 2, 3, 2, 4, 3, 3, 0, 2, 2, 2, 3, 4, 1, 2, 2, 2, 4, 3,
       0, 2, 3, 3, 2, 4, 3, 2, 1, 3, 2, 2, 3, 1, 0, 2, 4, 3, 3, 1, 3, 3,
       3, 2, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 2, 3, 2, 3, 2, 2,
       2, 1, 2, 2, 2, 3, 1, 2, 2, 2, 2, 0, 3, 4, 2, 2, 2, 1, 2, 2, 2, 1,
       4, 2, 2, 2, 3, 4, 4, 3, 2, 2, 3, 3, 1, 2, 2, 3, 2, 2, 2, 4, 4, 2,
       2, 2, 4, 0, 2, 2, 2, 4, 3, 3, 3, 0, 2, 3, 3, 2, 4, 2, 2, 2, 3, 0,
       4, 3, 2, 2, 2, 3, 2, 2, 2, 3, 3, 3, 2, 3, 2, 1, 2, 2, 2, 3, 2, 2,
       2, 2, 3, 2, 2, 2, 3, 2, 2, 3, 3, 2, 2, 2, 3, 3, 3, 3, 2, 3, 4, 2,
       4, 4, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 0,
       3, 2, 4, 2, 3, 2, 2, 3, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 3, 3, 2, 2,
       2, 3, 2, 2, 2, 3, 3, 2, 2, 2, 3, 2, 2, 2, 3, 0, 2, 3, 2, 2, 2, 2,
       3, 2, 2, 2, 2, 2, 3, 3, 2, 0, 1, 2, 3, 3, 1, 3, 2, 3, 2, 2, 2, 2,
       0, 4, 3, 4, 2, 3, 0, 2, 2, 2, 2, 2, 2, 4, 2,

In [47]:
rdf.Fjob = rdf.Fjob.replace(rdf.Fjob.values,fjob.transform(rdf.Fjob.values))

In [48]:
reason = preprocessing.LabelEncoder()
reason.fit(rdf.reason.values)
reason.classes_

array(['course', 'home', 'other', 'reputation', nan], dtype=object)

In [49]:
reason.transform(rdf.reason.values)
rdf.reason = rdf.reason.replace(rdf.reason.values,reason.transform(rdf.reason.values))

In [50]:
guardian = preprocessing.LabelEncoder()
guardian.fit(rdf.guardian.values)
guardian.classes_

array(['father', 'mother', 'other', nan], dtype=object)

In [51]:
guardian.transform(rdf.guardian.values)
rdf.guardian = rdf.guardian.replace(rdf.guardian.values, guardian.transform(rdf.guardian.values))

### A continuacion tabla con todos los valores recodificados con variables numericas

In [52]:
rdf

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,0,0,,0,0,1,4.0,4.0,0,4,...,4.0,3.0,4.0,1.0,1.0,3.0,6.0,5.0,6.0,6.0
1,0,0,17.0,0,0,0,1.0,1.0,0,2,...,5.0,3.0,3.0,1.0,1.0,3.0,4.0,5.0,5.0,6.0
2,0,0,15.0,0,1,0,1.0,1.0,0,2,...,4.0,3.0,2.0,2.0,3.0,3.0,10.0,,8.0,10.0
3,0,0,15.0,0,0,0,4.0,2.0,1,3,...,3.0,2.0,2.0,1.0,1.0,5.0,2.0,15.0,14.0,15.0
4,0,0,,0,0,0,3.0,3.0,2,2,...,4.0,3.0,2.0,1.0,2.0,5.0,4.0,6.0,10.0,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
390,1,1,20.0,0,1,1,2.0,2.0,3,3,...,5.0,5.0,4.0,4.0,5.0,4.0,11.0,9.0,9.0,9.0
391,1,1,17.0,0,1,0,3.0,1.0,3,3,...,2.0,4.0,5.0,3.0,4.0,2.0,3.0,14.0,16.0,16.0
392,1,1,21.0,1,0,0,1.0,1.0,2,2,...,5.0,5.0,3.0,3.0,3.0,3.0,3.0,10.0,8.0,7.0
393,1,1,18.0,1,1,0,3.0,2.0,3,2,...,4.0,4.0,1.0,3.0,4.0,5.0,0.0,11.0,12.0,10.0
