# 5. FEATURE ENGINEERING

### 1. Manejo de valores faltantes

- **Numéricas**: Imputar con mediana (robusta a outliers), KNN o métodos iterativos (MICE).
- **Categóricas**: Imputar con la moda o crear una nueva categoría `"missing"`.
- **Indicadores de imputación**: Crear variables binarias que marquen si el dato fue imputado.
- **Criterio de descarte**: Eliminar variables con >30–40% de valores faltantes si no son confiables.
- **Importante**: Ajustar la imputación solo en el conjunto de entrenamiento y aplicar en test.

### 2. Variables categóricas:

- Mantener como categóricas cuando sea posible (usando parámetros apropiados de RF)
- Label encoding para variables ordinales
- One-hot encoding solo si mejora rendimiento
- Target encoding para categóricas con alta cardinalidad (con validación cruzada)

### 3. Variables numéricas:

- No requiere escalado (RF es invariante a escalas)
- Binning inteligente para capturar patrones no lineales específicos
- Transformaciones para distribuciones muy sesgadas (log, sqrt, Box-Cox)
- Creación de variables polinómicas si se detectan patrones cuadráticos

### 4. Variables derivadas:

- Ratios y diferencias que tengan sentido de negocio
- Agregaciones grupales (por categorías relevantes): medias, medianas, desviaciones
- Variables de interacción específicas del dominio
- Lags para series temporales

### 5. Variables temporales:

- Extracción de día, mes, trimestre, día de semana, etc.
- Variables cíclicas (sin, cos) para componentes temporales
- Tendencias y estacionalidades

### 6. Manejo de texto:

- Crear variables numéricas desde texto (longitud, conteos, sentiment scores, etc.)
- Validación de feature engineering: Aplicar transformaciones ajustadas solo en train

## 1. CARGA DE ARCHIVOS

In [37]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 500)
pd.set_option('display.float_format', '{:.2f}'.format)

In [38]:
# Cargar en Parquet
X_train = pd.read_parquet("X_train.parquet")
X_test = pd.read_parquet("X_test.parquet")

# Recuperamos las series con el nombre correcto
y_train = pd.read_parquet("y_train.parquet")["puntaje_global"]
y_test = pd.read_parquet("y_test.parquet")["puntaje_global"]

print("✅ Splits cargados en Parquet")

✅ Splits cargados en Parquet


In [39]:
icfesx = X_train.copy()
icfesx.sample(5)

Unnamed: 0,nacionalidad,region,etnia_estudiante,edad,presento_fuera_edad,num_personas_casa,num_cuartos_casa,estrato_casa,tiempo_internet,internet,tv,computador,lavadora,microndas,carro,moto,consola,num_libros,tiempo_lectura,freq_leche_derivados,freq_carne_pescado_similares,freq_cereales_frutos_legumbres,nivel_edu_padre,actividad_padre,nivel_edu_madre,actividad_madre,horas_trabajo_semanal,colegio_genero,colegio_publico,colegio_area,colegio_jornada,nse_estudiante,nse_colegio,educacion_padres,perfil_lector,actividad_padres
284074,Colombia,Caribe,No,18,0,Más de 1 hogar,4.0,3.0,Poco,1,1,1,1,1,1,1,0,0-10,No lee,Aceptable,Aceptable,Insuficiente,Tecnico/Tecnologo,Trabajadores Operativos,Tecnico/Tecnologo,Trabajadores Operativos,No Trabaja,Mixto,Oficial,Urbano,Tradicional,3.0,2.0,Educación Técnica,"Poco Apoyo, Poco Habito",Trabajadores Operativos
277814,Colombia,Andina,No,16,0,Hogar tradicional,2.0,3.0,Promedio,1,1,1,1,0,0,0,0,11-25,0-30 min,Aceptable,Aceptable,Insuficiente,Profesional,Trabajadores Operativos,Tecnico/Tecnologo,Profesionales,No Trabaja,Femenino,Oficial,Urbano,Tradicional,3.0,3.0,Educación Superior,"Poco Apoyo, Poco Habito",Profesionales/Directivos
221985,Colombia,Caribe,No,17,0,Hogar grande,3.0,1.0,Promedio,1,1,0,1,1,0,0,0,26-100,1-2h,Óptimo,Insuficiente,Muy Insuficiente,Bachiller,Sin Actividad Remunerada,Bachiller,Trabajadores Operativos,No Trabaja,Mixto,Oficial,Urbano,Tradicional,2.0,2.0,Bachillerato Completo,"Buen Apoyo, Buen Habito",Trabajadores Operativos
291785,Colombia,Caribe,No,17,0,Hogar tradicional,3.0,1.0,Promedio,0,1,0,1,0,0,1,0,26-100,30-60 min,Aceptable,Aceptable,Insuficiente,Bachiller Inc,Sin Actividad Remunerada,Bachiller Inc,Microempresario,Tiempo Completo,Mixto,Oficial,Rural,Tradicional,2.0,2.0,Educación Secundaria Incompleta,"Buen Apoyo, Buen Habito",Empresarios/Independientes
243445,Colombia,Andina,No,17,0,Hogar grande,3.0,3.0,Mucho,1,1,1,1,1,1,0,0,0-10,No lee,Óptimo,Óptimo,Óptimo,Bachiller,Sin Información,Bachiller,Sin Información,No Trabaja,Femenino,Oficial,Urbano,Tradicional,3.0,3.0,Bachillerato Completo,"Poco Apoyo, Poco Habito",Sin Información


## 2. VARIABLES CATEGÓRICAS

### 2.1 CODIFICACIÓN BINARIA

#### 2.1.1 ENCODING DE NACIONALIDAD Y ETNIA

In [40]:
### CODIFICACIÓN DE NACIONALIDAD Y ETNIA

def codificacion_dummy(df): # Renombrar columnas 
    df.rename(columns={'nacionalidad': 'colombiano', 'etnia_estudiante': 'etnia'}, inplace=True) 
    # Codificación binaria 
    df['colombiano'] = (df['colombiano'] == 'Colombia').astype(int) 
    df['etnia'] = (df['etnia'] != 'No').astype(int) 
    return df

codificacion_dummy(icfesx)
icfesx.sample(5)

Unnamed: 0,colombiano,region,etnia,edad,presento_fuera_edad,num_personas_casa,num_cuartos_casa,estrato_casa,tiempo_internet,internet,tv,computador,lavadora,microndas,carro,moto,consola,num_libros,tiempo_lectura,freq_leche_derivados,freq_carne_pescado_similares,freq_cereales_frutos_legumbres,nivel_edu_padre,actividad_padre,nivel_edu_madre,actividad_madre,horas_trabajo_semanal,colegio_genero,colegio_publico,colegio_area,colegio_jornada,nse_estudiante,nse_colegio,educacion_padres,perfil_lector,actividad_padres
68827,1,Caribe,0,18,0,Hogar grande,2.0,1.0,Promedio,1,1,1,0,0,0,0,0,11-25,1-2h,Aceptable,Óptimo,Aceptable,Ninguna,Sin Actividad Remunerada,Ninguna,Trabajador Independiente,No Trabaja,Mixto,Oficial,Urbano,Unica,2.0,2.0,Sin Educación Formal,"Poco Apoyo, Buen Habito",Empresarios/Independientes
85005,1,Andina,0,55,1,Hogar tradicional,1.0,2.0,Ninguno,1,1,0,0,0,0,0,0,11-25,0-30 min,Aceptable,Aceptable,Insuficiente,Bachiller Inc,No Aplica,Bachiller Inc,Sin Actividad Remunerada,Tiempo Completo,Mixto,No Oficial,Urbano,Tradicional,2.0,2.0,Educación Secundaria Incompleta,"Poco Apoyo, Poco Habito",No Aplica
168033,1,Andina,0,16,0,Hogar tradicional,2.0,3.0,Promedio,1,1,1,1,1,0,0,0,>100,0-30 min,Óptimo,Óptimo,Insuficiente,Primaria,Sin Actividad Remunerada,Tecnico/Tecnologo Inc,Trabajadores Operativos,No Trabaja,Mixto,No Oficial,Urbano,Completa,3.0,4.0,Educación Primaria,"Buen Apoyo, Poco Habito",Trabajadores Operativos
252776,1,Andina,0,16,0,Hogar grande,3.0,1.0,Moderado,1,1,1,1,0,0,1,0,11-25,0-30 min,Óptimo,Óptimo,Aceptable,Bachiller,Sin Actividad Remunerada,Bachiller,Trabajador Independiente,No Trabaja,Mixto,Oficial,Urbano,Unica,3.0,2.0,Bachillerato Completo,"Poco Apoyo, Poco Habito",Empresarios/Independientes
162070,1,Andina,0,17,0,Hogar tradicional,2.0,3.0,Promedio,1,0,1,1,1,0,1,1,11-25,0-30 min,Aceptable,Aceptable,Aceptable,Bachiller,Trabajadores Operativos,Tecnico/Tecnologo,Trabajadores Operativos,No Trabaja,Mixto,Oficial,Urbano,Tradicional,3.0,3.0,Educación Técnica,"Poco Apoyo, Poco Habito",Trabajadores Operativos


#### 2.1.2 CREACIÓN DE VARIABLE DE ESTUDIANTE TRABAJADOR

In [41]:
icfesx['horas_trabajo_semanal'].value_counts(dropna=False)

horas_trabajo_semanal
No Trabaja                 229426
Trabajo Ocasional           61097
Tiempo Parcial Reducido     26075
Tiempo Completo             11713
Medio Tiempo                 9395
Name: count, dtype: int64

In [42]:
icfesx.loc[icfesx['edad'] < 20, 'horas_trabajo_semanal'].value_counts(dropna=False)

horas_trabajo_semanal
No Trabaja                 222198
Trabajo Ocasional           55063
Tiempo Parcial Reducido     22911
Medio Tiempo                 7659
Tiempo Completo              7178
Name: count, dtype: int64

In [43]:
def crear_estudiante_trabajador(df):
    """
    Crea variable 'estudiante_trabajador' basada en múltiples condiciones:
    - edad < 20
    - colegio_jornada != 'Sabatina' y != 'Noche'
    - horas_trabajo_semanal != 'No Trabaja'
    """
    # Crear la variable con múltiples condiciones usando &
    df['estudiante_trabajador'] = (
        (df['edad'] < 20) & 
        (~df['colegio_jornada'].isin(['Sabatina', 'Noche'])) &
        (df['horas_trabajo_semanal'] != 'No Trabaja')
    ).astype(int)
    
    return df

crear_estudiante_trabajador(icfesx)
icfesx['estudiante_trabajador'].value_counts(dropna=False)

estudiante_trabajador
0    244895
1     92811
Name: count, dtype: int64

#### 2.1.3 ENCODING DE COLEGIO PÚBLICO Y AREA URBANA

In [44]:
def binarizar_colegio_area_publico(df):
    
    df['colegio_publico'] = np.where(df['colegio_publico'] == 'Oficial', 1, 0)
    df['colegio_area'] = np.where(df['colegio_area'] == 'Urbano', 1, 0)
    
    return df

binarizar_colegio_area_publico(icfesx)
icfesx[['colegio_publico', 'colegio_area']].head()

Unnamed: 0,colegio_publico,colegio_area
0,1,1
1,1,1
2,1,1
3,0,1
4,0,1


In [45]:
icfesx.sample(10)

Unnamed: 0,colombiano,region,etnia,edad,presento_fuera_edad,num_personas_casa,num_cuartos_casa,estrato_casa,tiempo_internet,internet,tv,computador,lavadora,microndas,carro,moto,consola,num_libros,tiempo_lectura,freq_leche_derivados,freq_carne_pescado_similares,freq_cereales_frutos_legumbres,nivel_edu_padre,actividad_padre,nivel_edu_madre,actividad_madre,horas_trabajo_semanal,colegio_genero,colegio_publico,colegio_area,colegio_jornada,nse_estudiante,nse_colegio,educacion_padres,perfil_lector,actividad_padres,estudiante_trabajador
121456,1,Andina,0,17,0,Hogar tradicional,4.0,3.0,Moderado,1,1,1,1,1,1,0,0,26-100,30-60 min,Óptimo,Óptimo,Óptimo,Profesional,Profesionales,Profesional,Profesionales,No Trabaja,Mixto,1,1,Tradicional,4.0,3.0,Educación Superior,"Buen Apoyo, Buen Habito",Profesionales/Directivos,0
111564,1,Pacífica,0,18,0,Hogar grande,3.0,1.0,Moderado,0,1,0,0,0,0,1,1,0-10,0-30 min,Insuficiente,Aceptable,Insuficiente,Primaria,Microempresario,Bachiller Inc,Microempresario,Tiempo Completo,Mixto,1,0,Tradicional,2.0,1.0,Educación Primaria,"Poco Apoyo, Poco Habito",Empresarios/Independientes,1
42906,1,Caribe,0,18,0,Hogar grande,3.0,2.0,Promedio,1,1,1,1,0,0,0,0,11-25,30-60 min,Óptimo,Aceptable,Aceptable,Primaria,Sin Actividad Remunerada,Primaria,Trabajadores Operativos,No Trabaja,Mixto,0,1,Tradicional,2.0,2.0,Educación Primaria,"Poco Apoyo, Buen Habito",Trabajadores Operativos,0
78782,1,Caribe,0,17,0,Hogar tradicional,2.0,1.0,Mucho,0,0,0,1,1,0,1,0,26-100,>2h,Insuficiente,Aceptable,Aceptable,Profesional,Sin Actividad Remunerada,Tecnico/Tecnologo,Trabajadores Operativos,No Trabaja,Mixto,1,1,Tradicional,2.0,2.0,Educación Superior,"Buen Apoyo, Buen Habito",Trabajadores Operativos,0
139685,1,Andina,0,16,0,Hogar tradicional,3.0,2.0,Moderado,1,1,1,1,1,1,1,1,11-25,No lee,Aceptable,Aceptable,Insuficiente,Profesional,Profesionales,Profesional,Profesionales,No Trabaja,Mixto,1,1,Tradicional,4.0,2.0,Educación Superior,"Poco Apoyo, Poco Habito",Profesionales/Directivos,0
193490,1,Caribe,0,18,0,Hogar grande,3.0,2.0,Moderado,0,1,0,1,1,0,1,0,0-10,0-30 min,Insuficiente,Óptimo,Muy Insuficiente,Bachiller,Sin Actividad Remunerada,Bachiller Inc,Trabajadores Operativos,No Trabaja,Mixto,1,1,Tradicional,2.0,2.0,Al Menos Un Bachiller,"Poco Apoyo, Poco Habito",Trabajadores Operativos,0
147257,1,Andina,0,18,0,Más de 1 hogar,3.0,1.0,Promedio,1,0,1,1,0,0,0,0,26-100,1-2h,Insuficiente,Óptimo,Óptimo,Primaria Inc,Microempresario,Primaria Inc,Trabajador Independiente,No Trabaja,Mixto,1,0,Tradicional,2.0,2.0,Educación Primaria Incompleta,"Buen Apoyo, Buen Habito",Empresarios/Independientes,0
116099,1,Pacífica,0,18,0,Hogar tradicional,3.0,3.0,Moderado,1,1,1,1,1,0,0,0,11-25,1-2h,Muy Insuficiente,Insuficiente,Insuficiente,Bachiller Inc,Sin Actividad Remunerada,Bachiller,Trabajador Independiente,Trabajo Ocasional,Mixto,0,1,Tradicional,2.0,2.0,Al Menos Un Bachiller,"Poco Apoyo, Buen Habito",Empresarios/Independientes,1
126020,1,Pacífica,0,17,0,Hogar grande,3.0,1.0,Promedio,1,1,1,1,0,0,0,0,26-100,>2h,Insuficiente,Aceptable,Óptimo,Bachiller,Trabajadores Operativos,Bachiller,Sin Información,No Trabaja,Mixto,1,1,Tradicional,3.0,2.0,Bachillerato Completo,"Buen Apoyo, Buen Habito",Trabajadores Operativos,0
143763,1,Andina,0,18,0,Hogar tradicional,2.0,3.0,Moderado,1,1,1,1,1,1,1,0,11-25,0-30 min,Óptimo,Óptimo,Insuficiente,Bachiller,Trabajadores Operativos,Tecnico/Tecnologo,Trabajador Independiente,No Trabaja,Mixto,0,1,Completa,3.0,3.0,Educación Técnica,"Poco Apoyo, Poco Habito",Empresarios/Independientes,0


### 2.2 CODIFICACIÓN ONE HOT

In [46]:
icfesx.sample(7)

Unnamed: 0,colombiano,region,etnia,edad,presento_fuera_edad,num_personas_casa,num_cuartos_casa,estrato_casa,tiempo_internet,internet,tv,computador,lavadora,microndas,carro,moto,consola,num_libros,tiempo_lectura,freq_leche_derivados,freq_carne_pescado_similares,freq_cereales_frutos_legumbres,nivel_edu_padre,actividad_padre,nivel_edu_madre,actividad_madre,horas_trabajo_semanal,colegio_genero,colegio_publico,colegio_area,colegio_jornada,nse_estudiante,nse_colegio,educacion_padres,perfil_lector,actividad_padres,estudiante_trabajador
219578,1,Andina,0,21,1,Hogar grande,4.0,3.0,Mucho,1,1,1,1,1,1,1,1,>100,0-30 min,Óptimo,Aceptable,Óptimo,Primaria Inc,No Aplica,Primaria,No Aplica,Tiempo Parcial Reducido,Mixto,0,1,Validación,3.0,3.0,Educación Primaria,"Buen Apoyo, Poco Habito",No Aplica,0
141303,1,Andina,0,19,0,Hogar grande,3.0,2.0,Mucho,1,1,1,1,1,0,0,1,11-25,0-30 min,Muy Insuficiente,Aceptable,Muy Insuficiente,Bachiller,Sin Actividad Remunerada,Bachiller Inc,Trabajadores Operativos,No Trabaja,Mixto,1,1,Unica,3.0,3.0,Al Menos Un Bachiller,"Poco Apoyo, Poco Habito",Trabajadores Operativos,0
287800,1,Caribe,0,16,0,Hogar tradicional,2.0,1.0,Moderado,0,1,0,1,0,0,0,0,0-10,30-60 min,Aceptable,Aceptable,Insuficiente,Bachiller,Trabajadores Operativos,Bachiller,Trabajadores Operativos,No Trabaja,Mixto,1,1,Tradicional,2.0,2.0,Bachillerato Completo,"Poco Apoyo, Buen Habito",Trabajadores Operativos,0
45471,1,Caribe,1,19,0,Más de 1 hogar,5.0,2.0,Moderado,0,1,0,1,0,0,1,0,26-100,0-30 min,Insuficiente,Óptimo,Aceptable,Primaria Inc,Microempresario,Ninguna,Sector Primario,Tiempo Parcial Reducido,Mixto,1,0,Tradicional,1.0,1.0,Educación Primaria Incompleta,"Buen Apoyo, Poco Habito",Empresarios/Independientes,1
313171,1,Andina,0,17,0,Hogar tradicional,2.0,3.0,Promedio,1,1,1,1,1,0,0,0,11-25,No lee,Óptimo,Óptimo,Óptimo,Bachiller,Trabajadores Operativos,Bachiller,Microempresario,Tiempo Parcial Reducido,Mixto,1,1,Unica,3.0,3.0,Bachillerato Completo,"Poco Apoyo, Poco Habito",Empresarios/Independientes,1
47977,1,Andina,0,24,1,Hogar tradicional,3.0,1.0,Moderado,1,1,0,1,0,1,1,0,0-10,30-60 min,Insuficiente,Aceptable,Muy Insuficiente,Bachiller Inc,Trabajadores Operativos,No aplica,Trabajadores Operativos,No Trabaja,Mixto,0,1,Validación,2.0,2.0,No Aplica,"Poco Apoyo, Buen Habito",Trabajadores Operativos,0
277408,1,Andina,0,19,0,Hogar tradicional,2.0,2.0,Promedio,1,0,0,1,0,0,1,1,0-10,No lee,Aceptable,Aceptable,Aceptable,Bachiller Inc,Sin Información,Primaria,Trabajadores Operativos,Trabajo Ocasional,Mixto,1,1,Validación,2.0,2.0,Educación Primaria,"Poco Apoyo, Poco Habito",Trabajadores Operativos,1


In [47]:
from sklearn.preprocessing import OneHotEncoder

def aplicar_onehot(df, columnas, sparseout=False, drop=None):

    if isinstance(columnas, str):
        columnas = [columnas]  # convertir en lista si es solo una

    for col in columnas:
        if col not in df.columns:
            raise ValueError(f"La columna '{col}' no existe en el DataFrame.")

    encoder = OneHotEncoder(sparse_output=sparseout, drop=drop)

    # Ajustar y transformar
    encoded = encoder.fit_transform(df[columnas])

    # Crear DataFrame con las columnas codificadas
    feature_names = encoder.get_feature_names_out(columnas)
    encoded_df = pd.DataFrame(encoded, columns=feature_names, index=df.index)

    # Eliminar las columnas originales
    df.drop(columns=columnas, inplace=True)

    # Agregar las nuevas columnas al DataFrame original
    for col in encoded_df.columns:
        df[col] = encoded_df[col]

    return encoder

In [48]:
aplicar_onehot(icfesx, ['region', 'tiempo_internet', 'num_personas_casa', 'num_libros', 'tiempo_lectura', 'nivel_edu_padre', 'nivel_edu_madre', 
                        'actividad_padre', 'actividad_madre','horas_trabajo_semanal', 'colegio_genero', 'colegio_jornada', 'freq_leche_derivados', 
                        'freq_carne_pescado_similares', 'freq_cereales_frutos_legumbres'], sparseout=False, drop=None)
icfesx.sample(5)

Unnamed: 0,colombiano,etnia,edad,presento_fuera_edad,num_cuartos_casa,estrato_casa,internet,tv,computador,lavadora,microndas,carro,moto,consola,colegio_publico,colegio_area,nse_estudiante,nse_colegio,educacion_padres,perfil_lector,actividad_padres,estudiante_trabajador,region_Amazónica,region_Andina,region_Caribe,region_Orinoquía,region_Pacífica,tiempo_internet_Moderado,tiempo_internet_Mucho,tiempo_internet_Ninguno,tiempo_internet_Poco,tiempo_internet_Promedio,num_personas_casa_Hogar grande,num_personas_casa_Hogar tradicional,num_personas_casa_Más de 1 hogar,num_libros_0-10,num_libros_11-25,num_libros_26-100,num_libros_>100,tiempo_lectura_0-30 min,tiempo_lectura_1-2h,tiempo_lectura_30-60 min,tiempo_lectura_>2h,tiempo_lectura_No lee,nivel_edu_padre_Bachiller,nivel_edu_padre_Bachiller Inc,nivel_edu_padre_Ninguna,nivel_edu_padre_No aplica,nivel_edu_padre_Postgrado,nivel_edu_padre_Primaria,nivel_edu_padre_Primaria Inc,nivel_edu_padre_Profesional,nivel_edu_padre_Profesional Inc,nivel_edu_padre_Tecnico/Tecnologo,nivel_edu_padre_Tecnico/Tecnologo Inc,nivel_edu_madre_Bachiller,nivel_edu_madre_Bachiller Inc,nivel_edu_madre_Ninguna,nivel_edu_madre_No aplica,nivel_edu_madre_Postgrado,nivel_edu_madre_Primaria,nivel_edu_madre_Primaria Inc,nivel_edu_madre_Profesional,nivel_edu_madre_Profesional Inc,nivel_edu_madre_Tecnico/Tecnologo,nivel_edu_madre_Tecnico/Tecnologo Inc,actividad_padre_Directivos,actividad_padre_Microempresario,actividad_padre_No Aplica,actividad_padre_Pensionado,actividad_padre_Profesionales,actividad_padre_Sector Primario,actividad_padre_Sin Actividad Remunerada,actividad_padre_Sin Información,actividad_padre_Trabajador Independiente,actividad_padre_Trabajadores Operativos,actividad_madre_Directivos,actividad_madre_Microempresario,actividad_madre_No Aplica,actividad_madre_Pensionado,actividad_madre_Profesionales,actividad_madre_Sector Primario,actividad_madre_Sin Actividad Remunerada,actividad_madre_Sin Información,actividad_madre_Trabajador Independiente,actividad_madre_Trabajadores Operativos,horas_trabajo_semanal_Medio Tiempo,horas_trabajo_semanal_No Trabaja,horas_trabajo_semanal_Tiempo Completo,horas_trabajo_semanal_Tiempo Parcial Reducido,horas_trabajo_semanal_Trabajo Ocasional,colegio_genero_Femenino,colegio_genero_Masculino,colegio_genero_Mixto,colegio_jornada_Completa,colegio_jornada_Tradicional,colegio_jornada_Unica,colegio_jornada_Validación,freq_leche_derivados_Aceptable,freq_leche_derivados_Insuficiente,freq_leche_derivados_Muy Insuficiente,freq_leche_derivados_Óptimo,freq_carne_pescado_similares_Aceptable,freq_carne_pescado_similares_Insuficiente,freq_carne_pescado_similares_Muy Insuficiente,freq_carne_pescado_similares_Óptimo,freq_cereales_frutos_legumbres_Aceptable,freq_cereales_frutos_legumbres_Insuficiente,freq_cereales_frutos_legumbres_Muy Insuficiente,freq_cereales_frutos_legumbres_Óptimo
1786,1,0,17,0,2.0,1.0,0,0,1,0,1,0,1,0,1,1,2.0,3.0,Educación Primaria Incompleta,"Poco Apoyo, Poco Habito",Empresarios/Independientes,1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0
2839,1,0,18,0,2.0,2.0,1,1,0,0,0,0,1,0,1,1,2.0,3.0,Bachillerato Completo,"Poco Apoyo, Poco Habito",Trabajadores Operativos,1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0
329436,1,0,17,0,4.0,2.0,0,0,0,1,0,0,0,0,1,1,1.0,2.0,Educación Primaria Incompleta,"Poco Apoyo, Buen Habito",No Aplica,0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
99086,1,0,23,1,3.0,3.0,1,0,1,1,0,0,0,0,1,1,2.0,3.0,Educación Primaria,"Poco Apoyo, Poco Habito",Sector Primario,0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
132788,1,0,17,0,3.0,2.0,0,1,0,1,0,0,1,1,1,0,2.0,1.0,Bachillerato Completo,"Poco Apoyo, Poco Habito",Trabajadores Operativos,1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0


In [49]:
variables_excluidas = ['educacion_padres', 'perfil_lector', 'actividad_padres', 'estudiante_trabajador']

icfesx_original = icfesx.drop(columns=variables_excluidas)   # todas menos las excluidas
icfesx_original.sample(5)

Unnamed: 0,colombiano,etnia,edad,presento_fuera_edad,num_cuartos_casa,estrato_casa,internet,tv,computador,lavadora,microndas,carro,moto,consola,colegio_publico,colegio_area,nse_estudiante,nse_colegio,region_Amazónica,region_Andina,region_Caribe,region_Orinoquía,region_Pacífica,tiempo_internet_Moderado,tiempo_internet_Mucho,tiempo_internet_Ninguno,tiempo_internet_Poco,tiempo_internet_Promedio,num_personas_casa_Hogar grande,num_personas_casa_Hogar tradicional,num_personas_casa_Más de 1 hogar,num_libros_0-10,num_libros_11-25,num_libros_26-100,num_libros_>100,tiempo_lectura_0-30 min,tiempo_lectura_1-2h,tiempo_lectura_30-60 min,tiempo_lectura_>2h,tiempo_lectura_No lee,nivel_edu_padre_Bachiller,nivel_edu_padre_Bachiller Inc,nivel_edu_padre_Ninguna,nivel_edu_padre_No aplica,nivel_edu_padre_Postgrado,nivel_edu_padre_Primaria,nivel_edu_padre_Primaria Inc,nivel_edu_padre_Profesional,nivel_edu_padre_Profesional Inc,nivel_edu_padre_Tecnico/Tecnologo,nivel_edu_padre_Tecnico/Tecnologo Inc,nivel_edu_madre_Bachiller,nivel_edu_madre_Bachiller Inc,nivel_edu_madre_Ninguna,nivel_edu_madre_No aplica,nivel_edu_madre_Postgrado,nivel_edu_madre_Primaria,nivel_edu_madre_Primaria Inc,nivel_edu_madre_Profesional,nivel_edu_madre_Profesional Inc,nivel_edu_madre_Tecnico/Tecnologo,nivel_edu_madre_Tecnico/Tecnologo Inc,actividad_padre_Directivos,actividad_padre_Microempresario,actividad_padre_No Aplica,actividad_padre_Pensionado,actividad_padre_Profesionales,actividad_padre_Sector Primario,actividad_padre_Sin Actividad Remunerada,actividad_padre_Sin Información,actividad_padre_Trabajador Independiente,actividad_padre_Trabajadores Operativos,actividad_madre_Directivos,actividad_madre_Microempresario,actividad_madre_No Aplica,actividad_madre_Pensionado,actividad_madre_Profesionales,actividad_madre_Sector Primario,actividad_madre_Sin Actividad Remunerada,actividad_madre_Sin Información,actividad_madre_Trabajador Independiente,actividad_madre_Trabajadores Operativos,horas_trabajo_semanal_Medio Tiempo,horas_trabajo_semanal_No Trabaja,horas_trabajo_semanal_Tiempo Completo,horas_trabajo_semanal_Tiempo Parcial Reducido,horas_trabajo_semanal_Trabajo Ocasional,colegio_genero_Femenino,colegio_genero_Masculino,colegio_genero_Mixto,colegio_jornada_Completa,colegio_jornada_Tradicional,colegio_jornada_Unica,colegio_jornada_Validación,freq_leche_derivados_Aceptable,freq_leche_derivados_Insuficiente,freq_leche_derivados_Muy Insuficiente,freq_leche_derivados_Óptimo,freq_carne_pescado_similares_Aceptable,freq_carne_pescado_similares_Insuficiente,freq_carne_pescado_similares_Muy Insuficiente,freq_carne_pescado_similares_Óptimo,freq_cereales_frutos_legumbres_Aceptable,freq_cereales_frutos_legumbres_Insuficiente,freq_cereales_frutos_legumbres_Muy Insuficiente,freq_cereales_frutos_legumbres_Óptimo
314244,1,0,18,0,2.0,1.0,0,0,0,0,0,0,0,0,1,1,1.0,2.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
60675,1,0,16,0,2.0,1.0,0,1,1,1,1,0,1,0,1,1,2.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
178436,1,0,16,0,2.0,1.0,0,0,0,1,0,1,0,0,1,1,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0
164200,1,0,16,0,3.0,1.0,1,1,1,1,1,1,0,0,1,1,3.0,3.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
206320,1,0,18,0,2.0,2.0,0,1,1,1,1,0,0,0,1,1,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


### 2.3 RENOMBRAR VARIABLES

In [50]:
icfesx_original.columns.to_list()

['colombiano',
 'etnia',
 'edad',
 'presento_fuera_edad',
 'num_cuartos_casa',
 'estrato_casa',
 'internet',
 'tv',
 'computador',
 'lavadora',
 'microndas',
 'carro',
 'moto',
 'consola',
 'colegio_publico',
 'colegio_area',
 'nse_estudiante',
 'nse_colegio',
 'region_Amazónica',
 'region_Andina',
 'region_Caribe',
 'region_Orinoquía',
 'region_Pacífica',
 'tiempo_internet_Moderado',
 'tiempo_internet_Mucho',
 'tiempo_internet_Ninguno',
 'tiempo_internet_Poco',
 'tiempo_internet_Promedio',
 'num_personas_casa_Hogar grande',
 'num_personas_casa_Hogar tradicional',
 'num_personas_casa_Más de 1 hogar',
 'num_libros_0-10',
 'num_libros_11-25',
 'num_libros_26-100',
 'num_libros_>100',
 'tiempo_lectura_0-30 min',
 'tiempo_lectura_1-2h',
 'tiempo_lectura_30-60 min',
 'tiempo_lectura_>2h',
 'tiempo_lectura_No lee',
 'nivel_edu_padre_Bachiller',
 'nivel_edu_padre_Bachiller Inc',
 'nivel_edu_padre_Ninguna',
 'nivel_edu_padre_No aplica',
 'nivel_edu_padre_Postgrado',
 'nivel_edu_padre_Primaria'

In [51]:
rename_dict = {
    # Regiones
    "region_Amazónica": "region_amazonica",
    "region_Andina": "region_andina",
    "region_Caribe": "region_caribe",
    "region_Orinoquía": "region_orinoquia",
    "region_Pacífica": "region_pacifica",

    # Tiempo en internet
    "tiempo_internet_Moderado": "internet_moderado",
    "tiempo_internet_Mucho": "internet_mucho",
    "tiempo_internet_Ninguno": "internet_ninguno",
    "tiempo_internet_Poco": "internet_poco",
    "tiempo_internet_Promedio": "internet_promedio",

    # Personas en casa
    "num_personas_casa_Hogar grande": "hogar_grande",
    "num_personas_casa_Hogar tradicional": "hogar_tradicional",
    "num_personas_casa_Más de 1 hogar": "hogar_multiple",

    # Libros
    "num_libros_0-10": "libros_0_10",
    "num_libros_11-25": "libros_11_25",
    "num_libros_26-100": "libros_26_100",
    "num_libros_>100": "libros_mas_100",

    # Tiempo lectura
    "tiempo_lectura_0-30 min": "lectura_0_30min",
    "tiempo_lectura_30-60 min": "lectura_30_60min",
    "tiempo_lectura_1-2h": "lectura_1_2h",
    "tiempo_lectura_>2h": "lectura_mas_2h",
    "tiempo_lectura_No lee": "lectura_nula",

    # Nivel educativo padre
    "nivel_edu_padre_Bachiller": "edu_padre_bachiller",
    "nivel_edu_padre_Bachiller Inc": "edu_padre_bachiller_inc",
    "nivel_edu_padre_Ninguna": "edu_padre_ninguna",
    "nivel_edu_padre_No aplica": "edu_padre_no_aplica",
    "nivel_edu_padre_Postgrado": "edu_padre_postgrado",
    "nivel_edu_padre_Primaria": "edu_padre_primaria",
    "nivel_edu_padre_Primaria Inc": "edu_padre_primaria_inc",
    "nivel_edu_padre_Profesional": "edu_padre_profesional",
    "nivel_edu_padre_Profesional Inc": "edu_padre_profesional_inc",
    "nivel_edu_padre_Tecnico/Tecnologo": "edu_padre_tecnico",
    "nivel_edu_padre_Tecnico/Tecnologo Inc": "edu_padre_tecnico_inc",

    # Nivel educativo madre
    "nivel_edu_madre_Bachiller": "edu_madre_bachiller",
    "nivel_edu_madre_Bachiller Inc": "edu_madre_bachiller_inc",
    "nivel_edu_madre_Ninguna": "edu_madre_ninguna",
    "nivel_edu_madre_No aplica": "edu_madre_no_aplica",
    "nivel_edu_madre_Postgrado": "edu_madre_postgrado",
    "nivel_edu_madre_Primaria": "edu_madre_primaria",
    "nivel_edu_madre_Primaria Inc": "edu_madre_primaria_inc",
    "nivel_edu_madre_Profesional": "edu_madre_profesional",
    "nivel_edu_madre_Profesional Inc": "edu_madre_profesional_inc",
    "nivel_edu_madre_Tecnico/Tecnologo": "edu_madre_tecnico",
    "nivel_edu_madre_Tecnico/Tecnologo Inc": "edu_madre_tecnico_inc",

    # Actividad padre
    "actividad_padre_Directivos": "act_padre_directivos",
    "actividad_padre_Microempresario": "act_padre_microempresario",
    "actividad_padre_No Aplica": "act_padre_no_aplica",
    "actividad_padre_Pensionado": "act_padre_pensionado",
    "actividad_padre_Profesionales": "act_padre_profesionales",
    "actividad_padre_Sector Primario": "act_padre_sector_primario",
    "actividad_padre_Sin Actividad Remunerada": "act_padre_sin_trabajo",
    "actividad_padre_Sin Información": "act_padre_sin_info",
    "actividad_padre_Trabajador Independiente": "act_padre_independiente",
    "actividad_padre_Trabajadores Operativos": "act_padre_operativos",

    # Actividad madre
    "actividad_madre_Directivos": "act_madre_directivos",
    "actividad_madre_Microempresario": "act_madre_microempresario",
    "actividad_madre_No Aplica": "act_madre_no_aplica",
    "actividad_madre_Pensionado": "act_madre_pensionado",
    "actividad_madre_Profesionales": "act_madre_profesionales",
    "actividad_madre_Sector Primario": "act_madre_sector_primario",
    "actividad_madre_Sin Actividad Remunerada": "act_madre_sin_trabajo",
    "actividad_madre_Sin Información": "act_madre_sin_info",
    "actividad_madre_Trabajador Independiente": "act_madre_independiente",
    "actividad_madre_Trabajadores Operativos": "act_madre_operativos",

    # Horas trabajo semanal
    "horas_trabajo_semanal_Medio Tiempo": "trabajo_medio_tiempo",
    "horas_trabajo_semanal_No Trabaja": "trabajo_no",
    "horas_trabajo_semanal_Tiempo Completo": "trabajo_tiempo_completo",
    "horas_trabajo_semanal_Tiempo Parcial Reducido": "trabajo_parcial_reducido",
    "horas_trabajo_semanal_Trabajo Ocasional": "trabajo_ocacional",

    # Colegio genero y jornada
    "colegio_genero_Femenino": "colegio_femenino",
    "colegio_genero_Masculino": "colegio_masculino",
    "colegio_genero_Mixto": "colegio_mixto",
    "colegio_jornada_Completa": "jornada_completa",
    "colegio_jornada_Tradicional": "jornada_tradicional",
    "colegio_jornada_Unica": "jornada_unica",
    "colegio_jornada_Validación": "jornada_validacion",

    # Frecuencia alimentos
    "freq_leche_derivados_Aceptable": "leche_aceptable",
    "freq_leche_derivados_Insuficiente": "leche_insuficiente",
    "freq_leche_derivados_Muy Insuficiente": "leche_muy_insuficiente",
    "freq_leche_derivados_Óptimo": "leche_optimo",

    "freq_carne_pescado_similares_Aceptable": "carne_aceptable",
    "freq_carne_pescado_similares_Insuficiente": "carne_insuficiente",
    "freq_carne_pescado_similares_Muy Insuficiente": "carne_muy_insuficiente",
    "freq_carne_pescado_similares_Óptimo": "carne_optimo",

    "freq_cereales_frutos_legumbres_Aceptable": "cereales_aceptable",
    "freq_cereales_frutos_legumbres_Insuficiente": "cereales_insuficiente",
    "freq_cereales_frutos_legumbres_Muy Insuficiente": "cereales_muy_insuficiente",
    "freq_cereales_frutos_legumbres_Óptimo": "cereales_optimo"
}

icfesx_original.rename(columns=rename_dict, inplace=True)
icfesx_original.sample(5)

Unnamed: 0,colombiano,etnia,edad,presento_fuera_edad,num_cuartos_casa,estrato_casa,internet,tv,computador,lavadora,microndas,carro,moto,consola,colegio_publico,colegio_area,nse_estudiante,nse_colegio,region_amazonica,region_andina,region_caribe,region_orinoquia,region_pacifica,internet_moderado,internet_mucho,internet_ninguno,internet_poco,internet_promedio,hogar_grande,hogar_tradicional,hogar_multiple,libros_0_10,libros_11_25,libros_26_100,libros_mas_100,lectura_0_30min,lectura_1_2h,lectura_30_60min,lectura_mas_2h,lectura_nula,edu_padre_bachiller,edu_padre_bachiller_inc,edu_padre_ninguna,edu_padre_no_aplica,edu_padre_postgrado,edu_padre_primaria,edu_padre_primaria_inc,edu_padre_profesional,edu_padre_profesional_inc,edu_padre_tecnico,edu_padre_tecnico_inc,edu_madre_bachiller,edu_madre_bachiller_inc,edu_madre_ninguna,edu_madre_no_aplica,edu_madre_postgrado,edu_madre_primaria,edu_madre_primaria_inc,edu_madre_profesional,edu_madre_profesional_inc,edu_madre_tecnico,edu_madre_tecnico_inc,act_padre_directivos,act_padre_microempresario,act_padre_no_aplica,act_padre_pensionado,act_padre_profesionales,act_padre_sector_primario,act_padre_sin_trabajo,act_padre_sin_info,act_padre_independiente,act_padre_operativos,act_madre_directivos,act_madre_microempresario,act_madre_no_aplica,act_madre_pensionado,act_madre_profesionales,act_madre_sector_primario,act_madre_sin_trabajo,act_madre_sin_info,act_madre_independiente,act_madre_operativos,trabajo_medio_tiempo,trabajo_no,trabajo_tiempo_completo,trabajo_parcial_reducido,trabajo_ocacional,colegio_femenino,colegio_masculino,colegio_mixto,jornada_completa,jornada_tradicional,jornada_unica,jornada_validacion,leche_aceptable,leche_insuficiente,leche_muy_insuficiente,leche_optimo,carne_aceptable,carne_insuficiente,carne_muy_insuficiente,carne_optimo,cereales_aceptable,cereales_insuficiente,cereales_muy_insuficiente,cereales_optimo
231169,1,1,18,0,2.0,2.0,1,1,0,1,1,1,0,0,1,1,3.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
293475,1,1,16,0,3.0,1.0,0,0,1,1,0,0,0,0,1,1,2.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
13708,1,0,19,0,4.0,3.0,1,1,1,1,1,0,1,1,1,1,3.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
79384,1,0,17,0,3.0,1.0,1,1,1,0,0,0,1,0,1,1,2.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0
302185,1,0,18,0,1.0,1.0,0,0,0,0,0,0,0,0,1,1,1.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0


## 3. EXPORTAR ARCHIVO PARA ENTRENAMIENTO

In [52]:
icfesx_original.to_parquet("icfesx_original.parquet", index=False, engine="pyarrow")