# Tarea 06 - Procesando datos MPG 

## 1 Importando librerias y scripts 

Para facilitar se la lectura de este notebook se ha separadp en tres scripts distinto los procesos de:
    
1. Procesamiento de datos.  [`procesamiento.py`](https://github.com/RicardxJMG/deep-learning-diplomado/blob/main/Tareas/src/data_processor.py)
2. Guardar los datos procesados en un zip. [`save_to_zip.py`](https://github.com/RicardxJMG/deep-learning-diplomado/blob/main/Tareas/src/save_to_zip.py)
3. Cargar el zip para su uso. [`load_from_zip.py`](https://github.com/RicardxJMG/deep-learning-diplomado/blob/main/Tareas/src/load_from_zip.py)


In [9]:
import pandas as pd
from pathlib import Path
from src.data_processor import DataProcessor, process_new_data_with_artifacts
from src.save_to_zip import save_processed_data_to_zip
from src.load_from_zip import load_processed_data_from_zip

## 2. Transformando datos para el entrenamiento

In [10]:
datapath = Path().resolve().parent / 'datasets' / 'mpg' 

mpg_path = datapath / 'mpg.csv'

config_mpg = {
    "cols_num": ['displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'cylinders'],
    "cols_cat": ['origin'],
    "cols_onehot": ['origin'],
    "target": "mpg"
}

procesar_mpg = DataProcessor(**config_mpg)
resultados_mpg = procesar_mpg.process(mpg_path)



Resumen del preprocesamiento:
X_train_final: (251, 9)
X_val_final  : (63, 9)
X_test_final : (79, 9)
Total de características: 9
  - Numéricas: 6
  - Categóricas procesadas: 3


In [11]:
# Checando el contenido de processed_data 
resultados_mpg['data']['X_test_df'].sample(7)

Unnamed: 0,displacement,horsepower,weight,acceleration,model_year,cylinders,origin___europe,origin___japan,origin___usa
213,1.434508,1.047762,1.231053,-1.223153,0.058451,1.462788,0.0,0.0,1.0
111,-1.2041,-0.389475,-1.01614,-0.694066,-0.756618,-1.48392,0.0,1.0,0.0
139,0.982176,0.917104,1.909516,0.187745,-0.484928,1.462788,0.0,0.0,1.0
80,-0.714072,-0.494002,-0.700765,0.187745,-1.028307,-0.894578,0.0,0.0,1.0
163,0.256558,-0.258817,0.916841,1.245918,-0.213238,0.284105,0.0,0.0,1.0
53,-1.194676,-1.042765,-1.424615,1.245918,-1.299997,-0.894578,0.0,1.0,0.0
346,-0.949662,-0.990502,-1.084801,0.822649,1.416899,-0.894578,0.0,1.0,0.0


## 3. Guardando datos para su futuro uso

In [12]:
# Guardar en zip
zip_path = datapath / 'procesados'/ 'mpg_procesado.zip'


save_processed_data_to_zip(zip_path, resultados_mpg)

Datos procesados guardados


## 4. Procesando nuevos datos

### 4.1 Procesando los datos nuevos a partir de los datos ya procesados 

#### 1. Cargando los datos procesados

Esta parte se realiza solamente para verificar que la función `load_processed_data_from_zip` funciona correctamente

In [13]:
train_loaded, val_loaded, test_loaded, artifacts, metadata = load_processed_data_from_zip(zip_path)

print("Train shape:", train_loaded.shape)

Datos extraidos correctamente
Train shape: (251, 10)


In [14]:
# Imprimiendo el contenido de metadata
metadata

{'feature_names': ['displacement',
  'horsepower',
  'weight',
  'acceleration',
  'model_year',
  'cylinders',
  'origin___europe',
  'origin___japan',
  'origin___usa'],
 'target': 'mpg',
 'cols_num': ['displacement',
  'horsepower',
  'weight',
  'acceleration',
  'model_year',
  'cylinders'],
 'cols_cat': ['origin'],
 'cols_onehot': ['origin'],
 'cols_ordinal': [],
 'cat_out_cols': ['origin___europe', 'origin___japan', 'origin___usa']}

#### 2. Cargando y procesando los datos nuevos

In [15]:
mpg_nuevos  = pd.read_csv(datapath / 'mpg_nuevos.csv')
mpg_nuevos

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin
0,27,4,140,86,2790,15.6,82,usa
1,44,4,97,52,2130,24.6,82,europe
2,32,4,135,84,2295,11.6,82,usa
3,28,4,120,79,2625,18.6,82,usa
4,31,4,119,82,2720,19.4,82,usa


In [16]:
# Procesar los nuevos datos usando los artefactos cargados
new_processed = process_new_data_with_artifacts(mpg_nuevos, artifacts)

# Mostrar el resultado
new_processed.head()

# Guardar los nuevos datos procesados si es necesario
new_processed.to_csv(datapath / 'procesados' / 'mpg_nuevos_procesados.csv', index=False)

print("Nuevos datos procesados y guardados.")

Nuevos datos procesados: (5, 9)
Nuevos datos procesados y guardados.


In [17]:
new_processed

Unnamed: 0,displacement,horsepower,weight,acceleration,model_year,cylinders,origin___europe,origin___japan,origin___usa
0,-0.544448,-0.494002,-0.241085,0.046655,1.688589,-0.894578,0.0,0.0,1.0
1,-0.949662,-1.382476,-1.009157,3.221176,1.688589,-0.894578,1.0,0.0,0.0
2,-0.591566,-0.546265,-0.817139,-1.364243,1.688589,-0.894578,0.0,0.0,1.0
3,-0.73292,-0.676923,-0.433103,1.104829,1.688589,-0.894578,0.0,0.0,1.0
4,-0.742343,-0.598528,-0.322548,1.387008,1.688589,-0.894578,0.0,0.0,1.0
