# Tarea 06 - Procesando datos MPG 

## 1 Importando librerias y scripts 

Para facilitar se la lectura de este notebook se ha separa en dos archivos distinto los procesos de:
    
1. Procesamiento de datos.  [`procesamiento.py`](./src/procesamiento.py)
2. Guardar los datos procesados en un zip. [`save_to_zip.py`](./src/save_to_zip.py)
3. Cargar el zip para su uso. [`load_from_zip.py`](./src/load_from_zip.py)


In [2]:
import pandas as pd
from pathlib import Path
from src.procesamiento import process_mpg_data, process_new_data_with_artifacts
from src.save_to_zip import save_processed_data_to_zip
from src.load_from_zip import load_processed_data_from_zip

## 2. Transformando datos para el entrenamiento

In [3]:
datapath = Path().resolve().parent / 'datasets'
mpg_data = datapath / 'mpg.csv'

# Procendo los datos
processed_data = process_mpg_data(mpg_data)


X_train_final: (251, 9)
X_val_final  : (63, 9)
X_test_final : (79, 9)


In [4]:
# Checando el contenido de processed_data 
processed_data['data']['X_test_df']

Unnamed: 0,displacement,horsepower,weight,acceleration,model_year,cylinders,origin___europe,origin___japan,origin___usa
238,-0.940239,-0.572396,-1.073163,0.152473,0.330141,-0.894578,0.0,0.0,1.0
270,-0.600989,-0.258817,-0.561115,-0.235524,0.601830,-0.894578,0.0,1.0,0.0
213,1.434508,1.047762,1.231053,-1.223153,0.058451,1.462788,0.0,0.0,1.0
358,-0.732920,-0.807581,-0.421466,0.999011,1.416899,-0.894578,0.0,1.0,0.0
57,-0.798885,-0.258817,-0.836923,0.011383,-1.028307,-0.894578,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...
346,-0.949662,-0.990502,-1.084801,0.822649,1.416899,-0.894578,0.0,1.0,0.0
132,-0.544448,-0.781449,-0.529694,0.540469,-0.484928,-0.894578,0.0,0.0,1.0
353,-0.874274,-0.807581,-0.939333,-0.447159,1.416899,-0.894578,1.0,0.0,0.0
177,-0.780038,-0.258817,-0.352805,-0.164980,-0.213238,-0.894578,1.0,0.0,0.0


## 3. Guardando datos para su futuro uso

In [5]:
# Guardar en zip
zip_path = datapath / 'procesados'/ 'mpg_procesado.zip'


save_processed_data_to_zip(zip_path, processed_data)

Datos procesados guardados


### 3.1 Cargando datos procesados desde zip

In [7]:
train_loaded, val_loaded, test_loaded, num_pipe_loaded, preprocessor_cat_loaded, feature_names_loaded, metadata_loaded = load_processed_data_from_zip(zip_path)

print("Train shape:", train_loaded.shape)

Datos extraidos correctamente
Train shape: (251, 10)


## 4. Procesando nuevos datos

### 4.1 Procesando los datos nuevos a partir de los datos ya procesados 

#### 1. Cargando los datos procesados

In [9]:
train_final, val_final, test_final, num_pipe, preprocessor_cat, feature_names, metadata = load_processed_data_from_zip(zip_path)

print("Train shape:", train_loaded.shape)

Datos extraidos correctamente
Train shape: (251, 10)


In [10]:
# Imprimiendo el contenido de metadata
metadata

{'cols_num': ['displacement',
  'horsepower',
  'weight',
  'acceleration',
  'model_year',
  'cylinders'],
 'cols_cat': ['origin'],
 'cols_onehot': ['origin'],
 'cols_ordinal': [],
 'cat_out_cols': ['origin___europe', 'origin___japan', 'origin___usa'],
 'feature_names': ['displacement',
  'horsepower',
  'weight',
  'acceleration',
  'model_year',
  'cylinders',
  'origin___europe',
  'origin___japan',
  'origin___usa'],
 'target': 'mpg'}

#### 2. Cargando y procesando los datos nuevos

In [11]:
mpg_nuevos  = pd.read_csv(datapath / 'mpg_nuevos.csv')
mpg_nuevos

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin
0,27,4,140,86,2790,15.6,82,usa
1,44,4,97,52,2130,24.6,82,europe
2,32,4,135,84,2295,11.6,82,usa
3,28,4,120,79,2625,18.6,82,usa
4,31,4,119,82,2720,19.4,82,usa


In [12]:
# Procesar los nuevos datos usando los artefactos cargados
new_processed = process_new_data_with_artifacts(mpg_nuevos, num_pipe, preprocessor_cat, metadata, feature_names)

# Mostrar el resultado
new_processed.head()

# Guardar los nuevos datos procesados si es necesario
new_processed.to_csv(datapath / 'procesados' / 'mpg_nuevos_procesados.csv', index=False)

print("Nuevos datos procesados y guardados.")

Nuevos datos procesados: (5, 9)
Nuevos datos procesados y guardados.
