# TFG Leplag Fumigaciones - Demo Regresi√≥n M√∫ltiple

**Autor**: Ricardo Landa  
**Fecha**: 2025

Este notebook reproduce el pipeline completo de regresi√≥n lineal m√∫ltiple para predecir `monto_mensual_ars`.

---

## üì¶ Instalaci√≥n de dependencias

Si ejecutas en Colab, descomentar la siguiente celda para instalar dependencias:

In [None]:
# Descomenta si ejecutas en Colab
# !pip install pandas numpy scikit-learn matplotlib

---

## üìÇ Descarga del dataset

Si ejecutas en Colab, descarga el CSV desde el repo o s√∫belo manualmente:

In [None]:
# Opci√≥n A: Descargar desde GitHub
!wget https://raw.githubusercontent.com/Pipoxsj/tfg-leplag-regresion-demo/main/data/dataset_complementario_regresion_anonimizado-2.csv -P data/ -q

# Opci√≥n B: Subir manualmente (descomentar si prefieres)
# from google.colab import files
# import os
# if not os.path.exists('data'):
#     os.makedirs('data')
#     uploaded = files.upload()
#     for fn in uploaded.keys():
#         os.rename(fn, f"data/{fn}")
#         print(f"‚úì Archivo '{fn}' subido correctamente")

---

## 1Ô∏è‚É£ Importar librer√≠as

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

print("‚úì Librer√≠as importadas correctamente")

---

## 2Ô∏è‚É£ Cargar dataset

In [None]:
data_path = 'data/dataset_complementario_regresion_anonimizado-2.csv'
df = pd.read_csv(data_path)

print(f"‚úì Dataset cargado: {df.shape[0]} filas, {df.shape[1]} columnas\n")
print("Primeras 5 filas:")
df.head()

---

## 3Ô∏è‚É£ Exploraci√≥n inicial

In [None]:
print("=== Informaci√≥n del dataset ===")
df.info()

print("\n=== Estad√≠sticas descriptivas ===")
df.describe()

---

## 4Ô∏è‚É£ Preparar features y target

In [None]:
target = 'monto_mensual_ars'
numeric_features = ['mes', 'superficie_m2', 'distancia_km', 'tecnico_id', 'servicios_mes']
categorical_features = ['zona', 'tipo_cliente', 'tipo_servicio']
features = numeric_features + categorical_features

X = df[features]
y = df[target]

print(f"‚úì Features (X): {X.shape}")
print(f"‚úì Target (y): {y.shape}")

---

## 5Ô∏è‚É£ Split train/test (80/20)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"‚úì Train set: {X_train.shape[0]} muestras")
print(f"‚úì Test set: {X_test.shape[0]} muestras")

---

## 6Ô∏è‚É£ Configurar preprocesamiento

In [None]:
numeric_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(handle_unknown='ignore')

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)

print("‚úì Preprocessor configurado: StandardScaler + OneHotEncoder")

---

## 7Ô∏è‚É£ Entrenar modelo

In [None]:
model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', LinearRegression())
])

model.fit(X_train, y_train)
print("‚úì Modelo entrenado exitosamente")

---

## 8Ô∏è‚É£ Evaluar modelo

In [None]:
y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print("=" * 60)
print("üéØ M√âTRICAS DEL MODELO")
print("=" * 60)
print(f"MAE (Error Absoluto Medio):       {mae:,.2f} ARS")
print(f"RMSE (Ra√≠z Error Cuadr√°tico):     {rmse:,.2f} ARS")
print(f"R¬≤ (Coeficiente Determinaci√≥n):   {r2:.4f}")
print("=" * 60)

---

## 9Ô∏è‚É£ Visualizaci√≥n: Valores reales vs. predichos

In [None]:
plt.figure(figsize=(8, 8))
plt.scatter(y_test, y_pred, alpha=0.4, edgecolors='k', linewidth=0.5)
plt.xlabel('Monto real (ARS)', fontsize=12)
plt.ylabel('Monto predicho (ARS)', fontsize=12)
plt.title('Valores reales vs. predichos - Regresi√≥n m√∫ltiple', fontsize=14, fontweight='bold')

# L√≠nea diagonal de referencia
min_val = min(y_test.min(), y_pred.min())
max_val = max(y_test.max(), y_pred.max())
plt.plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Predicci√≥n perfecta')

plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

---

## ‚úÖ Pipeline completado

El modelo ha sido entrenado y evaluado exitosamente. Para reproducir este an√°lisis en entorno local, consulta el [README del repositorio](https://github.com/Pipoxsj/tfg-leplag-regresion-demo).