# 📊 Actividad: EDA y Modelado con Regresión Lineal y Logística
**Estudiante:** TU NOMBRE AQUÍ  
**Fecha:** AAAA-MM-DD  
**Dataset:** House Prices - Advanced Regression Techniques

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression, LogisticRegression, SGDRegressor, SGDClassifier
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, accuracy_score, f1_score, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Cargar datos
df = pd.read_csv('data/train.csv')
df.head()

## 🔍 1. EDA Básico

In [None]:
df.info()
df.describe()
df.isnull().sum().sort_values(ascending=False).head(10)

In [None]:
plt.figure(figsize=(12,8))
sns.heatmap(df.corr(numeric_only=True), cmap='coolwarm')
plt.title('Mapa de correlación')
plt.show()

## 📈 2. Regresión Lineal

In [None]:
features = ['GrLivArea', 'OverallQual', 'YearBuilt', 'TotalBsmtSF', 'GarageCars']
X = df[features]
y = df['SalePrice']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
print("R²:", r2_score(y_test, y_pred_lr))
print("MAE:", mean_absolute_error(y_test, y_pred_lr))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_lr)))
pd.Series(lr.coef_, index=features)

## ⚙️ 3. Regresión Lineal con Gradiente Descendente

In [None]:
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, random_state=42)
sgd_reg.fit(X_train, y_train)
y_pred_sgd = sgd_reg.predict(X_test)
print("SGD - R²:", r2_score(y_test, y_pred_sgd))
print("SGD - MAE:", mean_absolute_error(y_test, y_pred_sgd))
print("SGD - RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_sgd)))

## 🧪 4. Regresión Logística

In [None]:
df['HighPrice'] = (df['SalePrice'] > df['SalePrice'].mean()).astype(int)
X_cls = df[features]
y_cls = df['HighPrice']
X_train_cls, X_test_cls, y_train_cls, y_test_cls = train_test_split(X_cls, y_cls, test_size=0.2, random_state=42)
logreg = LogisticRegression(max_iter=1000)
logreg.fit(X_train_cls, y_train_cls)
y_pred_log = logreg.predict(X_test_cls)
print("Accuracy:", accuracy_score(y_test_cls, y_pred_log))
print("F1 Score:", f1_score(y_test_cls, y_pred_log))
print("Confusion Matrix:\n", confusion_matrix(y_test_cls, y_pred_log))

## 🔁 5. Clasificador con Gradiente Descendente

In [None]:
sgd_cls = SGDClassifier(max_iter=1000, tol=1e-3, random_state=42)
sgd_cls.fit(X_train_cls, y_train_cls)
y_pred_sgd_cls = sgd_cls.predict(X_test_cls)
print("SGD - Accuracy:", accuracy_score(y_test_cls, y_pred_sgd_cls))
print("SGD - F1 Score:", f1_score(y_test_cls, y_pred_sgd_cls))
print("SGD - Confusion Matrix:\n", confusion_matrix(y_test_cls, y_pred_sgd_cls))

## 🧠 6. Conclusión reflexiva

- ¿Qué variables influyeron más en los modelos y por qué?
- ¿Qué diferencias observaste entre modelos clásicos y los que usan gradiente descendente?
- ¿Qué mejorarías en un próximo análisis?

In [None]:
# Si usas Otter-Grader
# import otter
# otter.Notebook().export('eda_modelado_regresion.ipynb')