# Regularizaci√≥n

## Cargar datos

In [1]:
import pandas as pd

df = pd.read_csv("Recursos/06-clientes_bancarios.csv")

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   edad          1500 non-null   int64
 1   ingresos      1500 non-null   int64
 2   antiguedad    1500 non-null   int64
 3   saldo         1500 non-null   int64
 4   productos     1500 non-null   int64
 5   credit_score  1500 non-null   int64
 6   deuda         1500 non-null   int64
 7   churn         1500 non-null   int64
dtypes: int64(8)
memory usage: 93.9 KB


In [2]:
df.head()

Unnamed: 0,edad,ingresos,antiguedad,saldo,productos,credit_score,deuda,churn
0,56,1304023,25,5198253,1,308,117829,0
1,69,938037,7,3712938,4,743,635719,0
2,46,672108,4,2686928,3,412,2211295,0
3,32,898994,20,6211771,1,504,3239330,0
4,60,879305,13,7255424,2,704,4123732,0


# Separar X e y

In [3]:
X = df.drop("churn", axis=1)
y = df["churn"]

# Escalado (obligatorio)
* Regularizaci√≥n sin escalado = error de principiante.

In [4]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Train / Test

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled,
    y,
    test_size=0.25,
    random_state=42
)

# Regresi√≥n log√≠stica SIN regularizaci√≥n

In [6]:
from sklearn.linear_model import LogisticRegression

lr_sin = LogisticRegression(
    penalty=None,
    max_iter=2000
)

lr_sin.fit(X_train, y_train)

print("Train:", lr_sin.score(X_train, y_train))
print("Test :", lr_sin.score(X_test, y_test))

Train: 0.904
Test : 0.888


# Regularizaci√≥n L2 (Ridge)
* No elimina variables. Solo las reduce.

In [11]:
lr_l2 = LogisticRegression(
    penalty="l2",
    C=1.0,
    max_iter=2000
)

lr_l2.fit(X_train, y_train) # Entrena con castigo

print("Train:", lr_l2.score(X_train, y_train))
print("Test :", lr_l2.score(X_test, y_test))

Train: 0.9013333333333333
Test : 0.8933333333333333


    - penalty="l2" ‚Üí Ridge
    - C ‚Üí fuerza del castigo
        * C peque√±o = m√°s castigo
        * C grande = menos castigo
    - max_iter ‚Üí estabilidad

In [8]:
valores_C = [0.01, 0.1, 1, 10, 100]

for c in valores_C:
    modelo = LogisticRegression(
        penalty="l2",
        C=c,
        max_iter=2000
    )
    
    modelo.fit(X_train, y_train)
    
    print(
        "C =", c,
        "Train =", modelo.score(X_train, y_train),
        "Test =", modelo.score(X_test, y_test)
    )


C = 0.01 Train = 0.8844444444444445 Test = 0.8853333333333333
C = 0.1 Train = 0.9013333333333333 Test = 0.9013333333333333
C = 1 Train = 0.9013333333333333 Test = 0.8933333333333333
C = 10 Train = 0.9048888888888889 Test = 0.8906666666666667
C = 100 Train = 0.904 Test = 0.888


    * C=0.1   Train 0.901  Test 0.901 ‚≠ê (Balance perfecto.)

# L1 (Lasso)

In [9]:
lr_l1 = LogisticRegression(
    penalty="l1",
    solver="liblinear",
    C=0.1,
    max_iter=2000
)

lr_l1.fit(X_train, y_train)

print("Train:", lr_l1.score(X_train, y_train))
print("Test :", lr_l1.score(X_test, y_test))

Train: 0.896
Test : 0.904


* penalty="l1" ‚Üí Lasso
* solver ‚Üí compatible
* C ‚Üí castigo

# Ver coeficientes

In [10]:
coef = pd.DataFrame({
    "variable": X.columns,
    "coeficiente": lr_l1.coef_[0]
})

coef.sort_values("coeficiente")

Unnamed: 0,variable,coeficiente
1,ingresos,-1.03768
3,saldo,-0.750361
5,credit_score,-0.471832
0,edad,-0.042605
2,antiguedad,-0.002259
4,productos,0.0
6,deuda,0.484713


- positivos ‚Üí aumentan churn
- negativos ‚Üí lo reducen
- 0 ‚Üí ignorados

#### Interpretaci√≥n:
- Ingresos (-1.03) : M√°s ingresos ‚Üí menos churn
- Saldo (-0.75) : M√°s saldo ‚Üí m√°s estable
- Credit score (-0.47) : Buen historial ‚Üí menos abandono
- Edad (-0.04) : Casi irrelevante
- Antig√ºedad (-0.00) : No aporta mucho.
- Productos (0.00) ‚ùå : L1 lo mat√≥. üëâ Variable in√∫til seg√∫n el modelo.
- Deuda (+0.48) : M√°s deuda ‚Üí m√°s churn

## RESUMEN MENTAL:

    * Sin regularizaci√≥n el modelo memoriza
    * L2 lo suaviza
    * L1 lo poda
    * C controla cu√°nto castigas
    * Escalar es obligatorio
    * Coeficientes explican el modelo