# Telecom X – Parte 2: Predicción de Cancelación (Churn)

# 1. Preparación de los datos

### 1.1 Extracción del Archivo Tratado

In [1]:
import pandas as pd

df = pd.read_csv('telecom_churn_cleaned.csv')
df.head()

Unnamed: 0,customerID,Churn,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,Charges.Monthly,Charges.Total,Cuentas_Diarias
0,0002-ORFBO,0,Female,0,1,1,9,1,No,DSL,...,No,Yes,Yes,No,One year,1,Mailed check,65.6,593.3,2.186667
1,0003-MKNFE,0,Male,0,0,0,9,1,Yes,DSL,...,No,No,No,Yes,Month-to-month,0,Mailed check,59.9,542.4,1.996667
2,0004-TLHLJ,1,Male,0,0,0,4,1,No,Fiber optic,...,Yes,No,No,No,Month-to-month,1,Electronic check,73.9,280.85,2.463333
3,0011-IGKFF,1,Male,1,1,0,13,1,No,Fiber optic,...,Yes,No,Yes,Yes,Month-to-month,1,Electronic check,98.0,1237.85,3.266667
4,0013-EXCHZ,1,Female,1,1,0,3,1,No,Fiber optic,...,No,Yes,Yes,No,Month-to-month,1,Mailed check,83.9,267.4,2.796667


### 1.2 Atención sobre el archivo

Utilizamos el mismo archivo que limpiamos y organizamos en la Parte 1 del desafío Telecom X. Debe contener solo las columnas relevantes, ya con los datos corregidos y estandarizados.

### 1.3 Eliminación de Columnas Irrelevantes

In [None]:
df = df.drop(['customerID', 'Cuentas_Diarias'], axis=1)
df.shape

(7043, 20)

In [3]:
df.head()


Unnamed: 0,Churn,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,Charges.Monthly,Charges.Total
0,0,Female,0,1,1,9,1,No,DSL,No,Yes,No,Yes,Yes,No,One year,1,Mailed check,65.6,593.3
1,0,Male,0,0,0,9,1,Yes,DSL,No,No,No,No,No,Yes,Month-to-month,0,Mailed check,59.9,542.4
2,1,Male,0,0,0,4,1,No,Fiber optic,No,No,Yes,No,No,No,Month-to-month,1,Electronic check,73.9,280.85
3,1,Male,1,1,0,13,1,No,Fiber optic,No,Yes,Yes,No,Yes,Yes,Month-to-month,1,Electronic check,98.0,1237.85
4,1,Female,1,1,0,3,1,No,Fiber optic,No,No,No,Yes,Yes,No,Month-to-month,1,Mailed check,83.9,267.4


### 1.4 Encoding

In [None]:
df_encoded = pd.get_dummies(df, drop_first=True)
df_encoded.head()

### 1.5 Verificación de la Proporción de Cancelación (Churn)

In [None]:
df_encoded['Churn_Yes'].value_counts(normalize=True)

### 1.6 Balanceo de Clases (opcional)

### 1.7 Normalización o Estandarización (si es necesario)

# 2. Correlación y selección de variables

### 2.1 Análisis de Correlación

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(20, 15))
sns.heatmap(df_encoded.corr(), annot=True, cmap='coolwarm')
plt.show()

### 2.2 Análisis Dirigido

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

sns.boxplot(x='Churn_Yes', y='tenure', data=df_encoded, ax=axes[0])
axes[0].set_title('Tiempo de contrato vs. Cancelación')

sns.boxplot(x='Churn_Yes', y='TotalCharges', data=df_encoded, ax=axes[1])
axes[1].set_title('Gasto total vs. Cancelación')

plt.show()