# **Challenge TelecomX - Predicción De Cancelación**
## 🎯 **Misión**

La nueva misión es desarrollar modelos predictivos capaces de prever qué clientes tienen mayor probabilidad de cancelar sus servicios.  
La empresa TelecomX quiere anticiparse al problema de la cancelación, por lo que se pretende construir un pipeline robusto para esta etapa inicial de modelado.

## 🧠 **Objetivos del Desafío**

- Preparar los datos para el modelado (tratamiento, codificación, normalización).
- Realizar análisis de correlación y selección de variables.
- Entrenar dos o más modelos de clasificación.
- Evaluar el rendimiento de los modelos con métricas.
- Interpretar los resultados, incluyendo la importancia de las variables.
- Crear una conclusión estratégica señalando los principales factores que influyen en la cancelación.

## **Preparacion de los datos**

### **Extracción**

In [24]:
import pandas as pd
# Importando los datos
datos = pd.read_csv('DataBase/TelecomXData.csv')
datos.head()

Unnamed: 0,customerID,Churn,customer.gender,customer.SeniorCitizen,customer.Partner,customer.Dependents,customer.tenure,phone.PhoneService,phone.MultipleLines,internet.InternetService,...,internet.DeviceProtection,internet.TechSupport,internet.StreamingTV,internet.StreamingMovies,account.Contract,account.PaperlessBilling,account.PaymentMethod,account.Charges.Monthly,account.Charges.daily,account.Charges.Total
0,0002-ORFBO,0,Female,0,1,1,9,1,No,DSL,...,No,Yes,Yes,No,One year,1,Mailed check,65.6,2.186667,593.3
1,0003-MKNFE,0,Male,0,0,0,9,1,Yes,DSL,...,No,No,No,Yes,Month-to-month,0,Mailed check,59.9,1.996667,542.4
2,0004-TLHLJ,1,Male,0,0,0,4,1,No,Fiber optic,...,Yes,No,No,No,Month-to-month,1,Electronic check,73.9,2.463333,280.85
3,0011-IGKFF,1,Male,1,1,0,13,1,No,Fiber optic,...,Yes,No,Yes,Yes,Month-to-month,1,Electronic check,98.0,3.266667,1237.85
4,0013-EXCHZ,1,Female,1,1,0,3,1,No,Fiber optic,...,No,Yes,Yes,No,Month-to-month,1,Mailed check,83.9,2.796667,267.4


In [25]:
datos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7256 entries, 0 to 7255
Data columns (total 22 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   customerID                 7256 non-null   object 
 1   Churn                      7256 non-null   int64  
 2   customer.gender            7256 non-null   object 
 3   customer.SeniorCitizen     7256 non-null   int64  
 4   customer.Partner           7256 non-null   int64  
 5   customer.Dependents        7256 non-null   int64  
 6   customer.tenure            7256 non-null   int64  
 7   phone.PhoneService         7256 non-null   int64  
 8   phone.MultipleLines        7256 non-null   object 
 9   internet.InternetService   7256 non-null   object 
 10  internet.OnlineSecurity    7256 non-null   object 
 11  internet.OnlineBackup      7256 non-null   object 
 12  internet.DeviceProtection  7256 non-null   object 
 13  internet.TechSupport       7256 non-null   objec

### **Eliminación de Columnas Irrelevantes**

In [None]:
# Eliminamos el identificadoe ID del cliente ya que no nos da información sobre si es probable que deje el servicio.
# Eliminamos el cargo diario, pues proviene del cargo mensual, ademas de que el este cargo mensual tiene menos decimales. Con esto reducimo el riesgo de errores futuros en el modelo.
datos = datos.drop(columns = ['customerID','account.Charges.daily'])

Unnamed: 0,Churn,customer.gender,customer.SeniorCitizen,customer.Partner,customer.Dependents,customer.tenure,phone.PhoneService,phone.MultipleLines,internet.InternetService,internet.OnlineSecurity,internet.OnlineBackup,internet.DeviceProtection,internet.TechSupport,internet.StreamingTV,internet.StreamingMovies,account.Contract,account.PaperlessBilling,account.PaymentMethod,account.Charges.Monthly,account.Charges.Total
0,0,Female,0,1,1,9,1,No,DSL,No,Yes,No,Yes,Yes,No,One year,1,Mailed check,65.6,593.3
1,0,Male,0,0,0,9,1,Yes,DSL,No,No,No,No,No,Yes,Month-to-month,0,Mailed check,59.9,542.4
2,1,Male,0,0,0,4,1,No,Fiber optic,No,No,Yes,No,No,No,Month-to-month,1,Electronic check,73.9,280.85
3,1,Male,1,1,0,13,1,No,Fiber optic,No,Yes,Yes,No,Yes,Yes,Month-to-month,1,Electronic check,98.0,1237.85
4,1,Female,1,1,0,3,1,No,Fiber optic,No,No,No,Yes,Yes,No,Month-to-month,1,Mailed check,83.9,267.4


In [29]:
datos.sample(10)

Unnamed: 0,Churn,customer.gender,customer.SeniorCitizen,customer.Partner,customer.Dependents,customer.tenure,phone.PhoneService,phone.MultipleLines,internet.InternetService,internet.OnlineSecurity,internet.OnlineBackup,internet.DeviceProtection,internet.TechSupport,internet.StreamingTV,internet.StreamingMovies,account.Contract,account.PaperlessBilling,account.PaymentMethod,account.Charges.Monthly,account.Charges.Total
2163,1,Male,0,0,0,9,1,No,DSL,No,No,No,No,No,No,Month-to-month,0,Mailed check,44.95,431.0
1789,0,Male,0,1,0,72,1,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,0,Bank transfer (automatic),20.7,1492.1
6771,1,Female,1,1,0,43,1,Yes,Fiber optic,Yes,No,Yes,Yes,Yes,Yes,One year,1,Mailed check,109.55,4830.25
2518,0,Female,0,1,0,32,1,No,Fiber optic,Yes,No,No,No,No,No,Month-to-month,1,Bank transfer (automatic),75.5,2324.7
2653,0,Female,0,1,1,72,0,No phone service,DSL,No,Yes,No,Yes,Yes,Yes,Two year,0,Credit card (automatic),53.65,3784.0
4459,0,Male,1,1,0,41,0,No phone service,DSL,No,Yes,Yes,No,Yes,Yes,Month-to-month,1,Electronic check,53.95,2215.4
2915,0,Male,0,0,0,3,1,No,DSL,No,No,No,No,No,No,Month-to-month,0,Mailed check,45.45,141.7
4036,0,Female,0,1,0,20,1,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,0,Bank transfer (automatic),19.25,375.25
805,0,Female,0,1,0,30,1,No,DSL,No,Yes,Yes,Yes,No,Yes,One year,0,Bank transfer (automatic),70.25,2198.9
1503,0,Male,0,1,1,63,0,No phone service,DSL,No,Yes,Yes,Yes,No,No,Two year,0,Bank transfer (automatic),39.35,2395.05


### **Transformando las columnas binarias** 

In [30]:
datos['customer.gender'] = datos['customer.gender'].map({'Male': 1, 'Female': 0})
datos.head()

Unnamed: 0,Churn,customer.gender,customer.SeniorCitizen,customer.Partner,customer.Dependents,customer.tenure,phone.PhoneService,phone.MultipleLines,internet.InternetService,internet.OnlineSecurity,internet.OnlineBackup,internet.DeviceProtection,internet.TechSupport,internet.StreamingTV,internet.StreamingMovies,account.Contract,account.PaperlessBilling,account.PaymentMethod,account.Charges.Monthly,account.Charges.Total
0,0,0,0,1,1,9,1,No,DSL,No,Yes,No,Yes,Yes,No,One year,1,Mailed check,65.6,593.3
1,0,1,0,0,0,9,1,Yes,DSL,No,No,No,No,No,Yes,Month-to-month,0,Mailed check,59.9,542.4
2,1,1,0,0,0,4,1,No,Fiber optic,No,No,Yes,No,No,No,Month-to-month,1,Electronic check,73.9,280.85
3,1,1,1,1,0,13,1,No,Fiber optic,No,Yes,Yes,No,Yes,Yes,Month-to-month,1,Electronic check,98.0,1237.85
4,1,0,1,1,0,3,1,No,Fiber optic,No,No,No,Yes,Yes,No,Month-to-month,1,Mailed check,83.9,267.4


In [31]:
datos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7256 entries, 0 to 7255
Data columns (total 20 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Churn                      7256 non-null   int64  
 1   customer.gender            7256 non-null   int64  
 2   customer.SeniorCitizen     7256 non-null   int64  
 3   customer.Partner           7256 non-null   int64  
 4   customer.Dependents        7256 non-null   int64  
 5   customer.tenure            7256 non-null   int64  
 6   phone.PhoneService         7256 non-null   int64  
 7   phone.MultipleLines        7256 non-null   object 
 8   internet.InternetService   7256 non-null   object 
 9   internet.OnlineSecurity    7256 non-null   object 
 10  internet.OnlineBackup      7256 non-null   object 
 11  internet.DeviceProtection  7256 non-null   object 
 12  internet.TechSupport       7256 non-null   object 
 13  internet.StreamingTV       7256 non-null   objec

### **Encoding**

In [32]:
categoricas = [
    'phone.MultipleLines', 'internet.InternetService', 'internet.OnlineSecurity', 'internet.OnlineBackup', 'internet.DeviceProtection',
    'internet.TechSupport', 'internet.StreamingTV', 'internet.StreamingMovies', 'account.Contract', 'account.PaymentMethod']
pd.get_dummies(data=datos,columns=categoricas, dtype=int).head()

Unnamed: 0,Churn,customer.gender,customer.SeniorCitizen,customer.Partner,customer.Dependents,customer.tenure,phone.PhoneService,account.PaperlessBilling,account.Charges.Monthly,account.Charges.Total,...,internet.StreamingMovies_No,internet.StreamingMovies_No internet service,internet.StreamingMovies_Yes,account.Contract_Month-to-month,account.Contract_One year,account.Contract_Two year,account.PaymentMethod_Bank transfer (automatic),account.PaymentMethod_Credit card (automatic),account.PaymentMethod_Electronic check,account.PaymentMethod_Mailed check
0,0,0,0,1,1,9,1,1,65.6,593.3,...,1,0,0,0,1,0,0,0,0,1
1,0,1,0,0,0,9,1,0,59.9,542.4,...,0,0,1,1,0,0,0,0,0,1
2,1,1,0,0,0,4,1,1,73.9,280.85,...,1,0,0,1,0,0,0,0,1,0
3,1,1,1,1,0,13,1,1,98.0,1237.85,...,0,0,1,1,0,0,0,0,1,0
4,1,0,1,1,0,3,1,1,83.9,267.4,...,1,0,0,1,0,0,0,0,0,1


In [33]:
datos_codificados = pd.get_dummies(data=datos,columns=categoricas, dtype=int)
datos_codificados.sample(5)

Unnamed: 0,Churn,customer.gender,customer.SeniorCitizen,customer.Partner,customer.Dependents,customer.tenure,phone.PhoneService,account.PaperlessBilling,account.Charges.Monthly,account.Charges.Total,...,internet.StreamingMovies_No,internet.StreamingMovies_No internet service,internet.StreamingMovies_Yes,account.Contract_Month-to-month,account.Contract_One year,account.Contract_Two year,account.PaymentMethod_Bank transfer (automatic),account.PaymentMethod_Credit card (automatic),account.PaymentMethod_Electronic check,account.PaymentMethod_Mailed check
5276,0,1,1,1,0,17,1,1,94.8,1563.9,...,0,0,1,1,0,0,0,0,1,0
2615,0,0,0,0,0,61,1,0,88.65,5321.25,...,0,0,1,0,0,1,0,1,0,0
2261,0,1,0,1,1,53,1,1,92.55,4779.45,...,1,0,0,1,0,0,0,0,1,0
7183,0,0,0,0,0,1,0,0,24.4,24.4,...,1,0,0,1,0,0,0,0,0,1
3419,0,0,0,0,0,4,1,1,75.65,302.35,...,1,0,0,1,0,0,0,0,1,0


In [35]:
datos_codificados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7256 entries, 0 to 7255
Data columns (total 41 columns):
 #   Column                                           Non-Null Count  Dtype  
---  ------                                           --------------  -----  
 0   Churn                                            7256 non-null   int64  
 1   customer.gender                                  7256 non-null   int64  
 2   customer.SeniorCitizen                           7256 non-null   int64  
 3   customer.Partner                                 7256 non-null   int64  
 4   customer.Dependents                              7256 non-null   int64  
 5   customer.tenure                                  7256 non-null   int64  
 6   phone.PhoneService                               7256 non-null   int64  
 7   account.PaperlessBilling                         7256 non-null   int64  
 8   account.Charges.Monthly                          7256 non-null   float64
 9   account.Charges.Total         

### **Balanceo de clases**

Verificación de la proporción de cancelación (Churn) afin que no haya un gran desbalance entre los datos, lo cual puede afectar al modelo

In [None]:
datos_codificados['Churn'].value_counts(normalize=True)

Churn
0    0.74242
1    0.25758
Name: proportion, dtype: float64

#### **Balanceo oversampling**

In [None]:
# Importar biblioteca
from imblearn.over_sampling import SMOTE