# Desafío Telecom X parte 2

## Objetivos del desafío
- Preparar los datos para el modelado (tratamiento, codificación, normalización).
- Realizar análisis de correlación y selección de variables.
- Entrenar dos o más modelos de clasificación.
- Evaluar el rendimiento de los modelos con métricas.
- Interpretar los resultados, incluyendo la importancia de las variables.
- Crear una conclusión estratégica señalando los principales factores que influyen en la cancelación.

## Qué se pondrá en práctica
- Preprocesamiento de datos para Machine Learning
- Construcción y evaluación de modelos predictivos
- Interpretación de resultados y entrega de insights
- Comunicación técnica con enfoque estratégico

# Eliminación de columnas irrelevantes

In [5]:
import pandas as pd

In [6]:
datos = pd.read_csv('data/datos_tratados.csv')
datos.head()

Unnamed: 0,customerID,Churn,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,...,PhoneService_bin,MultipleLines_bin,InternetService_bin,OnlineSecurity_bin,OnlineBackup_bin,DeviceProtection_bin,TechSupport_bin,StreamingTV_bin,StreamingMovies_bin,servicios_contratados
0,0002-ORFBO,0,Female,0,1,1,9,1,No,DSL,...,0,0,0,0,1,0,1,1,0,3
1,0003-MKNFE,0,Male,0,0,0,9,1,Yes,DSL,...,0,1,0,0,0,0,0,0,1,2
2,0004-TLHLJ,1,Male,0,0,0,4,1,No,Fiber optic,...,0,0,0,0,0,1,0,0,0,1
3,0011-IGKFF,1,Male,1,1,0,13,1,No,Fiber optic,...,0,0,0,0,1,1,0,1,1,4
4,0013-EXCHZ,1,Female,1,1,0,3,1,No,Fiber optic,...,0,0,0,0,0,0,1,1,0,2


In [7]:
datos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 33 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   customerID             7043 non-null   object 
 1   Churn                  7043 non-null   int64  
 2   gender                 7043 non-null   object 
 3   SeniorCitizen          7043 non-null   int64  
 4   Partner                7043 non-null   int64  
 5   Dependents             7043 non-null   int64  
 6   tenure                 7043 non-null   int64  
 7   PhoneService           7043 non-null   int64  
 8   MultipleLines          7043 non-null   object 
 9   InternetService        7043 non-null   object 
 10  OnlineSecurity         7043 non-null   object 
 11  OnlineBackup           7043 non-null   object 
 12  DeviceProtection       7043 non-null   object 
 13  TechSupport            7043 non-null   object 
 14  StreamingTV            7043 non-null   object 
 15  Stre

In [11]:
datos.describe()

Unnamed: 0,Churn,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,Charges.Monthly,Charges.Total,Cuentas_Diarias,...,PhoneService_bin,MultipleLines_bin,InternetService_bin,OnlineSecurity_bin,OnlineBackup_bin,DeviceProtection_bin,TechSupport_bin,StreamingTV_bin,StreamingMovies_bin,servicios_contratados
count,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,...,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0
mean,0.26537,0.162147,0.483033,0.299588,32.371149,0.903166,0.592219,64.761692,2279.734304,2.158675,...,0.0,0.421837,0.0,0.286668,0.344881,0.343888,0.290217,0.384353,0.387903,2.459747
std,0.441561,0.368612,0.499748,0.45811,24.559481,0.295752,0.491457,30.090047,2266.79447,1.003088,...,0.0,0.493888,0.0,0.452237,0.475363,0.475038,0.453895,0.486477,0.487307,2.045539
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.25,0.0,0.61,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,9.0,1.0,0.0,35.5,398.55,1.18,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
50%,0.0,0.0,0.0,0.0,29.0,1.0,1.0,70.35,1394.55,2.34,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
75%,1.0,0.0,1.0,1.0,55.0,1.0,1.0,89.85,3786.6,2.99,...,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0
max,1.0,1.0,1.0,1.0,72.0,1.0,1.0,118.75,8684.8,3.96,...,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,7.0


In [15]:
datos.describe(include='O')

Unnamed: 0,gender,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaymentMethod
count,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043
unique,2,3,3,3,3,3,3,3,3,3,4
top,Male,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Electronic check
freq,3555,3390,3096,3498,3088,3095,3473,2810,2785,3875,2365


In [8]:
datos = datos.drop(columns='customerID')
datos.head()

Unnamed: 0,Churn,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,PhoneService_bin,MultipleLines_bin,InternetService_bin,OnlineSecurity_bin,OnlineBackup_bin,DeviceProtection_bin,TechSupport_bin,StreamingTV_bin,StreamingMovies_bin,servicios_contratados
0,0,Female,0,1,1,9,1,No,DSL,No,...,0,0,0,0,1,0,1,1,0,3
1,0,Male,0,0,0,9,1,Yes,DSL,No,...,0,1,0,0,0,0,0,0,1,2
2,1,Male,0,0,0,4,1,No,Fiber optic,No,...,0,0,0,0,0,1,0,0,0,1
3,1,Male,1,1,0,13,1,No,Fiber optic,No,...,0,0,0,0,1,1,0,1,1,4
4,1,Female,1,1,0,3,1,No,Fiber optic,No,...,0,0,0,0,0,0,1,1,0,2


# Encoding
> Transformación de variables categóricas a formato numérico

In [19]:
categoricas = [
    "gender",
    "MultipleLines",
    "InternetService",
    "OnlineSecurity",
    "OnlineBackup",
    "DeviceProtection",
    "TechSupport",
    "StreamingTV",
    "StreamingMovies",
    "Contract",
    "PaymentMethod",
]

datos_encoded = pd.get_dummies(data=datos, columns=categoricas, dtype=int)
datos_encoded.head()

Unnamed: 0,Churn,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,Charges.Monthly,Charges.Total,Cuentas_Diarias,...,StreamingMovies_No,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,0,0,1,1,9,1,1,65.6,593.3,2.19,...,1,0,0,0,1,0,0,0,0,1
1,0,0,0,0,9,1,0,59.9,542.4,2.0,...,0,0,1,1,0,0,0,0,0,1
2,1,0,0,0,4,1,1,73.9,280.85,2.46,...,1,0,0,1,0,0,0,0,1,0
3,1,1,1,0,13,1,1,98.0,1237.85,3.27,...,0,0,1,1,0,0,0,0,1,0
4,1,1,1,0,3,1,1,83.9,267.4,2.8,...,1,0,0,1,0,0,0,0,0,1
