#Sistema de Predicción de Abandono de clientes (Customer Churn)

##### Planteamiento del problema

La palabra "abandono" hace referencia a un cliente que ha decidido poner fin a su relación con la empresa y ya no utiliza sus productos o servicios. Las razones pueden ser desde productos defectuosos hasta servicios de postventa inadecuados. 
Para contrarrestar el abandono de los clientes, muchas compañías están comenzando a predecir la rotación de clientes y están tomando medidas para detener esa tendencia con la ayuda del machine learning.

Mediante el uso de tecnologías Big Data y procesamiento distribuido en la nube (Cloud Computing Clúster) este proyecto tiene como objetivo:<p>
* Desarrollar una propuesta de predicción de abandono de clientes para identificar de manera anticipada aquellos clientes que tienen una alta probabilidad de abandonar la empresa.
  
##### Acerca del conjunto de los datos

El dataset proviene de IBM, es una empresa de telecomunicaciones. Cada registro de este conjunto de datos muestra la información de un suscriptor, el dataset contiene información sobre:<p>

- Tenure: Tiempo en el que el cliente ha estado en la empresa.
- Churn: Indica si el cliente es suscriptor o no.
- Información del cliente (Servicios contratados, genero, método de pago, antigüedad, entre otros).

### Paso 1: Crear el objeto de sesión spark

In [0]:
#Cargar librerias
#SparkSession:
#types: es una clase base de todos los tipos de datos en PySpark
from pyspark.sql import SparkSession  
from pyspark.sql.types import *

In [0]:
# crear el objeto SparkSession
spark = SparkSession.builder.appName('log_reg').getOrCreate()

### Paso 2: Lectura del Dataset

In [0]:
# df: crea un marco de datos de Spark con los valores de nuestro archivo de datos de muestra
df = spark.read.csv('/FileStore/tables/WA_Fn_UseC__Telco_Customer_Churn-5.csv', inferSchema=True, header=True)

### Paso 3: Análisis Exploratorio de Datos

In [0]:
#df.count(): numeros de registros
#len(df.columns): Para validar el número de columnas
print((df.count(), len(df.columns)))

(7043, 21)


In [0]:
# ver las columnas del dataframe
df.printSchema()  

root
 |-- customerID: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- SeniorCitizen: integer (nullable = true)
 |-- Partner: string (nullable = true)
 |-- Dependents: string (nullable = true)
 |-- tenure: integer (nullable = true)
 |-- PhoneService: string (nullable = true)
 |-- MultipleLines: string (nullable = true)
 |-- InternetService: string (nullable = true)
 |-- OnlineSecurity: string (nullable = true)
 |-- OnlineBackup: string (nullable = true)
 |-- DeviceProtection: string (nullable = true)
 |-- TechSupport: string (nullable = true)
 |-- StreamingTV: string (nullable = true)
 |-- StreamingMovies: string (nullable = true)
 |-- Contract: string (nullable = true)
 |-- PaperlessBilling: string (nullable = true)
 |-- PaymentMethod: string (nullable = true)
 |-- MonthlyCharges: double (nullable = true)
 |-- TotalCharges: string (nullable = true)
 |-- Churn: string (nullable = true)



In [0]:
display(df.limit(20))

customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes
9305-CDSKC,Female,0,No,No,8,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,Yes
1452-KIOVK,Male,0,No,Yes,22,Yes,Yes,Fiber optic,No,Yes,No,No,Yes,No,Month-to-month,Yes,Credit card (automatic),89.1,1949.4,No
6713-OKOMC,Female,0,No,No,10,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,No,Mailed check,29.75,301.9,No
7892-POOKP,Female,0,Yes,No,28,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,Yes
6388-TABGU,Male,0,No,Yes,62,Yes,No,DSL,Yes,Yes,No,No,No,No,One year,No,Bank transfer (automatic),56.15,3487.95,No


In [0]:
#describe(): calcula un resumen de las estadísticas correspondientes a las columnas del DataFrame. Esta función da los valores de media. Además, la función excluye las columnas de caracteres y da un resumen sobre las columnas numéricas.
df.describe().display()

summary,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
count,7043,7043,7043.0,7043,7043,7043.0,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043.0,7043.0,7043
mean,,,0.1621468124378816,,,32.37114865824223,,,,,,,,,,,,,64.76169246059922,2283.3004408418697,
stddev,,,0.3686116056100135,,,24.55948102309444,,,,,,,,,,,,,30.09004709767848,2266.771361883145,
min,0002-ORFBO,Female,0.0,No,No,0.0,No,No,DSL,No,No,No,No,No,No,Month-to-month,No,Bank transfer (automatic),18.25,,No
max,9995-HOTOH,Male,1.0,Yes,Yes,72.0,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Mailed check,118.75,999.9,Yes


In [0]:
df.groupBy('internetService').count().show()

+---------------+-----+
|internetService|count|
+---------------+-----+
|    Fiber optic| 3096|
|             No| 1526|
|            DSL| 2421|
+---------------+-----+



In [0]:
#groupBy():se utiliza para dividir los datos en grupos en función de algún criterio.
#mean(): es una función que se utiliza para calcular la media de números y listas
df.groupBy('internetService').mean().show()

+---------------+-------------------+-----------------+-------------------+
|internetService| avg(SeniorCitizen)|      avg(tenure)|avg(MonthlyCharges)|
+---------------+-------------------+-----------------+-------------------+
|    Fiber optic|0.26841085271317827|32.91795865633075|  91.50012919896615|
|             No|0.03407601572739188|30.54718217562254| 21.079193971166454|
|            DSL|0.10698058653448989|32.82156133828996|  58.10216852540261|
+---------------+-------------------+-----------------+-------------------+



In [0]:
# display(): para visualizar
display(df.groupBy("gender").count())

gender,count
Female,3488
Male,3555


In [0]:
df.groupBy("Churn").count().show()

+-----+-----+
|Churn|count|
+-----+-----+
|   No| 5174|
|  Yes| 1869|
+-----+-----+



In [0]:
df.filter(df["Churn"]=="Yes").groupBy('gender').count().display()

gender,count
Female,939
Male,930


Databricks visualization. Run in Databricks to view.

In [0]:
 #clientes que abandonan o no
df.groupBy('churn').count().display()

churn,count
No,5174
Yes,1869


In [0]:
display(df.groupBy('churn').mean())

churn,avg(SeniorCitizen),avg(tenure),avg(MonthlyCharges)
No,0.1287205257054503,37.56996521066873,61.2651236953999
Yes,0.2546816479400749,17.979133226324237,74.4413322632423


In [0]:
display(df.select('internetService', 'gender').groupBy('internetService').count())

internetService,count
Fiber optic,3096
No,1526
DSL,2421


In [0]:
df.groupBy('contract').count().display()

contract,count
Month-to-month,3875
One year,1473
Two year,1695


In [0]:
 df.groupBy('churn').mean().display()

churn,avg(SeniorCitizen),avg(tenure),avg(MonthlyCharges)
No,0.1287205257054503,37.56996521066873,61.2651236953999
Yes,0.2546816479400749,17.979133226324237,74.4413322632423


In [0]:
df.groupBy('contract').mean().display()

contract,avg(SeniorCitizen),avg(tenure),avg(MonthlyCharges)
Month-to-month,0.208258064516129,18.036645161290323,66.39849032258037
One year,0.1289884589273591,42.044806517311606,65.04860828241674
Two year,0.0855457227138643,56.73510324483776,60.770412979351


In [0]:
display(df.groupBy("partner").count())

partner,count
No,3641
Yes,3402


### Paso 4: Ingeniería de Características

In [0]:
from pyspark.ml.feature import StringIndexer, IndexToString
from pyspark.ml.feature import VectorAssembler

In [0]:
  # Pirmero se etiquetan las siguientes columnas de forma númerica
  churn_index = StringIndexer(inputCol='Churn', outputCol='Churn_Num').fit(df)
  Partner_index = StringIndexer(inputCol='Partner', outputCol='Partner_Num', handleInvalid='skip').fit(df)
  Dependents_index = StringIndexer(inputCol='Dependents', outputCol='Dependents_Num', handleInvalid='skip').fit(df)
  PaperlessBilling_index = StringIndexer(inputCol='PaperlessBilling', outputCol='PaperlessBilling_Num', handleInvalid='skip').fit(df)
  PhoneService_index = StringIndexer(inputCol='PhoneService', outputCol='PhoneService_Num', handleInvalid='skip').fit(df)
  OnlineSecurity_index = StringIndexer(inputCol='OnlineSecurity', outputCol='OnlineSecurity_Num', handleInvalid='skip').fit(df)
  OnlineBackup_index = StringIndexer(inputCol='OnlineBackup', outputCol='OnlineBackup_Num', handleInvalid='skip').fit(df)
  DeviceProtection_index = StringIndexer(inputCol='DeviceProtection', outputCol='DeviceProtection_Num', handleInvalid='skip').fit(df)
  TechSupport_index = StringIndexer(inputCol='TechSupport', outputCol='TechSupport_Num', handleInvalid='skip').fit(df)
  StreamingTV_index = StringIndexer(inputCol='StreamingTV', outputCol='StreamingTV_Num', handleInvalid='skip').fit(df)
  StreamingMovies_index = StringIndexer(inputCol='StreamingMovies', outputCol='StreamingMovies_Num', handleInvalid='skip').fit(df)

In [0]:
df = churn_index.transform(df)
df = Partner_index.transform(df)
df = Dependents_index.transform(df)
df = PaperlessBilling_index.transform(df)
df = PhoneService_index.transform(df)
df = OnlineSecurity_index.transform(df)
df = OnlineBackup_index.transform(df)
df = DeviceProtection_index.transform(df)
df = TechSupport_index.transform(df)
df = StreamingTV_index.transform(df)
df = StreamingMovies_index.transform(df)

In [0]:
display(df.limit(20))

customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,Churn_Num,Partner_Num,Dependents_Num,PaperlessBilling_Num,PhoneService_Num,OnlineSecurity_Num,OnlineBackup_Num,DeviceProtection_Num,TechSupport_Num,StreamingTV_Num,StreamingMovies_Num
7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0
3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0
9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9305-CDSKC,Female,0,No,No,8,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,Yes,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0
1452-KIOVK,Male,0,No,Yes,22,Yes,Yes,Fiber optic,No,Yes,No,No,Yes,No,Month-to-month,Yes,Credit card (automatic),89.1,1949.4,No,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
6713-OKOMC,Female,0,No,No,10,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,No,Mailed check,29.75,301.9,No,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0
7892-POOKP,Female,0,Yes,No,28,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,Yes,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0
6388-TABGU,Male,0,No,Yes,62,Yes,No,DSL,Yes,Yes,No,No,No,No,One year,No,Bank transfer (automatic),56.15,3487.95,No,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0


In [0]:
df.describe()

Out[111]: DataFrame[summary: string, customerID: string, gender: string, SeniorCitizen: string, Partner: string, Dependents: string, tenure: string, PhoneService: string, MultipleLines: string, InternetService: string, OnlineSecurity: string, OnlineBackup: string, DeviceProtection: string, TechSupport: string, StreamingTV: string, StreamingMovies: string, Contract: string, PaperlessBilling: string, PaymentMethod: string, MonthlyCharges: string, TotalCharges: string, Churn: string, Churn_Num: string, Partner_Num: string, Dependents_Num: string, PaperlessBilling_Num: string, PhoneService_Num: string, OnlineSecurity_Num: string, OnlineBackup_Num: string, DeviceProtection_Num: string, TechSupport_Num: string, StreamingTV_Num: string, StreamingMovies_Num: string]

In [0]:
df.groupBy("Churn").count().show()

+-----+-----+
|Churn|count|
+-----+-----+
|   No| 5174|
|  Yes| 1869|
+-----+-----+



In [0]:
df.groupBy("Churn_Num").count().show()

+---------+-----+
|Churn_Num|count|
+---------+-----+
|      0.0| 5174|
|      1.0| 1869|
+---------+-----+



In [0]:
from pyspark.ml.feature import OneHotEncoder

In [0]:
partner_encoder = OneHotEncoder(inputCol='Partner_Num', outputCol='Partner_Vector').fit(df)
Dependents_encoder = OneHotEncoder(inputCol='Dependents_Num', outputCol='Dependents_Vector').fit(df)
PaperlessBilling_encoder = OneHotEncoder(inputCol='PaperlessBilling_Num', outputCol='PaperlessBilling_Vector').fit(df)
PhoneService_encoder = OneHotEncoder(inputCol='PhoneService_Num', outputCol='PhoneService_Vector').fit(df)
OnlineSecurity_encoder = OneHotEncoder(inputCol='OnlineSecurity_Num', outputCol='OnlineSecurity_Vector').fit(df)
OnlineBackup_encoder = OneHotEncoder(inputCol='OnlineBackup_Num', outputCol='OnlineBackup_Vector').fit(df)
DeviceProtection_encoder = OneHotEncoder(inputCol='DeviceProtection_Num', outputCol='DeviceProtection_Vector').fit(df)
TechSupport_encoder = OneHotEncoder(inputCol='TechSupport_Num', outputCol='TechSupport_Vector').fit(df)
StreamingTV_encoder = OneHotEncoder(inputCol='StreamingTV_Num', outputCol='StreamingTV_Vector').fit(df)
StreamingMovies_encoder = OneHotEncoder(inputCol='StreamingMovies_Num', outputCol='StreamingMovies_Vector').fit(df)

In [0]:
df = partner_encoder.transform(df)
df = Dependents_encoder.transform(df)
df = PaperlessBilling_encoder.transform(df)
df = PhoneService_encoder.transform(df)
df = OnlineSecurity_encoder.transform(df)
df = OnlineBackup_encoder.transform(df)
df = DeviceProtection_encoder.transform(df)
df = TechSupport_encoder.transform(df)
df = StreamingTV_encoder.transform(df)
df = StreamingMovies_encoder.transform(df)

In [0]:
df.groupBy('PhoneService_Vector').count().show(5,False)

+-------------------+-----+
|PhoneService_Vector|count|
+-------------------+-----+
|(1,[0],[1.0])      |6361 |
|(1,[],[])          |682  |
+-------------------+-----+



In [0]:
df_assembler = VectorAssembler(inputCols = ["Partner_Vector", "Dependents_Vector", 
                                            "PaperlessBilling_Vector", "PhoneService_Vector", "OnlineSecurity_Vector", 
                                            "OnlineBackup_Vector", "DeviceProtection_Vector", "TechSupport_Vector",
                                            "StreamingTV_Vector", "StreamingMovies_Vector"], 
                               
                               outputCol="features")
df = df_assembler.transform(df)
df.printSchema()


root
 |-- customerID: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- SeniorCitizen: integer (nullable = true)
 |-- Partner: string (nullable = true)
 |-- Dependents: string (nullable = true)
 |-- tenure: integer (nullable = true)
 |-- PhoneService: string (nullable = true)
 |-- MultipleLines: string (nullable = true)
 |-- InternetService: string (nullable = true)
 |-- OnlineSecurity: string (nullable = true)
 |-- OnlineBackup: string (nullable = true)
 |-- DeviceProtection: string (nullable = true)
 |-- TechSupport: string (nullable = true)
 |-- StreamingTV: string (nullable = true)
 |-- StreamingMovies: string (nullable = true)
 |-- Contract: string (nullable = true)
 |-- PaperlessBilling: string (nullable = true)
 |-- PaymentMethod: string (nullable = true)
 |-- MonthlyCharges: double (nullable = true)
 |-- TotalCharges: string (nullable = true)
 |-- Churn: string (nullable = true)
 |-- Churn_Num: double (nullable = false)
 |-- Partner_Num: double (nullable = fal

In [0]:
df.select(['features', "Churn_Num"]).show(10,False)

+-----------------------------------------------------------------+---------+
|features                                                         |Churn_Num|
+-----------------------------------------------------------------+---------+
|(16,[1,2,4,7,8,10,12,14],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])      |0.0      |
|(16,[0,1,3,5,6,9,10,12,14],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])|0.0      |
|[1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0]|1.0      |
|(16,[0,1,5,6,9,11,12,14],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])      |0.0      |
|[1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0]|1.0      |
|[1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0]|1.0      |
|(16,[0,2,3,4,7,8,10,13,14],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])|0.0      |
|(16,[0,1,5,6,8,10,12,14],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])      |0.0      |
|(16,[1,2,3,4,6,9,11,13,15],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])|1.0      |
|(16,[0,3,5,7,8,10,12,14],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])    

In [0]:
model_df = df.select(['features', "Churn_Num"])

### Paso 5: Dividir el Conjunto de Datos

In [0]:
training_df, test_df = model_df.randomSplit([0.85,0.15])
print(training_df.count())

5971


In [0]:
training_df.groupBy('Churn_Num').count().show()

+---------+-----+
|Churn_Num|count|
+---------+-----+
|      0.0| 4402|
|      1.0| 1569|
+---------+-----+



In [0]:
print(test_df.count())

1072


In [0]:
test_df.groupBy('Churn_Num').count().show()

+---------+-----+
|Churn_Num|count|
+---------+-----+
|      0.0|  772|
|      1.0|  300|
+---------+-----+



### Paso 6: Construir y Entrenar el Modelo de Regresión Logística

In [0]:
#Se utiliza para la regresión logística, ya sea binomial o multinomial.
from pyspark.ml.classification import LogisticRegression

In [0]:
log_reg = LogisticRegression(labelCol='Churn_Num').fit(training_df)

#### Resultados del entrenamiento

In [0]:
train_results = log_reg.evaluate(training_df).predictions

In [0]:
train_results.filter(train_results['Churn_Num'] == 1).filter(train_results['prediction']==1).select(['Churn_Num','prediction','probability']).show(10,False)

+---------+----------+---------------------------------------+
|Churn_Num|prediction|probability                            |
+---------+----------+---------------------------------------+
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
|1.0      |1.0       |[0.4593646814312283,0.5406353185687717]|
+---------+----------+---------------------------------------+
only showing top 10 rows



### Parte 7: Evaluación del modelo de regresión lineal en los datos de prueba

In [0]:
results = log_reg.evaluate(test_df).predictions
results.printSchema()

root
 |-- features: vector (nullable = true)
 |-- Churn_Num: double (nullable = false)
 |-- rawPrediction: vector (nullable = true)
 |-- probability: vector (nullable = true)
 |-- prediction: double (nullable = false)



In [0]:
results.select(['Churn_Num','prediction']).show(10,False)

+---------+----------+
|Churn_Num|prediction|
+---------+----------+
|0.0      |0.0       |
|0.0      |0.0       |
|0.0      |0.0       |
|0.0      |0.0       |
|0.0      |0.0       |
|0.0      |0.0       |
|0.0      |0.0       |
|0.0      |0.0       |
|0.0      |0.0       |
|0.0      |0.0       |
+---------+----------+
only showing top 10 rows



##### Matriz de confusión

In [0]:
verdadero_positivo = results[(results.Churn_Num == 1) & (results.prediction == 1)].count()
verdadero_negativo = results[(results.Churn_Num == 0) & (results.prediction == 0)].count()
falso_positivo = results[(results.Churn_Num == 0) & (results.prediction == 1)].count()
falso_negativo = results[(results.Churn_Num == 1) & (results.prediction == 0)].count()

##### Accuracy

In [0]:
accuracy = float((verdadero_positivo + verdadero_negativo) / (results.count()))
print(accuracy)

0.7565298507462687


##### Retroceso (Recall)

In [0]:
recall = float(verdadero_positivo)/(verdadero_positivo + falso_negativo)
print(recall)

0.3433333333333333


d
##### Precisión

In [0]:
precision = float(verdadero_positivo) / (verdadero_positivo + falso_positivo)
print(precision)

0.6167664670658682
