<a href="https://colab.research.google.com/github/awildt01/Predicting-Churn/blob/main/churn_de.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Churn Prediction

*Churn rate*, ou simplesmente *churn*, representa a taxa de evasão da sua base de clientes. Em serviços como Spotify ou Netflix, ela representaria a taxa de cancelamento de assinaturas.

<p align=center>
<img src="https://raw.githubusercontent.com/carlosfab/dsnp2/master/img/churnrate.jpg" width="60%"></p>

Ela é de extrema importância para a administração e sua análise ao longo do tempo pode mostrar que existe algum problema que deve ser atacado.

*Churn* também pode ser usado para identificar potenciais cancelamentos, com um tempo de antecedência, e promover ações direcionadas para tentar reter tais clientes. Essa métrica deve receber atenção pelo fato de que o Custo de Aquisição de Cliente (CAC) é normalmente mais alto que o custo para mantê-los. Ou seja, um alto valor para o *churn rate* é o que não desejamos.

## Datenbeschaffung

Die Kundenabwanderungsdaten von Telekommunikationsunternehmen enthalten Informationen über ein fiktives Telekommunikationsunternehmen, das im dritten Quartal 7043 Kunden in Kalifornien Telefon- und Internetdienste zu Hause bereitgestellt hat. Es zeigt an, welche Kunden ihren Service verlassen, geblieben oder sich für ihn angemeldet haben. Für jeden Kunden sind mehrere wichtige Daten sowie ein Zufriedenheitswert, ein Abwanderungswert und ein CLTV-Index (Customer Lifetime Value) enthalten.

* *Website [neste link](https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113).*



### Die Herausforderung der Analyse

Die Analyse einer neuen Krankheit, wie einer globalen Pandemie, ist äußerst herausfordernd. Der Mangel an Vorwissen über die Krankheit erschwert das Verständnis ihrer Verbreitung und ihres Einflusses auf die öffentliche Gesundheit. Darüber hinaus gibt es Konflikte zwischen den von Bundes- und Landesregierungen bereitgestellten Daten sowie von inoffiziellen Quellen. Regierungen können politische und wirtschaftliche Gründe haben, um Daten auf spezifische Weise darzustellen, was zu Diskrepanzen in den offiziellen Statistiken führt. Unterschiede in den Methoden der Datensammlung und bei der Definition von Fällen tragen ebenfalls zur Diskrepanz der Informationen bei. In dieser herausfordernden Situation ist es für Analysten und Forscher entscheidend, strenge Methoden, zuverlässige Quellen und transparente Ansätze bei der Analyse von Daten zu neuen Krankheiten zu verwenden. Internationale Zusammenarbeit, Transparenz in den Datensammlungsprozessen und effektive Kommunikation sind wesentlich, um eine neue Krankheit angemessen zu verstehen und darauf zu reagieren.

### Wörterbuch der Variablen
* *dicionário de variáveis.*
    * *Die Beschreibung der Spaltennamen [neste link](https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113).*


**customerID** **->** Customer ID.

**gender** **->** Whether the customer is a male or a female.

**SeniorCitizen** **->** Indicates if the customer is 65 (1, 0).

**Partner** **->** Whether the customer has a partner or not (Yes, No).

**Dependents** **->** Indicates if the customer lives with any dependents: Yes, No. Dependents could be children, parents, grandparents, etc.

**tenure** **->** Number of months the customer has stayed with the company

**PhoneService**  **->** Whether the customer has a phone service or not (Yes, No).

**MultipleLines** **->** Indicates if the customer subscribes to multiple telephone lines with the company: (Yes, No, No phone service).

**InternetService**  **->** Indicates if the customer subscribes to Internet service with the company: No, DSL, Fiber Optic, Cable.

**OnlineSecurity**  **->** Indicates if the customer subscribes to an additional online security service provided by the company: Yes, No.

**OnlineBackup**  **->** Indicates if the customer subscribes to an additional online backup service provided by the company: Yes, No.

**DeviceProtection**  **->** Indicates if the customer subscribes to an additional device protection plan for their Internet equipment provided by the company:Yes, No.

**TechSupport**  **->** Indicates if the customer subscribes to an additional technical support plan from the company with reduced wait times: Yes, No

**StreamingTV**  **->**  Indicates if the customer uses their Internet service to stream television programing from a third party provider: Yes, No. The company does not charge an additional fee for this service.

**StreamingMovies**  **->** Indicates if the customer uses their Internet service to stream movies from a third party provider: Yes, No. The company does not charge an additional fee for this service.

**Contract**  **->** Indicates the customer’s current contract type: Month-to-Month, One Year, Two Year.

**PaperlessBilling**  **->** Indicates if the customer has chosen paperless billing: Yes, No.

**PaymentMethod**  **->** Indicates how the customer pays their bill: Bank Withdrawal, Credit Card, Mailed Check.

**MonthlyCharges**  **->** Indicates the customer’s current total monthly charge for all their services from the company.

**TotalCharges**  **->** Indicates the customer’s total charges, calculated to the end of the quarter specified above.

**Churn**  **->** Churn






In [7]:
# importar os pacotes necessários
# !pip install bar_chart_race -q
# !pip install plotly

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import plotly.express as px
import plotly.graph_objects as go
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import bokeh.io
import bokeh.plotting
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure, show
from bokeh.transform import factor_cmap
from bokeh.models import Range1d
from bokeh.models import LabelSet

# para não limitar a visualização do número de colunas
pd.set_option('display.max_columns', None)

# renderers
import plotly.io as pio
pio.renderers.default = 'colab'

sns.set_style()
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

# para não limitar a visualização do número de colunas
pd.set_option('display.max_columns', None)

# importar os dados
DATA_PATH = "https://raw.githubusercontent.com/carlosfab/dsnp2/master/datasets/WA_Fn-UseC_-Telco-Customer-Churn.csv"
df = pd.read_csv(DATA_PATH)

In [8]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [9]:
print("Linhas:\t\t{}".format(df.shape[0]))
print("Colunas:\t{}".format(df.shape[1]))

Linhas:		7043
Colunas:	21
