# Análise de Dados com Python

### Desafio:

Você trabalha em uma empresa de telecom e tem clientes de vários serviços diferentes, entre os principais: internet e telefone.

O problema é que, analisando o histórico dos clientes dos últimos anos, você percebeu que a empresa está com Churn de mais de 26% dos clientes.

Isso representa uma perda de milhões para a empresa.

O que a empresa precisa fazer para resolver isso?

Base de Dados: https://drive.google.com/drive/folders/1T7D0BlWkNuy_MDpUHuBG44kT80EmRYIs?usp=sharing <br>
Link Original do Kaggle: https://www.kaggle.com/radmirzosimov/telecom-users-dataset

In [24]:
# step by step is allways the first step in any project
"""
pandas 
numpy
openpyxl

these are the needed libraries for data work
"""
# step 1: import database

import pandas as pd # data work library

table = pd.read_csv('telecom_users.csv') # in case we have the file in tha same directory as the python file program we dont need to use the file adress

# unhelpful information holds us back, so we can cut out some information like unhelpful collumns


# step 2: visualize database
# understand available info. this step is actually looking at the table to figure things out
# find out the database problems
# churn is a cancelling term

# display(table)

# exclude unwanted collumns
# axis = 0 line axis
# axis = 1 collumn axis
table = table.drop('Unnamed: 0', axis = 1) # drop have 2 parameters because you can exclude collumns as well as lines

display(table)





Unnamed: 0,IDCliente,Genero,Aposentado,Casado,Dependentes,MesesComoCliente,ServicoTelefone,MultiplasLinhas,ServicoInternet,ServicoSegurancaOnline,...,ServicoSuporteTecnico,ServicoStreamingTV,ServicoFilmes,TipoContrato,FaturaDigital,FormaPagamento,ValorMensal,TotalGasto,Churn,Codigo
0,7010-BRBUU,Masculino,0,Sim,Sim,72,Sim,Sim,Nao,SemInternet,...,SemInternet,SemInternet,SemInternet,2 anos,Nao,CartaoCredito,24.10,1734.65,Nao,
1,9688-YGXVR,Feminino,0,Nao,Nao,44,Sim,Nao,Fibra,Nao,...,Nao,Sim,Nao,Mensal,Sim,CartaoCredito,88.15,3973.2,Nao,
2,9286-DOJGF,Feminino,1,Sim,Nao,38,Sim,Sim,Fibra,Nao,...,Nao,Nao,Nao,Mensal,Sim,DebitoAutomatico,74.95,2869.85,Sim,
3,6994-KERXL,Masculino,0,Nao,Nao,4,Sim,Nao,DSL,Nao,...,Nao,Nao,Sim,Mensal,Sim,BoletoEletronico,55.90,238.5,Nao,
4,2181-UAESM,Masculino,0,Nao,Nao,2,Sim,Nao,DSL,Sim,...,Nao,Nao,Nao,Mensal,Nao,BoletoEletronico,53.45,119.5,Nao,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5981,0684-AOSIH,Masculino,0,Sim,Nao,1,Sim,Nao,Fibra,Sim,...,Nao,Sim,Sim,Mensal,Sim,BoletoEletronico,95.00,95,Sim,
5982,5982-PSMKW,Feminino,0,Sim,Sim,23,Sim,Sim,DSL,Sim,...,Sim,Sim,Sim,2 anos,Sim,CartaoCredito,91.10,2198.3,Nao,
5983,8044-BGWPI,Masculino,0,Sim,Sim,12,Sim,Nao,Nao,SemInternet,...,SemInternet,SemInternet,SemInternet,Mensal,Sim,BoletoEletronico,21.15,306.05,Nao,
5984,7450-NWRTR,Masculino,1,Nao,Nao,12,Sim,Sim,Fibra,Nao,...,Nao,Sim,Sim,Mensal,Sim,BoletoEletronico,99.45,1200.15,Sim,


In [6]:
# step 3: data treatment( solving the problems)
# right type informations. we need to figure out if the information mach the right type of variables. this is the first problem to correct
# ajust 'total gasto'(total spent in portuguese)
table["TotalGasto"] = pd.to_numeric(table["TotalGasto"], errors = "coerce") # to select table collumn we use the variable name and the collumn name between []. 
# we can use pd.to_ to change the value type
# in a scenario where there is missing information, we can decide to do a few things. Change values, use a mean, or erase empty values. 
# in this case where empty info happens in about 12 lines, the final result wouldn't be affected so we can just erase these lines. 

# empty information
# empty collumns
table = table.dropna(how = "all", axis = 1) # these method drop NaM's. collumn or line, and it drop all empty collumns or lines

# empty lines

table = table.dropna(how = "any", axis = 0) # how parameter is to indicate all, or entire collumn or line that have all the values emptied; 

# any, is to drop line or collumn that have at least one emptied value






print(table.info())




<class 'pandas.core.frame.DataFrame'>
Index: 5974 entries, 0 to 5985
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   IDCliente               5974 non-null   object 
 1   Genero                  5974 non-null   object 
 2   Aposentado              5974 non-null   int64  
 3   Casado                  5974 non-null   object 
 4   Dependentes             5974 non-null   object 
 5   MesesComoCliente        5974 non-null   int64  
 6   ServicoTelefone         5974 non-null   object 
 7   MultiplasLinhas         5974 non-null   object 
 8   ServicoInternet         5974 non-null   object 
 9   ServicoSegurancaOnline  5974 non-null   object 
 10  ServicoBackupOnline     5974 non-null   object 
 11  ProtecaoEquipamento     5974 non-null   object 
 12  ServicoSuporteTecnico   5974 non-null   object 
 13  ServicoStreamingTV      5974 non-null   object 
 14  ServicoFilmes           5974 non-null   objec

In [25]:
# step 4: inicial analysis of the available data
# como estão os cancelamentos? realmente 26%

print(table["Churn"].value_counts())# we need to count amount of churns

print(table["Churn"].value_counts(normalize = True))#.map("{:.1%}".format) # percentage. the .map display the values in percentage
#if you don't put the .map it shows 0.x. also the number 1, 2 or 3% display number of decimal places after the dot, 1, 2 or 3 numbers

#display(table["MesesComoCliente"].value_counts()) # idea of mine




Churn
Nao    4398
Sim    1587
Name: count, dtype: int64
Churn
Nao    0.734837
Sim    0.265163
Name: proportion, dtype: float64


In [29]:
# step 5: find out canceling reasons

import plotly.express as px

#every graph is made of 2 steps. create an show

# 1
graph = px .histogram(table, x = "MesesComoCliente", color = "Churn")

graph.show()
#display(graph)

In [18]:
graph = px .histogram(table, x = "Aposentado", color = "Churn")
graph.show()

In [19]:
graph = px .histogram(table, x = "TipoContrato", color = "Churn")
graph.show()

In [20]:
graph = px .histogram(table, x = "Genero", color = "Churn")
graph.show()

In [21]:
graph = px .histogram(table, x = "ServicoInternet", color = "Churn")
graph.show()

In [31]:
graph = px .histogram(table, x = "Dependentes", color = "Churn")
graph.show()

In [32]:
graph = px .histogram(table, x = "ValorMensal", color = "Churn")
graph.show()

In [33]:
graph = px .histogram(table, x = "ServicoStreamingTV", color = "Churn")
graph.show()

In [13]:
!pip install plotly



### Conclusões e Ações

Escreva aqui suas conclusões: