# Data Analysis with Python

### Case - Customer Cancellation

You have been hired by a company with over 800,000 customers for a Data project. Recently, the company noticed that the majority of its total customer base are inactive, meaning they have already canceled the service.

Needing to improve its results, it wants to understand the main reasons for these cancellations and the most effective actions to reduce this number.

In [None]:
import pandas as pd
import plotly.express as px

table = pd.read_csv("cancellations.csv") # Imports the database

table = table.drop(columns="CustomerID") # Removes the CustomerID column, since it does not affect the analysis
display(table) # Displays the table

In [None]:
display(table.info()) # Displays the table data

table = table.dropna() # Deletes the rows with empty values

display(table.info()) # Displays the new table data

In [None]:
display(table["canceled"].value_counts()) # Displays the count of people who canceled and who did not cancel

display(table["canceled"].value_counts(normalize=True).map("{:.1%}".format)) # Displays the percentage of people who canceled and who did not cancel

In [None]:
# Generate and display a histogram using plotly express for each column, comparing it with the "cancelled" column
for column in table.columns:
    if column != "canceled":
        grafico = px.histogram(table, x=column, color="canceled", text_auto=True)
        grafico.update_layout(bargap=0.2)
        grafico.show()

# Data Analysis
### Monthly contract customers cancel
Solution: offer discount on annual and quarterly plans
### Customers who call the call center more than 4 times cancel
Solution: create a process to solve the customer's problem in a maximum of 3 calls
### Customers who are more than 20 days late cancel
Solution: policy to resolve delays within 10 days (financial team)

In [None]:
# What if these people hadn't canceled?
    # Create filters to simulate
table = table[table["contract_type"] != "Monthly"]
table = table[table["callcenter_calls"] <= 4]
table = table[table["days_late"] <= 20]

display(table["canceled"].value_counts()) # Displays the new count of people who canceled and who did not cancel

display(table["canceled"].value_counts(normalize=True).map("{:.1%}".format)) # Displays the new percentage of people who canceled and who did not cancel