### LOAD DATA BASE

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('BankCustomers.csv', encoding='latin-1')
df.head()

### DELETING USELESS COLUMNS

In [None]:
df.drop(['Unnamed: 0', 'CLIENTNUM'], axis=1, inplace=True)
df.head()

### TREATING NaN VALUES

In [None]:
df.info()

**'Card Category'** column has one NaN value.
So, based on **'Card Category'**, **'Annual Salary Range'**, **'Limit'** we let's predict what is the credit card category.

In [None]:
array_card_category = df['Card Category'].to_numpy().tolist()
row_idx = array_card_category.index(np.nan)
df.loc[row_idx]

In [None]:
df_to_predict_limit = df[['Card Category', 'Annual Salary Range', 'Limit']].groupby(['Card Category', 'Annual Salary Range']).mean().round(2)
df_to_predict_limit

We see that with an Annual Salary Range equal to $60K - $80K and Limit equal to $8547, the closest category is **_Blue_**. So, let's update this value.

In [None]:
df.at[row_idx, 'Card Category'] = 'Blue'
df.loc[row_idx]

### GRAPHICS ANALITICS

In [None]:
import matplotlib.pyplot as plt
import plotly.express as px

Customers vs Canceled

Let's see the ratio between customers and canceled

In [None]:
serie_status = df['Category'].value_counts(normalize=True)
serie_status

In [None]:
plt.bar(x=serie_status.index,
        height=serie_status*100)
plt.title('Customers vs Canceled Ratio')
plt.yticks([i for i in range(0, 110, 10)])
for i, v in enumerate(serie_status):
	match i:
		case 0:
			y = serie_status.Customers * 100 + 5
		case 1:
			y = serie_status.Canceled * 100 + 5
	plt.text(x=i, y=y, s=f"{np.round(v * 100, 2)}%", 
		horizontalalignment='center', 
		verticalalignment='center', 
		fontdict=dict(fontsize=12)
		)
plt.show()

1º ANALYSIS - Limit comparison

We grouped the **limits** and **consumed limits** based **customers type**

In [None]:
df_client_canceled = df[['Category', 'Limit', 'Consumed Limit']].groupby('Category').mean().round(2).reset_index()
df_client_canceled

In [None]:
idx_limit = np.arange(2)
width = 0.35
idx_consumed_limit = [i + width for i in idx_limit]
fig = plt.figure(figsize=(10, 5))
ax = fig.add_axes([0, 0.2, 1, 1])
ax.bar(idx_limit, 
       df_client_canceled['Limit'],
       width = width,
       align='center',
       color = 'gray')
ax.bar(idx_consumed_limit, 
       df_client_canceled['Consumed Limit'],
       width = width,
       align='center',
       color = 'blue')

ax.set_ylabel('Limit ($)')
ax.set_xlabel('Current Status')
ax.set_xticks(idx_limit + width / 2, df_client_canceled.Category)

for p in ax.patches:
       height = p.get_height()
       ax.text(p.get_x() + p.get_width() / 2,
       height + 100,
       f'${height:1.2f}',
       ha='center',
       fontdict=dict(fontsize=12))
       
legends = df_client_canceled.drop('Category', axis=1).columns.tolist()
legends.reverse()
ax.legend(labels=legends)
plt.show()

We can see that the limit is probably not a reason to cancel the account. So let's see the others params.

In [None]:
for col in df.columns: 
    graph = px.histogram(df,
                         x=col,
                         color='Category')
    graph.show()

# CONCLUSION

**1º Insight - Focus in Blue Card**

As we can see in the chart, most custormes have a Blue Card.

In [None]:
graph = px.histogram(df,
                     x='Card Category',
                     color='Category')
graph.show()

**2º Insight - The more contact, the more chance to cancel the card**

- When the customers doesn't contact the chance of canceling the card is practically 0%.
- Whereas when he makes 3+ contacts the chance of canceling the card is from 25%.
- And when he makes 6+ contacts the chance of canceling the card is 100%.

In [None]:
graph = px.histogram(df,
                     x='Contacts 12m',
                     color='Category')
graph.show()

**3º Insight - Customers with low transactions quantitys or low transactions values tend to cancel their card**

- Customers who transact less than $3.000,00 in the last 12 months are more likely to cancel their card.
- Customers who have made fewer than 60 transactions in the last 12 months are more likely to cancel their card.


In [None]:
for col in ['Valor Transactions 12m', 'Qty of Transactions 12m']:
    graph = px.histogram(df,
                        x=col,
                        color='Category')
    graph.show()