The loyalty of costumers is an important asset in the revenue of any company, and maybe this is one of the greatest fields of Data Science. The point is simple: know if a customer will leave or not the comapny services. Thar happens for a few reasons:
-
It harder to acquire new clients than to keep the existing ones.
-
The chance that a client who left the financial institution returns in the future is derisory.
-
A client that left the institution is less likely to reccommend it for other people, becoming a detractor of the services of the company.
The goal of this project is try to predict if a customer will quit or not quit. And for this task we will analyze a dataset with features about customers of ACME Bank Corporation, as we can see below.
Column | Description | Data Type |
---|---|---|
CustomrtId | The customer unique identifying number | id |
Surname | The customer surname | string type |
CreditScore | The customer credit rank in the bank | Continuous variable |
Geography | Residence by country | Discrete variable |
Gender | The customer gender | Binary category as string type |
Age | age in years | continnuos variable |
Tenure | The number of customer possessions | discrete variable |
Balance | Account balance | numerical continuos |
NumOfProducts | The number of financial products used by the customer | numerical discrete variable |
HasCard | Has or not credit card | binary variable |
IsActiveMember | Indicates if the costumer is active or not | binary variable |
EstimatedSalary | Estimated Salary | continuous variable |
Exited | Costumers who get out the service | Target in classification model |
From ten thounsand costumers records in this dataset is possible to check that 20.4% had left the bank services. According to some analyzes performed, it's possible to check the countrie with most churning rate was Germany. Also it was checked the proportion of genders and others features, but an interesting observation is: every costumer with all four services had left the bank.
At the part of the Exploratory Data Analysis was possible check which features could be splited to the better performance of the Machine Learning.
At the part of the Machine Learning a several models was trained, after that was possible to pick one and choose the best parameter aiming at the best performance according with the metric of accuracy, but other metrics were taken into account.
After all, it was necessary to improve the best model and made the deploy. But, with this big dataset, it's was necessary to develop a few lines of code and work with every parameter separately. Check the code!