Final paper of the Specialization Course in Data Science and Big Data of the Federal University of Paraná (UFPR)
Developing strategies for customer retention became a common practice among companies from different segments, since long-term relationships with customers are associated to the economic survival and success of companies. Therefore, with the goal of predicting the customer churn of a Brazilian startup, this article presents a predictive model that classifies the customer churn and allows interpreting the reasons that impact the outcome. After an extensive data wrangling process, the logistic regression was applied using K-fold cross-validation and the stepwise algorithm for covariate selection. The final model, composed by 14 covariates, has undergone a diagnostic analysis through randomized quantile residuals and its prediction power was evaluated by the ROC curve, confusion matrix and evaluating metrics. The model was considered appropriate for the business in all stages of the study and not only proved to have a good prediction power, but also demonstrated to be capable of providing insights for executing optimized and customized marketing actions focusing on the retention of customers likely to churn.
Keywords: Customer churn; Customer retention; CRM; Logistic regression; Stepwise.
Full paper: https://github.com/juniorssz/dsbd-churn-analysis/blob/master/tex/article/article.pdf