The analysis involves processing a dataset of consumer complaints to predict whether a consumer will dispute a complaint, employing an XGBoost Classifier for the prediction task. The steps include preprocessing the data (filling missing values, encoding categorical variables, and splitting the dataset), balancing the classes via undersampling, and training the model. The model's performance is evaluated using a classification report and a confusion matrix.
The results indicate that without a model, the cost of handling complaints is $8,619,200. However, when the model's predictions are utilized, two cost scenarios are considered:
Without Extra Diligence: The cost remains the same as the base case ($8,619,200) because the model's predictions did not effectively reduce the cost. With Extra Diligence: The cost is reduced to $7,695,780 when incorporating a cost of $190 for each instance where extra diligence is applied (either true positive or false positive cases). To further optimize, the threshold for classifying a complaint as disputed was varied, and the threshold that minimized the cost was determined to be 0.46. Using this optimal threshold, the minimum cost with the model was further reduced to $7,589,140.
The model's recall for 'Consumer disputed?' = 'Yes' cases is 0.63, indicating its sensitivity in identifying the positive class (disputed cases). Finally, calculations for total costs with and without the model highlight the financial impact of applying the predictive model with an optimized decision threshold, showing a significant cost reduction when using the model with extra diligence and the optimal threshold.