## Doing evaluation by using XGBoost Classifier

In [1]:
from tabulate import tabulate

# Define the metrics and corresponding scores for the initial XGBoost classifier
metrics_original = ['Precision', 'Recall', 'F1 Score','ROC Curve']
scores_original = [0.21882224942200462, 0.3711221312420713, 0.27531334217393166,0.7635]

# Define the metrics and corresponding scores for the new set of metrics
metrics_new = ['Precision', 'Recall', 'F1 Score','ROC Curve']
scores_new = [0.5819672131147541, 0.01637642717102987, 0.031856421761076836,0.81]

# Create a list of lists containing metric and score pairs for both sets of metrics
table_data = [
    ['Metric', 'Score with XGBoost Classifier', 'Score with XGBoost Classifier using Random Search CV'],
    *zip(metrics_original[::-1], scores_original[::-1], scores_new[::-1])
]

# Print the comparison table using tabulate
table_str = tabulate(table_data, headers='firstrow', tablefmt='grid')

print(table_str)




+-----------+---------------------------------+--------------------------------------------------------+
| Metric    |   Score with XGBoost Classifier |   Score with XGBoost Classifier using Random Search CV |
| ROC Curve |                        0.7635   |                                              0.81      |
+-----------+---------------------------------+--------------------------------------------------------+
| F1 Score  |                        0.275313 |                                              0.0318564 |
+-----------+---------------------------------+--------------------------------------------------------+
| Recall    |                        0.371122 |                                              0.0163764 |
+-----------+---------------------------------+--------------------------------------------------------+
| Precision |                        0.218822 |                                              0.581967  |
+-----------+---------------------------------+--------

>Score with XGBoost Classifier appears to be a better choice for predicting customer churn based on the metrics, especially if your priority is to accurately identify churned customers (higher recall) while maintaining a reasonable level of precision.

>Score with XGBoost Classifier using RandomSearch CV shows higher precision but at the expense of recall, which might lead to missing many actual churn cases.

## Doing evaluation by using Linear SVC Model

In [2]:
from tabulate import tabulate

# Define the metrics and corresponding scores for the LinearSVC model
metrics = ['Precision', 'Recall', 'F1 Score', 'AUC-ROC']
scores = [0.1278, 0.7146, 0.2168, 0.6412]

# Create a list of lists containing metric and score pairs
table_data = list(zip(metrics, scores))

# Print the table using tabulate
table_str = tabulate(table_data, headers=['Metric', 'Score With SVM Model using LinearSVC'], tablefmt='grid')

print(table_str)


+-----------+----------------------------------------+
| Metric    |   Score With SVM Model using LinearSVC |
| Precision |                                 0.1278 |
+-----------+----------------------------------------+
| Recall    |                                 0.7146 |
+-----------+----------------------------------------+
| F1 Score  |                                 0.2168 |
+-----------+----------------------------------------+
| AUC-ROC   |                                 0.6412 |
+-----------+----------------------------------------+


## Doing evaluation by using Random Forest

In [3]:
from tabulate import tabulate

# Define the metrics and corresponding scores for the first Random Forest model
metrics_rf1 = ['Precision', 'Recall', 'F1 Score', 'ROC AUC']
scores_rf1 = [1.0, 0.0006919617114519663, 0.0013829664630632707, 0.500345980855726]

# Define the metrics and corresponding scores for the second Random Forest model
metrics_rf2 = ['Precision', 'Recall', 'F1 Score', 'ROC AUC']
scores_rf2 = [0.9847461892829396, 0.9177527585714577, 0.9500699411438676, 0.9517423683748888]

# Create a list of lists containing metric and score pairs for both Random Forest models
table_data = [
    ['Metric', 'Random Forest (Random Search CV)', 'Random Forest (Random Search CV in SageMaker)'],
    *zip(metrics_rf1, scores_rf1, scores_rf2)
]

# Print the table using tabulate
table_str = tabulate(table_data, headers='firstrow', tablefmt='grid')

print(table_str)


+-----------+------------------------------------+-------------------------------------------------+
| Metric    |   Random Forest (Random Search CV) |   Random Forest (Random Search CV in SageMaker) |
| Precision |                        1           |                                        0.984746 |
+-----------+------------------------------------+-------------------------------------------------+
| Recall    |                        0.000691962 |                                        0.917753 |
+-----------+------------------------------------+-------------------------------------------------+
| F1 Score  |                        0.00138297  |                                        0.95007  |
+-----------+------------------------------------+-------------------------------------------------+
| ROC AUC   |                        0.500346    |                                        0.951742 |
+-----------+------------------------------------+-----------------------------------------

### Analysis:
The Random Forest model using Random Search CV in SageMaker performs significantly better across all metrics compared to the other model:
It has very high precision (0.9847), indicating a low rate of false positives.
It also has a high recall (0.9178), indicating that it captures a large portion of actual churners.
The F1 Score (0.9501) reflects a good balance between precision and recall.
The ROC AUC (0.9517) is also high, suggesting strong discriminative ability.

### Conclusion:
Based on the provided metrics, the Random Forest model using Random Search CV in SageMaker is preferable for customer churn prediction. It achieves a good balance between precision and recall, with high overall performance indicated by the ROC AUC score. This model is likely better suited for accurately identifying customers who are likely to churn, which is crucial for effective churn prediction and retention strategies.

## OVERALL CONCLUSION:
>For churn prediction, the Random Forest model using Random Search CV in SageMaker is the best choice. It achieves high precision, recall, F1 Score, and ROC AUC, indicating strong performance in identifying churn cases while minimizing false positives. This model strikes a good balance between accuracy and completeness in churn prediction, making it well-suited for practical use in customer churn prediction scenarios.