# Defining Causation of Churn with CausalML

# Business problem
Before we jump into the data we can start defining the problem. Let our project manager give us a problem, users are suddenly churning at a specific point which is Level 28. After realizing this problem the manager asked us this generic question "Why do users churn at Level 28?". Then you looked at the level data and the difficulty looked as suspicious data but the level was not the only hardest one in the 25 - 35 levels area. Considering this you started a CausalML to prove your idea is right or not.

# Data Generation
Assume our dataset includes the following features for each user:

- user_id: Unique identifier for the user.
- level: The highest level reached by the user.
- attempts_to_pass: Number of attempts the user made to pass Level 28.
- engagement_score: A composite score representing the user's engagement with the game up to Level 28.
- passed_level_28: Binary indicator (1 if the user passed Level 28, 0 otherwise).
- churned_at_28: Binary indicator (1 if the user churned at Level 28, 0 otherwise).
- rewarded_shows_watched: Number of rewarded ad shows watched by the user up to Level 28.
- is_payer: Binary indicator (1 if the user has made any in-game purchases, 0 otherwise).
- boosters_used: Number of boosters used by the user up to Level 28.
- level_play_time: Total playtime in minutes the user spent on Level 28.
- is_crown_enabled: Binary indicator (1 if the crown feature is enabled for the user, 0 otherwise).

- is_battle_pass_active: Binary indicator (1 if the user has an active battle pass, 0 otherwise).

In [1]:
import pandas as pd
from causalml.inference.meta import XGBTRegressor
from sklearn.model_selection import train_test_split

In [2]:
# Assuming the dataset is updated with the new features
data = {
    'user_id': range(1, 101),
    'attempts_to_pass': [1, 2, 3, 4, 5] * 20,
    'engagement_score': [i % 10 + 1 for i in range(100)],
    'passed_level_28': [1 if i % 2 == 0 else 0 for i in range(100)],
    'churned_at_28': [1 if i % 5 == 0 else 0 for i in range(100)],
    'rewarded_shows_watched': [i % 3 for i in range(100)],
    'is_payer': [1 if i % 4 == 0 else 0 for i in range(100)],
    'boosters_used': [i % 5 for i in range(100)],
    'level_play_time': [20 + i % 10 for i in range(100)],
    'is_crown_enabled': [1 if i % 6 == 0 else 0 for i in range(100)],
    'is_battle_pass_active': [1 if i % 7 == 0 else 0 for i in range(100)],
}
df = pd.DataFrame(data)

In [3]:
# Define the treatment and the outcome
treatment = 'passed_level_28'
outcome = 'churned_at_28'

# Define covariates including the new metrics
covariates = ['attempts_to_pass', 'engagement_score', 'rewarded_shows_watched', 'is_payer', 
              'boosters_used', 'level_play_time', 'is_crown_enabled', 'is_battle_pass_active']


In [4]:
# Split data into features (X) and outcome (y) first
X = df[covariates]
y = df[outcome]
treatment_data = df[treatment]

# Then split into training and testing sets
X_train, X_test, y_train, y_test, treatment_train, treatment_test = train_test_split(X, y, treatment_data, test_size=0.2, random_state=42)

# Initialize and fit the model with the correctly aligned subsets
model = XGBTRegressor()
model.fit(X=X_train, treatment=treatment_train, y=y_train)


In [5]:
# Estimate the causal effect with the correctly aligned subsets
causal_effects = model.estimate_ate(X=X_test, treatment=treatment_test, y=y_test)
ate = causal_effects[0]

print(f"Average Treatment Effect of passing Level 28 on churn: {ate}")

Average Treatment Effect of passing Level 28 on churn: [0.00038255]


# Result
The mean value represents the estimated average causal effect of increasing the number of attempts to pass Level 28 by one unit on the probability of churning. In this case, the estimated effect size is approximately 0.0004. This suggests that an additional attempt to pass Level 28 is associated with a 0.04% increase in the probability of churning, on average, across all users in the dataset.

# Interpretation
The estimated causal effect is relatively small, indicating that the number of attempts to pass Level 28, by itself, has a slight increase in the likelihood of user churn. However, it's important to consider this result in the broader context of game design and user experience:

* Game Difficulty: If Level 28 is a significant difficulty spike, even a small increase in churn probability could indicate that users are getting frustrated. It might be beneficial to review the level's design or provide additional support or hints to help users overcome this challenge.
* User Engagement: High engagement users might be more willing to attempt difficult levels multiple times, whereas less engaged users might churn more easily. Segmenting users based on engagement levels could provide further insights.
* Other Factors: While the analysis controls for several covariates, there could be other factors not included in the model that also influence churn. Continuous monitoring and analysis are essential to understand the multifaceted nature of user churn fully.
* In summary, the causal analysis suggests a slight causal effect of the number of attempts to pass Level 28 on user churn, highlighting the importance of level difficulty and user support in game design to enhance user retention.