# Defining Causation of Churn with CausalML

# Business problem
Before we jump into the data we can start defining the problem. Let our project manager give us a problem, users are suddenly churning at a specific point which is Level 28. After realizing this problem the manager asked us this generic question "Why do users churn at Level 28?". Then you looked at the level data and the difficulty looked as suspicious data but the level was not the only hardest one in the 25 - 35 levels area. Considering this you started a CausalML to prove your idea is right or not.

# Data Generation
Assume our dataset includes the following features for each user:

- user_id: Unique identifier for the user.
- level: The highest level reached by the user.
- attempts_to_pass: Number of attempts the user made to pass Level 28.
- engagement_score: A composite score representing the user's engagement with the game up to Level 28.
- passed_level_28: Binary indicator (1 if the user passed Level 28, 0 otherwise).
- churned_at_28: Binary indicator (1 if the user churned at Level 28, 0 otherwise).
- rewarded_shows_watched: Number of rewarded ad shows watched by the user up to Level 28.
- is_payer: Binary indicator (1 if the user has made any in-game purchases, 0 otherwise).
- boosters_used: Number of boosters used by the user up to Level 28.
- level_play_time: Total playtime in minutes the user spent on Level 28.
- is_crown_enabled: Binary indicator (1 if the crown feature is enabled for the user, 0 otherwise).

- is_battle_pass_active: Binary indicator (1 if the user has an active battle pass, 0 otherwise).

In [2]:
#First import necessary libraries
import pandas as pd
from dowhy import CausalModel
import numpy as np

In [3]:
# Simulating data for demonstration
np.random.seed(42)
data = {
    'user_id': np.arange(100),
    'attempts_to_pass_28': np.random.poisson(3, 100),
    'engagement_score': np.random.normal(50, 10, 100),
    'rewarded_shows_watched': np.random.randint(0, 5, 100),
    'is_payer': np.random.binomial(1, 0.3, 100),
    'boosters_used': np.random.randint(0, 10, 100),
    'level_play_time': np.random.normal(30, 5, 100),
    'is_crown_enabled': np.random.binomial(1, 0.5, 100),
    'is_battle_pass_active': np.random.binomial(1, 0.4, 100),
    'churned_at_28': np.random.binomial(1, 0.25, 100)
}
df = pd.DataFrame(data)

In [4]:
# Define the causal model
model = CausalModel(
    data=df,
    treatment='attempts_to_pass_28',
    outcome='churned_at_28',
    common_causes=['engagement_score', 'rewarded_shows_watched', 'is_payer', 'boosters_used', 
                   'level_play_time', 'is_crown_enabled', 'is_battle_pass_active']
)

# Identify the causal effect using the backdoor criterion
identified_estimand = model.identify_effect()

In [5]:
# Estimate the causal effect
estimate = model.estimate_effect(identified_estimand,
                                 method_name="backdoor.linear_regression")

print(estimate)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
          d                                                                   
─────────────────────(E[churned_at_28|is_payer,boosters_used,is_battle_pass_ac
d[attemptsₜₒ ₚₐₛₛ ₂₈]                                                         

                                                                              
tive,engagement_score,level_play_time,is_crown_enabled,rewarded_shows_watched]
                                                                              

 
)
 
Estimand assumption 1, Unconfoundedness: If U→{attempts_to_pass_28} and U→churned_at_28 then P(churned_at_28|attempts_to_pass_28,is_payer,boosters_used,is_battle_pass_active,engagement_score,level_play_time,is_crown_enabled,rewarded_shows_watched,U) = P(churned_at_28|attempts_to_pass_28,is_payer,boosters_used,is_battle_pass_active,engagement_score,level_play_

# Result
The mean value represents the estimated average causal effect of increasing the number of attempts to pass Level 28 by one unit on the probability of churning. In this case, the estimated effect size is approximately 0.0022. This suggests that an additional attempt to pass Level 28 is associated with a 0.22% increase in the probability of churning, on average, across all users in the dataset.

# Interpretation
The estimated causal effect is relatively small, indicating that the number of attempts to pass Level 28, by itself, has a slight increase in the likelihood of user churn. However, it's important to consider this result in the broader context of game design and user experience:

* Game Difficulty: If Level 28 is a significant difficulty spike, even a small increase in churn probability could indicate that users are getting frustrated. It might be beneficial to review the level's design or provide additional support or hints to help users overcome this challenge.
* User Engagement: High engagement users might be more willing to attempt difficult levels multiple times, whereas less engaged users might churn more easily. Segmenting users based on engagement levels could provide further insights.
* Other Factors: While the analysis controls for several covariates, there could be other factors not included in the model that also influence churn. Continuous monitoring and analysis are essential to understand the multifaceted nature of user churn fully.
* In summary, the causal analysis suggests a slight causal effect of the number of attempts to pass Level 28 on user churn, highlighting the importance of level difficulty and user support in game design to enhance user retention.