# Predicting Monthly User Churn for Waze

This is an **advanced data analytics** project aimed at developing a machine learning model to predict **user churn**. Churn quantifies the number of users who have uninstalled the Waze app or stopped using the app. This project focuses on monthly user churn. An accurate model will help prevent churn, improve user retention, and grow Waze’s business.

An accurate model can also help identify specific factors that contribute to churn and answer questions such as: 
- Who are the users most likely to churn?
- Why do users churn? 
- When do users churn?

For example, if Waze can identify a segment of users who are at high risk of churning, Waze can proactively engage these users with special offers to try and retain them. Otherwise, Waze may lose these users without knowing why. 

Ultimately, the insights generated will help Waze leadership optimize the company’s retention strategy, enhance user experience, and make data-driven decisions about product development.

This is **Part 3** of the project.

## 3. Hypothesis testing and confidence intervals

**The purpose** of this notebook is to perform a two-sample hypothesis test and compute the corresponding confidence interval to determine whether there is a statistically significant difference in the **mean number of rides** and **churn rates** between iPhone and Android users.

**The goal** is to apply descriptive statistics along with inferential methods—hypothesis testing and confidence intervals—to draw meaningful conclusions from the data.

### 3a. Imports and data loading

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest
from statsmodels.stats.proportion import proportions_ztest, confint_proportions_2indep

In [2]:
# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

### 3b. Descriptive statistics

**Note:** In the dataset, `device` is a categorical variable with the labels `iPhone` and `Android`.

In [3]:
df.groupby('device')['drives'].mean().sort_values(ascending=False)

device
iPhone     67.859078
Android    66.231838
Name: drives, dtype: float64

Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, we can conduct a two-sample t-test.

In [4]:
df.groupby('device')['label'].value_counts(normalize=True)

device   label   
Android  retained    0.824399
         churned     0.175601
iPhone   retained    0.821680
         churned     0.178320
Name: proportion, dtype: float64

The proportions show similar retention across devices (≈82% retained, ≈18% churned). iPhone users have a slightly higher churn rate, but this small difference may be due to chance. A two-proportion z-test can confirm if it is statistically significant.

### 3c. Mean number of rides

#### Hypothesis testing

**Hypotheses:**

$H_0$: There is no difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.

$H_A$: There is a difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.

**Significance level:**

5% is the significance level.

In [5]:
# Isolate the `drives` column for iPhone users
iPhone = df[df['device'] == 'iPhone']['drives']

# Isolate the `drives` column for Android users
Android = df[df['device'] == 'Android']['drives']

# Perform the two-sample t-test
stats.ttest_ind(a=iPhone, b=Android, equal_var=False)

TtestResult(statistic=1.463523206885235, pvalue=0.143351972680206, df=11345.066049381952)

Based on the p-value we got above, do we reject or fail to reject the null hypothesis?

> *Since the p-value (0.14) is larger than the chosen significance level (0.05), we fail to reject the null hypothesis. We conclude that there is **not** a statistically significant difference in the average number of drives between drivers who use iPhones and drivers who use Androids.*

#### Confidence interval

**Confidence level:**

95% is the confidence level.

In [6]:
# Compute sample statistics
mean1, mean2 = np.mean(iPhone), np.mean(Android)
n1, n2 = len(iPhone), len(Android)
s1, s2 = np.std(iPhone, ddof=1), np.std(Android, ddof=1)

# Calculate standard error of the difference
se = np.sqrt(s1**2/n1 + s2**2/n2)

# Compute degrees of freedom (Welch–Satterthwaite)
dof = (s1**2/n1 + s2**2/n2)**2 / ((s1**2/n1)**2/(n1-1) + (s2**2/n2)**2/(n2-1))

# Calculate t critical value for 95% CI
alpha = 0.05
t_crit = stats.t.ppf(1 - alpha/2, dof)

# Compute confidence interval for difference in means
diff = mean1 - mean2
ci_lower = diff - t_crit*se
ci_upper = diff + t_crit*se
(ci_lower, ci_upper)

(-0.5522075447568255, 3.8066874303778064)

Based on the confidence interval we computed above, what can we conclude?

> *Since the 95% confidence interval for the difference in means (-0.55, 3.81) includes 0, we cannot rule out the possibility that the population difference is zero. We conclude that there is **not** a statistically significant difference in the average number of drives between drivers who use iPhones and drivers who use Androids, which is consistent with and supports the hypothesis test result above.*

### 3d. Churn rates

#### Hypothesis testing

**Hypotheses:**

$H_0$: There is no difference in churn rates between iPhone and Android users.

$H_A$: There is a difference in churn rates between iPhone and Android users.

**Significance level:**

5% is the significance level.

In [7]:
# Create contingency table
contingency = pd.crosstab(df['device'], df['label'])

# Extract churn counts and totals
churn_counts = contingency['churned'].values
n_obs = contingency.sum(axis=1).values

# Perform the two-proportion z-test
_, pvalue = proportions_ztest(churn_counts, n_obs)
pvalue

0.6838419397722314

Based on the p-value we got above, do we reject or fail to reject the null hypothesis?

> *Since the p-value (0.68) is larger than the chosen significance level (0.05), we fail to reject the null hypothesis. We conclude that there is **not** a statistically significant difference in churn rates between iPhone and Android users.*

#### Confidence interval

**Confidence level:**

95% is the confidence level.

In [8]:
# Compute 95% confidence interval for difference in proportions
ci_lower, ci_upper = confint_proportions_2indep(
    count1=churn_counts[1], nobs1=n_obs[1],
    count2=churn_counts[0], nobs2=n_obs[0],
    method="wald"
)
(ci_lower, ci_upper)

(-0.010343262371848204, 0.015780621436018293)

Based on the confidence interval we computed above, what can we conclude?

> *Since the 95% confidence interval for the difference in proportions (-0.01, 0.02) includes 0, we cannot rule out the possibility that the population difference is zero. We conclude that there is **not** a statistically significant difference in churn rates between iPhone and Android users, which is consistent with and supports the hypothesis test result above.*

### Conclusion

What business insight(s) can we draw from the result of our statistical analysis?

> *The analysis indicates that drivers using iPhone and Android devices show no statistically significant differences in either their average number of drives or their churn rates. This suggests that device type is not a meaningful factor influencing driver activity or retention.*

# END OF PART 3