# **Waze User Churn: Statistical Analysis**
**This notebook uses descriptive statistics and hypothesis testing to explore whether Waze ride activity differs between iPhone and Android users. The analysis focuses on comparing mean ride counts across device types and interpreting the results in a business context.**

The objective of this analysis is to apply descriptive statistics and a two-sample hypothesis test to assess whether average ride usage differs between iPhone and Android users.

This notebook is organized into three sections:
- Imports and data loading
- Hypothesis formulation and testing
- Interpretation and business implications

# **Data exploration and hypothesis testing**

**Research Question**</br>
The key statistical question in this notebook is whether there is a statistically significant difference in the mean number of rides between iPhone and Android users

### **Imports and data loading**




The analysis uses pandas and NumPy for data manipulation and SciPy for statistical testing. 

In [None]:
# Import core libraries for data manipulation and statistical testing
import pandas as pd
import numpy as np
from scipy import stats

The Waze churn dataset is loaded from a CSV file into a pandas DataFrame for analysis

In [None]:
df = pd.read_csv('waze_dataset.csv')

## **Descriptive statistics and EDA**

Deswcriptive statistics provide an intial understanding of the distribution, central tendency, and variability of key variables. This helps identify typical usage patterns and informs the choice of appropriate statistical tests. 


To compare ride activity by device type, the categorical `device` variable is encoded into a numeric `device_type` indicator. iPhone users are mapped to 1 and Android users to 2, stored in a new column to preserve the original labels. 

In [None]:
# map device labels to numeric codes (iPhone: 1, Android: 2)
map_dictionary = {'Android': 2, 'iPhone': 1}

# create a numberic device_type column while preserving original device labels
df['device_type'] = df['device']
df['device_type'] = df['device_type'].map(map_dictionary)

Next, average ride counts are computed by device to compare typical usage across iPhone and Android users. 

In [None]:
df.groupby('device')['drives'].mean()

The summary statitics indicate that iPhone users have a higher average number of rides than Andoird users. However, this observed different may be due to random variation, so a formal hypothesis test is used to determine whether the difference is statistically significant. 


### **Hypothesis test: iPhone vs Android ride counts**

A two-sample t-test for independent groups is used to asses whether the mean number of rides differes between iPhone and Android users. A 5% significance level is adopted for the test. 
- Null Hypothesis ($H_0$): There is no idfference in the mean number of rides between iPhone and Android users. 
- Alternative Hypothesis ($H_A$): There is a difference in the mean number of rides between iPhone and Android users. 

The unequal variance (Welch) t-test is applied to avoid assuming equal population variances between the two device groups. 

In [None]:
#isolate ride counts by device type
iphone = df[df['device_type']==1]['drives']
android = df[df['device_type']==2]['drives']

# Perform Welch's two sample t-test (unequal variances)
t_stat,p_value = stats.ttest_ind(a=iphone, b=android, equal_var=False)
t_stat, p_value

With a p-value of approximately 0.143m whcih is greater than the 0.05 significance level, there is insufficient evidence to conclude that the mean number of rides differs between iPhone and Android users. In other words, the analysis fails to reject the null hypothesis and does not support a statistically significant difference in ride counts between device types. 

### **Business Implications and next steps**
The hypothesis test indicates no statistically significant difference in average ride usage between iPhone and Android users at the 5% significance level. This suggests that based on ride counts alone, device type is not a primary driver of user engagement, and there is no clear justification to prioritize one platform over the other. 

Future analysis can focus on other behavioral or profile variables that may better explain user churn, such as frequency, trip distance, cancellation patterns, or tenure. Applying similar statistical tests and modeling techniques to these features can help identify more actionable churn risk factors. 