# **Waze Project**
**Milestone 4 - Hypothesis Testing on variables**

**The purpose** of this project is to demostrate knowledge of how to conduct a two-sample hypothesis test. 

(A two-sample hypothesis test is a common statistical approach used in A/B testing to compare two groups and determine if there is a significant difference between them. Although this dataset is not the result of an actual A/B test experiment, it will be used here as a **demonstration of how A/B testing can be conducted** using Python code and statistical methods.).

**The goal** is to to analyze if there is a **statistically significant** difference in mean amount of rides between iPhone® users and Android™ users. .
<br/>

*This analysis has three parts:*

**Part 1:** Decide effects, power and sample sizes (Before Data Collection in A/B testing) 

**Part 2:** Imports and data loading

**Part 3:** Conduct hypothesis testing (a two sample t test)

**Part 4:** Communicate insights with stakeholders

<br/>



### **Part 1. Decide effects, power and sample sizes**

**This part only exists for A/B testing.**

This part is meant to show how we approach an A/B testing before the experiment. 

Decide **effects** for this A/B test. Smaller Effects require larger sample size. 

**Power** = 0.8 is commonly used as a standard in statistics because it provides a good balance between detecting a true effect (sensitivity) and minimizing the sample size.

**Sample size** is decided by code below.

**Key Variant** is the mean amount of rides.

**Control Group** applying current feature.
**Treatment Group** applying new feature.

In [36]:
from statsmodels.stats.power import TTestIndPower

# Define parameters
alpha = 0.05
power = 0.8
effect_size = 0.06 # Choose a small effect size

# Calculate sample size per group
analysis = TTestIndPower()
sample_size = analysis.solve_power(effect_size=effect_size, power=power, alpha=alpha, alternative='two-sided')
print(f"Sample size: {sample_size}")

Sample size: 4361.438697203437


### **Part 2. Imports and data loading**




Assume the data collection is finished.

Import packages and libraries needed to compute descriptive statistics and conduct a hypothesis test.

In [37]:
# Import any relevant packages or libraries
import pandas as pd
import numpy as np
from scipy import stats

Import the dataset.

In [38]:
# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

**Data exploration**

Use descriptive statistics to conduct exploratory data analysis (EDA).

In [39]:
df.describe(include = 'all')

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
count,14999.0,14299,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999
unique,,2,,,,,,,,,,,2
top,,retained,,,,,,,,,,,iPhone
freq,,11763,,,,,,,,,,,9672
mean,7499.0,,80.633776,67.281152,189.964447,1749.837789,121.605974,29.672512,4039.340921,1860.976012,15.537102,12.179879,
std,4329.982679,,80.699065,65.913872,136.405128,1008.513876,148.121544,45.394651,2502.149334,1446.702288,9.004655,7.824036,
min,0.0,,0.0,0.0,0.220211,4.0,0.0,0.0,60.44125,18.282082,0.0,0.0,
25%,3749.5,,23.0,20.0,90.661156,878.0,9.0,0.0,2212.600607,835.99626,8.0,5.0,
50%,7499.0,,56.0,48.0,159.568115,1741.0,71.0,9.0,3493.858085,1478.249859,16.0,12.0,
75%,11248.5,,112.0,93.0,254.192341,2623.5,178.0,43.0,5289.861262,2464.362632,23.0,19.0,


**Findings:** 
In the dataset, `device` is a categorical variable with the labels `iPhone` and `Android`. `drives` is numerical data with mean = 67.28 and standard deviation = 65.91.

In order to perform this analysis, we need to turn each label into an integer.  Assigning a `1` for an `iPhone` user and a `2` for `Android`.  It assigns this label back to the variable `device_type` instead of overwrite the original data.



In [40]:
# 1. Create `map_dictionary`
map_dictionary = {'iPhone':1, 'Android':2}

# 2. Create new `device_type` column
df['device_type']=df['device']

# 3. Map the new column to the dictionary
df['device_type'] = df['device_type'].map(map_dictionary)

# 4. Verify data
df.head(5)

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device,device_type
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android,2
1,1,retained,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,iPhone,1
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android,2
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone,1
4,4,retained,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,Android,2


Since we are interested in the relationship between device type and the number of drives. One approach is to look at the average number of drives for each device type. Calculate these averages.

In [41]:
#calculate users on each device type
count_device = df['device_type'].value_counts()
count_device

1    9672
2    5327
Name: device_type, dtype: int64

In [42]:
#calculate the average number of drives for each device type
mean_device = df[['device_type','drives']].groupby(by = df['device_type']).mean()['drives']
mean_device

device_type
1    67.859078
2    66.231838
Name: drives, dtype: float64

Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, we need to conduct a hypothesis test.


### **Part 3. Hypothesis testing**

Our goal is to conduct a **two-sample t-test**. 

**Steps:**
1.   State the null hypothesis and the alternative hypothesis
2.   Choose a signficance level
3.   Find the p-value
4.   Reject or fail to reject the null hypothesis

**Note:** This is a t-test for two **independent** samples. This is the appropriate test since the two groups are independent (The case that one user having one iphone and one Andriod phone is dealed before this test).

**1. State the null hypothesis and the alternative hypothesis**

$H_0$: There is no difference in the mean amount of drives between iPhone users and Android users.

$H_A$: There is a difference in the mean amount of drives between iPhone users and Android users.


**2. Choose a signficance level**

Choose 5% as the significance level (company commonly used threshold) and proceed with a two-sample t-test.


**3.  Find the p-value**
Using ttest_ind() from stats


In [43]:
# 1. sample iPhone users.
df_iphone = df[df['device_type']==1]

# 2. Isolate Android users.
df_andriod = df[df['device_type']==2]

# 3. Perform the t-test based on the 'drives' column
tstats,pvalue = stats.ttest_ind(a=df_iphone['drives'], b=df_andriod['drives'], equal_var = False)
tstats,pvalue

(1.4635232068852353, 0.1433519726802059)

**4.Reject or fail to reject the null hypothesis**

p-value is 14.33% > 5% which means we fail to reject the Null Hypothesis. In other words, there is a difference in the mean amount of drives between iPhone users and Android users.

**Conclusion:**

Since we did not find any statistically significant difference in the mean amount of drives between iPhone users and Android users, we can conclude that the average rides variable is no obvious relationship between mean amount of rides and device type.

(If this is a A/B testing, we can conclude that the new feature tested on those two groups does not lead to any statistically significant difference in the mean amount of drives between iPhone users and Android users.)