# **Waze Project**

**Scenario** 

Conduct a two-sample hypothesis test (t-test) to analyze the difference in the mean amount of rides between iPhone users and Android users. 

**The purpose** of this project is to demostrate knowledge of how to conduct a two-sample hypothesis test.

**The goal** is to apply descriptive statistics and hypothesis testing in Python.
<br/>

*This activity has three parts:*

**Part 1:** Imports and data loading

**Part 2:** Conduct hypothesis testing

**Part 3:** Communicate insights with stakeholders


### Imports and data loading




In [1]:
# Import any relevant packages or libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats


In [2]:
# Load dataset into dataframe
df = pd.read_csv('/Volumes/Lenovo PS8/Data analytics/Data/waze_dataset.csv')

### Data exploration

In [3]:
df.head()

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android
1,1,retained,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,iPhone
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone
4,4,retained,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,Android


**Note:** In the dataset, `device` is a categorical variable with the labels `iPhone` and `Android`.

In order to perform this analysis, we must turn each label into an integer. 



In [5]:
# 1. Create `map_dictionary`
map_dictionary = {'Android': 2, 'iPhone': 1}

# 2. Create new `device_type` column
df['device_type'] = df['device']

# 3. Map the new column to the dictionary
df['device_type'] = df['device_type'].map(map_dictionary)

df['device_type'].head()

0    2
1    1
2    2
3    1
4    2
Name: device_type, dtype: int64

We are interested in the relationship between device type and the number of drives. One approach is to look at the average number of drives for each device type.

In [9]:
df.groupby('device_type')['drives'].mean()

device_type
1    67.859078
2    66.231838
Name: drives, dtype: float64

Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, we will conduct a hypothesis test.


### Hypothesis testing

The goal is to conduct a two-sample t-test. Steps for conducting a hypothesis test:


1.   State the null hypothesis and the alternative hypothesis
2.   Choose a signficance level
3.   Find the p-value
4.   Reject or fail to reject the null hypothesis

**Note:** This is a t-test for two independent samples. This is the appropriate test since the two groups are independent (Android users vs. iPhone users).

$H_0$: Average number of drives among iPhone users = Average number of drives among Android users

$H_A$: Average number of drives among iPhone users != Average number of drives among Android users

We choose 5% as the significance level and proceed with a two-sample t-test.

In [10]:
# 1. Isolate the `drives` column for iPhone users.
iphone_drives = df[df['device_type'] == 1]['drives']

# 2. Isolate the `drives` column for Android users.
android_drives = df[df['device_type'] == 2]['drives']

# 3. Perform the t-test
stats.ttest_ind(a=iphone_drives, b=android_drives, equal_var = False)

Ttest_indResult(statistic=1.4635232068852353, pvalue=0.1433519726802059)

Since the p-value (14%) is higher than the chosen significance level chosen (5%), we fail to reject the null hypothesis. At the 5% significance level, there is no difference between the average number of drives between iPhone and Android users.

- As there doesn’t seem to be a statistically significant difference between the number of drives between iPhone and Android users, there is no evidence to support creating a product development strategy focused on specific devices based on the analysis so far.

- However, there could be other factors which influence the number of drives, which could warrant further exploration. 