# Hypothesis Test
Research question:

"Do drivers who open the application using an iPhone have the same number of drives on average as drivers who use Android devices?"

In [2]:
# Import any relevant packages or libraries
import pandas as pd
import numpy as np
from scipy import stats

In [4]:
# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

In [6]:
# 1. Create `map_dictionary`
map_dictionary = {'Android': 2, 'iPhone': 1}

# 2. Create new `device_type` column
df['device_type'] = df['device']

# 3. Map the new column to the dictionary
df['device_type'] = df['device_type'].map(map_dictionary)

df['device_type'].head()

0    2
1    1
2    2
3    1
4    2
Name: device_type, dtype: int64

In [8]:
df.groupby('device_type')['drives'].mean()

device_type
1    67.859078
2    66.231838
Name: drives, dtype: float64

Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, you can conduct a hypothesis test.

In [11]:
# Hypothesis Test
# 1. Isolate the `drives` column for iPhone users.
iPhone = df[df['device_type'] == 1]['drives']

# 2. Isolate the `drives` column for Android users.
Android = df[df['device_type'] == 2]['drives']

# 3. Perform the t-test
stats.ttest_ind(a=iPhone, b=Android, equal_var=False)

TtestResult(statistic=1.463523206885235, pvalue=0.143351972680206, df=11345.066049381952)

Since the p-value is larger than the chosen significance level (5%), fail to reject the null hypothesis. There is not a statistically significant difference in the average number of drives between drivers who use iPhones and drivers who use Androids.