# **Waze Project**
**The Power of Statistics**

Your team is nearing the midpoint of their user churn project. So far, you’ve completed a project proposal, and used Python to explore and analyze Waze’s user data. You’ve also used Python to create data visualizations. The next step is to use statistical methods to analyze and interpret your data.

You receive a new email from Sylvester Esperanza, your project manager. Sylvester tells your team about a new request from leadership: to analyze the relationship between mean amount of rides and device type. You also discover follow-up emails from three other team members: May Santner, Chidi Ga, and Harriet Hadzic. These emails discuss the details of the analysis. They would like a statistical analysis of ride data based on device type. In particular, leadership wants to know if there is a statistically significant difference in mean amount of rides between iPhone® users and Android™ users. A final email from Chidi includes your specific assignment: to conduct a two-sample hypothesis test (t-test) to analyze the difference in the mean amount of rides between iPhone users and Android users.


# **Data exploration and hypothesis testing**

<img src="images/Pace.png" width="100" height="100" align=left>

# **PACE stages**





## **PACE: Plan**



### **Imports and data loading**




In [1]:
# Importing any relevant packages or libraries
import pandas as pd
from scipy import stats

In [2]:
# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')



## **PACE: Analyze and Construct**



### **Data exploration**


In the dataset, `device` is a categorical variable with the labels `iPhone` and `Android`.

In order to perform this analysis, each label must be turn into an integer.  The following code assigns a `1` for an `iPhone` user and a `2` for `Android`.  

In [3]:
# 1. Creating `map_dictionary`
map_dictionary = {'Android': 2, 'iPhone': 1}

# 2. Creating new `device_type` column
df['device_type'] = df['device']

# 3. Mapping the new column to the dictionary
df['device_type'] = df['device_type'].map(map_dictionary)

df['device_type'].head()

0    2
1    1
2    2
3    1
4    2
Name: device_type, dtype: int64

 Calculating the average number of drives for each device type.

In [4]:
df.groupby('device_type')['drives'].mean()

device_type
1    67.859078
2    66.231838
Name: drives, dtype: float64

Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, you can conduct a hypothesis test.


### **Hypothesis testing**

This is a t-test for two independent samples. 

**Hypotheses:**

$H_0$: There is no difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.

$H_A$: There is a difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.

Choosing 5% as the significance level.

In [6]:
# 1. Isolating the `drives` column for iPhone users.
iPhone = df[df['device_type'] == 1]['drives']

# 2. Isolating the `drives` column for Android users.
Android = df[df['device_type'] == 2]['drives']

# 3. Performing the t-test
stats.ttest_ind(a=iPhone, b=Android, equal_var=False)

Ttest_indResult(statistic=1.4635232068852353, pvalue=0.1433519726802059)

*Since the p-value is larger than the chosen significance level (5%), I fail to reject the null hypothesis. Therefore, there is **not** a statistically significant difference in the average number of drives between drivers who use iPhones and drivers who use Androids.*



## **PACE: Execute**


### **Communicate insights with stakeholders**

The main business takeaway is that there is a comparable number of drives, on average, between users of iPhone and Android devices.

A possible follow-up action would involve delving into additional factors affecting variations in the number of drives and conducting further hypothesis tests to gain insights into user behavior. Furthermore, any short-term alterations in marketing or the user interface of the Waze app could offer an opportunity to gather more data for churn investigation.
