#  Data exploration and hypothesis testing:

 
<br/>

**The purpose** of this project is to demonstrate knowledge of how to conduct a two-sample hypothesis test.

**The goal** is to apply descriptive statistics and hypothesis testing in Python.
<br/>

*This activity has three parts:*

**Part 1:** Imports and data loading
* What data packages will be necessary for hypothesis testing?

**Part 2:** Conduct hypothesis testing
* How did computing descriptive statistics help you analyze your data?

* How did you formulate your null hypothesis and alternative hypothesis?

**Part 3:** Communicate insights with stakeholders

* What key business insight(s) emerged from your hypothesis test?

* What business recommendations do you propose based on your results?

<br/>

**Research question:**

* "Do drivers who open the application using an iPhone have the same number of drives on average as drivers who use Android devices?"

# Task 1. Imports and data loading
* Import packages and libraries needed to compute descriptive statistics and conduct a hypothesis test.

In [5]:
#importing packages and libraries
import pandas as pd
from scipy import stats

In [13]:
# Load dataset into dataframe
df = pd.read_csv(r"C:\Users\akhil\OneDrive\Desktop\projects\waze\waze_dataset.csv", encoding="unicode_escape")

# Task 2. Data exploration
* Use descriptive statistics to conduct exploratory data analysis (EDA).



In [14]:
# 1. Create `map_dictionary`
map_dictionary = {'Android': 2, 'iPhone': 1}

# 2. Create new `device_type` column
df['device_type'] = df['device']

# 3. Map the new column to the dictionary
df['device_type'] = df['device_type'].map(map_dictionary)

df['device_type'].head()

0    2
1    1
2    2
3    1
4    2
Name: device_type, dtype: int64

In [15]:
#finding averages
df.groupby('device_type')['drives'].mean()

device_type
1    67.859078
2    66.231838
Name: drives, dtype: float64

* Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant,  conducting a hypothesis test.

# Hypotheses:

* 𝐻0
 : There is no difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.

* 𝐻𝐴
 : There is a difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.

* Next, choose 5% as the significance level and proceed with a two-sample t-test.

In [16]:
# 1. Isolate the `drives` column for iPhone users.
iPhone = df[df['device_type'] == 1]['drives']

# 2. Isolate the `drives` column for Android users.
Android = df[df['device_type'] == 2]['drives']

# 3. Perform the t-test
stats.ttest_ind(a=iPhone, b=Android, equal_var=False)

TtestResult(statistic=np.float64(1.463523206885235), pvalue=np.float64(0.14335197268020597), df=np.float64(11345.066049381952))

* Since the p-value is larger than the chosen significance level (5%), you fail to reject the null hypothesis. You conclude that there is not a statistically significant difference in the average number of drives between drivers who use iPhones and drivers who use Androids.

# Insights:
* The key business insight is that drivers who use iPhone devices on average have a similar number of drives as those who use Androids.
* One potential next step is to explore what other factors influence the variation in the number of drives, and run additonal hypothesis tests to learn more about user behavior. Further, temporary changes in marketing or user interface for the Waze app may provide more data to investigate churn.

* Thank You.