✰ Google Play vs Apple Store:

A company that designs operating systems, wants to build a major apps store into their user interface. To this end, they want to know whether Google Play apps have higher reviews on average than Apple Store apps (or vice versa) , as they’re intending to strike a deal with just one of these companies.

To help this company, we’ll work through a series of steps known as the Data Science Pipeline, or DSP. The DSP is a general and fairly broad sequence of steps data scientists use to tackle and solve business problems.

✰ Datasets source:

Apple Store: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps
Google Play: https://www.kaggle.com/lava18/google-play-store-apps

✰ Stages of the project

Sourcing and loading
- Load the two datasets
- Pick the columns that we are going to work with
- Subsetting the data on this basis
Cleaning, transforming and visualizing
- Check the data types and fix them
- Add a platform column to both the Apple and the Google dataframes
- Changing the column names to prepare for a join
- Join the two data sets
- Eliminate the NaN values
- Filter only those apps that have been reviewed at least once
- Summarize the data visually and analytically (by the column platform)
Modelling:
- Hypothesis formulation:
  - Null hypothesis H_null: the observed difference in the mean rating of Apple Store and Google Play apps is due to chance (and thus not due to the platform).
  - Alternate hypothesis H_alternative: the observed difference in the average ratings of apple and google users is not due to chance (and is actually due to platform)
  - We're also going to pick a significance level of 0.05.
- Getting the distribution of the data
- Permutation test:
  - Since the data aren't normally distributed, we're using a non-parametric test here. This is simply a label for statistical tests used when the data aren't normally distributed. These tests are extraordinarily powerful due to how few assumptions we need to make.
  - The difference in the means for Permutation1 (0.001103) now looks hugely different to our observed difference of 0.14206.
  - It's sure starting to look like our observed difference is significant, and that the Null is false.
  - platform does impact on ratings but to be sure, let's create 10,000 permutations, calculate the mean ratings for Google and Apple apps and the difference between these for each one, and then take the average of all of these differences.
  - Let's create a vector with the differences - that will be the distibution of the Null.
Evaluating and concluding
- What is our conclusion?
  - The p-value of our observed data is just the proportion of the data given the null that's at least as extreme as that observed data.
  - As a result, we're going to count how many of the differences in our difference list are at least as extreme as our observed difference.
  - If less than or equal to 5% of them are, then we will reject the Null.
  - Zero differences are at least as extreme as our observed difference!
  - p-value of the observed data is 0.
- What is our decision?
  - It doesn't matter which significance level we pick; our observed data is statistically significant, and we reject the Null.
  - We conclude that platform does impact on ratings. Specifically, we should advise our client to integrate only Google Play into their operating system interface.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Notebooks		Notebooks
raw_data		raw_data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebooks

Notebooks

raw_data

raw_data

README.md

README.md

Repository files navigation

✰ Google Play vs Apple Store:

✰ Datasets source:

✰ Stages of the project

About

Releases

Packages

Languages

akthammomani/Google-vs-Apple-Store-Hypothesis-Testing

Folders and files

Latest commit

History

Repository files navigation

✰ Google Play vs Apple Store:

✰ Datasets source:

✰ Stages of the project

About

Resources

Stars

Watchers

Forks

Languages