# Mobile App Data Analysis Project
This initiative involves a comprehensive exploration of mobile app trends using datasets from Google Play and the App Store. By analyzing genres, categories, and user ratings, the aim is to reveal critical patterns.

## Introduction

The objective of this project is to analyze datasets for apps available on both the Google Play Store and Apple Store. The focus is on identifying characteristics that contribute to higher user attraction, as the revenue model relies on in-app ads for free downloads. In essence, the goal is to help developers discern the types of apps that are more likely to garner increased user engagement and, consequently, higher revenue through ad interactions.

## Key Points

1. **Platform Diversity:** Analyze data from both Google Play and the App Store to gain a comprehensive understanding of the mobile app landscape.

2. **Revenue Model:** Explore the relationship between app success and in-app advertisements, focusing on factors that drive user engagement.

3. **User Preferences:** Investigate app genres, categories, and user ratings to identify patterns that can guide the development team in creating apps tailored to user preferences.

4. **Decision-Making Support:** Provide actionable insights derived from data analysis to aid developers in making strategic decisions that maximize user reach and enhance ad revenue.

5. **Competitive Edge:** Utilize data-driven decision-making as a powerful tool to stay competitive in the dynamic and evolving field of mobile app development.

## Project Relevance

In the dynamic landscape of mobile app development, understanding user behavior and preferences is crucial for success. By leveraging data analysis techniques, this project aims to equip our development team with valuable insights. These insights will empower them to make informed decisions, ultimately leading to the creation of apps that have the potential to attract a larger user base and maximize ad revenue. This project serves as a comprehensive exploration of the intersection between data analytics and strategic decision-making in the context of mobile app development.

### Data Extraction

In this project two datasets from the Google Play Store and the Apple App Store were used.

These datasets are stored in CSV files, specifically `googleplaystore.csv` and `AppleStore.csv`.

A sript was written to extract this datasets:

1. **Setting up the file paths**: The script starts by setting up the locations of the datasets. These locations are stored in what we call 'file paths'.

2. **Preparing for the data**: The script then sets up 'containers' to hold the datasets and their headers. Headers are the titles of the different categories of data in the datasets.

3. **Opening the datasets**: The script uses a function called `open_csv` to open the datasets. It then stores the data and headers in their respective 'containers'.

4. **Displaying the data**: Finally, the script uses a function called `print_headers_and_datasets` to display the headers and datasets. This allows us to see the data in a structured and organized manner.

This process allow the extraction and view of the data from the Google Play Store and Apple App Store datasets.

In [22]:
# csv is a built-in module in python that allows us to read and write to csv 
# files
import csv

# File paths for the datasets
google_play_store_file = "googleplaystore.csv";
apple_store_file = "AppleStore.csv";

# initialize the datasets and headers
google_play_store_data = None;
google_play_store_header = None;
apple_store_data = None;
apple_store_header = None;

In [23]:
# print_separator() function: prints a separator
# return: nothing, just prints a separator
def print_separator():
    print("\n");
    print("----------------------------------------\n");
    print("\n");

In [24]:
# explore_data() function: allows us to explore the rows and columns of a 
# dataset
# dataset: list of lists
# start and end: integers that slice the dataset
# rows_and_columns: boolean parameter with False as default argument
# return: nothing, just prints the number of rows and columns and slices the 
# dataset
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end];
    for i in dataset_slice:
        print(i);
        print("\n");
        
    if rows_and_columns:
        print("Number of rows: ", len(dataset));
        print("Number of columns: ", len(dataset[0]));
  
# open_csv() function: allows us to open a csv file and transform it into a list 
# of lists
# file_name: string that contains the name of the file, including the extension
# return: the dataset as a list of lists      
def open_csv(file_name):
    if (file_name == None):
        print("Error: no file name provided\n");
        return (None);
    
    try:
        csv_file = open(file_name, encoding="utf-8");
    except FileNotFoundError:
        print(f"Error: {file_name} not found");
        return (None);
    except Exception as e:
        print(f"Error: {e}");
        return (None);
    
    csv_reader = csv.reader(csv_file); 
    data = list(csv_reader); # transform the csv_reader into a list
    data_header = data[0];
    data = data[1:];
    csv_file.close();
    return (data, data_header);

# print the headers and datasets() function: prints the headers and datasets
# header: list of strings
# dataset: list of lists
# dataset_name: string that represents the name of the dataset (optional)
# return: nothing, just prints the headers and datasets
def print_headers_and_datasets(header, dataset, dataset_name = None):
    if (dataset_name != None):
        print(dataset_name + ": \n");
    
    if (header):
        print(header);
        print("\n");
        
    if (dataset):
        explore_data(dataset, 0, 3, True);
        print_separator();

In [25]:
# Open the datasets
google_play_store_data, google_play_store_header = open_csv(google_play_store_file);
apple_store_data, apple_store_header = open_csv(apple_store_file);

# print the headers and datasets
print_headers_and_datasets(google_play_store_header, google_play_store_data, "Google Play Store");
print_headers_and_datasets(apple_store_header, apple_store_data, "Apple Store");

Google Play Store: 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  10841
Number of columns:  13


----------------------------------------



Apple Store: 

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prim

# App Store Datasets Overview

This output provides a glimpse into the initial rows of data from both the Google Play Store and Apple App Store datasets.

## Google Play Store Dataset

The Google Play Store dataset encompasses information regarding various apps, including:

- **Name**
- **Category**
- **Rating**
- **Number of Reviews**
- **Size**
- **Number of Installs**
- **Type** (Free or Paid)
- **Price**
- **Content Rating**
- **Genre**
- **Last Updated Date**
- **Current Version**
- **Android Version Required**

In total, there are 10,841 rows of data and 13 columns.

## Apple App Store Dataset

The Apple App Store dataset contains information about apps, including:

- **App ID**
- **Name**
- **Size (in bytes)**
- **Currency**
- **Price**
- **Total Number of Ratings**
- **Number of Ratings for the Current Version**
- **User Rating**
- **User Rating for the Current Version**
- **Version**
- **Content Rating**
- **Primary Genre**
- **Number of Supporting Devices**
- **Number of Screenshots Displayed**
- **Number of Supported Languages**
- **VPP License Availability**

There are a total of 7,197 rows of data and 16 columns in the Apple App Store dataset.

## Analysis Opportunities

This comprehensive dataset enables analysis of diverse aspects of apps, including their popularity, user satisfaction, and emerging trends in successful app categories.


### Data Cleaning Process

After extracting the data, it is necessary to refine the data to ensure accuracy and relevance. The following steps have been taken during the cleaning process:

1. **Removed Inaccurate Data:** Any entries containing inaccurate information were carefully identified and removed to maintain data integrity.

2. **Eliminated Duplicate App Entries:** Duplicate app entries were identified and removed to avoid skewing the analysis with redundant information.

3. **Excluded Non-English Apps:** To narrow down our analysis to English-speaking markets, we removed apps in languages other than English.

4. **Isolated Free Apps:** Since our goal is to analyze and determine successful app profiles for revenue generation, we isolated the dataset to include only free apps.

In [26]:
# is_free_app() function: checks if the app is free
# row: list of strings
# header: list of strings
# index: integer that represents the index of the list that we want to check
# return: boolean that represents if the app is free or not
def is_free_app(row, header, index, printing_non_free_apps=False):
    #lowercase the header
    header_copy = [i.lower() for i in header];
    #look for the index of the price in header_copy ("price")
    index_price = header_copy.index("price");
    #lowercase row
    row_copy = [i.lower() for i in row];
    #check if the price is 0.0 or "0" or "free" in row_copy
    if (row_copy[index_price] == "0" or row_copy[index_price] == "0.0" or row_copy[index_price] == "free"):
        if (printing_non_free_apps == True):
            print(f"Not free app at index{index}: {row}\n");
        return (True);
    return (False);

# match_number_of_columns() function: checks if the row has the correct number
# of columns by comparing the length of the row with the length of the header
# row: list of strings
# header_length: integer that represents the length of the header
# index: integer that represents the index of the list that we want to check
# return: boolean that represents if the row has the correct number of columns
def match_number_of_columns(row, header_length, index, printing_wrong_number_of_columns=False):
    if (len(row) != header_length):
        if (printing_wrong_number_of_columns == True):
            print(f"Wrong number of columns at index {index}: {row}\n");
        return (False);
    return (True);

# is_english() function: checks if a the all row is in english characters
# row: list of strings
# printing_non_english: boolean that represents if we want to print the non
# english rows
# return: boolean that represents if the row is in english or not
def is_english(row, printing_non_english=False):
    for i in row:
        for j in i:
            if (ord(j) > 127):
                if (printing_non_english == True):
                    print(f"Not English at index {row.index(i)}: {i}\n");
                return (False);
    return (True);

# has_duplicates() function: checks if the row has duplicates
# cleaned_dataset: list of lists
# row: list of strings
# header: list of strings
# index: integer that represents the index of the list that we want to check
# printing_duplicates: boolean that represents if we want to print the duplicates
# return: boolean that represents if the row has duplicates or not
def has_duplicates(cleaned_dataset, row, header, index, printing_duplicates=False):
    #look for the index of the app name in header ("App" or "track_name")
    try:
        index_app_name = header.index("App");
    except ValueError:
        try:
            index_app_name = header.index("track_name");
        except ValueError:
            pass;
    #check if the app name is in cleaned_dataset
    for i in cleaned_dataset:
        if (row[index_app_name] == i[index_app_name]):
            if (printing_duplicates == True):
                print(f"Duplicate at index {index}: {row}\n");
            # check which one has the highest number of reviews
            try:    
                index_reviews = header.index("Reviews");
                if (int(row[index_reviews]) > int(i[index_reviews])):
                    # remove the one with the lowest number of reviews and append the new one
                    del cleaned_dataset[cleaned_dataset.index(i)];
                    cleaned_dataset.append(row);
                    return (True);
            except ValueError:
                pass;
            try:
                index_reviews = header.index("rating_count_tot");
                if (int(row[index_reviews]) > int(i[index_reviews])):
                    # remove the one with the lowest number of reviews and append the new one
                    del cleaned_dataset[cleaned_dataset.index(i)];
                    cleaned_dataset.append(row);
                    return (True);
            except ValueError:
                pass;
            return (True);
    return (False);

# check_row_and_delete() function: checks if the row has duplicates, if the app
# is free, if the row has the correct number of columns and if the row is in
# english. If it doesn't, it deletes the row.
# dataset: list of lists
# header: list of strings
# name_of_dataset: string that represents the name of the dataset (optional)
# return: the cleaned dataset as a list of lists
def check_row_and_delete(dataset, header, name_of_dataset=None):
    if (dataset == None):
        print("Error: no dataset provided\n");
        return (None);
    
    if (header == None):
        print("Error: no header provided\n");
        return (None);
    
    header_length = len(header);
    dataset_length = len(dataset);
    cleaned_dataset = [];
    duplicates = 0;
    
    for i in dataset:
        # check duplicates
        if (has_duplicates(cleaned_dataset, dataset[dataset.index(i)], header, dataset.index(i), False)):
            #print(f"Duplicate in dataset at index {dataset.index(i)}: {i}");
            #print_separator();
            duplicates += 1;
            continue;
        if (not is_english(dataset[dataset.index(i)], False)):
            #print(f"Non-English in dataset at index {dataset.index(i)}: {i}");
            #print_separator();
            continue;
        if (not match_number_of_columns(dataset[dataset.index(i)], header_length, dataset.index(i), False)):
            #print(f"Wrong number of columns in dataset at index {dataset.index(i)}: {i}");
            #print_separator();
            continue;
        if (not is_free_app(dataset[dataset.index(i)], header, dataset.index(i), False)):
            #print(f"Not free app in dataset at index {dataset.index(i)}: {i}");
            #print_separator();
            continue;
        cleaned_dataset.append(i);
    #print nbr of rows before and after
    cleaned_dataset_length = len(cleaned_dataset);
    print(f"Number of rows before of {name_of_dataset}: {dataset_length}");
    print(f"Number of rows after of {name_of_dataset}: {cleaned_dataset_length}");
    print(f"Number of duplicates of {name_of_dataset}: " + str(duplicates));
    return (cleaned_dataset);

In [28]:
# clean the datasets
google_play_store_data_cleaned = check_row_and_delete(google_play_store_data, google_play_store_header, "Google Play Store");
apple_store_data_cleaned = check_row_and_delete(apple_store_data, apple_store_header, "Apple Store");

# print the headers and datasets
print_headers_and_datasets(google_play_store_header, google_play_store_data_cleaned, "Google Play Store");
print_headers_and_datasets(apple_store_header, apple_store_data_cleaned, "Apple Store");

Number of rows before of Google Play Store: 10841
Number of rows after of Google Play Store: 8406
Number of duplicates of Google Play Store: 1081
Number of rows before of Apple Store: 7197
Number of rows after of Apple Store: 2920
Number of duplicates of Apple Store: 2
Google Play Store: 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number 

# Cleaned Datasets Overview

This output presents the outcomes of the data cleaning procedures applied to the Google Play Store and Apple App Store datasets.

## Data Cleaning Process

The data cleaning process involved the removal of duplicate entries and correction of incorrect data. The changes in the number of rows before and after cleaning are outlined for each dataset.

### Google Play Store Dataset

- Initial Rows: 10,841
- Cleaned Rows: 8,406
- Entries Removed: 2,435
  - Duplicate Entries: 1,081

### Apple App Store Dataset

- Initial Rows: 7,197
- Cleaned Rows: 2,920
- Entries Removed: 4,277
  - Duplicate Entries: 2

## Cleaned Datasets

Following the cleaning process, the first few rows of the cleaned datasets are provided:

### Cleaned Google Play Store Dataset

- Rows: 8,406
- Columns: 13

### Cleaned Apple App Store Dataset

- Rows: 2,920
- Columns: 16

## Analysis Ready

With the completion of the cleaning process, the cleaned datasets, consisting of 8,406 rows for the Google Play Store and 2,920 rows for the Apple App Store, along with their respective columns, are now ready for in-depth and accurate analysis.


## Validation Strategy

In this scenario, the strategy for the app ideas consists of a three-step process to minimize risks and overhead:

1. **Build a Minimal Android Version:** Develop a basic Android version of the app and release it on Google Play.

2. **User Response Evaluation:** Assess the app's response from users. If the feedback is positive, proceed to the next step.

3. **Further Development and iOS Version:** If the app proves to be profitable after six months, further development is undertaken, and an iOS version is built for inclusion in the App Store.

In [31]:
# Data analysis

# look for the index of the genres in google_play_store_header/apple_store_header ("Genres" or  
# "prime_genre", "Category", "installs" and "rating_count_tot") 
# Since, AppleStore dataset does not provide information about the number of installs, we assume that the app with the most reviews would be also the app with most installs.
index_genres = google_play_store_header.index("Genres");
index_prime_genre = apple_store_header.index("prime_genre");
index_category = google_play_store_header.index("Category");
index_installs = google_play_store_header.index("Installs");
index_rating_count_tot = apple_store_header.index("rating_count_tot");

In [32]:
# freq_table() function: generates a frequency table for any column we want in
# our dataset
# dataset: list of lists
# index: integer that represents the index of the list that we want to check
# return: a dictionary that represents the frequency table
def freq_table(dataset, index):
    freq_table = {};
    total = 0;
    for i in dataset:
        total += 1;
        if (i[index] in freq_table):
            freq_table[i[index]] += 1;
        else:
            freq_table[i[index]] = 1;
    for i in freq_table:
        freq_table[i] = (freq_table[i] / total) * 100;
    return (freq_table);



#Another function we can use to display the percentages in a descending order
# is display_table() function:
# dataset: list of lists
# index: integer that represents the index of the list that we want to check
# return: nothing, just prints the frequency table in a descending order
def display_table(dataset, index):
    table = freq_table(dataset, index);
    table_display = [];
    for i in table:
        key_val_as_tuple = (table[i], i);
        table_display.append(key_val_as_tuple);
    table_sorted = sorted(table_display, reverse=True);
    for i in table_sorted:
        print(i[1], ":", i[0]);
        
# avg_nbr_of_user_ratings_per_genre() function: calculates the average number of
# user ratings per genre
# dataset: list of lists
# header: list of strings
# index: integer that represents the index of the list that we want to check
# return: nothing, just prints the average number of user ratings per genre
def avg_nbr_of_user_ratings_per_genre(dataset, header, index):
    genre_freq_table = freq_table(dataset, index);
    avg_nbr_of_user_ratings_per_genre = {};
    for i in genre_freq_table:
        total = 0;
        len_genre = 0;
        for j in dataset:
            if (j[index] == i):
                total += float(j[header.index("rating_count_tot")]);
                len_genre += 1;
        avg_nbr_of_user_ratings_per_genre[i] = total / len_genre;
    return (avg_nbr_of_user_ratings_per_genre);



# avg_installs_per_category() function: calculates the average number of installs
# per category
# dataset: list of lists
# index_category: integer that represents the index of the list that we want to check
# index_installs: integer that represents the index of the list that we want to check
# return: nothing, just prints the average number of installs per category
def avg_installs_per_category(dataset, index_category, index_installs):
    # Create a frequency table for the Category column
    category_freq_table = freq_table(dataset, index_category)
    category_avg_nbr_installs = {}

    # For each category...
    for category in category_freq_table:
        total = 0  # Sum of installs specific to each genre
        len_category = 0  # Number of apps specific to each genre

        # For each app...
        for app in dataset:
            category_app = app[index_category]
            
            # If the app's category matches the current category...
            if category_app == category:
                n_installs = app[index_installs]
                n_installs = n_installs.replace(',', '')
                n_installs = n_installs.replace('+', '')
                total += float(n_installs)
                len_category += 1

        # Compute the average number of installs
        avg_n_installs = total / len_category
        category_avg_nbr_installs[category] = avg_n_installs;
    return (category_avg_nbr_installs);

# split_by_column_value(): 
def split_by_column_value(data, column_index):
    splitted_data = {};

    for row in data:
        column_value = row[column_index];

        if (column_value in splitted_data):
            splitted_data[column_value].append(row);
        else:
            splitted_data[column_value] = [row];
    return (splitted_data);

def calculate_total_per_category(data_split_by_category, index):
    total_per_category = {}
    for category in data_split_by_category:
        total = 0
        for item in data_split_by_category[category]:
            total += float(item[index].replace("+", "").replace(",", ""))
        total_per_category[category] = total
    return total_per_category

def calculate_percentage_of_value_per_category(data_split_by_category, index_value):
    total_per_category = calculate_total_per_category(data_split_by_category, index_value)
    percentage_of_value_per_category = {}
    for category in data_split_by_category:
        percentage_of_value_per_category[category] = []
        for item in data_split_by_category[category]:
            percentage = round(float(item[index_value].replace("+", "").replace(",", "")) / total_per_category[category] * 100, 2)
            percentage_of_value_per_category[category].append([item[0], item[index_value], percentage])
    return percentage_of_value_per_category


#
def add_percentage_column(data_split_by_category, index_value):
    total_per_category = calculate_total_per_category(data_split_by_category, index_value)
    data_with_percentage = {}
    for category in data_split_by_category:
        data_with_percentage[category] = []
        for item in data_split_by_category[category]:
            percentage = round(float(item[index_value].replace("+", "").replace(",", "")) / total_per_category[category] * 100, 2)
            data_with_percentage[category].append(item + [percentage])
    return data_with_percentage


In [33]:
# display the frequency tables of the genres and prime genres of the cleaned 
# datasets
print("Google Play Store - Genres: \n");
display_table(google_play_store_data_cleaned, index_genres);
print("\n");
display_table(google_play_store_data_cleaned, index_category);
print("\n");
print("Apple Store - Prime Genres: \n");
display_table(apple_store_data_cleaned, index_prime_genre);     


Google Play Store - Genres: 

Tools : 8.565310492505352
Entertainment : 6.09088746133714
Education : 5.389007851534618
Business : 4.710920770877944
Productivity : 3.97335236735665
Lifestyle : 3.866285986200333
Finance : 3.735427075898168
Medical : 3.628360694741851
Sports : 3.3309540804187487
Personalization : 3.3071615512729005
Communication : 3.2238876992624315
Health & Fitness : 3.1287175826790388
Action : 3.1168213181061146
Photography : 3.009754936949798
News & Magazines : 2.7956221746371637
Social : 2.664763264334999
Travel & Local : 2.3078753271472756
Shopping : 2.2483940042826553
Books & Reference : 2.1889126814180346
Simulation : 2.081846300261718
Dating : 1.8320247442303115
Arcade : 1.8320247442303115
Casual : 1.7725434213656914
Video Players & Editors : 1.736854627646919
Maps & Navigation : 1.3561741613133478
Food & Drink : 1.2015227218653344
Puzzle : 1.1301451344277897
Racing : 1.023078753271473
Role Playing : 0.939804901261004
Auto & Vehicles : 0.939804901261004
Strategy :

# Genre Analysis: App Store vs. Google Play

## App Store Dataset

In the App Store dataset, the most common genre is **Games** (59.14%), followed by **Entertainment** (7.53%). This suggests that a significant portion of the apps is designed for entertainment rather than practical purposes. However, the prevalence of genres alone doesn't necessarily indicate user preferences, as the number of apps doesn't directly correlate with user engagement.

## Google Play Dataset

For the Google Play dataset, the most common genres are:

- **Genres Column:**
  - **Tools** (8.57%)
  - **Entertainment** (6.09%)

- **Category Column:**
  - **Family** (18.81%)
  - **Game** (9.61%)

This suggests a more balanced distribution of app genres on the Google Play market, encompassing both practical and entertainment apps.

## Market Comparison

Comparing the two markets:

- **App Store:** Dominated by **Games** and **Entertainment** apps.
- **Google Play:** Exhibits a more balanced landscape with a mix of practical and entertainment apps, represented by **Tools**, **Entertainment**, **Family**, and **Game**.

However, it's crucial to note that these frequency tables alone don't provide insights into user engagement, popularity, or revenue. Additional information such as user ratings, number of installs, and revenue data is essential for a comprehensive analysis.

## Recommendations and Considerations

Based on the current data, making app profile recommendations is challenging. The frequency tables reveal the most common genres but do not reflect user preferences or profitability. To make informed recommendations, further data analysis, including user engagement and revenue metrics, is necessary.


### Most Popular Apps by Genre on the App Store

To determine the most popular genres on the App Store, we can calculate the average number of installs for each app genre. However, in the App Store dataset, the direct information on installs is missing. As a workaround, we'll use the total number of user ratings as a proxy, which is available in the `rating_count_tot` column.

Below, we outline the process of calculating the average number of user ratings per app genre on the App Store:

1. **App Store Data Set:**

   - **Column Used:** `rating_count_tot`
   - **Calculation:** Average number of user ratings per app genre.

This approach provides a proxy for popularity based on user engagement, allowing us to identify the genres with the most user interactions on the App Store.

Keep in mind that this method assumes a correlation between user ratings and popularity, but it may not capture other factors influencing app success, such as user satisfaction, retention, or revenue.



In [34]:
print("\n");
print("Average number of user ratings per genre in AppStore: \n");
apple_store_data_avg_nbr_rating_per_genre = sorted(avg_nbr_of_user_ratings_per_genre(apple_store_data_cleaned, apple_store_header, index_prime_genre).items(), key=lambda x: x[1], reverse=True);
print(apple_store_data_avg_nbr_rating_per_genre);
print_separator();



Average number of user ratings per genre in AppStore: 

[('Navigation', 125037.25), ('Reference', 89562.6), ('Social Networking', 78567.30769230769), ('Music', 55396.01587301587), ('Weather', 48275.57692307692), ('Travel', 34115.57575757576), ('Food & Drink', 33333.92307692308), ('Photo & Video', 29249.766666666666), ('Shopping', 28877.575342465752), ('Finance', 26038.6875), ('Sports', 25791.666666666668), ('News', 23382.17948717949), ('Productivity', 22842.22), ('Games', 21585.620150550087), ('Health & Fitness', 19418.620689655174), ('Lifestyle', 17260.53488372093), ('Book', 16671.0), ('Entertainment', 15006.227272727272), ('Utilities', 11571.69696969697), ('Business', 6839.6), ('Education', 6103.464285714285), ('Catalogs', 5195.0), ('Medical', 612.0)]


----------------------------------------





In [37]:
# print the average number of user ratings per genre in Google Play Store
print("\n");
print("Average number of user ratings per genre in Google Play Store: \n");
google_play_store_data_avg_nbr_rating_per_genre = sorted(avg_installs_per_category(google_play_store_data_cleaned, index_category, index_installs).items(), key=lambda x: x[1], reverse=True);
print(google_play_store_data_avg_nbr_rating_per_genre);
print_separator();



Average number of user ratings per genre in Google Play Store: 

[('COMMUNICATION', 36106662.328413285), ('VIDEO_PLAYERS', 25234606.216216218), ('SOCIAL', 24441088.17857143), ('PHOTOGRAPHY', 18099283.85375494), ('PRODUCTIVITY', 16972497.946107786), ('GAME', 15434835.816831684), ('TRAVEL_AND_LOCAL', 14487541.68041237), ('ENTERTAINMENT', 12346329.11392405), ('TOOLS', 11084333.292649098), ('NEWS_AND_MAGAZINES', 10006311.10638298), ('BOOKS_AND_REFERENCE', 8504745.97826087), ('SHOPPING', 7307823.2010582015), ('WEATHER', 5219216.7164179105), ('PERSONALIZATION', 5027006.791366907), ('MAPS_AND_NAVIGATION', 4304432.280701755), ('HEALTH_AND_FITNESS', 4263642.1749049425), ('SPORTS', 3647640.208029197), ('FAMILY', 3633707.342820999), ('FOOD_AND_DRINK', 1974937.1386138613), ('ART_AND_DESIGN', 1932519.642857143), ('EDUCATION', 1844897.9591836734), ('BUSINESS', 1602958.308080808), ('HOUSE_AND_HOME', 1391211.1911764706), ('LIFESTYLE', 1379485.3343558281), ('FINANCE', 1348224.9426751593), ('COMICS', 

# App Popularity Analysis: Google Play Store and Apple App Store

On average, navigation apps appear to have the highest number of user reviews. However, this figure may be skewed by a few apps with hundreds of thousands of reviews, while others struggle to surpass the 10,000 review threshold.

## Python Script Operations Overview

The following Python script performs several operations on the Google Play Store and Apple App Store datasets:

1. **Splitting by Category/Prime Genre:**
   - Datasets are divided into groups based on the app's category (Google Play Store) or prime genre (Apple App Store).
   - The top 3 apps from each category or prime genre are printed.

2. **Ordering by Number of Installs:**
   - Within each category or prime genre, apps are ordered by the number of installs.
   - The top 3 apps with the most installs from each category or prime genre are printed.

3. **Sum of Installs per Category/Prime Genre:**
   - The total number of installs is calculated for each category or prime genre and printed.

4. **Adding a Column for Percentage of Installs per Category/Prime Genre:**
   - A new column is added to the datasets, indicating the percentage of installs that each app has within its category or prime genre.
   - The top 3 apps from each category or prime genre, along with their percentage of installs, are printed.

These operations aim to provide insights into the most popular categories or prime genres, identify apps with the highest installs within each category or prime genre, and illustrate the distribution of installs within each category or prime genre.

Please note that these analyses are based on available data and assumptions, and the interpretation may be influenced by outliers in the number of user reviews or installs.


In [39]:
# split by category
print("\n");
print("Google Play Store - split by category: \n");
google_play_store_data_split_by_category = split_by_column_value(google_play_store_data_cleaned, index_category);
# only print top 3 of each category
for i in google_play_store_data_split_by_category:
    print(i, ":\n");
    print(google_play_store_data_split_by_category[i][:3]);
    print("\n");
print_separator();
print("\n");
print("Apple Store - split by prime genre: \n");
apple_store_data_split_by_prime_genre = split_by_column_value(apple_store_data_cleaned, index_prime_genre);
# only print top 3 of each category
for i in apple_store_data_split_by_prime_genre:
    print(i, ":\n");
    print(apple_store_data_split_by_prime_genre[i][:3]);
    print("\n");




Google Play Store - split by category: 

ART_AND_DESIGN :

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']]


AUTO_AND_VEHICLES :

[['Monster Truck Stunt 3D 2019', 'AUTO_AND_VEHICLES', '4.2', '367', '25M', '100,000+', 'Free', '0', 'Everyone', 'Auto & Vehicles', 'May 10, 2018', '1.0', '4.0.3 and up'], ['Real Tractor Farming', 'AUTO_AND_VEHICLES', '4.0', '1598', '56M', '1,000,000+', 'Free', '0', 'Everyone', 'Auto & Vehicles', 'July 26, 2018', '11.0', '4.1 and up'], ['Ultimate F1 Racing Championship

In [41]:
#order each category by number of installs
print("\n");
print("Google Play Store - order each category by number of installs: \n");
google_play_store_data_ordered_by_installs = {};
for i in google_play_store_data_split_by_category:
    google_play_store_data_ordered_by_installs[i] = sorted(google_play_store_data_split_by_category[i], key=lambda x: float(x[index_installs].replace("+", "").replace(",", "")), reverse=True);
    print(i, ":\n");
    print(google_play_store_data_ordered_by_installs[i][:3]);
    print("\n");
print_separator();
print("\n");
print("Apple Store - order each category by number of installs: \n");
apple_store_data_ordered_by_installs = {};
for i in apple_store_data_split_by_prime_genre:
    apple_store_data_ordered_by_installs[i] = sorted(apple_store_data_split_by_prime_genre[i], key=lambda x: float(x[index_rating_count_tot].replace("+", "").replace(",", "")), reverse=True);
    print(i, ":\n");
    print(apple_store_data_ordered_by_installs[i][:3]);
    print("\n");



Google Play Store - order each category by number of installs: 

ART_AND_DESIGN :

[['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Tattoo Name On My Photo Editor', 'ART_AND_DESIGN', '4.2', '44829', '20M', '10,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'April 2, 2018', '3.8', '4.1 and up'], ['ibis Paint X', 'ART_AND_DESIGN', '4.6', '224399', '31M', '10,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'July 30, 2018', '5.5.4', '4.1 and up']]


AUTO_AND_VEHICLES :

[['Android Auto - Maps, Media, Messaging & Voice', 'AUTO_AND_VEHICLES', '4.2', '271920', '16M', '10,000,000+', 'Free', '0', 'Teen', 'Auto & Vehicles', 'July 11, 2018', 'Varies with device', '5.0 and up'], ['AutoScout24 - used car finder', 'AUTO_AND_VEHICLES', '4.4', '186648', '42M', '10,000,000+', 'Free', '0', 'Everyone', 'Auto & Vehicles', 'August 2, 2018', '9.3.52', '4.2 and up'], ['Ulysse 

In [42]:
# sum of installs per category
print("\n");
print("Google Play Store - sum of installs per category: \n");
google_play_store_data_sum_of_installs_per_category = calculate_total_per_category(google_play_store_data_ordered_by_installs, index_installs);
for i in google_play_store_data_sum_of_installs_per_category:
      print(i, ":\n");
      print(google_play_store_data_sum_of_installs_per_category[i]);
      print("\n");
print_separator();
print("\n");
print("Apple Store - sum of installs per category: \n");
apple_store_data_sum_of_installs_per_category = calculate_total_per_category(apple_store_data_ordered_by_installs, index_rating_count_tot);
for i in apple_store_data_sum_of_installs_per_category:
        print(i, ":\n");
        print(apple_store_data_sum_of_installs_per_category[i]);
        print("\n");



Google Play Store - sum of installs per category: 

ART_AND_DESIGN :

108221100.0


AUTO_AND_VEHICLES :

50980061.0


BEAUTY :

27197050.0


BOOKS_AND_REFERENCE :

1564873260.0


BUSINESS :

634771490.0


COMICS :

42261150.0


COMMUNICATION :

9784905491.0


DATING :

117803757.0


EDUCATION :

180800000.0


ENTERTAINMENT :

975360000.0


EVENTS :

13973150.0


FINANCE :

423342632.0


FOOD_AND_DRINK :

199468651.0


HEALTH_AND_FITNESS :

1121337892.0


HOUSE_AND_HOME :

94602361.0


LIBRARIES_AND_DEMO :

51293710.0


LIFESTYLE :

449712219.0


GAME :

12471347340.0


FAMILY :

5744891309.0


MEDICAL :

36470344.0


SOCIAL :

5474803752.0


SHOPPING :

1381178585.0


PHOTOGRAPHY :

4579118815.0


SPORTS :

999453417.0


TRAVEL_AND_LOCAL :

2810583086.0


TOOLS :

7991804304.0


PERSONALIZATION :

1397507888.0


PRODUCTIVITY :

5668814314.0


PARENTING :

29961010.0


WEATHER :

349687520.0


VIDEO_PLAYERS :

3734721720.0


NEWS_AND_MAGAZINES :

2351483110.0


MAPS_AND_NAVIGATION :



In [43]:
#add a column with the app percentage of installs per category (installs / sum of installs per category)
print("\n");
print("Google Play Store - add a column with the app percentage of installs per category: \n");
google_play_store_data_add_percentage = add_percentage_column(google_play_store_data_ordered_by_installs, index_installs);
for i in google_play_store_data_add_percentage:
      print(i, ":\n");
      print(google_play_store_data_add_percentage[i][:3]);
      print("\n");
print_separator();
print("\n");
print("Apple Store - add a column with the app percentage of installs per category: \n");
apple_store_data_add_percentage = add_percentage_column(apple_store_data_ordered_by_installs, index_rating_count_tot)
for i in apple_store_data_add_percentage:
        print(i, ":\n");
        print(apple_store_data_add_percentage[i][:3]);
        print("\n");



Google Play Store - add a column with the app percentage of installs per category: 

ART_AND_DESIGN :

[['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up', 46.2], ['Tattoo Name On My Photo Editor', 'ART_AND_DESIGN', '4.2', '44829', '20M', '10,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'April 2, 2018', '3.8', '4.1 and up', 9.24], ['ibis Paint X', 'ART_AND_DESIGN', '4.6', '224399', '31M', '10,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'July 30, 2018', '5.5.4', '4.1 and up', 9.24]]


AUTO_AND_VEHICLES :

[['Android Auto - Maps, Media, Messaging & Voice', 'AUTO_AND_VEHICLES', '4.2', '271920', '16M', '10,000,000+', 'Free', '0', 'Teen', 'Auto & Vehicles', 'July 11, 2018', 'Varies with device', '5.0 and up', 19.62], ['AutoScout24 - used car finder', 'AUTO_AND_VEHICLES', '4.4', '186648', '42M', '10,000,000+', 'Free', '0', 'Everyone', 'Auto & Vehicles', 'August

# Mobile App Analysis: App Store and Google Play

## App Store Genres and Popularity

The goal is to identify popular genres, but genres like navigation, social networking, or music might seem more popular due to a few dominant apps. The average number of ratings is skewed by a handful of giants, making it challenging to recommend an app profile based solely on this data.

Genres with the highest average number of user ratings ('Navigation', 'Reference', and 'Social Networking') might have dominance by a few big players, like Google Maps and Waze. Genres like 'Reference', 'Weather', 'Food & Drink', or 'Finance' could be interesting, offering opportunities for practical apps rather than entertainment or social networking.

## Most Popular Apps by Genre on Google Play

For the Google Play market, install numbers provide clearer insights into genre popularity. However, these numbers are not precise, and categories may be influenced by a few giants. Communication apps have the most installs, heavily influenced by apps like WhatsApp and Facebook Messenger. Similarly, video players, social apps, and productivity apps follow the same pattern, dominated by a few major players.

While game genres seem popular, they may be saturated. 'Books and Reference' genre looks promising, showing potential for profitability on both the App Store and Google Play. A deeper exploration reveals that popular apps in this genre are dominated by a few, leaving room for new apps with unique features.

## App Recommendations

Based on the analysis, building an app around a popular book with additional features like daily quotes, audio versions, quizzes, and discussion forums could be profitable for both markets. The 'Books and Reference' category is a potential choice.

## Category Analysis on Google Play

Looking at the average number of installs per category in the Google Play Store, categories like 'COMMUNICATION', 'VIDEO_PLAYERS', 'SOCIAL', 'PHOTOGRAPHY', and 'PRODUCTIVITY' have high averages. However, these may be heavily influenced by major apps.

For a less dominated category, exploring 'ART_AND_DESIGN', 'AUTO_AND_VEHICLES', 'BEAUTY', 'BOOKS_AND_REFERENCE', and 'EDUCATION' could be worthwhile. 'BOOKS_AND_REFERENCE' stands out as a promising category with potential for a wide audience.

## Conclusion

In conclusion, turning a popular book into an app, enriched with additional features, is recommended for profitability on both App Store and Google Play. The 'Books and Reference' category offers potential, but a more detailed analysis may be needed for a final decision.
