# Analyzing Mobile App Data Project Summary

In this project, our focus is on analyzing mobile app data for a company that specializes in developing Android and iOS applications. These apps are distributed through two prominent platforms, Google Play and the App Store, and generate revenue through in-app advertisements. The success of these apps is directly tied to user engagement, as higher engagement translates into increased revenue for the company.

Our main objective is to conduct a comprehensive analysis of the available data to identify the key factors that contribute to app popularity among users. By examining patterns and trends within the data, we will gain valuable insights that can help the company's developers enhance their apps and attract a larger user base. This analysis will provide us with practical experience in real-world data analysis techniques and strengthen our ability to effectively communicate our findings.

Completing this project will not only allow us to gain hands-on experience in analyzing mobile app data, but it will also enable us to showcase our skills as software engineers in interpreting and leveraging data to drive business decision-making. By providing valuable insights and recommendations to the company, we will demonstrate our ability to make a tangible impact on app development strategies and contribute to the company's success in the competitive mobile app market.


# Opening and Exploring the Data


The objective of this section is to help developers understand the types of apps that attract more users on Google Play and the App Store. To achieve this, we will collect and analyze data about mobile apps available on both platforms. We have two existing datasets that are suitable for our analysis: Dataset 1, which contains information on around 10,000 Android apps from Google Play (collected in August 2018), and Dataset 2, which includes data on approximately 7,000 iOS apps from the App Store (collected in July 2017). By examining these datasets, we aim to gain insights into the characteristics and trends of successful apps on both platforms, without the need for additional data collection.

Our objective is to understand the types of apps that attract more users on Google Play and the App Store. Instead of collecting new data, we will analyze existing datasets as a representative sample. Using the explore_data() function, we can examine the datasets' structure and content by printing rows in a readable format. It's crucial to exclude any header rows in the datasets to ensure accurate row counts. 

To sum up, our objective is to analyze two datasets that provide information about Android and iOS apps. By using the explore_data() function, we can examine the dataset rows and understand their structure. This analysis allows us to gain insights into the trends of apps on Google Play and the App Store, without the need for extensive data collection efforts.

In [1]:
from csv import reader

opened_file = open('googleplaystore.csv', encoding='utf8')  # Open the file
read_file = reader(opened_file)  # Read the file
android = list(read_file)  # Convert the file into a list of lists
header_android = android[0]  # Extract the header row
android = android[1:]  # Extract the data rows

# Open the App Store dataset
opened_file = open('AppleStore.csv', encoding='utf8')  # Open the file
read_file = reader(opened_file)  # Read the file
ios = list(read_file)  # Convert the file into a list of lists
header_ios = ios[0]  # Extract the header row
ios_data = ios[1:]  # Extract the data rows




The given code performs the following operations for opening and reading two CSV files, namely 'googleplaystore.csv' and 'AppleStore.csv'. It utilizes the CSV reader functionality to convert the data from each file into a list of lists. The first list within the resulting list represents the header row, containing the column names, while the remaining lists represent the data rows. The converted data for the Android dataset is stored in the 'android' variable, while the iOS dataset is stored in the 'ios' variable. Additionally, the code extracts the header row from each dataset and assigns it to 'header_android' and 'header_ios', respectively. The actual data rows for Android and iOS are assigned to 'android_data' and 'ios_data'.



In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


print('Google Play dataset:')

# Call the explore_data() function for the Google Play dataset
explore_data(android, 0, 5, True) 

print('App Store dataset:')

# Call the explore_data() function for the App Store dataset
explore_data(ios_data, 0, 5, True)  

# Print column names for each dataset
# Print the column names for the Google Play dataset
print('Google Play columns:', header_android)

# Print the column names for the App Store dataset
print('App Store columns:',0, 5, header_ios)  


Google Play dataset:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13
App Store data

The code introduces a function called 'explore_data()' -> This function is designed to examine and understand a dataset in a more readable format -> It enables us to view the rows and columns of the dataset -> Additionally, it provides an option to display the number of rows and columns in the dataset -> Furthermore, the function helps us identify the indices of the rows and columns for further analysis and exploration of the dataset.

# Deleting Wrong Data

In [3]:
print(android[10472])
print(header_android)
print(len(android))
del android[10472]
print(len(android))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
10841
10840


The code executes a series of actions in sequencial manners: First, it prints the row at index 10472 from the 'android' dataset -> Next, it prints the header of the 'android' dataset -> After that, it displays the number of rows present in the 'android' dataset -> Then, it deletes the row at index 10472 from the 'android' dataset -> Finally, it prints the updated number of rows in the 'android' dataset. These actions allow for the examination and modification of the 'android' dataset.

# Removing Duplicate Entries:

# Part One

In the process of cleaning our data, we identified that there are duplicate entries for certain apps in the Google Play dataset, with over 1,181 instances of apps appearing more than once. To resolve this issue, we will eliminate the duplicates and retain only one entry per app. By examining the rows associated with the Instagram app, we observed variations in the number of reviews, indicating different data collection times. To establish a criterion for removing duplicates, we will select the entry with the highest number of reviews, ensuring that we avoid counting apps multiple times. This approach prioritizes recent and reliable data, upholds data integrity, and facilitates informed decision-making based on accurate information.

In [4]:
unique_apps= []
duplicate_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Number of Unique apps:', len(unique_apps))

Number of duplicate apps: 1181


Number of Unique apps: 9659


The code has two lists named 'unique_apps' and 'duplicate_apps' are initialized. The code then proceeds to iterate through each app in the 'android' dataset.--> For each app, the code extracts its name and checks if it is already present in the 'unique_apps' list.--> If the name is found in 'unique_apps', indicating that it is a duplicate app, the name is added to the 'duplicate_apps' list.--> On the other hand, if the name is not present in 'unique_apps', it is considered a unique app and is added to the 'unique_apps' list.
After iterating through all the apps, the code prints the number of duplicate apps by displaying the length of the 'duplicate_apps' list, followed by a new line.
Finally, the code prints the number of unique apps by displaying the length of the 'unique_apps' list.
By executing this code, we can identify and count the number of duplicate and unique app names present in the 'android' dataset.

# Part Two

To remove duplicate entries from the Google Play dataset, we took the following steps. First, we created a dictionary where each unique app name was a key, and the corresponding value was the highest number of reviews for that app. This helped us identify the most reliable entry for each app. Then, using this dictionary, we constructed a new dataset that included only one entry per app, specifically the entry with the highest number of reviews. By doing so, we eliminated duplicate rows and ensured that our dataset was clean and accurate for further analysis.






In [5]:
reviews_max = {}  # Initialize an empty dictionary to store the maximum reviews for each app

for app in android:  # Loop through each app in the android dataset
    name = app[0]  # Extract the app name
    n_reviews = float(app[3])  # Extract the number of reviews and convert it to a float
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        # If the app name is already in reviews_max and the current number of reviews is greater than the existing value,
        # update the maximum reviews for that app in reviews_max
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        # If the app name is not in reviews_max, add a new entry with the app name as the key and the number of reviews as the value
        reviews_max[name] = n_reviews

print('Expected length:', len(android) - 1181)  # Print the expected length of the cleaned dataset (original length - number of duplicates)
print('Actual length:', len(reviews_max))  # Print the actual length of the reviews_max dictionary
print(header_android)  # Print the header row of the android dataset


Expected length: 9659
Actual length: 9659
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


The code initializes an empty dictionary called reviews_max. It then loops through the android dataset and for each app, it extracts the app name and the number of reviews. It checks if the app name is already present in reviews_max and if the number of reviews is greater than the existing value in the dictionary. If so, it updates the value in reviews_max with the higher number of reviews. If the app name is not in reviews_max, it adds a new entry to the dictionary. After the loop, it prints the expected length of the cleaned dataset by subtracting the number of duplicates (1181) from the original length of the android dataset. It also prints the actual length of the reviews_max dictionary. Finally, it prints the header row of the android dataset. This code is used to identify the apps with the highest number of reviews, which will be used to remove duplicate entries.

In [6]:
android_clean = []    # Create an empty list to store the cleaned Android dataset
already_added = []    # Create an empty list to keep track of already added app names

# Iterate through each app in the Android dataset
for app in android:
    name = app[0]    # Extract the app name
    n_reviews = float(app[3])    # Extract the number of reviews and convert it to a float

    # Check if the number of reviews is equal to the highest number of reviews for that app and the app name is not already added
    if n_reviews == reviews_max[name] and (name not in already_added):
        android_clean.append(app)    # Append the app to the cleaned dataset
        already_added.append(name)    # Add the app name to the already_added list to avoid duplicates

# Display information about the cleaned Android dataset
explore_data(android_clean, 0, 3, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


The provided code aims to create a cleaned dataset called android_clean by removing duplicate entries from the original Android dataset. It achieves this by iterating through each app in the Android dataset. For each app, it extracts the app name and the number of reviews. Then, it checks if the number of reviews is equal to the highest number of reviews for that particular app (reviews_max[name]). It also checks if the app name has not been added to the already_added list to avoid duplicates. If both conditions are met, the app is considered unique, and it is appended to the android_clean list, while the app name is added to the already_added list. Finally, the explore_data() function is called to display information about the cleaned dataset, including the specified number of rows and the total number of rows.

# Removing Non-English Apps: 


# Part One


Duplicate App Removal:

Removed duplicate entries from the Google Play dataset.
Achieved a cleaned dataset with unique app entries.
Non-English App Filtering:

Detected non-English apps by checking for characters with ASCII numbers greater than 127.
Developed a function to identify non-English characters in app names.
Filtered out non-English apps from both the Google Play and App Store datasets.
Focus on English Apps:

Ensured that the analysis is limited to English apps only.
Prepared the datasets for further analysis and insights.
These steps helped us clean the data and narrow down our focus to English apps, facilitating more accurate and relevant analysis.

In [11]:
# Define a function to check if a string consists of common English characters
def comm_english(string):
    for row in string:
        if ord(row) > 127:
            return False
    return True

# Test the function on different strings
blue = comm_english('Instagram')  # Check if 'Instagram' is English
yell = comm_english('爱奇艺PPS -《欢乐颂2》电视剧热播')  # Check if '爱奇艺PPS -《欢乐颂2》电视剧热播' is English
gray = comm_english('Docs To Go™ Free Office Suite')  # Check if 'Docs To Go™ Free Office Suite' is English
purple = comm_english('Instachat 😜')  # Check if 'Instachat 😜' is English

# Print the results
print(blue)  # Output: True (English)
print(yell)  # Output: False (Non-English)
print(gray)  # Output: False (Non-English)
print(purple)  # Output: False (Non-English)

# Another way to check multiple strings at once
app_names = ['Instagram', '爱奇艺PPS -《欢乐颂2》电视剧热播', 'Docs To Go™ Free Office Suite', 'Instachat 😜']

# Iterate over the list of app names and store the results in a new list
results = []
for name in app_names:
    result = comm_english(name)
    results.append(result)

# Print the results for all app names
print(results)  # Output: [True, False, False, False] (English, Non-English, Non-English, Non-English)


True
False
False
False
[True, False, False, False]


The objective of the code is to determine whether a given string consists of common English characters or not. By checking the ASCII values of the characters in the string, the code helps identify if the string contains any non-English characters. The function comm_english and the subsequent code execution aim to differentiate between English and non-English strings and provide a way to filter out non-English content in further data analysis or processing tasks.



The code defines a function called comm_english that checks whether a given string consists of common English characters or not. It iterates through each character in the string, checks its ASCII value using the ord() function, and returns False if any character has an ASCII value greater than 127, indicating a non-English character. The function returns True if all characters have ASCII values within the common English range. The code then calls this function with different strings and prints the results. It also demonstrates an alternative approach by creating a list of app names and using a loop to check their English status, storing the results in a list and printing them.

# Part Two

In the previous step, we encountered an issue with our function that detects non-English app names. To address this, we revised the function to allow for up to three non-ASCII characters in an English app name. This modification aims to minimize the loss of useful data while still effectively filtering out non-English apps. In the upcoming steps, we will utilize this updated function to filter out the non-English apps from our datasets.

In [12]:
# Define a function to check if a string is English or non-English
def comm_english(string):
    # Initialize a counter for non-ASCII characters
    non_ascii = 0
    
    # Loop through each character in the string
    for row in string:
        # Check if the ASCII value of the character is greater than 127
        if ord(row) > 127:
            # If it is, increment the counter by 1
            non_ascii += 1
    
    # Check the count of non-ASCII characters
    if non_ascii > 3:
        # If the count is greater than 3, consider the string as non-English and return False
        return False
    else:
        # If the count is 3 or less, consider the string as English and return True
        return True

# Check the English/non-English status of the first app name
pure = comm_english('Docs To Go™ Free Office Suite')

# Check the English/non-English status of the second app name
girl = comm_english('Instachat 😜')

# Print the English/non-English status of the first app name
print(pure)

# Print the English/non-English status of the second app name
print(girl)


True
True


The code defines a function called comm_english that determines whether a given string is in English or not. It checks each character in the string and counts the number of non-ASCII characters. If the count exceeds 3, the function considers the string as non-English and returns False; otherwise, it considers the string as English and returns True. The code then applies this function to two sample app names and prints the results, indicating whether each app name is in English or not.

# Isolating the Free Apps

Now, we are ready to proceed with the final step of the data cleaning process, which involves isolating only the free apps from the datasets. This step is essential because we are specifically interested in analyzing free apps, as our revenue model relies on in-app ads. With the data now cleaned and prepared, we can move on to the analysis phase in the upcoming lessons.

In [13]:
android_english = []
ios_english = []

# Filtering non-English apps from the android dataset
for app in android:
    name = app[0]
    if comm_english(name):  # Check if app name is in English
        android_english.append(app)  # Add app to the android_english list if it's in English

# Filtering non-English apps from the ios dataset       
for app in ios:
    name = app[1]
    if comm_english(name):  # Check if app name is in English
        ios_english.append(app)  # Add app to the ios_english list if it's in English

explore_data(android_english, 0, 3, True)  # Explore a sample of the filtered android_english dataset
explore_data(ios_english, 0, 3, True)  # Explore a sample of the filtered ios_english dataset


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10795
Number of columns: 13
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instag

This code is filtering out non-English apps from two datasets: android and ios. It creates two new lists, android_english and ios_english, to store only the apps that are in English. The filtering is done by iterating through each app in the respective dataset and checking if its name is classified as English using the comm_english() function. By filtering out non-English apps, we ensure that we focus only on apps designed for English-speaking audiences. The explore_data() function is then used to examine a sample of the filtered datasets to verify the filtering process. This step is important because we want to ensure that our analysis is based on relevant data for our target audience.

# Most Common Apps by Genre

# Part One

This is data cleaning procedures, ----> including the removal of inaccurate data, duplicate app entries, and non-English apps. --> Subsequently, we delved into data analysis to determine the types of apps that are more likely to attract users and generate revenue. --> A validation strategy was outlined, involving three sequential steps: building a basic Android version of the app and releasing it on Google Play, evaluating user response, and further developing the app if it exhibits promise. ---> Finally, if the app remains profitable over a period of six months, an iOS version is developed and added to the App Store. 
* The objective is to identify app profiles that can succeed in both the Google Play and App Store markets. To initiate the analysis, frequency tables were constructed to ascertain the most common app genres in each market. This lesson establishes a solid groundwork for informed decision-making in app development and the identification of potential successful app profiles.

In [14]:
# Initialize empty lists to store the free apps
android_final = []
ios_final = []

# Loop through each app in the Android dataset
for app in android_english:
    # Extract the price value from the app
    price = app[7]
    
    # Check if the price is '0', indicating it's a free app
    if price == '0':
        # If it's a free app, append it to the android_final list
        android_final.append(app)

# Loop through each app in the iOS dataset
for app in ios_english:
    # Extract the price value from the app
    price = app[4]
    
    # Check if the price is '0.0', indicating it's a free app
    if price == '0.0':
        # If it's a free app, append it to the ios_final list
        ios_final.append(app)

# Print the number of free apps remaining in each dataset
print(len(android_final))
print(len(ios_final))


9999
3222


In the given code, we start by creating two empty lists: called android_final and ios_final.
These lists are meant to store the free apps from the Android and iOS datasets, respectively.
* Then, the code goes through each app in the Android dataset and checks if the app's price is '0', which means it is free. * If the price is indeed '0', the app is added to the android_final list. Similarly, 
* the code does the same for each app in the iOS dataset, checking if the price is '0.0'.
* If it is, the app is added to the ios_final list. Lastly, the code displays the number of free apps that remain in each dataset by printing the length of the android_final and ios_final lists.

# Part Two


Frequency tables are a useful tool for analyzing data and finding patterns. In this lesson, we focus on generating and analyzing frequency tables to determine the most common genres in different app markets. We create functions to calculate the frequencies and percentages of genres, and another function to sort and display the genres in descending order. To overcome limitations with dictionaries, we convert them into a list of tuples. Additionally, we use a helper function called display_table() to print the genres and their frequencies in a clear format. By examining these frequency tables, we can gain insights into the popularity of different app genres in each market.

In [15]:
def freq_table(dataset, index):
    table = {}  # Initialize an empty dictionary to store value counts
    total = 0  # Initialize a variable to keep track of the total number of rows
    
    # Iterate over each row in the dataset
    for row in dataset:
        total += 1  # Increment the total count for each row
        value = row[index]  # Get the value at the given index
        
        # Check if the value already exists as a key in the dictionary
        if value in table:
            table[value] += 1  # Increment the count if the value exists
        else:
            table[value] = 1  # Add the value as a new key and set its count to 1
    
    table_percentages = {}  # Initialize an empty dictionary to store percentages
    
    # Calculate the percentage of each value in the table
    for key in table:
        percentage = (table[key] / total) * 100  # Calculate the percentage
        table_percentages[key] = percentage  # Store the percentage with the corresponding key
    
    return table_percentages  # Return the dictionary of value frequencies as percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)  # Get the frequency table as a dictionary
    table_display = []  # Initialize an empty list to store the formatted table
    
    # Iterate over each key-value pair in the table dictionary
    for key in table:
        key_val_as_tuple = (table[key], key)  # Create a tuple with count and value
        table_display.append(key_val_as_tuple)  # Append the tuple to the table_display list
        
    table_sorted = sorted(table_display, reverse=True)  # Sort the table_display list in descending order
    
    # Iterate over each entry in the sorted table_display list
    for entry in table_sorted:
        print(entry[1], ':', entry[0])  # Print the value and count of each entry



The given code consists of two functions. The first function, freq_table, creates a frequency table from a dataset. This function takes two inputs: the dataset and the index.

→ Initialize an empty dictionary called table to store the counts of each value.
→ Initialize a variable called total to keep track of the total number of rows in the dataset.

→ Iterate over each row in the dataset.
→ Increase the counter total by 1 for each row to keep track of the total number of rows.
→ Check the value at the specific index position.

→ If the value already exists in the table dictionary:
→ Increase the count for that value by 1. Another occurrence of that value has been found in the dataset.

→ If the value does not exist in the table dictionary:
→ Add the value as a new key with an initial count of 1.

→ Calculate the percentages of each value in the table.
→ Create an empty dictionary called table_percentages to store the percentages.
→ For each key (value) in the table dictionary:
→ Calculate the percentage by dividing the count of that key by the total and then multiplying by 100.
→ Store the percentage in the table_percentages dictionary with the corresponding key.

→ Finally, the freq_table function returns the table_percentages dictionary, which contains the frequencies of each value in the dataset as percentages.

# Part Three


The analysis of the free English apps in the App Store dataset reveals interesting insights. It is evident that a significant majority of these apps, over half of them (58.16%), are games. Following games, entertainment apps hold the second-largest share, accounting for approximately 8% of the apps. Photo and video apps are also quite popular, representing nearly 5% of the dataset. However, educational apps have a relatively smaller presence, comprising only 3.66% of the apps, while social networking apps make up about 3.29%.

The overall impression gained from this analysis is that the App Store, particularly its collection of free English apps, is heavily saturated with apps designed for entertainment purposes. Genres like games, entertainment, photo and video, social networking, sports, and music dominate the market. On the other hand, apps serving practical purposes such as education, shopping, utilities, productivity, and lifestyle are less prevalent. It is important to note that although there is a high number of apps in the entertainment categories, it does not necessarily correlate with having the largest user base. User demand may not always align with the abundance of apps available in each genre.



In [None]:
display_table(ios_final, -5)

In [7]:
print(header_android)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [11]:
# Call the display_table function with the ios_final dataset and -5 as the index
display_table(ios_final, -5)

Games : 59.171800136892536
Entertainment : 7.529089664613278
Photo & Video : 5.133470225872689
Education : 3.8329911019849416
Social Networking : 3.1143052703627654
Shopping : 2.4982888432580426
Utilities : 2.2587268993839835
Music : 2.1560574948665296
Sports : 2.0533880903490758
Health & Fitness : 1.9849418206707734
Productivity : 1.7111567419575633
Lifestyle : 1.4715947980835045
News : 1.3347022587268993
Travel : 1.1293634496919918
Finance : 1.0951403148528405
Weather : 0.8898015058179329
Food & Drink : 0.8898015058179329
Reference : 0.5133470225872689
Business : 0.5133470225872689
Book : 0.2737850787132101
Medical : 0.20533880903490762
Navigation : 0.13689253935660506
Catalogs : 0.10266940451745381


Based on the frequency table generated for the "prime_genre" column of the App Store dataset, we can observe the following:

* The most common genre is "Games" with a percentage of 59.17%.
* The next most common genres are "Entertainment" (7.53%) and "Photo & Video" (5.13%).
* Other patterns include the presence of genres like "Education" (3.83%), "Social Networking" (3.11%), and "Shopping" (2.50%).
* The general impression is that the majority of apps in the App Store are designed for entertainment purposes, including games, entertainment, photo and video, and social networking. Practical app genres like education, shopping, and utilities have a smaller presence.
* Based on this frequency table alone, it is recommended to develop an app profile focused on the gaming genre, as it is the most common genre in the App Store. However, having a large number of apps in a particular genre doesn't necessarily imply that apps of that genre have a large number of users. Further analysis is required to understand user engagement and popularity.

In [16]:
display_table(android_final, 1)#category

FAMILY : 17.67176717671767
GAME : 10.591059105910592
TOOLS : 7.640764076407641
BUSINESS : 4.45044504450445
PRODUCTIVITY : 3.95039503950395
SPORTS : 3.6003600360036003
LIFESTYLE : 3.5903590359035906
COMMUNICATION : 3.5903590359035906
MEDICAL : 3.5403540354035403
FINANCE : 3.49034903490349
HEALTH_AND_FITNESS : 3.2503250325032504
PHOTOGRAPHY : 3.1203120312031203
PERSONALIZATION : 3.08030803080308
SOCIAL : 2.9202920292029204
NEWS_AND_MAGAZINES : 2.7702770277027704
SHOPPING : 2.5702570257025705
TRAVEL_AND_LOCAL : 2.4602460246024602
DATING : 2.2702270227022705
BOOKS_AND_REFERENCE : 1.9901990199019903
VIDEO_PLAYERS : 1.7001700170017002
EDUCATION : 1.5101510151015103
ENTERTAINMENT : 1.4701470147014701
MAPS_AND_NAVIGATION : 1.3001300130013
FOOD_AND_DRINK : 1.25012501250125
HOUSE_AND_HOME : 0.88008800880088
LIBRARIES_AND_DEMO : 0.8400840084008401
AUTO_AND_VEHICLES : 0.8200820082008201
WEATHER : 0.7400740074007401
EVENTS : 0.6300630063006301
ART_AND_DESIGN : 0.6100610061006101
COMICS : 0.59005900

Based on the frequency table generated for the "Category" column of the Google Play dataset, we can observe the following:

* The most common category is "Family" with a percentage of 17.67%.
* The next most common categories are "Game" (10.59%) and "Tools" (7.64%).
* Other notable categories include "Business" (4.45%), "Productivity" (3.95%), and "Sports" (3.60%).
* Some patterns include the presence of categories like "Communication," "Medical," "Finance," and "Health & Fitness."
* Comparing the patterns with the App Store market, we see a broader range of categories in the Google Play market, including a significant focus on family and games.
* Based on the frequency table, it is recommended to consider developing apps in popular categories such as family, games, tools, and productivity, as they have a significant presence in the Google Play market.
* The frequency table indicates the most common app genres in terms of their frequency in the dataset, but it doesn't provide direct information about the number of users or popularity. Further analysis is required to determine which genres have the most users.


# Most Popular Apps by Genre on the App Store


To determine the most popular app genres and attract a large user base, we calculate the average number of installations or user ratings for each genre. By analyzing these averages, we can identify the genres with the highest number of users, helping us understand the types of apps that are in high demand. This information is valuable for developing apps that have the potential for greater success in terms of user engagement and generating revenue.

In [17]:
# Generate a frequency table for the 'prime_genre' column
genres_ios = freq_table(ios_final, -5)

# Loop over each unique genre in the dataset
for genre in genres_ios:
    total = 0  # Initialize a variable to keep track of the total number of ratings for the genre
    len_genre = 0  # Initialize a variable to count the number of apps belonging to the genre
    
    # Loop over each app in the dataset
    for app in ios_final:
        genre_app = app[-5]  # Get the genre of the current app
        
        # If the genre of the app matches the current genre being analyzed
        if genre_app == genre:            
            n_ratings = float(app[5])  # Get the number of ratings for the app and convert it to a float
            total += n_ratings  # Add the number of ratings to the total
            len_genre += 1  # Increment the count of apps belonging to the genre
    
    avg_n_ratings = total / len_genre  # Calculate the average number of ratings for the genre
    print(genre, ':', avg_n_ratings)  # Print the genre and the average number of ratings


# Print the names and number of ratings for apps in the 'Navigation' genre
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])


# Print the names and number of ratings for apps in the 'Reference' genre
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])



Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus 

The code is analyzing the frequency table generated for the 'prime_genre' column in the iOS dataset. 
* It loops over each unique genre in the dataset and calculates the average number of ratings for apps belonging to that genre. 
* The variable 'total' keeps track of the total number of ratings, and 'len_genre' counts the number of apps belonging to the genre. 
* It then calculates the average number of ratings by dividing the total number of ratings by the count of apps. The code prints the genre and the average number of ratings.

* After that, it prints the names and number of ratings for apps in the 'Navigation' genre and the 'Reference' genre. 
* It iterates over each app in the iOS dataset and checks if the genre of the app matches the genre being analyzed. 
* If it matches, it prints the app's name and the number of ratings.

In [14]:
# Display a table related to the Android dataset
display_table(android_final, 5)

1,000,000+ : 15.407313731689323
10,000,000+ : 12.319527874380862
100,000+ : 10.770365686584467
10,000+ : 9.379281273053008
1,000+ : 7.587733164717041
5,000,000+ : 7.2926546527558225
100+ : 6.354726525450522
500,000+ : 5.227105069027295
50,000+ : 4.257561386869006
100,000,000+ : 4.099483612604068
5,000+ : 4.0889450943197385
10+ : 3.1510169670144377
500+ : 2.8980925281905363
50,000,000+ : 2.8348614184845613
50+ : 1.7599325534829804
500,000,000+ : 0.7587733164717041
5+ : 0.7271577616187164
1,000,000,000+ : 0.5796185056381074
1+ : 0.4531562862261566
0+ : 0.042154073137316894
0 : 0.010538518284329224


# Most Popular Apps by Genre on Google Play


Here we need to analyze the android dataet, 
we start by generating a frequency table that lists the number of occurrences for each unique category in the 'Category' column. Next, we iterate over each category in the dataset. For each category, we initialize two variables: 'total' to keep track of the cumulative number of installs, and 'len_category' to count the number of apps belonging to that category. Within the loop, we iterate over each app in the dataset and retrieve its category. If the category of the app matches the current category being analyzed, we extract the number of installs and clean the install numbers by removing commas and symbols. We then add the cleaned install number to the 'total' variable and increment the 'len_category' variable by 1. After processing all the apps in the category, we calculate the average number of installs by dividing the 'total' by 'len_category'. This allows us to determine the average popularity of each category based on the number of installs.

In [12]:
# Generate a frequency table for the 'Category' column
categories_android = freq_table(android_final, 1)

# Loop over each unique category in the dataset
for category in categories_android:
    total = 0  # Initialize a variable to keep track of the total number of installs
    len_category = 0  # Initialize a variable to count the number of apps in each category
    
    # Loop over each app in the dataset
    for app in android_final:
        category_app = app[1]  # Get the category of the current app
        
        # If the category of the app matches the current category being analyzed
        if category_app == category:            
            n_installs = app[5]  # Get the number of installs for the app
            
            # Clean the install numbers by removing commas and plus signs
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            
            total += float(n_installs)  # Add the number of installs to the total
            len_category += 1  # Increment the count of apps in the category
    
    avg_n_installs = total / len_category  # Calculate the average number of installs for the category
    print(category, ':', avg_n_installs)  # Print the category and the average number of installs


The code is **analyzing the average number of installs** for different categories in the Android dataset. It does this by **generating a frequency table** for the 'Category' column and then looping through each unique category. For each category, it calculates the total number of installs and the count of apps in that category. It then calculates the <average number of installs for the category and prints the category along with its average number of installs. The code helps us understand which <categories> have the **highest number** of installs on average in the Android app market.