# **Education Apps: Market Trends, Monetization, and Growth Opportunities**


## Introduction

The education app market has experienced rapid growth in recent years, driven by increased mobile device adoption, digital learning trends, and demand for accessible educational content. This analysis leverages a dataset of over 2 million apps to uncover key trends, revenue strategies, and growth opportunities within the education category.

**Objectives of this analysis:**
1. Identify the distribution of education apps by type (free vs paid) and monetization strategy (ads, in-app purchases, freemium models).  
2. Explore trends in app downloads and user engagement to determine which strategies correlate with higher reach.  
3. Provide actionable insights for companies and app developers to optimize revenue, improve user acquisition, and prioritize app development focus areas.

**Dataset Overview:**
- Size: 2,000,000+ apps  
- Features: app category, pricing model, average installs, ratings, revenue indicators, and more  
- Scope: Analysis focuses specifically on apps within the *Education* category

**Key Value for Stakeholders:**  
By analyzing market patterns and monetization strategies, companies can make informed decisions about app development, marketing, and pricing, targeting segments with the highest growth and revenue potential.



---

### Packages & Setup

We’ll use these packages for data cleaning, analysis, and visualization.

In [None]:
# Kindly upload the packages before starting :)
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px

### Data Import

In [None]:
import os

def load_dataset(file_path):
    """Load CSV dataset with error handling."""
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"The file {file_path} was not found. Please check the path.")
    try:
        df = pd.read_csv(file_path)
        print(f"Dataset loaded successfully: {df.shape[0]} rows, {df.shape[1]} columns")
        return df
    except Exception as e:
        raise Exception(f"Error while loading dataset: {e}")


In [None]:
file_path = r"C:\Users\A\Desktop\playstore_app_market_insights\dataset\Google-Playstore.csv"
df = load_dataset(file_path)

Dataset contains a comprehensive set of app features useful for revenue and user behavior analysis.

---

# **Data Cleaning & Transformation**

In [None]:
def clean_dataset(df):
    
    # 1. Handle Missing Values
    df = df.dropna(subset=['App Name'])
    df['Rating'] = df.groupby('Category')['Rating'].transform(lambda x: x.fillna(x.median()))
    df['Released_missing'] = df['Released'].isna().astype(int)
    df['Released'] = df['Released'].fillna(df['Last Updated'])
    df['Developer Id'] = df['Developer Id'].fillna("N/A")
    df['max_inst_miss'] = df['Minimum Installs'].isna().astype(int)
    df['Minimum Installs'] = df['Minimum Installs'].fillna(df['Maximum Installs'])
    df['Currency'] = df['Currency'].fillna("N/A")

    # 2. Drop Useless Columns
    df = df.drop([
        'Developer Website', 'Developer Email', 'Privacy Policy', 'Scraped Time', 
        'App Id', 'Installs', 'Rating Count', 'Minimum Android'
    ], axis=1, errors='ignore')

    # 3. Normalize Size
    df["Size"] = df["Size"].astype(str).str.replace(",", "").str.replace(" ", "")
    def convert_size(value):
        try:
            val = str(value).strip()
            if val.lower() in {"varieswithdevice", "na", "n/a", ""}:
                return np.nan
            if val[-1].lower() == "m":
                return float(val[:-1]) * 1000
            elif val[-1].lower() == "k":
                return float(val[:-1])
            else:
                return float(val)
        except:
            return np.nan
    df["size"] = df["Size"].apply(convert_size)
    df = df.drop(['Size'], axis=1, errors='ignore')

    # 4. Convert Boolean to Int
    df['Free'] = df['Free'].astype(int)
    df['Ad Supported'] = df['Ad Supported'].astype(int)
    df['In App Purchases'] = df['In App Purchases'].astype(int)
    df['Editors Choice'] = df['Editors Choice'].astype(int)

    # 5. Derived Columns
    df['avg_installs'] = ((df['Minimum Installs'] + df['Maximum Installs']) / 2).round(0)
    df['Released'] = pd.to_datetime(df['Released'], errors='coerce')
    df['released_year'] = df['Released'].dt.year

    # 6. Rename Columns (snake_case)
    df = df.rename(columns={
        "App Name": "app_name",
        "Category": "category",
        "Rating": "rating",
        "Free": "app_status",
        "Currency": "currency",
        "Developer Id": "developer_name",
        "Released": "released_date",
        "Last Updated": "last_update",
        "Content Rating": "content_target",
        "Ad Supported": "ads_flag",
        "In App Purchases": "in_app_purchases_flag",
        "Editors Choice": "play_store_recommend"
    })

    # Ensure consistency between Price and app_status
    df.loc[df['Price'] > 0, 'app_status'] = 0  # Paid
    df.loc[df['Price'] == 0, 'app_status'] = 1  # Free

    # 7. Remove Duplicates
    df = df.drop_duplicates(['app_name'], keep='first')

    print(f"Cleaning complete: {df.shape[0]} rows, {df.shape[1]} columns remain.")
    return df


In [None]:
df = clean_dataset(df)

## **Data Validation**

In [None]:
def validate_dataset(df):
    errors = []

    # Ratings between 0–5
    if not df['rating'].between(0, 5).all():
        errors.append("Invalid ratings detected outside range 0–5.")

    # Released Year check
    if df['released_year'].isna().any():
        errors.append("Missing release years detected.")
    invalid_years = df[~df['released_year'].between(2008, 2025)]
    if not invalid_years.empty:
        errors.append(f"{len(invalid_years)} apps have invalid release years.")

    # Boolean flags check
    for col in ['app_status', 'ads_flag', 'in_app_purchases_flag', 'play_store_recommend']:
        if not df[col].isin([0, 1]).all():
            errors.append(f"Invalid values in {col} (should be 0 or 1).")

    if errors:
        print("Validation Issues Found:")
        for e in errors:
            print("-", e)
    else:
        print("Dataset validation passed.")

In [None]:
validate_dataset(df)

In [None]:
All app categories now conform to standard taxonomy; no negative installs found

# **Exploratory Data Analysis (EDA)**

## Google Play Apps Overview

In this section, we perform an initial exploration of the Google Play dataset to understand the overall app market, including:
* Total number of apps and category distribution.
* Pricing and monetization strategies (Free vs Paid, Ads/IAP)
* Ratings and user engagement.
* Installs and popularity metrics.

These insights will help identify market trends and guide strategic recommendations before focusing on a deep dive into Education apps.


**General summary and counts**

In [None]:
# 1. General summary
def analyze_total_apps(df):
    total_apps = len(df)
    print(f"The Total number of apps: {total_apps}")
    return total_apps


# 2. Category analysis
def analyze_category_distribution(df):
    category_counts = df['category'].value_counts().head(10).reset_index()
    category_counts.columns = ['Category', 'Count']
    fig = px.bar(
        category_counts, 
        x='Category', 
        y='Count',
        title=f"Top 10 Categories by Number of Apps",
        text='Count'
    )
    fig.update_traces(textposition='outside')
    fig.show()

In [None]:
analyze_total_apps(df)

In [None]:
# Category distribution
analyze_category_distribution(df)

The dataset contains 2,177,943 apps, showing a very large and diverse market on Google Play.  

The most populated categories are Education, Music & Audio, Business, Tools, Entertainment, and Lifestyle, indicating that these segments dominate the app ecosystem.  

Education being the top category suggests strong user demand for learning and skill-building apps, while Music & Audio and Business show that entertainment and productivity remain major focuses for users.

**Pricing and Monetization**

In [None]:
# 3. Free vs Paid distribution

def plot_free_apps_financing(df):
    
    # Classify free apps
    free_apps = df[df['Price'] == 0].copy()
    free_apps['financing_type'] = 'Nothing'
    
    free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
    free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
    free_apps.loc[(free_apps['ads_flag'] == 0) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
    
    # Count and percentage
    counts = free_apps['financing_type'].value_counts()
    percent = (counts / counts.sum() * 100).round(2)
    
    # Plot pie chart
    fig = px.pie(
        names=counts.index,
        values=counts.values,
        title='Distribution of Free Apps by Financing Type',
        hole=0.3  # donut chart style
    )
    fig.show()


# Financing Trends
def plot_all_apps_trend(df):
    """
    Plots the trend of financing methods among all apps (Free and Paid) over the years.
    For paid apps, the financing type is considered 'Paid' since they are directly monetized.
    """
    
    apps = df.copy()

    # Ensure 'released_year' is numeric
    apps['released_year'] = pd.to_numeric(apps['released_year'], errors='coerce')

    # Classify financing type
    apps['financing_type'] = 'Nothing'
    
    # For Free apps
    free_mask = apps['Price'] == 0
    apps.loc[free_mask & (apps['ads_flag'] == 1) & (apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
    apps.loc[free_mask & (apps['ads_flag'] == 1) & (apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
    apps.loc[free_mask & (apps['ads_flag'] == 0) & (apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
    
    # For Paid apps
    apps.loc[apps['Price'] > 0, 'financing_type'] = 'Paid'

    # Group by year and financing type
    yearly_counts = apps.groupby(['released_year', 'financing_type']).size().reset_index(name='count')

    # Calculate percentage per year
    yearly_counts['percentage'] = yearly_counts.groupby('released_year')['count'].transform(lambda x: x / x.sum() * 100)

    # Plot trend
    fig = px.line(yearly_counts, x='released_year', y='percentage', color='financing_type',
                  markers=True,
                  title='Trend of Financing Methods Among All Apps Over Years',
                  labels={'percentage': 'Percentage of Apps', 'released_year': 'Year'})
    fig.show()


In [None]:
# Free vs Paid
plot_free_apps_financing(df)

Almost half of free apps (48.4%) fall under "Other," meaning they do not use Ads or In-App Purchases (IAP) as a monetization strategy. This may indicate apps that are entirely free with no direct revenue model, possibly relying on external funding or promotional purposes.  
    42.97% of free apps monetize using Ads only, which is the most common revenue strategy among monetized free apps.

Only 6.27% of apps use both Ads + IAP, suggesting that dual monetization is relatively rare but potentially more effective for revenue maximization.

2.33% of apps rely solely on IAP, indicating this is the least common approach for free apps.  

In [None]:
plot_all_apps_trend(df)

**Ratings & user engagement**

In [None]:
# 4. Rating distribution
def unrated_app(df):
    counts = (df['rating'] == 0).sum()
    print("The number of unrated apps:", counts)

    return counts

def analyze_rating_app(df):
    rating_counts = df.loc[df['rating']!= 0, 'rating'].value_counts().sort_index()
    print("The frequency of the ratings by app" , rating_counts)
    return rating_counts
    
def analyze_rating_distribution(df):
    df_filtered = df[df['rating'] > 0 ]
    fig = px.histogram(df_filtered, x="rating", nbins=20, title="Ratings Distribution")
    fig.show()

In [None]:
analyze_rating_distribution(df)

**Installs & popularity metrics**

In [None]:
def get_category_installs(df):
    category_installs = df.groupby('category')['avg_installs'].mean().sort_values(ascending=False).head()
    category_installs = category_installs.round(0).reset_index()
    top3 = df.sort_values(by='avg_installs', ascending=False).head(3)
    return category_installs, top3

def plot_category_installs(category_installs, top3):
    fig1 = px.bar(category_installs, x='category', y='avg_installs',
            title='Top Categories by Average Installs',
            labels={'avg_installs': 'Average Installs', 'category': 'Category'},
            text='avg_installs')
    fig1.show()
    # Top 3 apps
    top3 = df.sort_values(by='avg_installs', ascending=False).head(3)
    fig2 = px.bar(top3, x='app_name', y='avg_installs', color='category',
                  title='Top 3 Apps by Installs',
                  text='avg_installs')
    fig2.show()



In [None]:
category_installs,top3 = get_category_installs(df)
plot_category_installs(category_installs, top3)

___

# Deep Dive: Education Apps

This section focuses on Education apps, with the goal of providing insights and recommendations directly relevant for XpertBot's Education app strategy.

Objectives:
- Understand the Education app market, user engagement, and monetization trends.
- Identify high-performing apps and successful strategies.
- Provide actionable recommendations to improve XpertBot's app downloads, ratings, and revenue.


**General summary & key metrics**

In [None]:
def education_count(df):
    edu_app =  df[df['category'].isin(['Educational','Education']) ]
    edu_count = edu_app.shape[0]
    avg_installs = edu_app['avg_installs'].mean().round(0)
    subset_nonzero = edu_app[edu_app['rating'] != 0]
    avg_rating_edu = subset_nonzero['rating'].mean().round(2)  # no need to groupby, only Education category
    
    print("The number of education apps is:", edu_count)
    print("\nThe average installs of education apps is:", avg_installs)
    print("\nThe average rating of the education apps is:", avg_rating_edu)

In [None]:
education_count(df)


*Education apps are numerous (~228k) with solid user engagement (avg. 33.8k installs) and a strong satisfaction level (avg. rating 4.19), indicating both high demand and generally positive user experience.*


**Pricing & Monetization**

In [None]:
def education_free_paid_stats(df):
    # Filter Education apps
    edu_app =  df[df['category'].isin(['Educational','Education']) ]
    
    # Count of Free and Paid apps
    free_paid_count = edu_app['app_status'].map({1: 'Free', 0: 'Paid'}).value_counts()
    
    # Percentage of Free and Paid apps
    free_paid_percentage = edu_app['app_status'].map({1: 'Free', 0: 'Paid'}).value_counts(normalize=True) * 100
    
    # Return values if you want to reuse them
    return free_paid_count, free_paid_percentage

def plot_education_free_paid(free_paid_count, free_paid_percentage):
    # Prepare data
    df_plot = free_paid_count.reset_index()
    df_plot.columns = ['Status', 'Count']
    df_plot['Percentage'] = free_paid_percentage.values.round(2)
    
    # Bar chart
    fig = px.bar(df_plot, x='Status', y='Count', text='Percentage',
                 title='Free vs Paid Education Apps',
                 labels={'Count':'Number of Apps', 'Status':'App Status'})
    
    fig.update_traces(texttemplate='%{text}%', textposition='outside')
    fig.show()

def avg_paid_education_price(df):
    # Filter Education apps
    education_app =  df[df['category'].isin(['Educational','Education']) ]
    
    # Filter only paid apps
    paid_education_apps = education_app[education_app['Price'] > 0]
    
    # Compute average price
    avg_price = paid_education_apps['Price'].mean().round(3)
    
    # Print result
    print(f'The average price of paid Education apps is ${avg_price}')
    
    # Return value for reuse
    return avg_price

In [None]:
# Compute stats first
free_paid_count, free_paid_percentage = education_free_paid_stats(df)
plot_education_free_paid(free_paid_count, free_paid_percentage)

In [None]:
avg_price = avg_paid_education_price(df)

*Paid Education apps are moderately priced on average (~$5.72), suggesting a low-cost barrier that aligns with accessibility and mass adoption strategies.*


**Installs & Revenue Metrics**

In [None]:
# Define function
def revenue_summary(df):
    """Calculate total revenue for education apps vs all apps."""
    # Paid apps only
    paid_apps = df[(df['Price'] > 0) & (df['avg_installs'] > 0)].copy()
    paid_apps['revenue'] = paid_apps['Price'] * paid_apps['avg_installs']
    
    # Education revenue
    edu_revenue = paid_apps[paid_apps['category'].isin(['Educational','Education'])]['revenue'].sum()
    total_revenue = paid_apps['revenue'].sum()
    edu_share = (edu_revenue / total_revenue) * 100
    
    print("Total Estimated Revenue Across All Paid Apps: ${:,.0f}".format(total_revenue))
    print("Education Apps Revenue: ${:,.0f}".format(edu_revenue))
    print("Education Share of Total Paid Revenue: {:.2f}%".format(edu_share))

    # Create dataframe of paid education apps
    edu_paid = df[(df['category'] == 'Education') & (df['Price'] > 0) & (df['avg_installs'] > 0)].copy()
    edu_paid['revenue'] = edu_paid['Price'] * edu_paid['avg_installs']
    
    return edu_revenue, total_revenue, edu_share, edu_paid



In [None]:

# Run function first to get edu_paid
edu_revenue, total_revenue, edu_share, edu_paid = revenue_summary(df)

# Format revenue with commas and round
top5_revenue_apps = edu_paid.sort_values(by="revenue", ascending=False).head(5).copy()
top5_revenue_apps['revenue'] = top5_revenue_apps['revenue'].apply(lambda x: f"${x:,.0f}")
top5_revenue_apps['avg_installs'] = top5_revenue_apps['avg_installs'].apply(lambda x: f"{x:,.0f}")

# Show as table
from IPython.display import display
display(top5_revenue_apps[['app_name', 'Price', 'avg_installs', 'revenue']])


**Developers Analysis**

In [None]:

def top_education_developers(df, top_n=10):
    # Filter Education apps
    education_app = df[df['category'] == 'Education']
    
    # Count apps per developer
    dev_by_app = education_app['developer_name'].value_counts().head(top_n)
    
    return dev_by_app

def plot_top_education_developers(dev_by_app):
    df_plot = dev_by_app.reset_index()
    df_plot.columns = ['Developer', 'Number of Apps']
    
    fig = px.bar(df_plot, x='Number of Apps', y='Developer', orientation='h',
                 title='Top 10 Education App Developers',
                 text='Number of Apps')
    
    fig.update_layout(yaxis={'categoryorder':'total ascending'})  # largest on top
    fig.show()


In [None]:
# Compute top developers
dev_by_app = top_education_developers(df)

# Plot top developers
plot_top_education_developers(dev_by_app)

**Financing models analysis**

In [None]:
def iap_stats(df):
    # Overall apps with/without IAP
    app_with_iap = df['in_app_purchases_flag'].value_counts()
    p_app_with_iap = df['in_app_purchases_flag'].value_counts(normalize=True) * 100
    
    # Education apps
    education_app =  df[df['category'].isin(['Educational','Education']) ]
    edu_with_iap = education_app['in_app_purchases_flag'].value_counts()
    edu_with_iap_percentage = education_app['in_app_purchases_flag'].value_counts(normalize=True) * 100
    
    # Education apps: IAP vs recommendation
    edu_iap_recommend = education_app.groupby('play_store_recommend')['in_app_purchases_flag'].sum()
    
    # Return values for reuse (optional)
    return app_with_iap, p_app_with_iap, edu_with_iap, edu_with_iap_percentage, edu_iap_recommend

def plot_iap_stats(edu_with_iap, edu_iap_recommend):
    # Education apps: Free vs Paid IAP
    df_plot1 = edu_with_iap.reset_index()
    df_plot1.columns = ['IAP', 'Count']
    df_plot1['IAP'] = df_plot1['IAP'].map({0:'No IAP', 1:'Has IAP'})
    
    fig1 = px.pie(df_plot1, names='IAP', values='Count',
                  title='Education Apps: With vs Without IAP')
    fig1.show()

def ads_distribution_education(df):
    
    edu_apps =  df[df['category'].isin(['Educational','Education']) ]
    
    # Count ads vs no ads
    ads_counts = edu_apps['ads_flag'].map({1: "With Ads", 0: "No Ads"}).value_counts().reset_index()
    ads_counts.columns = ["Ads Status", "Count"]
    
    # Percentage
    ads_counts["Percentage"] = (ads_counts["Count"] / ads_counts["Count"].sum()) * 100
    
    # Plot interactive pie chart
    fig = px.pie(
        ads_counts,
        names="Ads Status",
        values="Count",
        title="Ads Distribution in Education Apps",
        hole=0.3
    )
    fig.show()


def plot_free_apps_financing(df):
    #filter education apps
    education_app = df[df['category'].isin(['Education','Educational'])]
    
    # Classify free apps
    free_apps = education_app[education_app['Price'] == 0].copy()
    free_apps['financing_type'] = 'Nothing'
    
    free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
    free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
    free_apps.loc[(free_apps['ads_flag'] == 0) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
    
    # Count and percentage
    counts = free_apps['financing_type'].value_counts()
    percent = (counts / counts.sum() * 100).round(2)

    paid_apps = education_app[education_app['Price'] == 1].copy()
    paid_apps['financing_type'] = 'Paid'

    paid_apps.loc[(paid_apps['ads_flag'] == 1) & (paid_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
    paid_apps.loc[(paid_apps['ads_flag'] == 1) & (paid_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
    paid_apps.loc[(paid_apps['ads_flag'] == 0) & (paid_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
    
    # Count and percentage
    paid_counts = paid_apps['financing_type'].value_counts()
    paid_percent = (paid_counts / paid_counts.sum() * 100).round(2)

    print(f"The distribution of the paid apps")
    print(paid_percent)
    
    # Plot pie chart
    fig = px.pie(
        names=counts.index,
        values=counts.values,
        title='Distribution of Free Education Apps by Financing Type',
        hole=0.3  # donut chart style
    )
    fig.show()

In [None]:
# Compute stats
app_with_iap, p_app_with_iap, edu_with_iap, edu_with_iap_percentage, edu_iap_recommend = iap_stats(df)

# Plot
plot_iap_stats(edu_with_iap, edu_iap_recommend)

Only 7.6% of Education apps use in-app purchases (IAP). The vast majority (92.4%) rely on other revenue models or none at all.

In [None]:
ads_distribution_education(df)

Ads are a common but not dominant monetization strategy. Too many ads could hurt user experience.

In [None]:
plot_free_apps_financing(df)

The free segment is split: a majority simply offer free access (possibly funded externally), while others lean towards ads. Very few free apps use IAP, which suggests that monetizing educational content directly is less common.

Users who pay for Education apps expect an ad-free premium experience. Mixing ads into paid apps is rare and potentially risky.

### **Recommendations**

The dominant and most accepted model in Education apps is Free (with optional ads or IAP).

Launching as a free app will align with user expectations and maximize reach.

Ads can be used moderately, but IAP (premium features, certificates, or advanced content) could be Xpertbot’s main monetization path, since apps combining Free + Ads + IAP tend to capture both installs and revenue streams (as we’ll confirm the final analysis).


---

# **Final Comparative Analysis**

To better understand how Education apps position themselves in the wider Play Store ecosystem, we compare different financing strategies, adoption levels, and user ratings. This section evaluates the performance of free vs. paid models, the effectiveness of ads and in-app purchases, and the strategies used by top-performing apps. The goal is to highlight which approaches drive the highest installs and ratings, and to draw practical lessons for Xpertbot’s own education app.

**Education apps & financing strategies**

In [None]:
def top10_educational_apps(df):
    """Show top 10 education apps by installs and their financing strategy"""
    
    # Filter education apps
    edu_apps = df[df['category'].isin(['Educational','Education']) ]
    
    # Sort by installs
    top10_edu = edu_apps.sort_values(by="avg_installs", ascending=False).head(10)
    
    # Define financing strategy
    def financing(row):
        if row['app_status'] == 0:  # Paid app
            return "Paid App"
        elif row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
            return "Ads + IAP"
        elif row['ads_flag'] == 1:
            return "Ads Only"
        elif row['in_app_purchases_flag'] == 1:
            return "IAP Only"
        else:
            return "Free (No Revenue)"
    
    # Apply strategy
    top10_edu["financing_strategy"] = top10_edu.apply(financing, axis=1)
    
    # Select relevant columns
    top10_edu = top10_edu[["app_name", "avg_installs", "financing_strategy", "category","rating"]]
    
    return top10_edu

def top_paid_edu_apps(df, n=10):
    # Filter only education + paid apps
    paid_edu = df[(df['category'].isin(['Education','Educational'])) & (df['app_status'] == 0)].copy()  # 0 = Paid
    
    # Define financing strategy
    def financing(row):
        if row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
            return "Paid + Ads + IAP"
        elif row['ads_flag'] == 1:
            return "Paid + Ads"
        elif row['in_app_purchases_flag'] == 1:
            return "Paid + IAP "
        else:
            return "Paid only"
    
    paid_edu['financing_strategy'] = paid_edu.apply(financing, axis=1)
    
    # Sort by installs and select top N
    top_paid_edu = paid_edu.sort_values(by="avg_installs", ascending=False).head(n)
    
    return top_paid_edu[['app_name', 'avg_installs', 'Price', 'financing_strategy']]


In [None]:
top10_educational_apps(df)

Top-performing apps use a hybrid monetization strategy, not just ads or just IAP.

In [None]:
top_paid_edu_apps(df)

Paid apps can still succeed, but the market ceiling is much lower than free apps. Paid is more niche (parents buying games for kids, test prep apps, etc.).

In [None]:
def free_vs_paid_performance(df):
    """Compare installs, ratings, and financing strategies between free and paid education apps."""
    edu_apps = df[df['category'] == 'Education'].copy()
    
    # Exclude invalid ratings
    edu_apps = edu_apps[edu_apps['rating'] > 0]
    
    # Split datasets
    free_apps = edu_apps[edu_apps['app_status'] == 1]   # Free
    paid_apps = edu_apps[edu_apps['app_status'] == 0]   # Paid
    
    # --- Summary stats ---
    summary = pd.DataFrame({
        "Avg Installs": [free_apps['avg_installs'].mean(), paid_apps['avg_installs'].mean()],
        "Median Installs": [free_apps['avg_installs'].median(), paid_apps['avg_installs'].median()],
        "Avg Rating": [free_apps['rating'].mean(), paid_apps['rating'].mean()],
        "Median Rating": [free_apps['rating'].median(), paid_apps['rating'].median()],
        "App Count": [len(free_apps), len(paid_apps)]
    }, index=["Free", "Paid"])
    
    
    # --- Financing strategies ---
    def financing_breakdown(subset):
        no_financing = ((subset['ads_flag'] == 0) & (subset['in_app_purchases_flag'] == 0)).sum()
        ads_only = ((subset['ads_flag'] == 1) & (subset['in_app_purchases_flag'] == 0)).sum()
        iap_only = ((subset['ads_flag'] == 0) & (subset['in_app_purchases_flag'] == 1)).sum()
        both = ((subset['ads_flag'] == 1) & (subset['in_app_purchases_flag'] == 1)).sum()
        return pd.Series({
            "No Financing": no_financing,
            "Ads only": ads_only,
            "In-App Purchases only": iap_only,
            "Ads & IAP": both
        })
    
    financing = pd.DataFrame({
        "Free": financing_breakdown(free_apps),
        "Paid": financing_breakdown(paid_apps)
    }).T
    
    # --- Visualization: installs ---
    fig1 = px.box(
        edu_apps,
        x="app_status",
        y="avg_installs",
        title="Distribution of Installs: Free vs Paid Education Apps",
        labels={"app_status":"App Type (1=Free, 0=Paid)", "avg_installs":"Average Installs"},
        log_y=True
    )
    fig1.show()

In [None]:
free_vs_paid_performance(df)

Ratings are about the same (both ~4.2)

Users overwhelmingly prefer free apps. Paid apps are not better rated, so they don’t have a quality edge — they just limit adoption.

**Financing strategy effectiveness**

In [None]:
def financing_strategy_effectiveness(df):
    edu_apps = df[df['category'] == 'Education'].copy()
    edu_apps = edu_apps[edu_apps['rating'] > 0]  # drop invalid ratings
    
    # Define monetization type
    def get_strategy(row):
        if row['app_status'] == 0:  # Paid
            return "Paid"
        elif row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
            return "Free + Ads + IAP"
        elif row['ads_flag'] == 1:
            return "Free + Ads"
        elif row['in_app_purchases_flag'] == 1:
            return "Free + IAP"
        else:
            return "Free Only"
    
    edu_apps['monetization'] = edu_apps.apply(get_strategy, axis=1)
    
    # Summary stats
    summary = edu_apps.groupby('monetization').agg(
        Avg_Installs=('avg_installs', 'mean'),
        Median_Installs=('avg_installs', 'median'),
        Avg_Rating=('rating', 'mean'),
        Median_Rating=('rating', 'median'),
        App_Count=('app_name', 'count')
    ).sort_values(by='Avg_Installs', ascending=False)
    
    # Visualization: Installs
    fig1 = px.bar(
        summary.reset_index(),
        x="monetization", y="Avg_Installs",
        color="monetization",
        title="Average Installs by Financing Strategy (Education Apps)",
        log_y=True,
        labels={"Avg_Installs": "Average Installs (log scale)"}
    )
    fig1.show()
    
    # Visualization: Ratings
    fig2 = px.bar(
        summary.reset_index(),
        x="monetization", y="Avg_Rating",
        color="monetization",
        title="Average Ratings by Financing Strategy (Education Apps)",
        labels={"Avg_Rating": "Average Rating (0–5)"}
    )
    fig2.show()

    return summary


In [None]:
# Run
financing_summary = financing_strategy_effectiveness(df)

The most effective strategies are Free + IAP and Free + Ads + IAP. Ads-only apps underperform, while pure Free or Paid models miss out on monetization potential.

---

### **Recommendation for Xpertbot**

**Go Free at Launch:** To gain traction, Xpertbot should launch its app for free.

**Adopt a Freemium Model (Free + IAP, optionally Ads):**

Offer core features for free, but lock **advanced features, certifications, or premium content** behind in-app purchases.

Ads can be included in the free version but must be limited to avoid hurting ratings.

**Avoid Paid-only strategy:** It drastically reduces adoption with no rating advantage.

**Position Against Competitors:** Apps like Duolingo and Photomath prove that Free + Ads + IAP is scalable, sustainable, and well-accepted by users.

---



***Best Strategy for Xpertbot:***

Adopt a Free + IAP (with optional ads) monetization model. Focus on strong user experience to secure high ratings, while gradually monetizing through advanced features or premium tiers.

----

### Limitations &  Next Steps

This analysis offers strong insights into education apps on the Play Store but has some limitations. The dataset is a snapshot in time and may not reflect the newest apps or removals. Some fields, such as installs, were reported in ranges and averaged, which can distort results, especially for very large apps. Missing values required imputation, and revenue was inferred from financing strategies rather than actual earnings, so results should be seen as indicative rather than exact.

For next steps, the analysis could be deepened by segmenting education apps into subcategories (e.g., language learning, test prep, kids’ games) and tracking trends over time. Benchmarking top competitors would reveal best practices, while analyzing user reviews could highlight needs and pain points. Building an interactive dashboard would give Xpertbot decision-makers a dynamic view of the market, and once the app is live, A/B testing different monetization models would confirm which strategies work best in practice.



---

In summary, the education app market is both promising and competitive. Free apps dominate adoption, while hybrid models (Free + IAP + Ads) drive the strongest performance. For Xpertbot, success will depend on offering a high-quality free app with thoughtful monetization through in-app purchases and, where appropriate, ads. Looking forward, continuous monitoring and A/B testing will help refine this strategy, ensuring sustainable growth and user satisfaction in the evolving education market.