In [79]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from IPython.display import display

In [142]:
apps = pd.read_csv("/content/cleaned_googleplaystore.csv")

In [143]:
apps

Unnamed: 0,App,Category,Rating,Reviews,Size in MBs,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Installs_category
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.10000,159,19.0,10000,Free,0.0,Everyone,Art & Design,2018-01-07,1.0.0,4.0.3 and up,Moderate
1,Coloring book moana,ART_AND_DESIGN,3.90000,967,14.0,500000,Free,0.0,Everyone,Art & Design;Pretend Play,2018-01-15,2.0.0,4.0.3 and up,High
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.70000,87510,8.7,5000000,Free,0.0,Everyone,Art & Design,2018-08-01,1.2.4,4.0.3 and up,Very High
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.50000,215644,25.0,50000000,Free,0.0,Teen,Art & Design,2018-06-08,Varies with device,4.2 and up,Top Notch
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.30000,967,2.8,100000,Free,0.0,Everyone,Art & Design;Creativity,2018-06-20,1.1,4.4 and up,More than moderate
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10341,Sya9a Maroc - FR,FAMILY,4.50000,38,53.0,5000,Free,0.0,Everyone,Education,2017-07-25,1.48,4.1 and up,Moderate
10342,Fr. Mike Schmitz Audio Teachings,FAMILY,5.00000,4,3.6,100,Free,0.0,Everyone,Education,2018-07-06,1.0,4.1 and up,Low
10343,Parkinson Exercices FR,MEDICAL,4.17004,3,9.5,1000,Free,0.0,Everyone,Medical,2017-01-20,1.0,2.2 and up,Low
10344,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.50000,114,,1000,Free,0.0,Mature 17+,Books & Reference,2015-01-19,Varies with device,Varies with device,Low


In [144]:
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10346 entries, 0 to 10345
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   App                10346 non-null  object 
 1   Category           10346 non-null  object 
 2   Rating             10346 non-null  float64
 3   Reviews            10346 non-null  int64  
 4   Size in MBs        8821 non-null   float64
 5   Installs           10346 non-null  int64  
 6   Type               10346 non-null  object 
 7   Price              10346 non-null  float64
 8   Content Rating     10346 non-null  object 
 9   Genres             10346 non-null  object 
 10  Last Updated       10346 non-null  object 
 11  Current Ver        10346 non-null  object 
 12  Android Ver        10346 non-null  object 
 13  Installs_category  10346 non-null  object 
dtypes: float64(3), int64(2), object(9)
memory usage: 1.1+ MB


In [105]:
# Number of Apps per Category
num_categories = apps['Category'].nunique()
print(f"Number of categories = {num_categories}")

cat_counts = apps['Category'].value_counts().reset_index()
cat_counts.columns = ['Category', 'Count']


fig = px.bar(cat_counts,
             x='Category',
             y='Count',
             title='Number of Apps per Category')

fig.update_layout(
    xaxis_title='Category',
    yaxis_title='Number of Apps',
    xaxis_tickangle=-45,
    bargap=0.1,
    template="simple_white",
    width=1000,
    height=600
)

fig.show()

Number of categories = 33


The Google Play Store is dominated by Family and Game apps, which together make up a significant portion of all available applications. Beyond entertainment, practical categories like Tools, Business, and Medical also show strong presence, reflecting demand for utility-driven apps.

In contrast, categories like Comics, Beauty, and Parenting have a much smaller footprint. This distribution highlights the market saturation in some areas and suggests opportunities for growth in underrepresented categories.

In [100]:
# Distribution of App Ratings
hist_data = apps["Rating"].dropna()
mean_rating = hist_data.mean()
counts, bins = np.histogram(hist_data, bins=60)
max_y = counts.max()

fig = px.histogram(hist_data, x="Rating", nbins=60, title="Distribution of App Ratings")

fig.add_shape(
    type="line",
    x0=mean_rating, x1=mean_rating,
    y0=0, y1=max_y * 1.5,
    line=dict(color="red", width=3, dash="dash"),
    xref="x", yref="y"
)

fig.add_annotation(
    x=mean_rating,
    y=max_y * 1.53,
    text=f"Mean: {mean_rating:.2f}",
    showarrow=False,
    font=dict(color="red", size=12),
    xanchor="left"
)

fig.update_layout(
    bargap=0.05,
    xaxis_title='Rating',
    yaxis_title='Number of Apps',
    width=800,
    height=800,
    template="simple_white"
)

fig.show()

The distribution of app ratings shows a strong left skew, with the majority of apps clustered between 4.0 and 4.5. The average rating is 4.2, indicated by the red dashed line. This suggests that most apps on the Play Store are generally well-received by users, with only a small portion falling into lower rating ranges. The left-skewed nature of the histogram reinforces the idea that high ratings are common, while poorly rated apps are rare exceptions.

In [116]:
# App Size vs. Rating with Marginal Histograms
size_rating = apps.dropna(subset=['Size in MBs', 'Rating'])

fig = px.scatter(
    size_rating,
    x='Size in MBs',
    y='Rating',
    hover_data=['App', 'Category'],
    marginal_x='histogram',
    marginal_y='histogram',
    template="simple_white",
    title='App Size vs. Rating with Histograms'
)

fig.update_layout(height=650, width=650)
fig.show()

# Price vs. Rating for Paid Apps with Marginal Histograms
paid_apps = apps.dropna(subset=['Price', 'Rating'])
paid_apps = paid_apps[paid_apps['Type'] == 'Paid']

fig = px.scatter(
    paid_apps,
    x='Price',
    y='Rating',
    hover_data=['App', 'Category'],
    marginal_x='histogram',
    marginal_y='histogram',
    template="simple_white",
    title='Price vs Rating for Paid Apps with Histograms'
)

fig.update_layout(height=650, width=650)
fig.show()


*1st plot.*

**Observations:**
Most apps are small in size, with a significant concentration under 50 MB.
High ratings (4.0 - 5.0) dominate across all app sizes.
Lower-rated apps (below 3.0) are more scattered but occur across all size ranges.

**Conclusions:**
App size does not strongly impact rating—high-rated apps exist across all sizes.
Smaller apps are more common, but their rating distribution is similar to larger apps.
There are some low-rated apps in all size ranges, indicating that size alone does not determine user satisfaction.

*2nd plot.*

**Observations:**
Most paid apps have a low price (close to 0-10 dollars).
Few apps have higher prices (>$50), and they show a wide rating distribution.
High ratings are concentrated in lower-priced apps, with some high-priced apps still maintaining good ratings.
There are a few expensive apps with lower ratings.

**Conclusions:**
Most apps are low-cost or moderately priced.
High-priced apps (> $100) exist but are rare and show mixed ratings.
There is no strong correlation between price and rating—both low and high-priced apps can have high or low ratings.
Users tend to rate cheaper apps more often, likely due to higher accessibility.
Some very expensive apps get poor ratings, which may indicate high user expectations not being met.

So, most apps are small and cheap, but users still rate them highly.


In [120]:
# Distribution of app prices across popular categories
popular_cats = ['GAME', 'FAMILY', 'PHOTOGRAPHY', 'MEDICAL', 'TOOLS', 'FINANCE', 'LIFESTYLE','BUSINESS']
filtered_apps = apps[apps['Category'].isin(popular_cats)]

fig = px.strip(filtered_apps, x='Price', y='Category',
               hover_data=['App'],
               title='App Pricing Trend Across Categories')
fig.show()

In [121]:
# Filter for apps over $200
expensive_apps = filtered_apps[filtered_apps['Price'] > 200]

# Select and display relevant columns
expensive_apps_table = expensive_apps[['Category', 'App', 'Price']].sort_values(by='Price', ascending=False)
display(expensive_apps_table)

Unnamed: 0,Category,App,Price
4367,LIFESTYLE,I'm Rich - Trump Edition,400.0
4197,FAMILY,most expensive app (H),399.99
4362,LIFESTYLE,💎 I'm rich,399.99
5351,LIFESTYLE,I am rich,399.99
5354,FAMILY,I am Rich Plus,399.99
5356,FINANCE,I Am Rich Premium,399.99
5358,FINANCE,I am Rich!,399.99
5362,FAMILY,I Am Rich Pro,399.99
5359,FINANCE,I am rich(premium),399.99
9934,LIFESTYLE,I'm Rich/Eu sou Rico/أنا غني/我很有錢,399.99


*1st plot:*

Most apps are clustered around lower price ranges (0-10 dollars) across all categories.
A few high-priced outliers (200-400 dollars) exist, particularly in Finance, Lifestyle, and Family.
There are almost no high-priced apps in Game, Tools, or Photography, indicating that these categories tend to have more affordable apps.


*2nd plot:*

The table lists very expensive apps, with prices ranging from 299.99 to 400 dollars.
Most of these apps belong to LIFESTYLE, FINANCE, and FAMILY categories.
Many of these apps have names like "I'm Rich", "I Am Rich Pro", suggesting that they are novelty or status-symbol apps rather than functional applications.

In [49]:
# Number of downloads of Paid vs. Free apps
fig = px.box(apps,
             x='Type',
             y='Installs',
             color='Type',
             log_y=True,
             title="Number of downloads of Paid vs. Free apps")

fig.show(renderer="colab")

**Free Apps Have More Downloads:**
The median number of downloads for free apps is around 100k, significantly higher than paid apps (~1,000 downloads).
The upper quartile (Q3) for free apps is around 5M downloads, whereas for paid apps, it's only around 10k.

**Free Apps Have Extreme Outliers:**
Some free apps have reached over 1 billion downloads.
The upper fence (10M) still contains outliers beyond that, meaning free apps are often downloaded at a much higher scale.

**Paid Apps Have Lower and More Consistent Downloads:**
Most paid apps have install counts below 10,000.
A few outliers reach 10M installs, but these are rare.
The majority of paid apps fall within 100 - 10k installs, suggesting that people are far less likely to download paid apps compared to free ones.

In [117]:
# Plot top 5 Free apps with highest number of reviews
top_free = apps[apps['Type'] == 'Free'].sort_values(by='Reviews', ascending=False).head(5)

fig = px.bar(top_free, x='App', y='Reviews',
             title='Top 5 Free Apps by Reviews',
             hover_data=['Installs', 'Category'])
fig.show()

In [51]:
apps[apps['Type'] == 'Free'].sort_values(by='Reviews', ascending=False).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size in MBs,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Installs_category
2544,Facebook,SOCIAL,4.1,78158306,,1000000000,Free,0.0,Teen,Social,2018-08-03,Varies with device,Varies with device,Top Notch
3943,Facebook,SOCIAL,4.1,78128208,,1000000000,Free,0.0,Teen,Social,2018-08-03,Varies with device,Varies with device,Top Notch
336,WhatsApp Messenger,COMMUNICATION,4.4,69119316,,1000000000,Free,0.0,Everyone,Communication,2018-08-03,Varies with device,Varies with device,Top Notch
3904,WhatsApp Messenger,COMMUNICATION,4.4,69109672,,1000000000,Free,0.0,Everyone,Communication,2018-08-03,Varies with device,Varies with device,Top Notch
2604,Instagram,SOCIAL,4.5,66577446,,1000000000,Free,0.0,Teen,Social,2018-07-31,Varies with device,Varies with device,Top Notch


**Observations:**
The chart shows that Facebook has the highest number of reviews (~78 million), followed closely by WhatsApp Messenger and Instagram, each exceeding 65 million reviews. Notably, all three apps also have over 1 billion installs, indicating an exceptionally high level of user engagement and global reach.

**Conclusions:**
These apps are all owned by Meta, revealing a strong dominance of a single company in the free mobile app market. The combination of massive install numbers and high review counts suggests that these apps are not only widely downloaded but are also actively used and discussed by their user base.
This emphasizes a pattern in the Play Store: apps that serve core communication and social functions tend to attract both scale and interaction, reinforcing their position at the top of the ecosystem.

In [118]:
# Plot top 5 Paid apps with highest number of reviews
top_paid = apps[apps['Type'] == 'Paid'].sort_values(by='Reviews', ascending=False).head(5)

fig = px.bar(top_paid, x='App', y='Reviews',
             title='Top 5 Paid Apps by Reviews',
             hover_data=['Installs', 'Category', 'Price'])
fig.show()

In [53]:
apps[apps['Type'] == 'Paid'].sort_values(by='Reviews', ascending=False).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size in MBs,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Installs_category
2241,Minecraft,FAMILY,4.5,2376564,,10000000,Paid,6.99,Everyone 10+,Arcade;Action & Adventure,2018-07-24,1.5.2.1,Varies with device,Very High
4347,Minecraft,FAMILY,4.5,2375336,,10000000,Paid,6.99,Everyone 10+,Arcade;Action & Adventure,2018-07-24,1.5.2.1,Varies with device,Very High
4034,Hitman Sniper,GAME,4.6,408292,29.0,10000000,Paid,0.99,Mature 17+,Action,2018-07-12,1.7.110758,4.1 and up,Very High
7417,Grand Theft Auto: San Andreas,GAME,4.4,348962,26.0,1000000,Paid,6.99,Mature 17+,Action,2015-03-21,1.08,3.0 and up,High
8860,Bloons TD 5,FAMILY,4.6,190086,94.0,1000000,Paid,2.99,Everyone,Strategy,2018-07-13,3.16,2.3.3 and up,High


**Observations:**
Among the top paid apps, Minecraft stands out with over 2.3 million reviews, which is significantly higher than the others in its category. The next most reviewed paid apps — Hitman Sniper, GTA: San Andreas, and Bloons TD 5 — trail far behind, each under the 500k mark. Despite the difference in review counts, all apps maintain over 1 million installs, reflecting consistent popularity.

**Conclusions:**
This suggests that Minecraft not only leads in sales but also in user engagement — a rare trait for a paid app. Its enduring brand, cross-platform success, and regular updates may contribute to users being more likely to leave reviews. The sharp drop-off after Minecraft implies that few paid apps reach this level of community involvement, even when successful in downloads.
This plot highlights exceptional dominance by a single title in the paid market, showing that while many apps can be popular, only a few become cultural icons that inspire users to interact and share feedback at scale.

In [128]:
# Plot top 5 Free apps with highest number of installs
top_free_apps = apps[apps['Type'] == 'Free'].sort_values(by='Installs', ascending=False).head(5)

fig = px.bar(
    top_free_apps,
    x='App',
    y='Installs',
    title='Top 5 Free Apps by Installs',
    text='Installs'
)
fig.update_layout(showlegend=False)
fig.show()

In [55]:
apps[apps['Type'] == 'Free'].sort_values(by='Installs', ascending=False).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size in MBs,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Installs_category
5395,Google Photos,PHOTOGRAPHY,4.5,10847682,,1000000000,Free,0.0,Everyone,Photography,2018-08-01,Varies with device,Varies with device,Top Notch
5856,Google Play Games,FAMILY,4.3,7168735,,1000000000,Free,0.0,Teen,Entertainment,2018-07-16,Varies with device,Varies with device,Top Notch
2544,Facebook,SOCIAL,4.1,78158306,,1000000000,Free,0.0,Teen,Social,2018-08-03,Varies with device,Varies with device,Top Notch
2604,Instagram,SOCIAL,4.5,66577446,,1000000000,Free,0.0,Teen,Social,2018-07-31,Varies with device,Varies with device,Top Notch
4150,Google,TOOLS,4.4,8021623,,1000000000,Free,0.0,Everyone,Tools,2018-08-03,Varies with device,Varies with device,Top Notch


Google Photos, Google Play Games, Facebook, Instagram, and Google each have over 1 billion installs.

These apps are extremely popular, often due to being pre-installed or offering essential services.

Even with similar install counts, review activity varies, showing that high installs don't always mean high user engagement.

In [145]:
# Plot top 5 Free apps with highest number of installs
top_paid_apps = apps[apps['Type'] == 'Paid'].sort_values(by='Installs', ascending=False).head(5)

fig = px.bar(
    top_paid_apps,
    x='App',
    y='Installs',
    title='Top 5 Paid Apps by Installs',
    text='Installs'
)
fig.update_layout(showlegend=False)
fig.show()


In [57]:
apps[apps['Type'] == 'Paid'].sort_values(by='Installs', ascending=False).head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size in MBs,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Installs_category
2241,Minecraft,FAMILY,4.5,2376564,,10000000,Paid,6.99,Everyone 10+,Arcade;Action & Adventure,2018-07-24,1.5.2.1,Varies with device,Very High
4034,Hitman Sniper,GAME,4.6,408292,29.0,10000000,Paid,0.99,Mature 17+,Action,2018-07-12,1.7.110758,4.1 and up,Very High
4347,Minecraft,FAMILY,4.5,2375336,,10000000,Paid,6.99,Everyone 10+,Arcade;Action & Adventure,2018-07-24,1.5.2.1,Varies with device,Very High
9941,Tasker,TOOLS,4.6,43045,,1000000,Paid,2.99,Everyone,Tools,2018-06-25,Varies with device,Varies with device,High
7355,Threema,COMMUNICATION,4.5,51110,,1000000,Paid,2.99,Everyone,Communication,2018-08-03,Varies with device,Varies with device,High


Minecraft and Hitman Sniper lead the chart with 10 million+ installs, far ahead of others like Tasker and Threema.

These apps show that users are willing to pay for well-known or high-quality content.

Paid apps can achieve massive reach—if they deliver clear value or strong brand recognition.

In [119]:
# Category with highest number of Installs
top_installs = apps.groupby('Category')['Installs'].sum().sort_values(ascending=False).head(10).reset_index()

fig = px.bar(top_installs,
             x='Category',
             y='Installs',
             title='Top 10 Categories with Highest Total Installs')

fig.update_layout(
    xaxis_tickangle=-45,
    xaxis_title='Category',
    yaxis_title='Total Installs',
    template='simple_white',
    width=1000,
    height=750
)

fig.show()

The Game category leads all others in terms of total installs, exceeding 30 billion downloads — reflecting its massive popularity and engagement on the Play Store. Communication apps follow closely at around 25 billion, likely due to the essential nature of messaging, calling, and video conferencing tools.

While categories like Productivity, Tools, and Family are not at the very top in app count, they still achieve significant install volumes — each above 10 billion, showing that users heavily rely on a few key utility apps.

This contrast suggests that while some categories (like Family) have many apps, others (Productivity, Communication) achieve success with fewer but more impactful apps — revealing a distinction between quantity vs. quality in user demand across categories.

In [133]:
# Category with highest number of Reviews
top_categories = apps.groupby('Category')['Reviews'].sum().sort_values(ascending=False).head(10).reset_index()

fig = px.bar(
    top_categories,
    x='Category',
    y='Reviews',
    title='Top 10 Categories with Highest Total Reviews',
    labels={'Reviews': 'Total Reviews'}
)

fig.update_layout(
    xaxis_tickangle=-45,
    height=500,
    template='simple_white'
)

fig.show()

Most downloaded categories also tend to receive the highest reviews.

In [148]:
user_reviews = pd.read_csv("/content/cleaned_googleplaystore_user_reviews.csv")

In [149]:
user_reviews

Unnamed: 0,App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.000000,0.533333
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.250000,0.288462
2,10 Best Foods for You,Works great especially going grocery store,Positive,0.400000,0.875000
3,10 Best Foods for You,Best idea us,Positive,1.000000,0.300000
4,10 Best Foods for You,Best way,Positive,1.000000,0.300000
...,...,...,...,...,...
37422,Housing-Real Estate & Property,Most ads older many agents ..not much owner po...,Positive,0.173333,0.486667
37423,Housing-Real Estate & Property,"If photos posted portal load, fit purpose. I'm...",Positive,0.225000,0.447222
37424,Housing-Real Estate & Property,"Dumb app, I wanted post property rent give opt...",Negative,-0.287500,0.250000
37425,Housing-Real Estate & Property,I property business got link SMS happy perform...,Positive,0.800000,1.000000


In [150]:
user_reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37427 entries, 0 to 37426
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   App                     37427 non-null  object 
 1   Translated_Review       37427 non-null  object 
 2   Sentiment               37427 non-null  object 
 3   Sentiment_Polarity      37427 non-null  float64
 4   Sentiment_Subjectivity  37427 non-null  float64
dtypes: float64(2), object(3)
memory usage: 1.4+ MB


In [70]:
merged_df = pd.merge(apps, user_reviews, on = "App")
fig = px.box(merged_df,
             x='Type',
             y='Sentiment_Polarity',
             color='Type',
             title="Sentiment Polarity Distribution")

fig.show(renderer="colab")

We observe that free apps receive a lot of harsh comments, as indicated by the outliers on the negative y-axis. Reviews for paid apps appear never to be extremely negative, suggesting that users may have higher expectations for paid apps, leading to generally more positive feedback.

The median polarity score for paid apps is slightly higher (around 0.23) than for free apps (around 0.125), suggesting that paid apps tend to receive more favorable reviews on average.

Paid apps generally perceived as higher quality compared to free apps. However, free apps attract a larger and more varied user base, which could contribute to more extreme and diverse feedback.

In [126]:
# Compare sentiment distribution across different app categories
category_order = (
    merged_df.groupby('Category')
    .size()
    .sort_values(ascending=False)
    .index.tolist()
)

fig = px.histogram(
    merged_df,
    x='Category',
    color='Sentiment',
    barmode='group',
    category_orders={'Category': category_order},
    title='Sentiment Distribution by App Category (Sorted by Total Reviews)'
)

fig.update_layout(xaxis_tickangle=-45)
fig.show()

The sentiment analysis reveals that most app categories receive predominantly positive feedback, with GAME and FAMILY leading in overall review volume. Notably, GAME apps accumulate the highest number of negative reviews as well—likely a result of their massive user base and high expectations from players.

Despite the variation in total reviews, the overall positive sentiment dominates across nearly all categories, indicating general user satisfaction. Categories such as HEALTH AND FITNESS, SPORTS, and PRODUCTIVITY maintain a high ratio of positive reviews, suggesting users find significant value in these types of apps.

This insight highlights how review sentiment often correlates with app utility and engagement, while also showing that popular categories may face more scrutiny, resulting in higher negative review counts.

In [135]:
# Check whether apps with higher star ratings have more positive sentiment in reviews
rating_sentiment = merged_df.groupby('Rating')['Sentiment_Polarity'].mean().reset_index()

fig = px.line(
    rating_sentiment,
    x='Rating',
    y='Sentiment_Polarity',
    markers=True,
    title='Average Sentiment Polarity by App Rating',
    labels={'Rating': 'App Rating', 'Sentiment_Polarity': 'Average Sentiment Polarity'},
    template='simple_white'
)

fig.update_layout(height=500)
fig.show()

**Observations:**
* Apps with a rating below 3.0 tend to receive reviews with negative sentiment polarity.

* Apps with ratings above 4.0 generally attract more positive sentiment, showing a gradual upward trend in polarity.

* The plot shows some fluctuations, but the overall direction is clear: better-rated apps tend to be reviewed more positively.

**Conclusion:** This plot illustrates how user satisfaction is reflected in sentiment—as app ratings increase, so does the positivity in review language. Developers aiming for higher ratings can expect more favorable sentiment from users, reinforcing the importance of quality and user experience.

In [136]:
# Compare average sentiment polarity per app category
category_sentiment = merged_df.groupby('Category')['Sentiment_Polarity'].mean().sort_values().reset_index()

fig = px.bar(
    category_sentiment,
    x='Category',
    y='Sentiment_Polarity',
    title="Average Sentiment Polarity by App Category",
    labels={'Sentiment_Polarity': 'Average Sentiment Polarity'},
    template="simple_white"
)

fig.update_layout(xaxis_tickangle=90)
fig.show()

* Categories like COMICS, EVENTS, and AUTO_AND_VEHICLES show the highest average sentiment polarity, suggesting users are highly satisfied with apps in these categories.

* The GAME category ranks lowest, reinforcing earlier findings that games receive more mixed or critical reviews, possibly due to high user expectations or varying game quality.

* Overall, sentiment polarity varies significantly across categories, indicating that app type influences user satisfaction in distinct ways.