Now, I could do word frequency analysis, word cloud and such, and find which words correlate with lowest scores. But then, I'm sure "bug", "crash" will give low scores and "good", "awesome" will give high scores, and that doesn't really give any interesting insights.

However, the BERTopic framework has all the bells and whistles I'm hoping for. First, it will produces embeddings for each document, then apply a dimensionality reduction (UMAP), use a clustering algorithm (HDBSCAN), then extract the word importance for each clusters. This can be achieved using class-based TF-IDF, like normal TF-IDF, but applies on a cluster-level instead of document-level.

In [44]:
import pickle
import pandas as pd
import numpy as np
import ast
from pprint import pprint

ver = '29.8.4'
with open(f'data/{ver}/topics.pkl', 'rb')  as f:
    topics = pickle.load(f)

df = pd.read_csv('data/UserFeedbackData.csv', index_col=0)
df = df[df['RC_ver'] == ver]

sample_df = pd.read_csv(f'data/{ver}/sample_result.csv', index_col = 0)
sample_df

Unnamed: 0_level_0,Count,Name,Representation,Representative_Docs
Topic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
-1,11,-1_settings_preferences_icons_phone,"['settings', 'preferences', 'icons', 'phone', ...","[""My phone is in English, settings and prefere..."
0,15,0_pros_content_platform_youtube,"['pros', 'content', 'platform', 'youtube', 'pe...","[""Great app. Too many ads. If you saw a video ..."
1,14,1_youtube_bug_slideshow_glitch,"['youtube', 'bug', 'slideshow', 'glitch', 'ios...","[""Good app, but there's a glitch that I've had..."
2,13,2_hacked_account_permissions_privacy,"['hacked', 'account', 'permissions', 'privacy'...",['Tiktok is sketchy w/ collecting data. You ca...
3,12,3_android_glitching_screen_phone,"['android', 'glitching', 'screen', 'phone', 'f...","['app used to be entertaining, now it\'s just ..."
4,9,4_issues_issue_loading_wifi,"['issues', 'issue', 'loading', 'wifi', 'networ...","[""Request! It's a really great app! I have usi..."
5,6,5_followers_unfollows_profile_account,"['followers', 'unfollows', 'profile', 'account...","[""new account. can't set up a name, can't set ..."


In [None]:
for topic in sample_df.index:
    print('=======')
    print(sample_df.loc[topic]['Name'])
    print(sample_df.loc[topic]['Representation'])
    for doc in ast.literal_eval(sample_df.loc[topic]['Representative_Docs']):
        # pprint(doc, width=80)
        print(doc)
    print('=======================')


-1_settings_preferences_icons_phone
['settings', 'preferences', 'icons', 'phone', 'annoy', 'annoying', 'text', 'messages', 'inbox', 'language']
('My phone is in English, settings and preferences in app are in English, just '
 'why the heck are filters and text to voice transtaled to Spanish? I see the '
 "same filters in English, and I don't know what to do anymore. Please make it "
 'that if my preferences are in English, if there is an English version, to '
 'switch to the preferred language.')
('Tik tok best and nice app 100 workings amazing app great this app is a game '
 'changer tik tok has truly revolutionized the way we share and consume '
 "contects. Whether you're looking for hilarious skits impressive dance "
 'routines.or jaw-dropping talents this app has it all the user interface is '
 'sleek.intuitive and easy to navigate from the moment you open the app you '
 'are greeted with and endless stream of captivating videos that will keep you '
 'hooked for hours video that wi

With the result being clustered into 6 clusters, we can use the keywords and most representative docs to get a sense of what these clusters represent.
* -1 is outliers, so you can see their content diverges a lot from each otehr.
* 0 you can see a lot mentions of ads and the content of the Tiktok shorts itselves.
* 1 is more about technical bugs, especially about the slideshow feature. If you sort the reviews of this cluster by their Thumbs Up, these are also the most upvoted issues!
* 2 is more about permissions and privacy settings and concerns.
* 3 is various concerns about the UX. These are the most varied cluster just from vibes.
* 4 is about issues with network connections
* 5 is about follow and favorite buttons and setting up users counts.

I'll admit, this clustering-into-topic algorithm is more useful than I thought. Intuitively, when people visit the app's page on the app store, mostly to complain or (they're paid to) praise, if they see another person already making the same complaint, they would just upvote and not submit their own reviews. Which is to say, each problem would be summarized by 1 review, with a lot of people voting and nodding their heads, and it wouldn't make sense to do clustering. However, it seems that people would discover different bugs about the same features, which then, since their contents are similar, can be neatly classified into clusters. For practical purposes, instead of having to read all hundreds of reviews, or ONLY reading the most upvoted ones and potentially miss critical feedbacks, these tools can be used to categorize reviews! The curator would be able to pass each clusters into the team responsible to the feature (UX to design team, ads to recommendation team).

"neat" here is not perfect of course, as you can see by ranking them by upvotes within a cluster, there are a lot of variations between the top reviews. And of course, the most unique keywords by TF-IDF is flawed, since they relied on word frequency and not semantics to determine representitiveness. Given more time and compute, one can definitely improve the results. One can use the hierachy extracted to consider merging the clusters when it makes sense. The thumbs up counts present an informative parameter, which can be integrated and used to improve various modules in Bertopic. For example, creating weighted sub-graphs between points in the earlier stage of DBSCAN can incorporate information about upvotes, and not just use cosine similarity. One can also pass these most representitive docs through an LLM for it to give better summary and list of keywords. While we're at the topic of LLM, you can ask the LLM to rewrite the reviews in specific format, like {user, feature, issue, keyword}, which would help Bertopic even more. Hell, with Gemini's 1M context windows, you should be able to put all 150 reviews or so of 1 version. Its performance? You'd have to try it and tell me. Needle-in-haystack evaluation is kinda wack for how silly it is, but that's just my humble opinion. Idk if it is the best representation of long-context capabilities.

That leads us to elephant in the room, which is evaluation. In all cases, all applications need to evaluated to understand its performance and how to improve it. However, this dataset does not give classification information, and thus there's no way for us quantitatively test our clustering. One can argue that LLM is coming close to human intelligence and use it to create pseudo-labels for this task, but, again, that assumption itself would require testing. As for me, I just want to explore a dataset and try out tech for a simple application. This is too much excitement for one day already, so Imma leave the rest of the work as homework for my dear reader(s). Peace! 

In [48]:
df['topics'] = topics
unique_topics = sorted(df['topics'].unique())
for topic in unique_topics:
    print('=======')
    cur_df = df[(df['topics'] == topic)]   
    print(sample_df.loc[topic]['Name'])
    print(sample_df.loc[topic]['Representation'])

    #sort by TU count

    cur_df = cur_df.sort_values('TU_count', ascending=False)
    reviews = cur_df['content'].tolist()[:3]
    TU_counts = cur_df['TU_count'].tolist()[:3]

    for i in range(len(reviews)):
        pprint(reviews[i], width=80)
        print('ThumbsUp counts: ', TU_counts[i])

    

-1_settings_preferences_icons_phone
['settings', 'preferences', 'icons', 'phone', 'annoy', 'annoying', 'text', 'messages', 'inbox', 'language']
('My phone is in English, settings and preferences in app are in English, just '
 'why the heck are filters and text to voice transtaled to Spanish? I see the '
 "same filters in English, and I don't know what to do anymore. Please make it "
 'that if my preferences are in English, if there is an English version, to '
 'switch to the preferred language.')
ThumbsUp counts:  87
('The worst app. The contents is amazing 😍. But the experience🤬😡 Your app is '
 'terrible. Whenever I open it, I know something will irritate or annoy me. '
 'Your too much notifications. At least there are somewhat settings for this '
 'one. Whenever I log in, it tells me about two inbox messages I can never '
 'locate. Forever tells me about my contacts following me(no problem with '
 'that), the issue start when I click on that thing and the only option I have '
 'is to

In [None]:

np.


80