In [33]:
from tensorflow.keras.datasets import reuters
import pandas as pd
pd.set_option('display.max_colwidth', None)
import numpy as np
import time

import os
from dotenv import load_dotenv

In [2]:
path = "./data/"

In [3]:
df_r = pd.read_csv(path+"/2.crypto_app_reviews_tagged.csv")

In [4]:
df_r = df_r.drop("Unnamed: 0", axis=1)

## Convert topics into a list

In [7]:
with open(path+"1.crypto_category.txt", "r") as f:
    topics = f.readlines()

In [8]:
topics = [t.strip() for t in topics[0].split(",")]

In [9]:
topics

['Security',
 'Usability/UI/UX',
 'Transaction Fees/Speed',
 'Customer Support',
 'Features/Functionality',
 'Account Management',
 'Educational Resources/Onboarding',
 'Wallet Security/Integration',
 'Privacy',
 'Reliability/Stability',
 'Customer Service',
 'Verification/KYC/AML Processes']

## Check topic stability

LLMs tend to hallucianate, so as a first check we will compare the number of topics in the list that was fed to the prompt, versus the number of topics tagged by Gemini.

In [16]:
print("The list fed to the LLM through the 2nd prompt contains:", len(topics), "topics.")
print("13 possible topics if we consider 'Generic Feedback'.")

The list fed to the LLM through the 2nd prompt contains: 12 topics.
13 possible topics if we consider 'Generic Feedback'.


In [19]:
#adding generic feedback to the topics list
topics.append("Generic feedback")

The table below shows a table with 16 topics.

In [20]:
topic_distribution = df_r["gemini_llm_topic"].value_counts().reset_index()

In [21]:
topic_distribution

Unnamed: 0,gemini_llm_topic,count
0,Generic feedback,543
1,gemini failed to respond,277
2,Usability/UI/UX,170
3,Reliability/Stability,160
4,Account Management,140
5,Transaction Fees/Speed,132
6,Customer Support,122
7,Features/Functionality,118
8,Verification/KYC/AML Processes,78
9,Security,25


Let´s review which topics were "made up" (i.e hallucinations)

In [22]:
[t for t in topic_distribution["gemini_llm_topic"].values if t not in topics]

['gemini failed to respond',
 'Please provide the user review.',
 'Usability/UI/UX\nFeatures/Functionality\nCustomer Support\nVerification/KYC/AML Processes\nVerification/KYC/AML Processes\nVerification/KYC/AML Processes\nAccount Management\nGeneric feedback']

The results dont look bad:

* "Generic feedback" was mentioned in our prompt as the default topic, if the comment was just an expression of sentiment, i.e: Everythin is great!, Horrible Product!, etc.
* 'gemini failed to respond' means the API failed at responding
* "Please provide the user review" might be caused by reviews with "blank" text.

So it seems the only hallucination is the concatenation of all reviews into a single topic tag (see below).

In [40]:
print(df_r[df_r["gemini_llm_topic"].apply(lambda x: len(x) >50)]["content"])

832    1- the app is clear and can be used easily 2- it has a wide variety of currencies to trade. 3- one of the worst support systems, they don't afford a hot line, only chat option 4- the slowest verification process that I ever used 5- I waited more than three months to verify my address and it didn't work, with no explanation 6- they don't support non latim alphabet languages (for verification documents) 7- I couldn't creat a cash wallet, so I couldn't withdraw my money 8- very bad experience
Name: content, dtype: object


Let's quantify the topic distribution

In [41]:
topic_distribution["percentage"] = (topic_distribution["count"]/ topic_distribution["count"].sum()).round(2) * 100

We see that 30% of the topics are "Generic feedback" (topics that are not useful to extract insights) and then Gemini failed to respond 15% of the time (this tagging was done before implementing the recursive feature of the gemini which calls itself on failure).

The distribution also shows that the list could have been "improved" (i.e low prevalence topics such us "Privacy" and "Education Resources/Onboading" could have grouped into other categories or simply labeled as "other". I dont believe this was a wrong doing from the LLM, this simply shows the nuances of topic modeling.

"Customer Service" (only 3 reviews had this tagged) should have been grouped under "Customer Support", so this is probably a failure or an improvement are in the LLM response.

In [44]:
topic_distribution

Unnamed: 0,gemini_llm_topic,count,percentage
0,Generic feedback,543,30.0
1,gemini failed to respond,277,15.0
2,Usability/UI/UX,170,9.0
3,Reliability/Stability,160,9.0
4,Account Management,140,8.0
5,Transaction Fees/Speed,132,7.0
6,Customer Support,122,7.0
7,Features/Functionality,118,7.0
8,Verification/KYC/AML Processes,78,4.0
9,Security,25,1.0


Many of these "Generic Feedback" instances are in low scoring topics.

In [45]:
df_r[df_r["gemini_llm_topic"] == "Generic feedback"]["score"].value_counts()

score
2    188
1    167
5    165
4     15
3      8
Name: count, dtype: int64

Lets explore a few of them.

In [48]:
generic_feedback_index = df_r["gemini_llm_topic"] == "Generic feedback"
df_generic = df_r[generic_feedback_index].copy()

Based on the few reviews below, the use of "Generic feedback" seems mostly correct.

There are instances in the Binance reviews, where reviews with a score of 1 where mentioing "Listing pi", this seems to be around Binance not listing a coin (this also comes up in one Bybit review "Bybit, list PI network...."). 

Since this might be to niche for an LLM, its a en error that I can live with since my assumption is that most reviews will be around generally common topics.

We also see the use of emojis and other languages where using "Generic Feedback" makes sense.

Crypto.com and Bybit show a lot of mentions of "scam". I would have picked the "Security" category for these.

In [53]:
df_generic[["app","gemini_llm_topic","score","content"]].groupby(["app","score"]).head()

Unnamed: 0,app,gemini_llm_topic,score,content
0,com.binance.dev,Generic feedback,1,Listing to pi network
1,com.binance.dev,Generic feedback,1,Binance is acting like a mafia who does not want its users to make money pathetic platform
3,com.binance.dev,Generic feedback,1,I hate this app so much. What did Pi Network do to you that you cannot list it? The most Terrible and Fraustrating Exchange in the world ❌️.
5,com.binance.dev,Generic feedback,1,List Pi Coin
11,com.binance.dev,Generic feedback,1,"Today uninstall my Binance app, and i am shifting in bitget, for trade on pi"
200,com.binance.dev,Generic feedback,2,nice
201,com.binance.dev,Generic feedback,2,Best app of the crypto in the world
202,com.binance.dev,Generic feedback,2,nice binance program jist fain
203,com.binance.dev,Generic feedback,2,Great
207,com.binance.dev,Generic feedback,2,good


It´s clear that a more rigorous review is needed to assess the quality of the LLM tagging.

I will proceed to take a random sample, and manually check the topic assignment.

## Oversampling "non" generic feedback

Since generic feedback comments dont contain a lot of insight and is the most numerous topic, any model trained with the current dataset, might just optimize to label comments as "Generic Feedback" to achieve higher accuracy.

So I will take an "oversampling" approach, and extract more of the "non" generic feedback reviews.

In [54]:
generic = df_r["gemini_llm_topic"] == "Generic feedback"
failed= df_r["gemini_llm_topic"] == "gemini failed to respond"

So I will sample based on the following distribution:
* 5/6 (250) will be a sample from reviews that were NOT labeled as "Generic Feedback" nor "gemini failed to respond".
* 1/6 (50) will be a sample from reviews tagged with "Generic Feedback".

In [55]:
df_s_non_generic = df_r[~generic & ~failed].sample(250)

In [56]:
df_s_generic = df_r[generic].sample(50)

In [57]:
full_sample = pd.concat([df_s_non_generic, df_s_generic])

**Notes**: 

The code to save the sample locally is commented to avoid replacing the **3.tagged_reviews_sample.csv** file. 

This is done since the is random component in sampling and event though a "seed" could be used, different operating systems might still show different results.

The file **"/data/3.tagged_reviews_sample.csv"** can be used to reproduce the upcoming notebooks.

In [59]:
#full_sample.to_csv(path + "3.tagged_reviews_sample_n.csv", index=False)