In [1]:
from tensorflow.keras.datasets import reuters
import pandas as pd
import numpy as np
import time
from dotenv import load_dotenv
import os
import yaml
from utils import utils

## Import sample Data

Import the data that was produced by the "0-Download Google Play Store Reviews.ipynb" notebook.

In [2]:
df = pd.read_csv("./data/1.crypto_apps_reviews_raw.csv")

In [3]:
df.shape

(1800, 12)

In [4]:
load_dotenv()

True

In [5]:
GEMINI_API_KEY=os.getenv("GEMINI_NEW")

## Import prompts

In [6]:
with open("prompts/prompts.yaml") as file:
    try:
        prompts = yaml.safe_load(file)
    except yaml.YAMLError as exc:
        print(exc)

## Prompt Engineering

Prompt Engineering allows PENDING.

The tagging works in 2 steps using the "persona pattern", one of the most powerful patterns that we can leverage to tap into interesting behavior in a large language model.
1. The first  prompt asks the LLM to act as Customer Success manager for a company in a given industry an generate a list of the most common review categories.
2. The second prompt uses the list generated by the first prompt, and also asks the LLM to once again, actin as a Customer Success Manager, tag a review into one of the topics within the list generated by the first prompt.

## Prompt testing: Topic List generation

Since LLMs are generative, its not guaranteed that they will generate the same list.
See the examples below.

### Gemini Output

For the LLM, I settled for SOTA model.
[Gemini 1.5 flash](https://ai.google.dev/gemini-api/docs/pricing#gemini-1.5-flash) was my choice due to its speed and performance, and flexible free tier.

As of the day of this writing (March 26, 2025), this model is free of charge up to 1 million tokens of storage per hour.

Below is the gemini´s output to the first prompt

In [7]:
prompt = prompts["prompt_v9a"].format(industry="Crypto")

In [8]:
print("First Prompt:", prompt)
print()
for i in range(3):
    print(f"*********Category list #{i+1}**********")
    res = utils.gemini_query(prompt, gemini_key = GEMINI_API_KEY)
    print(res)

First Prompt: You are an expert Customer Success Manager working int the Crypto industry. Can you list the most common categories that could exist for a mobile app within this industry?Group similar categories into one, i.e: Customer Support and Customer Services should be grouped under Customer Support.Provide your answer as a list, eg: [category1, category2,..]

*********Category list #1**********
[Wallet Management & Security, Trading & Investing,  DeFi & Yield Farming,  NFT Management & Marketplace,  Staking & Governance,  News & Market Data,  Education & Learning,  Analytics & Portfolio Tracking,  Compliance & Regulatory Tools, Customer Support, Social Features & Community,  Security Audits & Certifications]
*********Category list #2**********
[Wallet Management & Security, Trading & Investing,  DeFi & Lending, NFT Marketplace & Management,  Staking & Yield Farming,  Analytics & Portfolio Tracking,  News & Research,  Education & Learning, Customer Support, Regulatory Compliance & 

### Ollama output

I also tried with local and open source LLMs using Ollama.
I wanted to try smaller, local and open source LLMs, since if the results turned out to be similar, I would be able to leverage a cheaper solution and have more control over privacy (i.e the data being fed to the LLM).

Unfortunately the response of the 2 models I tried to run likely was far from optimal. 

I ran them on a Mac Pro with an M1 chip (I computer with decent power).

In [9]:
print("First Prompt:", prompt)
print()
for i in range(3):
    print(f"*********Category list #{i+1}**********")
    res = utils.ollama_query(prompts["prompt_v9a"].format(industry="Crypto"),model ="llama3.2:1b")
    print(res)

First Prompt: You are an expert Customer Success Manager working int the Crypto industry. Can you list the most common categories that could exist for a mobile app within this industry?Group similar categories into one, i.e: Customer Support and Customer Services should be grouped under Customer Support.Provide your answer as a list, eg: [category1, category2,..]

*********Category list #1**********
As a Customer Success Manager in the crypto industry, I've identified common categories for mobile apps that cater to users. Here's a list of common categories:

**Customer Support:**

[Support, Maintenance, Updates]

* Support:
	+ FAQs
	+ Knowledge Base
	+ Self-service portals
	+ Customer support tickets and phone support

* Maintenance:
	+ Software updates and patches
	+ Bug fixes and release notes
	+ Performance optimization
	+ Security patching

* Updates:
	+ New feature releases
	+ Compatibility updates
	+ Changes to existing features
	+ Removal of outdated or deprecated features

**Cu

In [10]:
print("First Prompt:", prompt)
print()
for i in range(3):
    print(f"*********Category list #{i+1}**********")
    res = utils.ollama_query(prompts["prompt_v9a"].format(industry="Crypto"),model ="deepseek-r1:7b")
    print(res)

First Prompt: You are an expert Customer Success Manager working int the Crypto industry. Can you list the most common categories that could exist for a mobile app within this industry?Group similar categories into one, i.e: Customer Support and Customer Services should be grouped under Customer Support.Provide your answer as a list, eg: [category1, category2,..]

*********Category list #1**********
<think>
Okay, so I'm trying to figure out the most common categories that a mobile app in the crypto industry might fall under. The user wants me to group similar categories together, like putting both Customer Support and Customer Services into one. They also provided an example list, which is helpful.

First, I'll start by brainstorming what aspects are typical for any mobile app, especially ones dealing with cryptocurrency. Then, I'll see how they fit into existing categories or if they need to be grouped together.

1. **App Development & Updates**: This seems straightforward—app buildin

### Topic List Generation Summary

Both **llama3.2:1b** and **deepseek-r1:7b** failed to provide the desired output.

The reason the first prompt asks for an output in a list format (i.e [category1, category2,..]) is so that it can then be easily parsed in python and fed to the second prompt.

## The "generative" problem
Even thought gemini did a good job, the reason I ran the prompt 3 times, was to show how even a SOTA model would generate outputs even with the same prompt.

So how to we solve for this? We will rely on one shot prompting [********* CITAR]**************].

This basically that will go with the first result returned by the LLM.

In [11]:
topic_list = res = utils.gemini_query(prompt, gemini_key = GEMINI_API_KEY)
print("Example list:")
print()
print(topic_list)

Example list:

[Wallet Management & Security, Trading & Investing,  DeFi & Lending, NFT Management & Marketplace,  Staking & Yield Farming,  News & Information,  Analytics & Portfolio Tracking,  Education & Learning,  Social & Community, Regulatory Compliance & Reporting,  Customer Support]


The list used in this experiment resides in the "./data/2.crypto_category.txt" file.
We wont be overwriting the list so that the work remains reproducible.

Note that although is slightly different to the one generated above, its still good enough.

In [12]:
with open("./data/2.crypto_category.txt", "r") as f:
    topic_list = f.readlines()
print(topic_list)

['Security,  Usability/UI/UX,  Transaction Fees/Speed,  Customer Support,  Features/Functionality,  Account Management,  Educational Resources/Onboarding,  Wallet Security/Integration,  Privacy,  Reliability/Stability,  Customer Service,  Verification/KYC/AML Processes']


## Prompt Testing: Topic Tagging.

In [13]:
samples = df.content.sample(3)

In [14]:
samples

56     Community votes doesn't make sense on this pla...
246    Hi dear admin My information is encrypted in y...
422                                                 good
Name: content, dtype: object

In [20]:

print("Category List from Prompt 1:")
print()
print(topic_list)
print()
print("Sample prompt: ")
print()
sample_prompt = prompts["prompt_v9b"].format(industry="Crypto", 
                                                 categories=topic_list,
                                                input_text = sample )
print(sample_prompt)
print()

for i, sample in enumerate(samples):
    print(f"********************* Sample {i+1} ***************")
    prompt2 = prompts["prompt_v9b"].format(industry="Crypto", 
                                                 categories=topic_list,
                                                input_text = sample )
    
    
    print("Sample Review: ", sample)
    
    print()
    print("Gemini Results: ", utils.gemini_query(prompt2, gemini_key = GEMINI_API_KEY))
    
    print()
    print("Ollama 'llama3.2:1b' Results: ", utils.ollama_query(prompt2,model ="llama3.2:1b"))

Category List from Prompt 1:

['Security,  Usability/UI/UX,  Transaction Fees/Speed,  Customer Support,  Features/Functionality,  Account Management,  Educational Resources/Onboarding,  Wallet Security/Integration,  Privacy,  Reliability/Stability,  Customer Service,  Verification/KYC/AML Processes']

Sample prompt: 

You are an expert Customer Success Manager working in the Crypto Industry.You are tasked with categorizing a list of user reviews for further analysis. Please assign this review: good
To one of the following categories: ['Security,  Usability/UI/UX,  Transaction Fees/Speed,  Customer Support,  Features/Functionality,  Account Management,  Educational Resources/Onboarding,  Wallet Security/Integration,  Privacy,  Reliability/Stability,  Customer Service,  Verification/KYC/AML Processes']If the review is just an expression of sentiment (eg: Great!, Bad!, etc). Please use the 'Generic feedback' category.
Your answer should just be the category name.

********************* Sa

### Topic List Generation Summary

Once again, gemini does a very good job, where as the **llama3.2.:1b** really fails at delivering useful or realiable results.

## Conclusions

Based the sample results, it seems feasible to use closed source SOTA model such as GEMINI to generate topics for corpuses of unlabeled app reviews.