# Workshop: App Review Analysis with Few-Shot Learning

In this workshop, we'll explore how to acquire, clean, and analyze app reviews using various techniques, including Few-Shot Learning with Large Language Models (LLMs).

## Objectives:
1. Acquire and analyze reviews for your own app
2. Explore a benchmark dataset of app reviews
3. Clean and preprocess review data
4. Build baseline classifiers for sentiment analysis
5. Apply Few-Shot Learning techniques using LLMs

Let's get started!

## 1. Data Acquisition

### Step 1: Acquire reviews for your own app

We'll use the Google Play Scraper library to collect reviews for your app.

In [None]:
from google_play_scraper import app, reviews
import pandas as pd

# Replace 'your.app.id' with your actual app ID
app_id = 'com.whatsapp'
app_reviews, _ = reviews(app_id, count=1000)  # Fetching 1000 reviews

# Convert to DataFrame
app_reviews_df = pd.DataFrame(app_reviews)
print(app_reviews_df.head())
print(f"Total reviews collected: {len(app_reviews_df)}")

### Step 2: Load the Hugging Face "google-play-review" dataset

We'll use this dataset as a benchmark for positive/negative review classification.

In [None]:
from datasets import load_dataset

# Load Google Play Review dataset from Hugging Face
hf_reviews = load_dataset("jakartaresearch/google-play-review")
hf_reviews_df = hf_reviews['train'].to_pandas()
print(hf_reviews_df.head())
print(f"Total reviews in the dataset: {len(hf_reviews_df)}")

## 2. Data Analysis and Cleaning

### Step 3: Clean the datasets

We'll filter languages, remove short reviews, and handle unbalanced data.

In [None]:
import langdetect

def detect_language(text):
    try:
        return langdetect.detect(text)
    except:
        return 'unknown'

def clean_dataset(df, text_column, min_words=3, target_languages=['en']):
    # Detect language
    df.loc[:, 'detected_language'] = df.loc[:, text_column].apply(detect_language)
    
    # Filter by language
    df = df[df.loc[:, 'detected_language'].isin(target_languages)]
    
    # Remove short reviews
    df.loc[:, 'word_count'] = df.loc[:, text_column].apply(lambda x: len(str(x).split()))
    df = df[df.loc[:, 'word_count'] >= min_words]
    
    return df

In [None]:
# Clean your app reviews
clean_app_reviews = clean_dataset(app_reviews_df, 'content')
#rename score into stars
clean_app_reviews = clean_app_reviews.rename(columns={'score':'stars'})
print("Your app reviews after cleaning:")
print(clean_app_reviews.head())
print(f"Reviews remaining: {len(clean_app_reviews)}")

In [None]:
# Clean Hugging Face dataset
clean_hf_reviews = clean_dataset(hf_reviews_df, 'text')
print("\nHugging Face reviews after cleaning:")
print(clean_hf_reviews.head())
print(f"Reviews remaining: {len(clean_hf_reviews)}")

In [None]:
# Check class balance in Hugging Face dataset
print("\nClass balance in Hugging Face dataset:")
print(clean_hf_reviews['stars'].value_counts(normalize=True))

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(5, 3))

hf_counts = clean_hf_reviews['stars'].value_counts(normalize=True).sort_index()
app_counts = clean_app_reviews['stars'].value_counts(normalize=True).sort_index()

indices = np.arange(len(hf_counts))

bar_width = 0.35

ax.bar(indices, hf_counts, width=bar_width, label="Hugging Face dataset")
ax.bar(indices + bar_width, app_counts, width=bar_width, label="Your app reviews")

ax.set_title("Class Balance Comparison: Hugging Face vs Your App Reviews")
ax.set_xlabel("Rating")
ax.set_ylabel("Percentage")

ax.set_xticks(indices + bar_width / 2)
ax.set_xticklabels(hf_counts.index)

ax.legend()

plt.tight_layout()
plt.show()

## 3. Feature Engineering

### Step 4: Create categorical variables and clean textual data

In [None]:
import re
from sklearn.feature_extraction.text import CountVectorizer

def preprocess_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove special characters and digits
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    return text

# Preprocess text in both datasets
clean_app_reviews['processed_text'] = clean_app_reviews['content'].apply(preprocess_text)
clean_hf_reviews['processed_text'] = clean_hf_reviews['text'].apply(preprocess_text)

In [None]:
# Create bag-of-words features for Hugging Face dataset
vectorizer = CountVectorizer(max_features=1000, stop_words='english',
                             min_df=5
                             )
bow_features = vectorizer.fit(np.concatenate((clean_hf_reviews['processed_text'], clean_app_reviews['processed_text'])))
bow_features_app = vectorizer.transform(clean_app_reviews['processed_text'])
bow_features_hf = vectorizer.transform(clean_hf_reviews['processed_text'])

print("Bag-of-words features shape:", bow_features_app.shape)
print("Bag-of-words features shape HF dataset:", bow_features_hf.shape)
print("\nSample processed text:")
print(clean_hf_reviews['processed_text'].iloc[0])

In [None]:
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import numpy as np

# Get the feature names (words) from the vectorizer
feature_names = np.array(vectorizer.get_feature_names_out())

# Function to generate word clouds for a dataset grouped by star ratings and display them side by side
def generate_wordcloud_side_by_side(bow_features_hf, bow_features_app, reviews_hf_df, reviews_app_df, rating_column, dataset_hf_name, dataset_app_name):
    # Reset the indices to ensure alignment between the dataframe and bow_features
    reviews_hf_df = reviews_hf_df.reset_index(drop=True)
    reviews_app_df = reviews_app_df.reset_index(drop=True)
    
    # Get unique rating values for alignment
    ratings = sorted(reviews_hf_df[rating_column].unique())
    
    # Create subplots, one row per rating
    fig, axes = plt.subplots(len(ratings), 2, figsize=(15, 5 * len(ratings)))  # Adjust height based on the number of ratings
    
    for i, rating_value in enumerate(ratings):
        # Hugging Face Dataset - Get the reviews for this rating
        hf_indices = reviews_hf_df[reviews_hf_df[rating_column] == rating_value].index.tolist()
        hf_bow = bow_features_hf[hf_indices].sum(axis=0).A1
        hf_word_freq = dict(zip(feature_names, hf_bow))

        # Generate word cloud for Hugging Face dataset
        wordcloud_hf = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(hf_word_freq)
        
        # Plot the Hugging Face word cloud
        axes[i, 0].imshow(wordcloud_hf, interpolation='bilinear')
        axes[i, 0].set_title(f'{rating_value} Stars in {dataset_hf_name}')
        axes[i, 0].axis('off')

        # App Reviews Dataset - Get the reviews for this rating
        app_indices = reviews_app_df[reviews_app_df[rating_column] == rating_value].index.tolist()
        app_bow = bow_features_app[app_indices].sum(axis=0).A1
        app_word_freq = dict(zip(feature_names, app_bow))

        # Generate word cloud for App Reviews dataset
        wordcloud_app = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(app_word_freq)
        
        # Plot the App Reviews word cloud
        axes[i, 1].imshow(wordcloud_app, interpolation='bilinear')
        axes[i, 1].set_title(f'{rating_value} Stars in {dataset_app_name}')
        axes[i, 1].axis('off')

    plt.tight_layout()
    plt.show()

# Generate word clouds side by side for Hugging Face dataset and App Reviews dataset
generate_wordcloud_side_by_side(bow_features_hf, bow_features_app, clean_hf_reviews, clean_app_reviews, 'stars', 'Hugging Face Dataset', 'Your App Reviews Dataset')


In [None]:
# Let's do two things:

import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re

# Download required NLTK data if not already downloaded
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

# Custom stopwords - extend the default list to remove 'whatsapp' and 'app'
stop_words = list(set(stopwords.words('english')) - set(['whatsapp', 'app']))

# Initialize the lemmatizer
lemmatizer = WordNetLemmatizer()

# Function to clean and lemmatize text
def preprocess_text(text):
    # Remove non-alphabetic characters and lowercase the text
    text = re.sub(r'[^a-zA-Z\s]', '', text.lower())
    
    # Tokenize the text, remove stopwords, and lemmatize
    tokens = [lemmatizer.lemmatize(word) for word in text.split()]
    
    # Join tokens back into a string
    return ' '.join(tokens)

# Apply preprocessing to the datasets
clean_hf_reviews['processed_text'] = clean_hf_reviews['processed_text'].apply(preprocess_text)
clean_app_reviews['processed_text'] = clean_app_reviews['processed_text'].apply(preprocess_text)

# Vectorization after preprocessing
vectorizer = CountVectorizer(max_features=1000, stop_words=list(stop_words), min_df=5)
bow_features = vectorizer.fit(np.concatenate((clean_hf_reviews['processed_text'], clean_app_reviews['processed_text'])))
bow_features_app = vectorizer.transform(clean_app_reviews['processed_text'])
bow_features_hf = vectorizer.transform(clean_hf_reviews['processed_text'])

# Print the shapes of the resulting bag-of-words features
print("Bag-of-words features shape:", bow_features_app.shape)
print("Bag-of-words features shape HF dataset:", bow_features_hf.shape)
print("\nSample processed text:")
print(clean_hf_reviews['processed_text'].iloc[0])

In [None]:
generate_wordcloud_side_by_side(bow_features_hf, bow_features_app, clean_hf_reviews, clean_app_reviews, 'stars', 'Hugging Face Dataset', 'Your App Reviews Dataset')

## 4. Baseline Creation

### Step 5: Create a baseline classifier

## Baseline Classifier

### Steps:
1. **Extract the top 50 unigrams, bigrams, and trigrams** for each star rating.
2. **Assign weights** to each n-gram based on its frequency and relevance to each rating.
3. If a review does not contain any of the top n-grams, we will assign a **default rating of 3** (neutral).

### Why This is a Good First Baseline:
1. **Simplicity**: 
   - This baseline is simple and interpretable, making it an excellent teaching tool for students to understand basic classification techniques.
   
2. **Real-World Relevance**:
   - The use of **n-grams** mimics a common practice in Natural Language Processing (NLP), where short sequences of words often carry meaning in sentiment analysis.
   
3. **Performance Benchmark**:
   - This baseline classifier will likely have modest performance, which sets the stage for more complex models (e.g., logistic regression, neural networks) to improve upon.
   - It helps students understand the importance of **starting simple** and using baselines to track progress in model development.

4. **Weighted by Frequency**:
   - By assigning weights based on **frequency statistics**, students can see how **statistical approaches** can offer a quick, lightweight solution before diving into more computationally expensive models.

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from collections import defaultdict, Counter

# Step 1: Extract the top 10 unigrams, bigrams, and trigrams for each star rating

# Initialize the CountVectorizer to extract unigrams, bigrams, and trigrams
vectorizer = CountVectorizer(ngram_range=(1, 3), max_features=1000, stop_words='english')

# Fit the vectorizer on the entire dataset
vectorizer.fit(clean_hf_reviews['processed_text'])

# Create bag-of-words features
bow_features = vectorizer.transform(clean_hf_reviews['processed_text'])

# Convert the bag-of-words matrix to a dataframe for easier manipulation
bow_df = pd.DataFrame(bow_features.toarray(), columns=vectorizer.get_feature_names_out())

# Add the 'stars' column to the dataframe
bow_df['stars'] = clean_hf_reviews['stars'].values

In [None]:
# Step 2: For each rating, get the top 50 n-grams
n = 50
top_ngrams_by_rating = defaultdict(list)
for rating in sorted(bow_df['stars'].unique()):
    # Filter reviews by rating
    rating_reviews = bow_df[bow_df['stars'] == rating]
    
    # Sum up the n-gram counts for all reviews with this rating
    rating_ngram_counts = rating_reviews.drop('stars', axis=1).sum()
    
    # Get the top 10 n-grams for this rating
    top_ngrams = rating_ngram_counts.nlargest(n)
    
    # Store the top n-grams in the dictionary
    top_ngrams_by_rating[rating] = top_ngrams.index.tolist()

In [None]:
# Step 3: Assign weights based on statistical frequency
ngram_weights = {}
for rating, ngrams in top_ngrams_by_rating.items():
    for ngram in ngrams:
        # Frequency-based weight: the more common the n-gram for a rating, the higher its weight
        ngram_weights[ngram] = rating  # Assign the rating as the weight

In [None]:
# Step 4: Create the baseline classifier
def baseline_classifier(review):
    # Preprocess the review
    processed_review = preprocess_text(review)
    
    # Check for the presence of n-grams and sum their weights
    total_weight = 0
    for ngram, weight in ngram_weights.items():
        if ngram in processed_review:
            total_weight += weight
    
    # If the review contains matching n-grams, predict based on the weighted sum
    if total_weight > 0:
        return round(total_weight / len(processed_review.split()))  # Average the weights
    else:
        # If no n-grams match, return a neutral rating of 3
        return 3

In [None]:
# Apply the baseline classifier to a test dataset
clean_hf_reviews['predicted_rating'] = clean_hf_reviews['processed_text'].apply(baseline_classifier)

# Evaluate the baseline classifier (Optional)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(clean_hf_reviews['stars'], clean_hf_reviews['predicted_rating'])
print(f'Baseline classifier accuracy: {accuracy* 100:.2f}%')

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

# Prepare data
X = bow_features_hf
y = clean_hf_reviews['stars']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naive Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)

# Evaluate
y_pred = nb_classifier.predict(X_test)
print(classification_report(y_test, y_pred))

## 5. Naive Classifier

### Step 6: Train and evaluate a naive classifier

In [None]:
from sklearn.linear_model import LogisticRegression

# Train Logistic Regression classifier
lr_classifier = LogisticRegression(random_state=42)
lr_classifier.fit(X_train, y_train)

# Evaluate
y_pred_lr = lr_classifier.predict(X_test)
print(classification_report(y_test, y_pred_lr))

## 6. Few-Shot Learning and LLM

### Step 7: Apply Few-Shot Learning using an LLM

We'll use a pre-trained model for sentiment analysis as an example of Few-Shot Learning.

In [None]:
from transformers import pipeline

# Use a pre-trained model for text classification
classifier = pipeline('text-classification')

In [None]:
def classify_review(review_text):
    # Use the classifier to predict sentiment for each review
    score = classifier(review_text)[0]['score']

    # Convert the score to a star rating (1 to 5)
    # if 0.0 <= score < 0.2: 1, if 0.2 <= score < 0.4: 2, and so on
    predicted_rating = int(1 + round(score * 4))
    return predicted_rating

# Apply the classifier to all reviews in clean_hf_reviews
clean_hf_reviews['predicted_rating'] = clean_hf_reviews['processed_text'].apply(classify_review)

In [None]:
# Assessing the Results

# Calculate accuracy
accuracy = accuracy_score(clean_hf_reviews['stars'], clean_hf_reviews['predicted_rating'])
print(f"Accuracy: {accuracy*100:.2f}")

# Detailed classification report
print(classification_report(clean_hf_reviews['stars'], clean_hf_reviews['predicted_rating']))

## Conclusion

In this workshop, we've covered:

1. Acquiring and cleaning app review data
2. Exploring a benchmark dataset
3. Building baseline classifiers for sentiment analysis
4. Applying Few-Shot Learning techniques using pre-trained LLMs

These techniques can be applied to various natural language processing tasks, especially when dealing with limited labeled data.