# Synthetic Fake Review Gen (Ollama + Gemma 3)

This notebook generates synthetic fake reviews using different sizes of Gemma 3 models running via Ollama.

## Models & Roles

1.  **Small Models (`gemma3:1b`)**
    *   **Role:** "Spam Bot" Simulator
    *   **Characteristics:** Low coherence, repetitive, rigid patterns, grammatical looseness.
    *   **Goal:** Simulate mass-generated low-effort spam.

2.  **Medium/Large Models (`gemma3:4b`, `gemma3:12b`)**
    *   **Role:** "Paid Reviewer" Simulator
    *   **Characteristics:** High reasoning, nuanced, can follow complex instructions (e.g., "mixed" sentiment).
    *   **Goal:** Simulate sophisticated attacks, competitor sabotage, or paid boosting.

## Prerequisites

1.  Ensure [Ollama](https://ollama.com/) is installed and running.
2.  Pull the required models:
    ```bash
    ollama pull gemma3:1b
    ollama pull gemma3:4b
    ollama pull gemma3:12b
    ```
    *(Note: Adjust model tags if exact names differ in your registry)*


In [None]:
# %pip install pandas ollama tqdm

In [1]:
import ollama
import pandas as pd
from tqdm import tqdm
import time
import json
import os

In [None]:
MODELS = {
    "small": ["gemma3:1b"],
    "medium": ["gemma3:4b"],
    "large": ["gemma3:12b"]
}

OUTPUT_DIR = "../dataset/synthetic"
os.makedirs(OUTPUT_DIR, exist_ok=True)

PRODUCTS = [
    "Wireless Noise Cancelling Headphones",
    "Organic Vitamin C Serum",
    "Smart Home Security Camera",
    "Ergonomic Office Chair",
    "Gaming Laptop 15-inch"
]

In [3]:
SPAM_BOT_PROMPTS = [
    "Write a very short, generic positive review for {product}. Use poor grammar and repetitive words like 'good' or 'nice'.",
    "Generate a spammy looking review for {product}. It should look like a bot wrote it. Keep it under 10 words.",
    "Write a review for {product} that says 'Great project' or 'Nice sir' even though it is a product.",
    "Write a 5-star review for {product} using only broken English and emojis."
]
PAID_REVIEWER_PROMPTS = [
    "Write a detailed positive review for {product}. Mention specific features like battery life or build quality. Make it sound enthusiastic but slightly overly salesy.",
    "Write a negative review for {product} claiming the delivery was rude, but admit the product itself is okay. This is to damage the seller's reputation subtly.",
    "Write a mixed review for {product}. Praise the design but complain about a specific made-up defect to make it sound authentic.",
    "You are a paid reviewer. Write a convincing 5-star review for {product} that addresses common complaints found in other reviews to neutralize them."
]

In [4]:
def generate_reviews(model_name, role_type, prompts, count_per_prompt=5):
    generated_data = []
    
    print(f"Generating with {model_name} ({role_type})")
    
    for product in PRODUCTS:
        for prompt_template in prompts:
            prompt = prompt_template.format(product=product)
            
            for _ in tqdm(range(count_per_prompt), desc=f"{product[:15]}...", leave=False):
                try:
                    response = ollama.generate(model=model_name, prompt=prompt)
                    review_text = response['response'].strip()
                    
                    generated_data.append({
                        "model": model_name,
                        "role_type": role_type,
                        "product": product,
                        "prompt_used": prompt,
                        "review_text": review_text,
                        "label": "fake"
                    })
                except Exception as e:
                    print(f"Error generating with {model_name}: {e}")
                    continue
                    
    return generated_data

In [7]:
small_model_data = []

for model in MODELS["small"]:
    try:
        data = generate_reviews(
            model_name=model,
            role_type="spam_bot",
            prompts=SPAM_BOT_PROMPTS,
            count_per_prompt=5
        )
        small_model_data.extend(data)
    except Exception as e:
        print(f"Skipping {model}: {e}")

Generating with gemma3:1b (spam_bot)


                                                                 

In [8]:
complex_model_data = []

target_models = MODELS["medium"] + MODELS["large"]

for model in target_models:
    try:
        data = generate_reviews(
            model_name=model,
            role_type="paid_reviewer",
            prompts=PAID_REVIEWER_PROMPTS,
            count_per_prompt=5
        )
        complex_model_data.extend(data)
    except Exception as e:
        print(f"Skipping {model}: {e}")

Generating with gemma3:4b (paid_reviewer)


                                                                 

Generating with gemma3:12b (paid_reviewer)


                                                                  

KeyboardInterrupt: 

In [None]:
all_data = small_model_data + complex_model_data
df = pd.DataFrame(all_data)

if not df.empty:
    csv_path = f"{OUTPUT_DIR}/generated_fake_reviews_gemma3.csv"
    df.to_csv(csv_path, index=False)
    print(f"Saved {len(df)} generated reviews to {csv_path}")
    
    print("\nSample Spam Bot Reviews:")
    print(df[df['role_type'] == 'spam_bot'][['model', 'review_text']].head(3))
    
    print("\nSample Paid Reviewer Reviews:")
    print(df[df['role_type'] == 'paid_reviewer'][['model', 'review_text']].head(3))
else:
    print("No data generated. Check if Ollama is running and models are available.")

In [None]:
if not df.empty:
    print(df.groupby(['model', 'role_type']).size())