<a href="https://colab.research.google.com/github/bala2tech/Business-Reputation-Insights-Analyzer-using-Google-Maps-Reviews-LLM/blob/main/Google_Maps_Reviews_LLM_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install langchain==0.2.15 langchain_community transformers torch gradio textblob nltk requests pandas

Collecting langchain==0.2.15
  Downloading langchain-0.2.15-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain_community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-core<0.3.0,>=0.2.35 (from langchain==0.2.15)
  Downloading langchain_core-0.2.43-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain==0.2.15)
  Downloading langchain_text_splitters-0.2.4-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain==0.2.15)
  Downloading langsmith-0.1.147-py3-none-any.whl.metadata (14 kB)
Collecting numpy<2.0.0,>=1.26.0 (from langchain==0.2.15)
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain==0.2.15)
  Downloading tenacity-8.5.0-py3-

In [1]:
# ==========================================
# 🧠 Google Reviews Sentiment & Insights Dashboard (LangChain + HuggingFace)
# ✅ Free + Secure + Compatible with LangChain ≥ 0.2
# ==========================================

# --- Import core libraries ---
import os, re, getpass, requests, pandas as pd  # OS for env vars, re for regex, getpass for secure input, requests for HTTP, pandas for data handling
from datetime import datetime, timedelta       # Used for parsing and handling date/time
import gradio as gr                            # Gradio for interactive dashboard interface
from textblob import TextBlob                  # TextBlob for sentiment polarity
from nltk.sentiment import SentimentIntensityAnalyzer  # NLTK VADER sentiment analyzer
from transformers import pipeline              # Hugging Face transformers for text summarization
import nltk                                    # NLTK for downloading sentiment lexicons

# --- Ensure dependencies (LangChain + HuggingFace integration) ---
try:
    # Import LangChain components (Chains, Prompts, and LLM interface)
    from langchain.chains import LLMChain
    from langchain.prompts import PromptTemplate
    from langchain_huggingface import HuggingFacePipeline
except Exception:
    # If missing, install automatically
    import subprocess, sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "langchain", "langchain_huggingface"])
    from langchain.chains import LLMChain
    from langchain.prompts import PromptTemplate
    from langchain_huggingface import HuggingFacePipeline

# ==========================================
# 1️⃣ SETUP
# ==========================================

# Download sentiment lexicon for NLTK VADER (used in sentiment scoring)
nltk.download('vader_lexicon', quiet=True)
sia = SentimentIntensityAnalyzer()  # Initialize VADER analyzer

# --- Securely fetch SerpAPI key (hidden input) ---
SERPAPI_API_KEY = os.getenv("SERPAPI_API_KEY")  # Try to get key from environment
if not SERPAPI_API_KEY:
    print("🔐 Please enter your SerpAPI key (hidden):")
    SERPAPI_API_KEY = getpass.getpass("Enter SerpAPI key: ").strip()  # Prompt securely (hidden input)
if not SERPAPI_API_KEY:
    # Abort if key not provided
    raise ValueError("❌ Missing SERPAPI_API_KEY. Get one at https://serpapi.com/manage-api-key")

# --- Initialize Hugging Face Summarization Pipeline ---
summarizer_pipe = pipeline("summarization", model="facebook/bart-large-cnn", device_map="auto")  # Load pre-trained summarization model
hf_llm = HuggingFacePipeline(pipeline=summarizer_pipe)  # Wrap into LangChain-compatible pipeline

# --- Define LangChain prompt templates for two tasks ---
summary_prompt = PromptTemplate(
    input_variables=["text"],
    template="You are a customer experience analyst. Summarize the customer feedback and highlight key insights:\n{text}"
)
summary_chain = LLMChain(llm=hf_llm, prompt=summary_prompt)  # Chain for generating review summaries

suggestions_prompt = PromptTemplate(
    input_variables=["text"],
    template="You are a business improvement expert. Based on the following negative reviews, list 3 actionable improvement ideas:\n{text}"
)
suggestions_chain = LLMChain(llm=hf_llm, prompt=suggestions_prompt)  # Chain for generating improvement suggestions

# ==========================================
# 2️⃣ FETCH GOOGLE REVIEWS VIA SERPAPI
# ==========================================
def fetch_google_reviews(place_id: str) -> pd.DataFrame:
    """Fetch live Google Maps reviews using SerpAPI."""
    params = {
        "engine": "google_maps_reviews",  # API engine
        "place_id": place_id,              # Google Maps Place ID
        "hl": "en",                        # Language
        "api_key": SERPAPI_API_KEY,        # SerpAPI key
    }
    try:
        # Make GET request to SerpAPI
        res = requests.get("https://serpapi.com/search.json", params=params, timeout=30)
        res.raise_for_status()  # Raise error for HTTP failures
        data = res.json()       # Parse JSON response
        reviews = data.get("reviews", [])  # Extract reviews list
        if not reviews:
            return pd.DataFrame()  # Return empty if no reviews found
        # Construct DataFrame from JSON fields
        return pd.DataFrame({
            "user": [r.get("user", {}).get("name", "Anonymous") for r in reviews],
            "rating": [r.get("rating") for r in reviews],
            "date": [r.get("date") for r in reviews],
            "review_text": [r.get("snippet", "") for r in reviews],
        })
    except Exception as e:
        # Handle any network or parsing errors
        print(f"❌ Error fetching reviews: {e}")
        return pd.DataFrame()

# ==========================================
# 3️⃣ CLEANING & PREPROCESSING
# ==========================================
def parse_relative_date(text):
    """Convert relative dates like '2 weeks ago' into absolute datetime."""
    now = datetime.now()  # Current timestamp
    if not isinstance(text, str):
        return now
    text = text.lower()
    # Match patterns like '2 weeks ago'
    match = re.search(r'(\d+)\s+(day|week|month|year)s? ago', text)
    if "yesterday" in text:
        return now - timedelta(days=1)
    if match:
        val, unit = int(match.group(1)), match.group(2)
        # Conversion mapping for relative time units
        mult = {"day": 1, "week": 7, "month": 30, "year": 365}.get(unit, 1)
        return now - timedelta(days=val * mult)
    return now  # Default fallback to now

def preprocess_reviews(df):
    """Clean text, parse dates, and create additional metadata columns."""
    if df.empty:
        return df
    df["review_text"] = df["review_text"].fillna("").str.lower()  # Convert to lowercase, fill NaN
    df["date"] = df["date"].apply(parse_relative_date)            # Parse human-readable dates
    df["date"] = pd.to_datetime(df["date"], errors="coerce")      # Convert to datetime type
    df["year_month"] = df["date"].dt.to_period("M")               # Extract year-month for grouping
    return df

# ==========================================
# 4️⃣ SENTIMENT & TOPIC ANALYSIS
# ==========================================
def analyze_sentiment(text):
    """Determine sentiment using TextBlob polarity score."""
    p = TextBlob(text).sentiment.polarity  # Returns score between -1 and 1
    if p > 0.1:
        return "Positive 😊"
    elif p < -0.1:
        return "Negative 😠"
    return "Neutral 😐"

def extract_topics(text):
    """Identify major topics using keyword-based matching."""
    text = text.lower()
    # Predefined keyword-to-topic mapping
    mapping = {
        "service": ["service", "staff", "waiter"],
        "pricing": ["price", "cheap", "expensive"],
        "food": ["food", "menu", "dish"],
        "cleanliness": ["clean", "dirty"],
        "ambience": ["ambience", "atmosphere"],
    }
    # Check if any topic keywords appear in text
    topics = [k for k, v in mapping.items() if any(w in text for w in v)]
    return topics or ["general"]  # Default to 'general' if no match

# ==========================================
# 5️⃣ SUMMARIZATION & SUGGESTIONS
# ==========================================
def summarize_feedback(df):
    """Summarize all reviews into a short insight summary."""
    if df.empty:
        return "No reviews available."
    joined = " ".join(df["review_text"].tolist())[:2500]  # Combine all text, limit input length
    return summary_chain.run(text=joined)  # Run LangChain summarization

def generate_suggestions(df):
    """Generate business improvement suggestions from negative feedback."""
    neg = df[df["sentiment"] == "Negative 😠"]  # Filter negative reviews
    if neg.empty:
        return "✅ No major negative feedback detected!"
    joined = " ".join(neg["review_text"].tolist())[:2500]
    return suggestions_chain.run(text=joined)  # Run suggestion generation chain

# ==========================================
# 6️⃣ MAIN ANALYSIS FUNCTION FOR GRADIO
# ==========================================
def run_analysis(place_id):
    """Main workflow: fetch → preprocess → analyze → summarize → suggest."""
    df = fetch_google_reviews(place_id)  # Step 1: Fetch reviews
    if df.empty:
        return "❌ No reviews found.", {}, "No data available.", "N/A", "N/A"

    df = preprocess_reviews(df)  # Step 2: Clean data
    df["sentiment"] = df["review_text"].apply(analyze_sentiment)  # Step 3: Sentiment detection
    df["topics"] = df["review_text"].apply(extract_topics)        # Step 4: Topic extraction

    sentiment_counts = df["sentiment"].value_counts().to_dict()  # Step 5: Count sentiment classes
    summary = summarize_feedback(df)                             # Step 6: Summarize overall reviews
    suggestions = generate_suggestions(df)                       # Step 7: Generate improvement advice

    # Compute overall sentiment trend
    pos, neg, neu = sentiment_counts.get("Positive 😊", 0), sentiment_counts.get("Negative 😠", 0), sentiment_counts.get("Neutral 😐", 0)
    if pos > neg:
        overall = "Predominantly Positive 😊"
    elif neg > pos:
        overall = "Predominantly Negative 😠"
    else:
        overall = "Mixed 😐"

    # Return all processed data for Gradio UI
    return summary, sentiment_counts, df.to_html(index=False), overall, suggestions

# ==========================================
# 7️⃣ BUILD GRADIO DASHBOARD
# ==========================================
with gr.Blocks(title="Google Reviews Sentiment Dashboard") as app:
    # Dashboard header
    gr.Markdown("## 🧠 Google Reviews Sentiment & Insights Dashboard (LangChain + HuggingFace)")
    gr.Markdown("Enter a **Google Place ID** to fetch and analyze live reviews. Uses **Hugging Face**, **TextBlob**, and **LangChain** for NLP insights — free and secure.")

    # --- User Input Section ---
    place_id_input = gr.Textbox(label="🔍 Google Place ID", placeholder="e.g. ChIJN1t_tDeuEmsRUsoyG83frY4")  # Textbox for inputting Place ID
    analyze_button = gr.Button("🚀 Analyze Reviews")  # Button to start analysis

    # --- Output Display Components ---
    summary_output = gr.Textbox(label="📝 Summary", lines=5)               # Shows summarized insights
    sentiment_json = gr.JSON(label="📊 Sentiment Distribution")            # Displays sentiment counts
    reviews_html = gr.HTML(label="📋 Detailed Reviews")                    # Shows full reviews in table form
    overall_text = gr.Textbox(label="💬 Overall Sentiment")                # Displays dominant sentiment
    suggestions_text = gr.Textbox(label="💡 Suggestions", lines=5)         # Shows business recommendations

    # --- Connect button to processing function ---
    analyze_button.click(
        fn=run_analysis,   # Function to execute
        inputs=place_id_input,  # Input from textbox
        outputs=[summary_output, sentiment_json, reviews_html, overall_text, suggestions_text],  # Outputs
    )

# Launch the interactive Gradio app in browser
app.launch()


🔐 Please enter your SerpAPI key (hidden):
Enter SerpAPI key: ··········


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu
  summary_chain = LLMChain(llm=hf_llm, prompt=summary_prompt)  # Chain for generating review summaries


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://377976e8361fb361ba.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


