# 🗺️ Google Maps Review Summarizer

This Python app automates the process of fetching and summarizing Google Maps reviews for any business or location.

## 🚀 Overview
The app performs two main tasks:
1. **Scrape Reviews** – Uses a web scraping script to extract reviews directly from Google Maps.
2. **Summarize Content** – Leverages OpenAI's language models to generate concise, insightful summaries of the collected reviews and analyse the sentiments.

## 🧠 Tech Stack
- **Python** – Core language
- **Playwright** – For scraping reviews
- **OpenAI API** – For natural language summarization
- **Jupyter Notebook** – For exploration, testing, and demonstration

### 🙏 Credits
The web scraping logic is **inspired by [Antonello Zanini’s blog post](https://blog.apify.com/how-to-scrape-google-reviews/)** on building a Google Reviews scraper. Special thanks for the valuable insights on **structuring and automating the scraping workflow**, which greatly informed the development of this improved scraper.

This app, however, uses an **enhanced version of the scraper** that can scroll infinitely to load more reviews until it collects **at least 1,000 reviews**. If only a smaller number of reviews are available, the scraper stops scrolling earlier.

## ✅ Sample Output
Here is a summary of reviews of a restuarant generated by the app.

![Alt text](google-map-review-summary.jpg)


---

**Note:** This project is intended for educational and research purposes. Please ensure compliance with Google’s [Terms of Service](https://policies.google.com/terms) when scraping or using their data.


In [None]:
#Activate the llm_engineering virtual environment
!source ../../../.venv/bin/activate 

#Make sure pip is available and up to date inside the venv
!python3 -m ensurepip --upgrade

#Verify that pip now points to the venv path (should end with /.venv/bin/pip)
!which pip3

#Install Playwright inside the venv
!pip3 install playwright

#Download the required browser binaries and dependencies
!python3 -m playwright install

In [2]:
import asyncio
from playwright.async_api import async_playwright
from IPython.display import Markdown, display
import os
from dotenv import load_dotenv
from openai import OpenAI


In [3]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [4]:
async def scroll_reviews_panel(page, max_scrolls=50, max_reviews=10):
    """
    Scrolls through the reviews panel to lazy load all reviews.
    
    Args:
        page: Playwright page object
        max_scrolls: Maximum number of scroll attempts to prevent infinite loops
    
    Returns:
        Number of reviews loaded
    """
    # Find the scrollable reviews container
    # Google Maps reviews are in a specific scrollable div
    scrollable_div = page.locator('div[role="main"] div[jslog$="mutable:true;"]').first
    
    previous_review_count = 0
    scroll_attempts = 0
    no_change_count = 0

    print("Starting to scroll and load reviews...")
    
    while scroll_attempts < max_scrolls:
        # Get current count of reviews
        review_elements = page.locator("div[data-review-id][jsaction]")
        current_review_count = await review_elements.count()
        
        #if we have loaded max_reviews, we will stop scrolling
        if current_review_count >= max_reviews:
            break

        print(f"Scroll attempt {scroll_attempts + 1}: Found {current_review_count} reviews")
        
        # Scroll to the bottom of the reviews panel
        await scrollable_div.evaluate("""
            (element) => {
                element.scrollTo(0, element.scrollHeight + 100);
            }
        """)
        
        # Wait for potential new content to load
        await asyncio.sleep(2)
        
        # Check if new reviews were loaded
        if current_review_count == previous_review_count:
            no_change_count += 1
            # If count hasn't changed for 3 consecutive scrolls, we've likely reached the end
            if no_change_count >= 3:
                print(f"No new reviews loaded after {no_change_count} attempts. Finished loading.")
                break
        else:
            no_change_count = 0
        
        previous_review_count = current_review_count
        scroll_attempts += 1
    
    final_count = await review_elements.count()
    print(f"Finished scrolling. Total reviews loaded: {final_count}")
    return final_count

In [5]:
async def scrape_google_reviews(url):
    # Where to store the scraped data
    reviews = []

    async with async_playwright() as p:
         # Initialize a new Playwright instance
        browser = await p.chromium.launch(
            headless=True  # Set to False if you want to see the browser in action
        )
        context = await browser.new_context()
        page = await context.new_page()

        # The URL of the Google Maps reviews page

        # Navigate to the target Google Maps page
        print("Navigating to Google Maps page...")
        await page.goto(url)

        # Wait for initial reviews to load
        print("Waiting for initial reviews to load...")
        review_html_elements = page.locator("div[data-review-id][jsaction]")
        await review_html_elements.first.wait_for(state="visible", timeout=10000)
        
        # Scroll through the reviews panel to lazy load all reviews
        total_reviews = await scroll_reviews_panel(page, max_scrolls=100)
        
        print(f"\nStarting to scrape {total_reviews} reviews...")

        # Get all review elements after scrolling
        review_html_elements = page.locator("div[data-review-id][jsaction]")
        all_reviews = await review_html_elements.all()
        
        # Iterate over the elements and scrape data from each of them
        for idx, review_html_element in enumerate(all_reviews, 1):
            try:
                # Scraping logic

                stars_element = review_html_element.locator("[aria-label*=\"star\"]")
                stars_label = await stars_element.get_attribute("aria-label")

                # Extract the review score from the stars label
                stars = None
                for i in range(1, 6):
                    if stars_label and str(i) in stars_label:
                        stars = i
                        break

                # Get the next sibling of the previous element with an XPath expression
                time_sibling = stars_element.locator("xpath=following-sibling::span")
                time = await time_sibling.text_content()

                # Select the "More" button and if it is present, click it
                more_element = review_html_element.locator("button[aria-label=\"See more\"]").first
                if await more_element.is_visible():
                    await more_element.click()
                    await asyncio.sleep(0.3)  # Brief wait for text expansion

                text_element = review_html_element.locator("div[tabindex=\"-1\"][id][lang]")
                text = await text_element.text_content()

                reviews.append(str(stars) + " Stars: \n" +"Reviewed On:" + time + "\n"+ text)
                
                if idx % 10 == 0:
                    print(f"Scraped {idx}/{total_reviews} reviews...")
                    
            except Exception as e:
                print(f"Error scraping review {idx}: {str(e)}")
                continue

        print(f"\nSuccessfully scraped {len(reviews)} reviews!")

        # Close the browser and release its resources
        await browser.close()

        return "\n".join(reviews)

In [6]:
system_prompt = """
You are an expert assistant that analyzes google reviews,
and provides a summary and centiment of the reviews.
Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

In [7]:
# Define our user prompt

user_prompt_prefix = """
Here are the reviews of a google map location/business.
Provide a short summary of the reviews and the sentiment of the reviews.

"""

In [8]:

def prepare_message(reviews):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_prefix + reviews}
    ]

In [9]:
async def summarize(url):
    openai = OpenAI()
    reviews = await scrape_google_reviews(url)
    response = openai.chat.completions.create(
        model = "gpt-4.1-mini",
        messages = prepare_message(reviews)
    )
    return response.choices[0].message.content

In [10]:
async def display_summary(url):
    summary = await summarize(url)
    display(Markdown(summary))

In [None]:
url = "https://www.google.com/maps/place/Grace+Home+Nursing+%26+Assisted+Living/@12.32184,75.0853037,17z/data=!4m8!3m7!1s0x3ba47da1be6a0279:0x9e73181ab0827f7e!8m2!3d12.32184!4d75.0853037!9m1!1b1!16s%2Fg%2F11qjl430n_?entry=ttu&g_ep=EgoyMDI1MTAyMC4wIKXMDSoASAFQAw%3D%3D"
await display_summary(url)
