# Building a Product Review Analyzer with Hyperbrowser and GPT-4o

In this cookbook, we'll create an intelligent review analyzer that can automatically extract and summarize product reviews from e-commerce websites. This agent will:

1. Visit any product review page
2. Extract review content using web scraping
3. Analyze sentiment, pros, cons, and common themes
4. Generate a comprehensive summary with actionable insights
5. Answer specific questions about customer feedback

This approach combines:
- **[Hyperbrowser](https://hyperbrowser.ai)** for web scraping and content extraction
- **OpenAI's GPT-4o** for intelligent analysis and insight generation

By the end of this cookbook, you'll have a powerful tool that can help businesses understand customer sentiment and identify product improvement opportunities!

## Prerequisites

Before starting, you'll need:

1. A Hyperbrowser API key (sign up at [hyperbrowser.ai](https://hyperbrowser.ai) if you don't have one)
2. An OpenAI API key with access to GPT-4o

Store these API keys in a `.env` file in the same directory as this notebook:

```
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here
```

## Step 1: Set up imports and load environment variables

We start by importing the necessary packages and initializing our environment variables. The key libraries we'll use include:
- `asyncio` for handling asynchronous operations
- `hyperbrowser` for web scraping and content extraction
- `openai` for intelligent analysis and insight generation
- `IPython.display` for rendering markdown output in the notebook

In [46]:
import asyncio
import json
import os

from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.tools import WebsiteScrapeTool
from openai import AsyncOpenAI
from openai.types.chat import (
    ChatCompletionMessageParam,
    ChatCompletionMessageToolCall,
    ChatCompletionToolMessageParam,
)
from IPython.display import Markdown, display

load_dotenv()

True

## Step 2: Initialize API clients

Here we create instances of the Hyperbrowser and OpenAI clients using our API keys. These clients will be responsible for web scraping and intelligent analysis respectively.

In [47]:
hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
llm = AsyncOpenAI()

## Step 3: Implement the tool handler

The tool handler function processes requests from the LLM to interact with our web scraping functionality. It:

1. Receives tool call parameters from the LLM
2. Validates that the requested tool is available
3. Configures advanced scraping options like proxy usage and CAPTCHA solving
4. Executes the web scraping operation
5. Returns the scraped content or handles any errors that occur

This function is crucial for enabling the LLM to access web content dynamically.

In [48]:
async def handle_tool_call(
    tc: ChatCompletionMessageToolCall,
) -> ChatCompletionToolMessageParam:
    print(f"Handling tool call: {tc.function.name}")

    try:
        if (
            tc.function.name
            != WebsiteScrapeTool.openai_tool_definition["function"]["name"]
        ):
            raise ValueError(f"Tool not found: {tc.function.name}")

        args = json.loads(tc.function.arguments)
        print(args)

        content = await WebsiteScrapeTool.async_runnable(
            hb=hb,
            params=dict(
                **args,
                session_options={"use_proxy": True, "solve_captchas": True},
            ),
        )

        return {"role": "tool", "tool_call_id": tc.id, "content": content}

    except Exception as e:
        err_msg = f"Error handling tool call: {e}"
        print(err_msg)
        return {
            "role": "tool",
            "tool_call_id": tc.id,
            "content": err_msg,
            "is_error": True,  # type: ignore
        }

## Step 4: Create the agent loop

Now we implement the core agent loop that orchestrates the conversation between:
1. The user (who asks about product reviews)
2. The LLM (which analyzes the request and determines what information is needed)
3. Our tool (which fetches review content from websites)

This recursive pattern allows for sophisticated interactions where the agent can gather information iteratively, making multiple web scraping requests if necessary to fully understand the reviews before generating insights.

In [49]:
async def agent_loop(messages: list[ChatCompletionMessageParam]) -> str:
    while True:
        response = await llm.chat.completions.create(
            messages=messages,
            model="gpt-4o",
            tools=[
                WebsiteScrapeTool.openai_tool_definition,
            ],
            max_completion_tokens=8000,
        )

        choice = response.choices[0]

        # Append response to messages
        messages.append(choice.message)  # type: ignore

        # Handle tool calls
        if (
            choice.finish_reason == "tool_calls"
            and choice.message.tool_calls is not None
        ):
            tool_result_messages = await asyncio.gather(
                *[handle_tool_call(tc) for tc in choice.message.tool_calls]
            )
            messages.extend(tool_result_messages)

        elif choice.finish_reason == "stop" and choice.message.content is not None:
            return choice.message.content

        else:
            print(choice)
            raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")

## Step 5: Design the system prompt

The system prompt is crucial for guiding the LLM's behavior. Our prompt establishes the agent as an expert review analyzer that can:

1. Extract review content from product pages
2. Analyze overall sentiment and rating distribution
3. Identify common pros and cons mentioned by customers
4. Detect any issues with the company or service
5. Answer specific questions about the reviews

This structured approach ensures that the analysis is comprehensive and actionable.

In [50]:
SYSTEM_PROMPT = """
You are an expert review analyzer. You have access to a 'scrape_webpage' tool which can be used to get markdown data from a webpage. 

This is the link to the review page {link}. You are required to analyze the markdown content from the page, and provide a summary of the reviews. You will provide the following info:
1. The overall sentiment towards the product
2. The number of reviews
3. [Optional] The number of reviews with 1 star, 2 stars, 3 stars, 4 stars, 5 stars 
4. The cons of the product
5. The pros of the product
6. Any issues with the company or service

If the user provides you with a question regarding the reviews, provide that information as well.

Provide the total info in markdown format. 
""".strip()

## Step 6: Create a factory function for generating review analyzers

Now we'll create a factory function that generates a specialized review analyzer for any product page. This function:

1. Takes a URL to a review page as input
2. Ensures the URL has the proper format (adding https:// if needed)
3. Formats the system prompt with this URL
4. Returns a function that can answer questions about the reviews on that page

This approach makes our solution reusable for analyzing reviews of any product across different e-commerce platforms.

In [51]:
from typing import Coroutine, Any, Callable


def make_review_analyzer(
    link_to_review: str,
) -> Callable[..., Coroutine[Any, Any, str]]:
    # Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file
    # for documentation sites hosted on their platforms.
    if not (
        link_to_review.startswith("http://") or link_to_review.startswith("https://")
    ):
        link_to_review = f"https://{link_to_review}"

    sysprompt = SYSTEM_PROMPT.format(
        link=link_to_review,
    )

    async def review_analyzer(question: str) -> str:
        messages: list[ChatCompletionMessageParam] = [
            {"role": "system", "content": sysprompt}
        ]

        if question:
            messages.append({"role": "user", "content": question})

        return await agent_loop(messages)

    return review_analyzer

## Step 7: Test the review analyzer

Let's test our agent by analyzing reviews for a MacBook Air on Best Buy. We'll ask a specific question about how to improve the product and what customers like most about it. This demonstrates the agent's ability to not just summarize reviews but also extract actionable insights.

In [54]:
question = "How can I improve this product? and what do people like the most about it ?"
link = "https://www.bestbuy.com/site/reviews/apple-macbook-air-13-inch-apple-m2-chip-built-for-apple-intelligence-16gb-memory-256gb-ssd-midnight/6602763?variant=A"

## Step 8: Run the analysis and display results

Now we'll create an instance of our review analyzer for the MacBook Air page, run the analysis with our specific question, and display the results in a nicely formatted markdown output. The agent will scrape the review page, analyze the content, and provide insights about potential improvements and customer preferences.

In [55]:
review_analyzer = make_review_analyzer(link)
response = await review_analyzer(question)

if response is not None:
    display(Markdown(response))
else:
    print("Could not process response")

Handling tool call: scrape_webpage
{'url': 'https://www.bestbuy.com/site/reviews/apple-macbook-air-13-inch-apple-m2-chip-built-for-apple-intelligence-16gb-memory-256gb-ssd-midnight/6602763?variant=A', 'scrape_options': {'include_tags': ['.review-section'], 'exclude_tags': [], 'only_main_content': True}}
Handling tool call: scrape_webpage
{'url': 'https://www.bestbuy.com/site/reviews/apple-macbook-air-13-inch-apple-m2-chip-built-for-apple-intelligence-16gb-memory-256gb-ssd-midnight/6602763?variant=A', 'scrape_options': {'include_tags': ['.reviews-list', 'h1', 'h2', '.review', 'p', 'li'], 'exclude_tags': ['.header', 'nav', 'footer'], 'only_main_content': True}}


### Summary of Reviews for Apple MacBook Air 13-inch M2

1. **Overall Sentiment Towards the Product:**
   - The overall sentiment is overwhelmingly positive with a user rating of 4.9 out of 5 stars based on 936 reviews.

2. **Number of Reviews:**
   - There are a total of 936 reviews.

3. **Distribution of Reviews by Star Rating:** (Optional)
   - Most ratings are 5 stars, with a small number of 4-star reviews. There are hardly any 3-star or lower ratings mentioned in the content.

4. **Cons of the Product:**
   - Limited number of ports (only two Thunderbolt/USB-C ports).
   - Base storage of 256GB may be limiting for power users.
   - Midnight color variant tends to show fingerprints.
   - Some users had issues loading Gmail on the Google browser.

5. **Pros of the Product:**
   - **Performance:** The M2 chip provides exceptional speed and efficiency, handling demanding applications and multitasking seamlessly.
   - **Battery Life:** Exceptional battery life, easily lasting a full day even with heavy usage.
   - **Design:** Sleek, lightweight, compact, and aesthetically pleasing design. It's also described as having a premium build quality.
   - **Display:** Vibrant Retina display with excellent color accuracy and sharpness.
   - **Portability:** Quite compact and lightweight, making it perfect for travel.
   - **Other Features:** Comfortable keyboard and responsive trackpad, impressive speakers for its size.

6. **Any Issues with the Company or Service:**
   - No significant issues with the company or service were mentioned in the reviews. Some reviews praised Best Buy's service experience as hassle-free and satisfying.

The reviews highlight the MacBook Air 13-inch M2 as a powerful, efficient, and portable laptop, making it highly recommended for students, professionals, and anyone needing a reliable, everyday laptop. Possible improvements could include increasing the base storage capacity and adding more port options, but overall the product stands out positively due to its performance and battery life.

## Future Explorations

There are many exciting ways to extend and enhance this review analyzer. Here are some possibilities for developers and users to explore:

### Advanced Analysis Features
- **Demographic Segmentation**: Identify if different user groups have different experiences.
- **Comparative Analysis**: Compare reviews across multiple products
- **Interactive Dashboards**: Build visualization dashboards for review insights.

### Technical Enhancements
- **Multi-platform Integration**: Analyze reviews from multiple sources.
- **Real-time Monitoring**: Continuously monitor new reviews and alert on significant deviations.
- **Automatic customer support**: A review analysis agent could help customers with common issues they may face, improving the sentiment towards the product.

All, or even some of these features could make the review analyzer evolve from a useful tool into a comprehensive intel agent. These could provide some interesting ideas for the direction of evolution for such an agent!

## Conclusion

In this cookbook, we built a powerful review analyzer using Hyperbrowser and GPT-4o. This agent can:

1. Automatically extract review content from any product page
2. Analyze sentiment and identify common themes
3. Summarize pros, cons, and customer experiences
4. Answer specific questions about customer feedback
5. Provide actionable insights for product improvement

This pattern can be extended to create more sophisticated review analysis tools, such as:
- Competitive analysis by comparing reviews across similar products
- Trend analysis by tracking sentiment changes over time
- Feature prioritization based on customer feedback
- Automated customer support response generation

Happy analyzing! 📊

## Relevant Links
- [Hyperbrowser](https://hyperbrowser.ai)
- [OpenAI Docs](https://platform.openai.com/docs/introduction)