# Intro
##Summarizing Yelp Reviews with Google Gemini AI (Summarization Notebook)
                                        By MATTA (Matteo Meier, Adrian Sanchez, Timothy Phan, Ty Hershberger, Aaron Tu)

## Project Goal

The aim of this project is to transform raw Yelp data from a multitude of files into an organized, streamlined table via data wrangling and thorough analysis, which can be summarized by AI to provide key insights to a user. In this notebook, we focus on building foundational datasets and tables through data wrangling and strategic merging to support meaningful summarization of large quantities of data. This involves manually structuring multiple tables, along with loading, cleaning, and organizing data from various sources, including restaurant categories, user reviews, and metro area information. We utilize the Google Gemini API to aid in summarization, tailoring its parameters to take into consideration the pros and cons listed by Yelp reviewers  to create summaries that are digestible and engaging the reader. By refining and structuring Yelp’s data, we generate authentic and truthful insights to streamline the process of sorting through reviews and get the user to the dining experience faster. Our project aims to make the process of sorting through restaurant recommendations efficient, informative, and impactful for users. 

This notebook will focus on the final two stages of our product, the summarization using an LLM and the identifications of our project limitations.


## Initial Imports
Before we start -- run these first. These are all of the modules we need associated with Gemini, Langchain, and Pydantic.

In [0]:
pip install -Uq google-generativeai

Python interpreter will be restarted.
Python interpreter will be restarted.


In [0]:
dbutils.library.restartPython()

In [0]:
pip install -q langchain

Python interpreter will be restarted.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
scipy 1.7.3 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.26.4 which is incompatible.
Python interpreter will be restarted.


In [0]:
pip install -q langchain-google-genai

Python interpreter will be restarted.
Python interpreter will be restarted.


In [0]:
from pydantic import BaseModel, Field
from typing import List
from pydantic.dataclasses import *
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.runnables import RunnableLambda
from langchain_core.runnables.base import RunnableEach
from langchain_core.prompts import PromptTemplate
import json
import pprint
from IPython.display import display as python_display
from IPython.display import Markdown

In [0]:
from IPython.display import display as python_display
from IPython.display import Markdown

In [0]:
from langchain_google_genai import ChatGoogleGenerativeAI
import os

# Summarization

## Creating the Model

Our LLM for this project will be Google Gemini. We will be aliasing this as `model` for simplicity's sake.


In [0]:

model = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

## Creating the Functions

In order to create our chain, we will be needing a series of functions, and those functions will require setup and explanation as well. Below you will find the functions we used and their explanations.

In [0]:
#this function rebuilds our final table from the wrangling notebook from parquet files to be called at will
def rebuild_tables(tablename):
  if spark.catalog._jcatalog.tableExists(tablename):
    print(f'{tablename} already loaded in memory')
  else:
    spark.sql(f"""
      CREATE TABLE {tablename}
      USING PARQUET 
      LOCATION '/user/hive/warehouse/{tablename}'
      """)
    print(f'{tablename} rebuilt from parquet files.')



In [0]:
rebuild_tables('restaurant_reviews_for_summarization_table')

restaurant_reviews_for_summarization_table rebuilt from parquet files.


In [0]:
spark.sql('''
DESCRIBE TABLE restaurant_reviews_for_summarization_table
''').show()

+--------------------+-------------+-------+
|            col_name|    data_type|comment|
+--------------------+-------------+-------+
|         business_id|       string|   null|
|     restaurant_name|       string|   null|
|          metro_area|       string|   null|
|               stars|       double|   null|
|          categories|array<string>|   null|
|             user_id|       string|   null|
|                text|       string|   null|
|                date|       string|   null|
|restaurant_review...|       bigint|   null|
|        review_stars|       double|   null|
|           user_name|       string|   null|
|       average_stars|       double|   null|
|   user_review_count|       bigint|   null|
|    compliment_count|       bigint|   null|
+--------------------+-------------+-------+



### Determining the Criteria for the Review-Sorting Function
We will need a function for sorting our reviews into positive our negative sentiments so that our LLM can identify the themes of the reviews and generate a summary based on them. Below we cover the different criteria we used for creating the function that serves that need.

#### Positive or Negative

In order to sort the reviews into either positive or negative for our summary, it is important to have a clear distinction of what qualifies as either sentiment. In the data wrangling notebook, we found that for all major metro areas, the distribution of reviews was top heavy, skewing heavily to 3, 4, and 5-star reviews, with very few 1 or 2-star reviews. This is likely due to the fact that unless a restaurant is outright horrible, most reviewers will be at least middling in their take. But middling is middling, not good, which is why our first criteria for whether a review is positive is whether it is greater than 3 stars. 


Within the world of Yelp, however, not all reviewers review equally. Many times a consumer will not find it worth reviewing a restaurant if the food is good, because good quality is to be expected, and bad quality is the exception. This means it is far more likely for a user to go out of their way to write a review if their experience was negative, to serve as a warning for others to avoid the establishment. This type of user typically has a low average star rating, and so if a user who primarily writes negative reviews has reviewed a place greater than their average, then it's fair to say that for them, it was impressive. However, it cannot only be 1 point greater than average. If a negative-skewing user has an average of 1 stars, than a 3 star review can be considered good to them, and so we will also allow 3 starred reviews so long as they are greater than 2 points above a user's average stars, although this will only help in few edge cases. We found that in some of the reviews meeting this criteria did have good things to say, and so they should be considered for positive reviews.

In [0]:
spark.sql(f'''
  SELECT text, restaurant_name, review_stars, restaurant_review_count, 
         average_stars, date, compliment_count
  FROM restaurant_reviews_for_summarization_table
  WHERE (review_stars > 3 OR review_stars >  (average_stars + 1))
  GROUP BY review_stars, date, text, restaurant_name, 
           restaurant_review_count, average_stars, compliment_count
  ORDER BY review_stars

''').show(vertical=True, truncate=False)

-RECORD 0-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#### Relevancy

People change, and so do their opinions about restaurants. In the early 2000's, Dominos Pizza was thought of as low quality, but after a change in their recipe in 2010, the pizza chain saw a great uptick in positive sentiment regarding their product, and they kept pace with technological advancement with their famous pizza tracking technology. If we were to include reviews from both eras of Dominos, we would end up with a confusing summary that doesn't reflect the true sentiment of a restaurant. Therefore we decided to limit our data by date.

Using the below queries we found that the most recent review for all of the data was very early on in 2022, and the oldest reviews go all the way back to 2005. Therefore, we decided that we should include 2022 along with the previous two years in order for our summarization to reflect the most recent states of the restaurants being summarized. The majority of the reviews were in the late 2010s, however, after the massive societal upheaval that was COVID, especially in the restaurant industry, we decided 2020 to be the most relevant early cutoff point for our summarization, even though it contained less reviews than the previous years. 2-3 years is more than enough time for sentiment to change in response to criticism, and we want our readers to have the clearest picture of a consensus based on the current opinions of the reviewers, in addition to the overall state of society.

In [0]:
year_distribution = spark.sql(f'''
  SELECT YEAR(date) AS year, COUNT(*) AS entry_count
  FROM restaurant_reviews_for_summarization_table
  WHERE YEAR(date) BETWEEN 2010 AND 2022
  GROUP BY YEAR(date)
  ORDER BY year
''')

display(year_distribution)


year,entry_count
2010,59857
2011,98779
2012,123225
2013,167108
2014,240829
2015,335344
2016,380065
2017,427915
2018,497362
2019,520380


Databricks visualization. Run in Databricks to view.

In [0]:
#This query shows the earliest reviews.
spark.sql(f'''
  SELECT date
  FROM restaurant_reviews_for_summarization_table
  GROUP BY YEAR(date), date
  ORDER BY date
''').show(vertical=True)

-RECORD 0-------------------
 date | 2005-02-16 04:06:26 
-RECORD 1-------------------
 date | 2005-03-01 16:57:17 
-RECORD 2-------------------
 date | 2005-03-01 16:59:37 
-RECORD 3-------------------
 date | 2005-03-01 17:25:13 
-RECORD 4-------------------
 date | 2005-03-01 17:59:26 
-RECORD 5-------------------
 date | 2005-03-01 19:33:35 
-RECORD 6-------------------
 date | 2005-03-02 22:13:59 
-RECORD 7-------------------
 date | 2005-03-04 02:06:36 
-RECORD 8-------------------
 date | 2005-03-04 02:40:25 
-RECORD 9-------------------
 date | 2005-03-09 06:37:47 
-RECORD 10------------------
 date | 2005-03-09 07:11:28 
-RECORD 11------------------
 date | 2005-03-09 07:14:58 
-RECORD 12------------------
 date | 2005-03-09 07:18:21 
-RECORD 13------------------
 date | 2005-03-09 07:20:08 
-RECORD 14------------------
 date | 2005-03-09 07:23:26 
-RECORD 15------------------
 date | 2005-03-10 05:39:45 
-RECORD 16------------------
 date | 2005-03-10 05:53:34 
-RECORD 17----

In [0]:
#This query shows the latest reviews.
spark.sql(f'''
  SELECT date
  FROM restaurant_reviews_for_summarization_table
  GROUP BY YEAR(date), date
  ORDER BY date DESC
''').show(vertical=True)

-RECORD 0-------------------
 date | 2022-01-19 19:48:25 
-RECORD 1-------------------
 date | 2022-01-19 19:48:13 
-RECORD 2-------------------
 date | 2022-01-19 19:47:59 
-RECORD 3-------------------
 date | 2022-01-19 19:46:34 
-RECORD 4-------------------
 date | 2022-01-19 19:45:56 
-RECORD 5-------------------
 date | 2022-01-19 19:45:43 
-RECORD 6-------------------
 date | 2022-01-19 19:44:03 
-RECORD 7-------------------
 date | 2022-01-19 19:42:22 
-RECORD 8-------------------
 date | 2022-01-19 19:39:46 
-RECORD 9-------------------
 date | 2022-01-19 19:35:57 
-RECORD 10------------------
 date | 2022-01-19 19:31:50 
-RECORD 11------------------
 date | 2022-01-19 19:30:14 
-RECORD 12------------------
 date | 2022-01-19 19:29:56 
-RECORD 13------------------
 date | 2022-01-19 19:29:13 
-RECORD 14------------------
 date | 2022-01-19 19:28:27 
-RECORD 15------------------
 date | 2022-01-19 19:28:07 
-RECORD 16------------------
 date | 2022-01-19 19:27:53 
-RECORD 17----

#### Credibility

In order to make sure the LLM is using the best material for it's summarization, we decided to create our own measure of credibility. Our decision was to make an aggregate field based off of the total number of compliments a Yelp user has received on their profile. A Yelp user can receive compliments for a variety of reasons, such as being cool, funny, a good writer, or including a lot of photos on their reviews. Regardless, a compliment is a compliment, and we have tallied the compliments a user received as a compliment count, and we will be using that as our measure for credibility, at least in positive reviews, by ordering the reviews in a way to where the most complimented reviewers are at the top of the list, and then later instructing the LLM to weigh the lists opinions with the top having more sway.

However, in the case of the above mentioned negative users, there are a few cases where this measure might be counterintuitive. A user might decide to leave a single negative review, signifying a uniquely bad experience, and never use Yelp again. If an experience was so bad it inspired this type of review, it should not be suppressed. Therefore, we will not be using the weighing system with the compliment count on our negative review collection.

In [0]:
spark.sql(f'''
  SELECT text, restaurant_name, review_stars, restaurant_review_count, 
         average_stars, date, compliment_count
  FROM restaurant_reviews_for_summarization_table
  WHERE business_id = 'YNgX5_SYHCXSoL9IMdVboA'
    AND (review_stars > 3 OR (review_stars >= average_stars)) 
    AND (YEAR(date) == 2021 OR YEAR(date) == 2020 OR YEAR(date) == 2022)
  GROUP BY YEAR(date), date, text, restaurant_name, review_stars, 
           restaurant_review_count, average_stars, compliment_count
  ORDER BY YEAR(date) DESC, compliment_count DESC
''').show(vertical=True)

-RECORD 0---------------------------------------
 text                    | Its been a long t... 
 restaurant_name         | Nu Yalk Pizza        
 review_stars            | 4.0                  
 restaurant_review_count | 448                  
 average_stars           | 3.55                 
 date                    | 2021-06-11 17:39:07  
 compliment_count        | 58                   
-RECORD 1---------------------------------------
 text                    | Whoa! So we've be... 
 restaurant_name         | Nu Yalk Pizza        
 review_stars            | 5.0                  
 restaurant_review_count | 448                  
 average_stars           | 3.78                 
 date                    | 2021-01-23 16:19:06  
 compliment_count        | 43                   
-RECORD 2---------------------------------------
 text                    | Ordered a large p... 
 restaurant_name         | Nu Yalk Pizza        
 review_stars            | 5.0                  
 restaurant_review_c

In [0]:
spark.sql(f'''
  SELECT text, restaurant_name, review_stars, restaurant_review_count, average_stars, date, compliment_count
  FROM restaurant_reviews_for_summarization_table
  WHERE business_id = 'YNgX5_SYHCXSoL9IMdVboA'
  AND review_stars > 3 OR review_stars >= average_stars
              ''').show(vertical=True)

-RECORD 0---------------------------------------
 text                    | After eating at B... 
 restaurant_name         | Mr. B's Bistro       
 review_stars            | 5.0                  
 restaurant_review_count | 2064                 
 average_stars           | 4.42                 
 date                    | 2020-01-19 01:30:05  
 compliment_count        | 13                   
-RECORD 1---------------------------------------
 text                    | Real treasure! To... 
 restaurant_name         | Moxie's Spirits &... 
 review_stars            | 5.0                  
 restaurant_review_count | 72                   
 average_stars           | 3.86                 
 date                    | 2021-08-14 01:28:12  
 compliment_count        | 0                    
-RECORD 2---------------------------------------
 text                    | Great pizza,lots ... 
 restaurant_name         | Peavine Taphouse ... 
 review_stars            | 5.0                  
 restaurant_review_c

#### Amount of Reviews
Normally, it shouldn't be too relevant how many reviews a user has made on Yelp, and we even went over a case on how a single bad review can be more telling than many good reviews. However, a single good review may be cause for suspicion. Some businesses employ a practice to incentivize a user to leave a good review for a business with the promise of discounts, benefits, or in my case, I have been given a Starbucks gift card. The users who leave a review in this case will generally be positive, because they view it as a quid pro quo for a reward. They may not use Yelp again. So we can safely devalue their opinion for integrity's sake.


There is also the issue of botted reviews in our digital age. If a business has a lot of 5-star reviews which are the only reviews on that user's account, it may be cause for suspicion. We already are using AI for this project, so it's best to eliminate suspected silicon life from other aspects of our process. 


So for positive reviews, we will be filtering out those who leave a positive review with only 1 review on their account.

In [0]:
spark.sql(f'''
  SELECT text, restaurant_name, review_stars, restaurant_review_count, 
         average_stars, date, compliment_count, user_review_count
  FROM restaurant_reviews_for_summarization_table
  WHERE business_id = 'YNgX5_SYHCXSoL9IMdVboA'
    AND (review_stars > 3 OR (review_stars >= average_stars)) 
    AND (YEAR(date) == 2021 OR YEAR(date) == 2020 OR YEAR(date) == 2022)
  GROUP BY YEAR(date), date, text, restaurant_name, review_stars, 
           restaurant_review_count, average_stars, compliment_count, user_review_count
  ORDER BY user_review_count
''').show(3, vertical=True)

-RECORD 0---------------------------------------
 text                    | Extremely unprofe... 
 restaurant_name         | Nu Yalk Pizza        
 review_stars            | 1.0                  
 restaurant_review_count | 448                  
 average_stars           | 1.0                  
 date                    | 2021-02-08 05:17:07  
 compliment_count        | 0                    
 user_review_count       | 1                    
-RECORD 1---------------------------------------
 text                    | The size of this ... 
 restaurant_name         | Nu Yalk Pizza        
 review_stars            | 4.0                  
 restaurant_review_count | 448                  
 average_stars           | 3.5                  
 date                    | 2021-11-05 03:34:33  
 compliment_count        | 6                    
 user_review_count       | 4                    
-RECORD 2---------------------------------------
 text                    | The second worst ... 
 restaurant_name    

The last consideration we have to make is for establishments which have an insufficient amount of reviews to generate a summary. We have decided that if a restaurants has less than three reviews, it should not generate a summary, both because it would be hard to identify general trends in a establishment with such paltry data, and because it would simply be more intuitive tfor the user to read all of the reviews. In this case, it would realistic to expect a user to read every review and make a decision based on their own judgment if they only had to read two. Three is also the earliest number it would take to break a tie if there is a split consensus. 

However, after looking at the data, within the restaurants we selected through wrangling, there are none with less than 5, only less than 6. We will still include the filter for less than 3 as a contingency, should a business meeting the criteria be added to the database. In today's digitial media environment, tweets can receive comments of too long; didn't read. With all the vying sources of attention, it's unrealistic to expect a hungry Yelp user to truly sit through reading all five reviews when they can get a summary elsewhere.

In [0]:
spark.sql('''
SELECT restaurant_review_count
FROM restaurant_reviews_for_summarization_table
WHERE restaurant_review_count < 5
''').show()
spark.sql('''
SELECT restaurant_review_count
FROM restaurant_reviews_for_summarization_table
WHERE restaurant_review_count < 6
''').show(5)

+-----------------------+
|restaurant_review_count|
+-----------------------+
+-----------------------+

+-----------------------+
|restaurant_review_count|
+-----------------------+
|                      5|
|                      5|
|                      5|
|                      5|
|                      5|
+-----------------------+
only showing top 5 rows



In [0]:
def get_positive_negative_reviews_together(inputs):
    if not inputs or len(inputs) < 1 or not inputs[0]:
        raise ValueError("Input list is empty or input1 is not provided.")
    
    input1 = inputs[0]
    # this checks whether of not the business is located in our table. If not, it was filtered out in the wrangling stage
    business_check_query = f"""
    SELECT COUNT(*) AS count
    FROM restaurant_reviews_for_summarization_table
    WHERE business_id = '{input1}'
    """
    business_check_df = spark.sql(business_check_query)
    business_count = business_check_df.collect()[0][0]  # Get the count of rows (if any)

    if business_count == 0:
        print(f"Business ID '{input1}' not found in the Restaurants Table. Business ID must be in the restaurants table to qualify.")
        pos_review_list = ['No reviews available']
        neg_review_list = ['No reviews available']
        pos_review_dict = {'sentiment': 'not a restaurant', 'review_list': pos_review_list}
        neg_review_dict = {'sentiment': 'not a restaurant', 'review_list': neg_review_list}
        return pos_review_dict, neg_review_dict


    # Review collector: Positive reviews
    query_pos = f"""
    SELECT text, restaurant_name, review_stars, restaurant_review_count, 
        average_stars, date, compliment_count, user_review_count
    FROM restaurant_reviews_for_summarization_table
    WHERE business_id = '{input1}'
    AND (review_stars > 3 OR (review_stars >= average_stars)) 
    AND YEAR(date) IN (2020, 2021, 2022)
    AND user_review_count > 1
    ORDER BY YEAR(date) DESC, compliment_count DESC
    """
    
    df_restaurant_data_pos = spark.sql(query_pos)
    pos_reviews = df_restaurant_data_pos.collect()

    pos_review_list = [row.text for row in pos_reviews]

    # Attaches the restaurant name for LLM to identify
    restaurant_name = [row.restaurant_name for row in pos_reviews]
    pos_review_list.insert(0, restaurant_name)

    # Review collector: Negative reviews
    query_neg = f"""
    SELECT text, restaurant_name, review_stars, restaurant_review_count, 
        average_stars, date, compliment_count
    FROM restaurant_reviews_for_summarization_table
    WHERE business_id = '{input1}'
        AND review_stars <= 3
        AND YEAR(date) IN (2020, 2021, 2022)
    ORDER BY YEAR(date) DESC
    """

    df_restaurant_data_neg = spark.sql(query_neg)
    neg_reviews = df_restaurant_data_neg.collect()

    neg_review_list = [row.text for row in neg_reviews]
   
    pos_review_dict = {'sentiment': 'positive', 'review_list': pos_review_list}
    neg_review_dict = {'sentiment': 'negative', 'review_list': neg_review_list}

    return pos_review_dict, neg_review_dict


The function creates the output below, two dictionaries which are sorted by positive and negative, and all of the reviews which were determined to fit into those categories. This is what we will be feeding into our first prompt.

In [0]:
get_positive_negative_reviews_together(['Hp3Ony7yW60VPuWHQFIIHA'])

Business ID 'Hp3Ony7yW60VPuWHQFIIHA' not found in the Restaurants Table. Business ID must be in the restaurants table to qualify.
Out[17]: ({'sentiment': 'not a restaurant', 'review_list': ['No reviews available']},
 {'sentiment': 'not a restaurant', 'review_list': ['No reviews available']})

In [0]:
get_positive_negative_reviews_together(['YNgX5_SYHCXSoL9IMdVboA'])

Out[18]: ({'sentiment': 'positive',
  'review_list': [['Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza',
    'Nu Yalk Pizza'],
   'Its been a long time since ive been here, and seeing as i have never reviewed them i felt i needed to do that. I ordered 2 pepperoni pizza slices and an ice tea for pick up. I arrived 10 minutes after ordering and when i arrived, the really nice gal behind the counter informed me that it was coming out soon. I  soon received a medium sized box and my dri

#### Dictionary Converter Function

The next function we need is something that converts the output from one part of our chain into two dictionaries so that it can be read by the next part.

In [0]:
def get_dicts(output):
    """
    Extracts the positive and negative sentiment blocks from the given output.
    Returns a dictionary with keys 'positive' and 'negative' for further processing.
    """
    # Ensure output is in the expected format
    if not isinstance(output, list) or len(output) != 2:
        raise ValueError("Expected a list of two dictionaries for positive and negative sentiments.")

    # Extract positive and negative blocks
    positive = next((item for item in output if item["sentiment"] == "positive"), None)
    negative = next((item for item in output if item["sentiment"] == "negative"), None)

    # Validate that both blocks exist
    if not positive or not negative:
        raise ValueError("Output must contain both 'positive' and 'negative' sentiment blocks.")

    # Return the structured dictionary
    return {"positive": positive, "negative": negative}

## Building the Chain for Summarizations

### Prompt Construction

#### First Prompt: Identifying the Sentiment and Themes

We want to identify common themes in our reviews before passing them over for summarization. This prompt allows us to take the result of our function, the positive and negative reviews for a restaurant, and identify what reviewers are saying. We want to identify the broad themes that keep occuring across many restaurants and synthesize them into universal categories that can be judged as positive or negative, so that our AI can pass judgement on whether the positives outweigh the negatives within that category, and ultimately overall, for the good of the user's Yelp experience. We have used PyDantic in order to create the format for our json output, giving us a way of effieciently creating new fields for our generation. It follows this schema:

`[
  {
    "restaurant_name": positive ,
    "sentiment": ,
    "categories":
  },
   {
    "restaurant_name": negative,
    "sentiment": ,
    "categories":
  }
]`

Within the instructions, we make sure to alert the LLM to the criteria we determined above so it can consider it during it's processing, and the user input will be the list of reviews that we collected with our function.

As for the model, we have altered the temperature and max output tokens. The temperature is a measure of response variability, with 0 being consistent in it's response from input to input, and 1 being wildly variable and creative. We set ours to 0.3, as we want the AI to have some variation on response, but that should be kept to an appropriate minimum, and the most variation should be based on the inputs, not the AI. If we give it too much liberty, it might end up changing it's mind on the same data if consulted multiple times, which would end up lowering a user's trust on the credibility of the summary if it keeps flip-flopping. As for the tokens, we found that the default amount only allowed for a short sentence for each of the fields. By increasing the tokens to 1500, we are able to generate more a few sentences per point. We don't want to clog our summaries with walls of text for the sake of our readers, so 1500 is the limit. Our summaries will still read as concise, but not overly-scarce in information.

In [0]:
from langchain_core.prompts import PromptTemplate


class ReviewTheme(BaseModel):
  """theme found in multiple reviews."""
  theme: str = Field(description="name of the theme identified")
  description: str = Field(description="provide a description of the theme")
  reviews: int = Field(description="Provide the number of reviews the theme was found in")

class ReviewCategory(BaseModel):
    """Broad category encompassing multiple review themes."""
    category: str = Field(description="list the broad category under which the themes identified fall under")
    themes: List[ReviewTheme] = Field(description="theme found in multiple reviews.")

class SentimentCategories(BaseModel):
    """List of ReviewCategory instances for positive or negative reviews."""
    restaurant_name: str = Field(description="The name of the restaurant as detected from the beginning of the positive reviews list.")
    sentiment: str = Field(desc="can be positive, negative, or not a restaurant")
    categories: List[ReviewCategory] = Field(description="Broad category encompassing multiple review themes.")




parser = JsonOutputParser(pydantic_object=SentimentCategories)

format_instructions = parser.get_format_instructions



template = """You are a data analyst summarizing the main themes in a list of {sentiment} reviews for a single restaurant. If the sentiment is 'not a restaurant', ignore all instructions and output a message notifying that the selected establishment was determined not to be a restaurant. The order in which you receive reviews will be ordered by reviewer credibility if the reviews are positive, so weigh their opinions more when summarizing without completely disregarding the lower reviews. At the beginning of the list of positive reviews, you will receive the restaurant name, which should be identified as such. Then, identify the main themes of {sentiment} aspects of the restaurant that are discussed in multiple reviews. Next, summarize each theme as a description using a detailed multiple paragraph format. Next, group common themes into broader categories. Then, as output provide the sentiment as to whether the reviews are negative or positive, a bullet point list of the categories, the theme descriptions within each category in a detailed paragraph format, and a count of the number of reviews that were {sentiment} about that theme. If the overall sentiment is positive, the order of the reviews received will be in order of credibility, so you should weigh the review's opinions in that order. This does not apply if the sentiment is negative. Additionally, if a review is included with the positives, but has negative aspects, it is likely an inclusion based on the score in relation to the average of that user's reviews. You should focus on the positive aspects of that review in that case."

{format_instructions}

% USER INPUT:
{review_list}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(
    input_variables=["sentiment", "review_list"],
    partial_variables={"format_instructions": format_instructions},
    template=template,
    temperature=0.3,
    max_output_tokens=1500
)


#### Second Prompt: Deciding the Consensus and Summarizing Based on the Categories

For our second prompt, it's time for our LLM to make decision based on the data it has received. By taking the output of the previous prompt, it will weigh based on these five categories:

- Food Quality
- Service
- Atmosphere
- Pricing
- Cleanliness
- Restaurant-Specific Quirks

The first four categories are what we identified as categories that could apply to any food-serving establishment, and would be the most important things that could vary in between reviews that would affect peoples opinions. Food Quality speaks for itself, the restaurant's primary purpose is to serve food, so it's quality will be the most consistent thing in a review.

Service is also an aspect that every establishment will offer, whether as a sit down restaurant or a drive-thru, the response people get from the employees working the job will be the at least the secondmost thing on every reviewer's mind.

Atmosphere is less important in some cases, but we also decided that it could apply to every restaurant. Even in drive-thru restaurants, you get a different sense of atmosphere from say, McDonalds to Chick-Fil-A. One gives off a vibe of laziness a lot of times, think 'broken icecream machine", while the other resembles a well-oiled military operation being executed with machine precision.

All restaurants exchange goods for services, and so pricing is another universal metric through which we can consistently collect reviews mentioning it. Whether or not a restaurants is overpriced or good value for your money will likely influence reviews.

Finally, with Cleanliness, we choose to include this because the general hygiene of a restaurant can swing a review quite far, even if the baseline people expect and find at the majority of establishments is good, by regulation. You can have a great tasting dinner with prompt service and a comfy atmosphere, but if a cockroach crawls out of your dinner, suddenly cleanliness of the establishment has rocketed to the top of your review points. Hygiene is a practice that every establishment will have to practice to varying degrees of success, so this earns its spot as a universal category.

In addition to the above five universal categories, we have decided to include restaurant-specific quirks as it's own section as an efficient way of grouping all other themes that don't fit into the categories we decided on, which will appear as niche themes based on what restaurant is being analyzed by the AI. This allows for us to catch any key insights by the reviewers that don't fall into the above categories.

In [0]:

class ReviewTheme(BaseModel):
  """theme found in multiple reviews."""
  theme: str = Field(description="list the title of the identified theme. Use 'Counterpoint:' if the theme contrasts with the overall sentiment, whether positive or negative, but only if there are agreeing points to counter. Skip 'Counterpoint:'' if all points are contrary to the overall sentiment.")
  description: str = Field(description="provide a description of the theme. if contrary to the overall sentiment, do not contradict the sentiment in your description")
  reviews: int = Field(description="Provide the number of reviews ")

class ReviewCategory(BaseModel):
    """Broad category encompassing multiple review themes."""
    category: str = Field(description="list the broad category under which the themes identified fall under")
    themes: List[ReviewTheme] = Field(description="theme found in multiple reviews.")


class SentimentCategories(BaseModel):
    """Structured output for restaurant reviews."""
    restaurant_name: str = Field(description="The name of the restaurant as detected from the previous input.")
    total_reviews: int = Field(description="the total amount of reviews analyzed")
    sentiment: str = Field(description="The overall sentiment: either 'positive' or 'negative'.")
    categories: List[ReviewCategory] = Field(description="""A structured list of review categories, always including:
        - Food Quality
        - Service
        - Atmosphere
        - Pricing
        - Cleanliness
        - Restaurant-Specific Quirks.
    """)
    summary: str = Field(description="The overall summary of the restaurant review analysis.")

    

parser = JsonOutputParser(pydantic_object=SentimentCategories)

sum_format_instructions = parser.get_format_instructions



#### Instructions for the LLM
Now that we have defined the format, we can give the LLM specific directions.

We are primarily asking the LLM to come to a decision on the sentiment based on the previous output, while following the format instructions. However, there are additional instructions that we have decided to give the LLM. We want the LLM to deliver a message if the business selected by the user is not a restaurant according to our criteria decided upon in the wrangling notebook. We allow for flexibility on the category names, for example, instead of listing "Food Quality" as a category, if the LLM is analyzing a pizza restaurant, it can name the category "Pizza Quality", to make the summary more contextual. We have, however, set the order of listing the categories in stone. We have also asked the LLM to omit the category sections if the reviews do not mention any of the universal categories. 

While we agree that every business will have these five aspects, that is not a guarantee that the reviews for that restaurant will contain mention of those themes. This signifies either that it wasn't worth commenting on either way, or that the other aspects heavily outweighed those omitted in the minds of reviewers that they forgot to mention them, being unremarkable. It is also key for the LLM to omit any mention of specific workers in our summary to avoid targetted harassment, even if the Yelp reviewer includes them in their review. Finally, we have instructed the LLM to leave it's own views out of the summary. Despite using AI, we want to deliver as human a product as possible by preserving the thoughts and opinions of the Yelp reviewers in our summaries. It is important to differentiate ourselves from just asking an LLM to do the thinking for us.

We also realize the potential for mixed reviews. If there are positive aspects in an otherwise negative consensus, we still want those to be listed, but we want them to be listed after the negative points. The reason this is is because if the AI has made a decision on a consensus, we don't want confused messaging by immediately following up with a contradictory statement to the main sentiment. The counterargument doesn't generally come first in a debate.

Our adjustments to the temperature and max tokens are the same as above, but our user input has changed to the positive and negative themes collected by the previous prompt. These will be the variables on which our AI makes its ultimate judgement of positive vs. negative.

In [0]:
sum_template = """You are a data analyst receiving a list of the main themes in a list of positive and negative reviews for a single restaurant. 

If the input notifies you that the selected establishment is not a restaurant, print a message that notifies the user that the business id they have selected for analysis is not a restaurant and ignore all of the directions I have listed.

You will be receiving a list of positive and negative categories and common themes within those categories. If a restaurant has more of a reviews of a certain sentiment, the sentiment should be positive or negative accordingly, although if the split is very close, the sentiment can be 'mixed'.
The categories you will be looking at are these five broad ones: Food Quality, Service, Atmosphere, Pricing, Cleanliness, and an additional category for unique quirks specific to the restaurant. Do not deviate from or combine these categories. The categories should always be listed in this order within the summary, with restaurant specific things at the end. The names can change slightly based on the context, but fit the same theme. If a restaurant had more reviews of a certain sentiment within a category, reflect that in your summary of that category and it's themes. For themes that are contrary to the overall sentiment, whether positive or negative make sure they are listed last in your summary of that category. In your description of these counterarguments, do not use words that contradict the overall sentiment, or your previous descriptions of the prevailing sentiment. Emphasize that these are counterpoints, but not the prevailing sentiment.

If there are no reviews that fit within a certain category, do not output that category. Avoid mentioning the names of employees in your summary. Your analysis should contain only views and opinions of the reviewers, none of your personal opinions as an LLM should be taken into consideration. After summarizing the themes, print an overall summary of the consensus of the restarant, written in a concise and formal style. Detect the name of the restaurant through the context of the reviews. Finally, collect a total count of every review you analyzed.

{format_instructions}

% USER INPUT:
{positive}, {negative}

YOUR RESPONSE:
"""
sum_prompt_template = PromptTemplate(
    input_variables=["positive", "negative"],
    partial_variables={"format_instructions": sum_format_instructions},
    template=sum_template,
    temperature=0.3,
    max_output_tokens=1500
)

### Chain Construction

#### The First Chain
The first chain utilizes a business ID and a list of reviews to extract sentiment and emotion-related information, invoking a function that fetches both positive and negative reviews.

This first chain starts by turning our function to collect the positive and negative restaurants together into a runnable, `runnable_restaurants`, using the RunnableLamba function of langchain, which runs the function after it is invoked later in the chain. Then we construct a 3-part chain utilizing the earlier `prompt_template` to establish the input structure for the Google Gemini model, which is signified by `model`, and then passed to the parser, `parser`. The RunnableLambda is wrapped in a runnable each to ensure that each dictionary returned by the function is ran through our three-part chain, and finally we invoke a business ID through the RunnableEach function, and then dump the output as json.

This step returns raw, detailed insights on both positive and negative themes identified by the LLM in a semi-readable format.


In [0]:
runnable_restaurants = RunnableLambda(get_positive_negative_reviews_together)

parser = JsonOutputParser()

chain = prompt_template | model | parser

runnable_each = runnable_restaurants | RunnableEach(bound=chain)

output = runnable_each.invoke(['mNw3UU6PPUAeS31VKgM-qw'])
print(json.dumps(output, indent=2))


[
  {
    "restaurant_name": "Claim Jumper Steakhouse & Bar",
    "sentiment": "positive",
    "categories": [
      {
        "category": "Food Quality",
        "themes": [
          {
            "theme": "Large Portions",
            "description": "Multiple reviews emphasize the generous portion sizes served at Claim Jumper.  Reviewers consistently mention that the portions are so large that they often have leftovers, sometimes enough for another meal. This aspect is highlighted as a significant positive, particularly for those seeking value for their money or those who enjoy taking food home. The sheer quantity of food received is repeatedly praised, indicating that it is a key factor in customer satisfaction and a major draw for many diners.  Even when some dishes are not perfect, the large portion size is still noted as a positive.",
            "reviews": 11
          },
          {
            "theme": "Delicious Entrees",
            "description": "Many reviewers rave about

#### The Full Chain
The full chain builds upon the first chain by processing the themes and categories identified and produces conclusive insights for better understanding of customer sentiment by users and actionable improvements for restaurant owners. In our summarization chain, `sum chain`, we incorporate the dictionary conversion function in order to convert the first output into a format readable by our next prompt, `sum_prompt_template` before passing it again through our model and parser. The final chain counts the total number of reviews and sums insights across positive and negative and decides based on the data. The chain has an additional summarization step compared to the first link where insights are counted and aggregated into an overall summary. The output is still structured in a json format.

In [0]:
runnable_restaurants = RunnableLambda(get_positive_negative_reviews_together)

parser = JsonOutputParser()

chain = prompt_template | model | parser

runnable_each = runnable_restaurants | RunnableEach(bound=chain)

output = runnable_each.invoke(['mNw3UU6PPUAeS31VKgM-qw'])

sum_chain = RunnableLambda(get_dicts) | sum_prompt_template | model | parser

final_output = sum_chain.invoke(output)
print(json.dumps(final_output, indent=2)) 

{
  "restaurant_name": "Claim Jumper Steakhouse & Bar",
  "total_reviews": 168,
  "sentiment": "mixed",
  "categories": [
    {
      "category": "Food Quality",
      "themes": [
        {
          "theme": "Large Portions",
          "description": "Multiple reviews highlight the generous portion sizes served at Claim Jumper. Reviewers consistently mention that the portions are so large that they often have leftovers, sometimes enough for another meal. This is frequently cited as a positive aspect, especially given the generally positive feedback on the taste and quality of the food.",
          "reviews": 12
        },
        {
          "theme": "Delicious Entrees",
          "description": "A significant number of reviews praise the taste and quality of the main courses. Specific dishes like the meatloaf, steaks (porterhouse, ribeye), chicken pot pie, salmon, and various sandwiches receive particular acclaim. Reviewers describe the food as \"to die for,\" \"phenomenal,\" and \"d

### Formatting the Output
Although for people acquainted with tech could probably comprehend the json output as it is, there are a lot of laymen who will assume upon seeing plain text in brackets that something has glitched out on the website or app they are reading and attempt to refresh the page. 

For our purposes, semi-readable is not enough, and so we will be converting the final output from our chain into Markdown, which turns our json into a nicely formatted report. Our title will be a definitive statement on the consensus followed by an invitation to read more, and then disclosing the amount of reviews the summary is based on, for transparancy that the following statements are a summary and not original thought. Next we list out all of our categories and themes and conclude with an overall general statement on the restaurant marked in importance by a higher-level header than the categories. This should draw the eyes of a reader who chooses to skim through the categories to the end of the summary.

In [0]:
output_list = []
line_break = '<br/>'
#So we can insert it in our f-strings
# We add a level 1 header based on the sentiment.
title = f"#Customers are feeling {final_output['sentiment'].title()} about {final_output['restaurant_name'].title()}. Here are the reasons why:"
output_list.append(title)
summary_header = f"## Summary Based on {final_output['total_reviews']} Reviews"
output_list.append(summary_header)
# We loop through each category, adding the category label as a header 2.
for category in final_output['categories']:
    output_list.append(f"###{category['category'].title()}")
    # We loop through each theme in the category.
    for theme in category['themes']:
        output_list.append(f"* **{theme['theme'].title()}:**{line_break}{theme['description']}")
overall_header = "##Overall"
output_list.append(overall_header)
overall_text = final_output['summary']
output_list.append(overall_text)
# The output is all now in the output_list and we use the
# display and Markdown functions from IPython to display it as markdown.
python_display(Markdown('\n'.join(output_list)))

#Customers are feeling Mixed about Claim Jumper Steakhouse & Bar. Here are the reasons why:
## Summary Based on 168 Reviews
###Food Quality
* **Large Portions:**<br/>Multiple reviews highlight the generous portion sizes served at Claim Jumper. Reviewers consistently mention that the portions are so large that they often have leftovers, sometimes enough for another meal. This is frequently cited as a positive aspect, especially given the generally positive feedback on the taste and quality of the food.
* **Delicious Entrees:**<br/>A significant number of reviews praise the taste and quality of the main courses. Specific dishes like the meatloaf, steaks (porterhouse, ribeye), chicken pot pie, salmon, and various sandwiches receive particular acclaim. Reviewers describe the food as "to die for," "phenomenal," and "delicious," indicating a high level of satisfaction with the flavor and preparation of the entrees.
* **Tasty Appetizers:**<br/>Several reviews mention the positive experience with appetizers, with specific praise for items such as the lemon zucchini appetizer, pretzel bites, potato skins, and mozzarella sticks. Reviewers describe these appetizers as "wow," "so good," and "super yummy."
* **High-Quality Desserts:**<br/>The restaurant's desserts, particularly the chocolate cake, receive significant praise. Reviewers describe the chocolate cake as "huge," "super yummy," and "fabulous."
* **Counterpoint: Subpar Food Quality:**<br/>Multiple reviews cite the decline in food quality since the Landry's acquisition. Dishes are described as bland, overcooked, undercooked, or otherwise not meeting expectations for a restaurant of this price point. Specific examples include burnt or overcooked meats (steak, salmon, scallops, lobster), under-seasoned dishes, old or stale ingredients (salad, fries, baked potato), and small portions for the price.
* **Counterpoint: Inconsistent Food Preparation:**<br/>A recurring issue highlighted in many reviews is the inconsistency in food preparation. Orders are frequently reported as arriving with incorrect items, incorrect temperatures (cold food, undercooked or overcooked dishes), or with missing components.
###Service Quality
* **Excellent Service:**<br/>Many reviews emphasize the exceptional service provided by the staff. Reviewers frequently mention the friendliness, attentiveness, and efficiency of the servers.
* **Quick Service:**<br/>Several reviews mention the speed and efficiency of service, both for seating and food delivery. Reviewers frequently note that they were seated quickly, especially when the restaurant wasn't busy.
* **Counterpoint: Slow Service:**<br/>Numerous reviews mention excessively long wait times for food and drinks, even when the restaurant is not busy. This slow service is reported as impacting both dine-in and takeout orders.
* **Counterpoint: Inattentive Or Rude Staff:**<br/>Several reviews mention negative interactions with staff members. Waitstaff is described as inattentive, rude, or unhelpful. Issues range from slow service and forgetting orders to outright disrespectful behavior.
###Ambiance
* **Pleasant Atmosphere:**<br/>The restaurant's ambiance is described as pleasant in several reviews. Reviewers mention the comfortable seating and the overall atmosphere as contributing to a positive dining experience.
###Other Issues
* **Counterpoint: Restaurant Condition:**<br/>At least one review highlights concerning issues related to the restaurant's physical condition, specifically a strong smell of mold and sewage emanating from the restrooms.
* **Counterpoint: Menu Changes And Value:**<br/>Many reviews express disappointment with changes to the menu, specifically the reduction of menu items and smaller portion sizes. The perception is that the restaurant is prioritizing profit over quality and customer satisfaction.
* **Counterpoint: Pricing:**<br/>The high prices are a common complaint across many reviews. Customers feel that the cost of the food does not justify its quality, portion size, or overall experience.
##Overall
Claim Jumper Steakhouse & Bar receives mixed reviews. While many praise the large portions, delicious entrees, tasty appetizers, high-quality desserts, and excellent service, significant concerns exist regarding inconsistent food preparation, slow service, inattentive or rude staff, and a decline in food quality since the Landry's acquisition.  High prices and a perceived decrease in value are also common complaints.  A reported issue of unsanitary restroom conditions further detracts from the overall experience.

#### Running Multiple IDs
In order to generate multiple summaries at once, we have created the function `run_multiple_ids(id_list)`. The parameter takes a list of business IDs and runs them through our chain and prints the formatted markdown for each summary, but first checks for whether a business is not a restaurant or has too few reviews for a summary. 

In order to notify the user of when a new generation has begun, we made sure to start each loop with a message notifying them of the new generation. Then the error checks are made before generating the summary if it passes the checks. If the error checks hit, we made sure to generate markdown explaining why the summary was not generated as normal. This function simplifies the process of running different IDs by standardizing the method of changing them out without messing with the chain, allowing for less error potential. 

We can finally call the function below to showcase the final product of our work.

In [0]:
def run_multiple_ids(id_list):
  for business_id in id_list:
    print(f'Generating summary for business_id: {business_id}...')
    pos_review_dict, neg_review_dict = get_positive_negative_reviews_together([business_id])
    # displays error for invalid business id
    if pos_review_dict['sentiment'] == 'not a restaurant' or neg_review_dict['sentiment'] == 'not a restaurant':
            output_list = []
            line_break = '<br/>'
            title = f"#Error: Invalid Business ID"
            output_list.append(title)
            output_list.append(f"###{business_id} is not a restaurant according to our criteria")
            python_display(Markdown('\n'.join(output_list)))
            continue
    # displays error message for insufficient review count
    elif(len(pos_review_dict['review_list']) + len(neg_review_dict['review_list']) < 3):
            output_list = []
            line_break = '<br/>'
            title = f"#Error: Not Enough Reviews to Generate Summary"
            output_list.append(title)
            output_list.append(f"###{business_id} does not have enough user reviews to generate a useful summary. Displaying User Reviews:")
            for review in pos_review_dict['review_list']:
              output_list.append(review)
            for review in neg_review_dict['review_list']:
              output_list.append(review)
            python_display(Markdown('\n'.join(output_list)))
            continue
    else:

    
      runnable_restaurants = RunnableLambda(get_positive_negative_reviews_together)

      parser = JsonOutputParser()

      chain = prompt_template | model | parser

      runnable_each = runnable_restaurants | RunnableEach(bound=chain)

      output = runnable_each.invoke([business_id])

      sum_chain = RunnableLambda(get_dicts) | sum_prompt_template | model | parser

      final_output = sum_chain.invoke(output)
      json.dumps(final_output, indent=2)
      output_list = []
      line_break = '<br/>'
      #So we can insert it in our f-strings
      # We add a level 1 header based on the sentiment.
      title = f"#Customers are feeling {final_output['sentiment'].title()} about {final_output['restaurant_name'].title()}. Here are the reasons why:"
      output_list.append(title)
      summary_header = f"## Summary Based on {final_output['total_reviews']} Reviews"
      output_list.append(summary_header)
      # We loop through each category, adding the category label as a header 2.
      for category in final_output['categories']:
          output_list.append(f"###{category['category'].title()}")
          # We loop through each theme in the category.
          for theme in category['themes']:
              output_list.append(f"* **{theme['theme'].title()}:**{line_break}{theme['description']}")
      overall_header = "##Overall"
      output_list.append(overall_header)
      overall_text = final_output['summary']
      output_list.append(overall_text)
      # The output is all now in the output_list and we use the
      # display and Markdown functions from IPython to display it as markdown.
      python_display(Markdown('\n'.join(output_list)))

In [0]:
run_multiple_ids(['196CWwMAtAcA21jYiMyRzg', 'DVBJRvnCpkqaYl6nHroaMg', 'WC8vQdCC-nSawCh2IV4epg', 'foh6hwQxjCs0SeLT5MO1SQ', 'VnpokM7AD0zYXfyDNEDe6g'])

Generating summary for business_id: 196CWwMAtAcA21jYiMyRzg...


#Customers are feeling Mixed about Cafe Pontalba. Here are the reasons why:
## Summary Based on 51 Reviews
###Food Quality
* **Delicious Food:**<br/>Multiple reviews consistently praise the deliciousness of the food served at Cafe Pontalba. Specific dishes like the Cajun combo (jambalaya, gumbo, crawfish pie), shrimp and grits, fish po-boy, and crab cakes receive high praise for their taste and quality ingredients. Even seemingly simple dishes, like the omelets and eggs benedict, are described as flavorful and well-prepared. The crawfish pie is mentioned as a particular highlight, noted for its tasty and fresh flavor, free from any fishy aftertaste. The overall consensus is that the food exceeds expectations, even for those who initially had reservations based on prior reviews.
* **High-Quality Ingredients:**<br/>Reviewers emphasize the freshness and quality of the ingredients used in Cafe Pontalba's dishes. The crab cakes are described as almost entirely crab meat, flavorful, and not salty – a testament to the restaurant's commitment to using high-quality ingredients. The freshness of the seafood is also highlighted in multiple reviews, contributing to the overall positive perception of the food quality. The detailed descriptions of the dishes suggest a focus on sourcing good ingredients and preparing them skillfully.
* **Counterpoint: Bland Food:**<br/>Multiple reviews cite the food as bland and lacking flavor. Dishes such as the Cajun egg benedict, red beans and rice, and seafood platter were specifically mentioned as being disappointingly flavorless. The overall consensus is that the food lacks the seasoning and depth of flavor expected, especially considering the restaurant's location in New Orleans, known for its rich culinary tradition.
* **Counterpoint: Poorly Cooked Dishes:**<br/>Several reviews highlighted issues with the cooking of specific dishes. Overcooked eggs (poached and otherwise), undercooked potatoes, and cold, old-tasting fries were frequently mentioned. The inconsistent cooking suggests a lack of attention to detail and potentially inadequate training in the kitchen.
* **Counterpoint: Inaccurate Dish Descriptions/Preparation:**<br/>A recurring complaint centers on the mismatch between menu descriptions and the actual dishes served. Catfish advertised as sautéed was often fried, and the pecan sauce was frequently described as bitter. The jambalaya was criticized for being essentially Zatarain's dirty rice with a few shrimp. This inconsistency erodes customer trust and suggests a lack of care in adhering to menu descriptions and standards in food preparation.
* **Counterpoint: Use Of Low-Quality Ingredients:**<br/>Reviewers noted the use of low-quality ingredients, such as frozen shrimp and instant-tasting mashed potatoes. The implication is that the restaurant prioritizes cost-cutting measures over using fresh, high-quality ingredients, which directly impacts the taste and overall dining experience.
###Service And Atmosphere
* **Excellent Service:**<br/>The service at Cafe Pontalba receives overwhelmingly positive feedback. Reviewers consistently mention the staff's friendliness, efficiency, and attentiveness. Even during busy lunch rushes, the servers are described as working nonstop and providing prompt, courteous service. The ability to accommodate large parties is also noted, highlighting the staff's adaptability and willingness to go the extra mile. The overall impression is one of a well-trained and dedicated staff committed to providing a positive dining experience.
* **Great Location And Ambiance:**<br/>Cafe Pontalba's prime location in Jackson Square is a major draw for many patrons. The open-air seating offers stunning views of the square and provides ample opportunities for people-watching. The restaurant's interior is also praised for its beautiful decor, featuring vintage French Quarter charm and dark hardwood interiors. The combination of a fantastic location, pleasant ambiance, and the lively atmosphere adds to the overall appeal of the restaurant. The open doors and windows are also mentioned as contributing to a breezy and pleasant dining experience, especially in warmer weather.
* **Cleanliness:**<br/>The cleanliness of the restaurant is mentioned positively in several reviews. The bathrooms are specifically highlighted for their fresh scent and overall cleanliness, which is often seen as an indicator of the restaurant's overall hygiene standards. This aspect, while seemingly minor, adds to the overall positive impression of the dining experience, suggesting a well-maintained and cared-for establishment.
* **Counterpoint: Poor/Inattentive Service:**<br/>Multiple reviews describe the service as poor, inattentive, or unfriendly. Waiters were reported as having attitudes, being unhelpful, and neglecting basic customer needs such as refilling water glasses. The slow service and lack of attentiveness contributed negatively to the overall dining experience.
* **Counterpoint: Restaurant Atmosphere:**<br/>Several reviews mentioned negative aspects of the restaurant's atmosphere. These included a high number of flies inside, cramped and unpleasant bathrooms, and lack of central air conditioning, relying only on fans and open doors. These issues detract from the overall dining experience and suggest a lack of attention to maintaining a pleasant and comfortable environment for patrons.
###Covid Safety
* **Covid-Safe Practices:**<br/>One review specifically praises Cafe Pontalba for its Covid-19 safety measures, highlighting it as the best experience in the French Quarter. The review mentions mask-wearing by staff, friendly and knowledgeable service, and the collection of contact information for contact tracing purposes. Although only one review explicitly addresses this, it suggests a commitment to patron safety during a challenging time.
##Overall
Cafe Pontalba receives mixed reviews. While many praise the delicious food, high-quality ingredients, excellent service, and great location, some criticize the blandness of certain dishes, inconsistencies in cooking, inaccurate menu descriptions, use of low-quality ingredients, poor and inattentive service, and negative aspects of the restaurant's atmosphere.  The overall experience appears inconsistent, with both highly positive and highly negative experiences reported.

Generating summary for business_id: DVBJRvnCpkqaYl6nHroaMg...


#Customers are feeling Positive about Tumerico. Here are the reasons why:
## Summary Based on 282 Reviews
###Food Quality And Taste
* **Flavorful And Unique Dishes:**<br/>Many reviewers repeatedly emphasized the exceptional and unique flavors of Tumerico's dishes.  The creative use of spices and ingredients resulted in a culinary experience unlike any other Mexican restaurant, vegan or otherwise.  Reviewers were particularly impressed by the innovative ways in which traditional Mexican dishes were reimagined using plant-based ingredients, highlighting the restaurant's ability to deliver bold, complex flavors without relying on meat. The skillful use of jackfruit as a meat substitute was frequently praised, with many describing its texture and taste as remarkably similar to pulled pork or chicken. This mastery of flavor extended to the accompanying sides, such as the rice, beans, and salads, which were often described as equally delicious and well-seasoned. The overall consensus is that Tumerico offers a truly unique and unforgettable dining experience.
* **High-Quality Ingredients:**<br/>The use of fresh, high-quality, and often locally-sourced ingredients was another recurring theme in the positive reviews. Reviewers frequently commented on the freshness and quality of the produce, noting that the taste of the food reflected this commitment to superior ingredients. This attention to detail extended beyond the main dishes to the sauces, salsas, and other accompaniments, all of which were praised for their freshness and flavor.  The overall impression is that Tumerico prioritizes quality ingredients, resulting in a superior culinary experience that is both delicious and satisfying.
* **Generous Portions:**<br/>A common thread running through the positive reviews is the generous portion sizes offered at Tumerico.  Many reviewers noted that the amount of food they received was more than enough for one person, often resulting in leftovers for another meal. This value for money was appreciated by many, especially given the high quality of the food. The abundance of food was described as both satisfying and filling, ensuring that customers left feeling full and content. The generous portions contributed to the overall positive dining experience, making Tumerico a place where customers felt they received excellent value for their money.
* **Counterpoint: Lack Of Flavor:**<br/>Multiple reviews mentioned a lack of flavor in various dishes. The Al Pastor tacos and Poblano tostados were described as lacking in flavor, even though the ingredients were present. The vegan Pozole was particularly criticized for having a strange and unappealing taste, failing to satisfy a customer's craving for the dish. Even the generally positive reviews noted that the food, while good, could be improved by enhancing the flavors of dishes like the beans and complimentary soup.
* **Counterpoint: Missing Ingredients/Inconsistent Portions:**<br/>Several reviews highlighted inconsistencies in food preparation. One review reported that their Al Pastor tacos were missing components, containing only jackfruit and pico. Another review noted that the rice and bean portion sizes were underwhelming, particularly considering the price point of the dishes.
###Service And Atmosphere
* **Friendly And Welcoming Staff:**<br/>The overwhelmingly positive reviews highlight the exceptional service provided by Tumerico's staff.  Reviewers consistently praised the friendliness, helpfulness, and attentiveness of the employees. Many described the staff as welcoming and accommodating, going above and beyond to ensure a positive dining experience.  The staff's knowledge of the menu and their willingness to offer recommendations were also frequently mentioned. This positive interaction with the staff contributed significantly to the overall positive impression of the restaurant, making customers feel valued and appreciated.
* **Pleasant And Casual Ambiance:**<br/>The restaurant's ambiance was also a source of positive feedback. Reviewers described the atmosphere as casual, relaxed, and welcoming, creating a comfortable and enjoyable dining experience. The decor was often praised for its unique and inviting character, enhancing the overall atmosphere. Many reviewers felt that Tumerico provided a perfect setting for a casual meal with friends or family, highlighting the restaurant's ability to create a pleasant and welcoming environment for all types of diners. The casual yet inviting atmosphere complemented the high-quality food and friendly service, contributing to the overall positive experience.
* **Counterpoint: Take-Out Issues:**<br/>Negative experiences with take-out orders were reported. One review described a situation where their order was lost, although it was remade quickly. This points to potential organizational issues within the restaurant's take-out system.
###Menu And Options
* **Creative Menu And Daily Specials:**<br/>The restaurant's menu was a frequent source of praise, with many reviewers highlighting the creativity and variety of dishes on offer. The daily changing menu, based on fresh, seasonal ingredients, was seen as a positive aspect, ensuring that customers could always try something new and exciting. The menu's focus on vegan and vegetarian options was also appreciated, with many reviewers praising the restaurant's ability to create delicious and innovative plant-based dishes. The combination of creativity, variety, and fresh ingredients made the menu a significant contributor to Tumerico's overall positive reputation.
* **Vegan/Vegetarian Options:**<br/>A significant number of reviewers explicitly praised Tumerico's commitment to vegan and vegetarian cuisine.  Many were surprised and delighted by the quality and flavor of the plant-based dishes, often stating that they would happily return even if they weren't vegetarian or vegan. This positive feedback underscores the restaurant's success in appealing to a broad audience, including those who are not typically drawn to vegan or vegetarian restaurants. The high quality of the vegan/vegetarian options broadened Tumerico's appeal and contributed to its popularity.
###Pricing And Value
* **Counterpoint: High Prices:**<br/>The high cost of the food was a recurring concern. One review expressed shock at the $80 price tag for four plates of vegan food. Another long-time patron stated their love for the restaurant but inability to afford the high prices. This suggests that the restaurant's pricing may not align with the perceived value of the food, especially given the portion-related issues mentioned in other reviews.
##Overall
Tumerico receives overwhelmingly positive feedback, praised for its unique and flavorful vegan Mexican dishes, high-quality ingredients, generous portions, friendly staff, and pleasant ambiance. While some concerns exist regarding pricing and occasional inconsistencies in food preparation and take-out service, the overall consensus points to a highly enjoyable dining experience.

Generating summary for business_id: WC8vQdCC-nSawCh2IV4epg...


#Customers are feeling Positive about Atlantis Steakhouse. Here are the reasons why:
## Summary Based on 194 Reviews
###Food Quality
* **Steak Quality And Preparation:**<br/>Multiple reviews consistently praise the quality and preparation of the steaks. Reviewers frequently use terms like "cooked to perfection," "melted in my mouth," and "perfectly seasoned" to describe their steak experiences.
* **Seafood Quality And Preparation:**<br/>Beyond the steaks, the restaurant's seafood offerings also receive significant praise. Reviewers highlight the freshness and quality of the seafood, noting that dishes like lobster tails, scallops, and Chilean sea bass were "cooked to perfection" and "delicious."
* **Appetizers And Sides:**<br/>The appetizers and side dishes are not merely an afterthought; they are integral parts of the overall dining experience, frequently mentioned in positive reviews.
* **Desserts:**<br/>The desserts offered at Atlantis Steakhouse consistently receive rave reviews, often described as a memorable part of the dining experience.
* **Counterpoint: Poor Quality Prime Rib:**<br/>Multiple reviews cite the prime rib as being subpar. Descriptions include terms like 'tough', 'chewy', 'tasteless', and 'dry'.
* **Counterpoint: Overcooked Or Undercooked Steaks:**<br/>Several reviewers reported inconsistencies in the cooking of their steaks. Some steaks arrived overcooked, described as dry and tough, despite being ordered to a specific doneness. Others were undercooked, arriving 'bloody' even when ordered medium-well.
* **Counterpoint: Poor Quality Sides:**<br/>Beyond the main courses, multiple reviews also criticized the quality of the side dishes. The twice-baked potato contained a hard object, causing dental damage to one diner. The mac and cheese was described as runny and tasteless, while the creamed spinach was bland.
* **Counterpoint: High Prices For Average Food:**<br/>Several reviewers felt that the food quality did not justify the high prices charged. The consensus is that the food was merely 'good,' not 'great,' or even 'average' in some cases, making the high cost a significant point of contention for numerous diners.
###Service Quality
* **Attentive And Friendly Service:**<br/>The attentiveness and friendliness of the staff are repeatedly mentioned as key contributors to the positive dining experience. Reviewers frequently praise the servers' ability to anticipate needs, refill drinks promptly, and provide helpful recommendations.
* **Table-Side Service:**<br/>The inclusion of table-side service for certain dishes, such as Steak Diane and shrimp scampi, is a frequently cited positive aspect of the dining experience. Reviewers describe these presentations as entertaining and enhancing the overall atmosphere.
* **Extra Touches:**<br/>Numerous reviews mention small, thoughtful gestures by the staff that contribute to the overall positive experience. These include providing warm towels before and after the meal, rose petals on the tables, personalized seating cards, and taking the initiative to refill drinks and clear plates without being asked.
* **Counterpoint: Inattentive Or Poor Service:**<br/>A significant number of reviews highlighted problems with the service. Issues ranged from servers being inattentive and slow to respond to requests, to a lack of understanding of basic dietary restrictions (gluten-free).
* **Counterpoint: Excessive Wait Times:**<br/>Multiple reviews mentioned unusually long wait times for food, with some tables waiting up to two hours for their entrees. This extended wait time, compounded by the arrival of food that was cold in several instances, created a severely negative dining experience.
* **Counterpoint: Poor Handling Of Complaints:**<br/>Several instances of poor complaint handling were reported. In one case, a manager refused to remove a damaged side dish from the bill, instead insisting on an incident report. In another, a promised credit was never applied to a bill, despite repeated attempts to contact management.
###Ambiance And Atmosphere
* **Elegant And Luxurious Atmosphere:**<br/>The restaurant's elegant and luxurious atmosphere is repeatedly mentioned as a significant positive aspect of the dining experience. Reviewers describe the decor as beautiful, sophisticated, and romantic, often mentioning specific details such as the lighting, seating, and overall ambiance.
* **Counterpoint: Unpleasant Seating Arrangements:**<br/>One review described being seated in a banquet room with poor lighting and no music, despite the main dining area being nearly empty. Another mentioned being seated too close to other tables, negating the intended social distancing measures.
##Overall
Atlantis Steakhouse receives overwhelmingly positive reviews, with consistent praise for the quality of its steaks and seafood, attentive service, and elegant atmosphere. While some negative feedback exists regarding inconsistencies in steak preparation, service issues, and occasional seating problems, the positive aspects significantly outweigh the negative, resulting in an overall positive customer experience.

Generating summary for business_id: foh6hwQxjCs0SeLT5MO1SQ...
Business ID 'foh6hwQxjCs0SeLT5MO1SQ' not found in the Restaurants Table. Business ID must be in the restaurants table to qualify.


#Error: Invalid Business ID
###foh6hwQxjCs0SeLT5MO1SQ is not a restaurant according to our criteria

Generating summary for business_id: VnpokM7AD0zYXfyDNEDe6g...


#Customers are feeling Mixed about Marlton Diner. Here are the reasons why:
## Summary Based on 49 Reviews
###Food Quality
* **Delicious Food:**<br/>Multiple reviewers highlighted the deliciousness of the food served at the Marlton Diner. Specific dishes mentioned include the huevos rancheros, which one reviewer described as 'delicious', and the omelets, which were praised for staying hot throughout the meal. The pancakes and challah French toast were also enjoyed by members of one family's party. Even the coffee received positive comments for its strength and flavor, exceeding expectations set by previous reviews.
* **Large Portions:**<br/>At least one reviewer noted the substantial size of the portions, specifically mentioning a 'huge omelet' that remained steaming even after significant consumption. This suggests that the restaurant provides generous servings of food, exceeding expectations of typical portion sizes.
* **Counterpoint: Food Quality And Taste:**<br/>Multiple reviews cite issues with the quality and taste of the food. Dishes were described as cold, old, stale, burnt, flavorless, and in one case, even containing someone else's leftovers. The ingredients were often missing or not as described. Even simple items like pancakes and home fries were deemed inedible. This consistent negative feedback across various menu items points to a significant problem with food preparation and ingredient freshness.
* **Counterpoint: Food Temperature:**<br/>A recurring complaint centers around the temperature of the food served. Many reviewers noted that their meals arrived cold, regardless of the item ordered. This suggests a potential problem with food holding or timing issues in the kitchen.
###Service
* **Fast Service:**<br/>One reviewer explicitly mentioned that their food was delivered 'very fast'. This indicates efficient and prompt service, contributing to a positive dining experience.
* **Friendly Staff:**<br/>The friendliness of the staff was a recurring positive point. One reviewer, despite initial reservations about the restaurant's appearance, noted that the staff were 'super friendly', significantly improving their overall experience. Another reviewer also mentions 'great service', specifically describing it as 'exceptional'.
* **Counterpoint: Slow Service And Inattentiveness:**<br/>Many reviewers complained about slow service, long wait times for food, and inattentive waitstaff. Some reported difficulty getting drink refills, while others noted that the staff did not check on their tables or address their concerns. The understaffing, while acknowledged by the restaurant, seems to be a significant contributor to these service problems.
* **Counterpoint: Staff Attitude And Friendliness:**<br/>Several reviews mention negative interactions with the staff, describing them as rude, unfriendly, and annoyed. This lack of hospitality further contributes to the negative dining experience reported by numerous customers.
###Ambiance
* **Outdoor Seating:**<br/>One reviewer found the outdoor seating arrangement, while initially seeming unusual, to be a positive aspect of the restaurant. This suggests that the outdoor seating area is well-maintained and contributes to a pleasant dining atmosphere.
###Cleanliness
* **Counterpoint: Restaurant Cleanliness:**<br/>Numerous reviews highlight the unsanitary conditions of the restaurant. Descriptions include sticky booths, dirty floors, grease smears on walls and windowsills, dead bugs and cobwebs, and generally unkempt surroundings. The bathrooms were specifically called out as filthy. The overall impression is one of significant neglect in maintaining a clean and hygienic environment for diners.
##Overall
The Marlton Diner receives mixed reviews. While some customers praise the delicious food, large portions, fast service, friendly staff, and outdoor seating, many others express concerns about food quality and temperature, slow and inattentive service, and the uncleanliness of the restaurant.  A more thorough investigation into kitchen practices and restaurant maintenance is needed to address the negative feedback.

# Limitations

While our approach to structuring and summarizing Yelp data is thorough, there are notable limitations that should be acknowledged within the notebook process:

**Potential Biases from Dataset Limitations:**
The dataset is skewed towards star ratings 3 and above, which is, although consistent with the idea that most users leave reviews for either highly positive or negative experiences, poses potential problems. This imbalance may limit the comprehensiveness of sentiment analysis, as moderate reviews might not be adequately represented or may be oversimplified.

**Databricks Memory Constraints:**
The memory limitations in Databricks posed challenges when handling the full volume of Yelp’s raw data. To stay within system constraints, processing had to be optimized, and some data tables were truncated. Additionally, visualizations were often disabled or not working due to Databricks' safeguards on overloading cluster's memory. These memory limitations could omit valuable information.

**Limited to Top-Level Categories:**
The analysis focuses on high-level restaurant categories, which may overlook smaller subcategories or less common types of establishments. This could lead to less specific insights for users who rather look for unique or specialized dining experiences.

**Geographical Scope:** 
The project analyzes data from only 11 metro areas. Smaller markets or regional trends are not represented, and the dataset's geographic focus of these specific metro areas might skew findings towards only urban dining experiences.

**Limitations of Gemini and LangChain in Tandem:** 
While themes are extracted using LangChain, the AI may overlook subtle but potentially meaningful patterns due to inherent limitations in the categorization process. Gemini also may account for potential system biases and omitting of certain aspects of reviews due to the nature of the model. Lastly, the token limit ensures concise summaries but may result in truncation of detail in cases with high complexity in the reviews.