# Project Goals
The goal of this project is to perform a sentiment analysis on search results from Google recommendations, such as "Starbucks Oslo" or "Elkjøp Storo." The main idea is that relying solely on star ratings is an inconsistent way of measuring customer satisfaction for a given place. Therefore, I chose to implement sentiment analysis to get an idea of whether most reviews are, for example, "frustrated," "positive," or "negative."

For each search, the Google Places API will return up to five results—for instance, five Starbucks locations in Oslo. I will then perform a sentiment analysis on each place and conduct an overall NLP-based (Natural Language Processing) evaluation of all the places to decide on the best overall option.

---

# Approach

## Logical Flow
This project utilizes a sequential logical flow, resembling a pipeline or Chain of Thought (CoT).

We first start by gathering the place IDs (a unique identifier for each business location) and peripheral data (name and address) using the Google Places API. This will give us up to five relevant places based on our search term (such as "Starbucks Oslo").

Then, we obtain more data about each place using another Places API based on the place IDs. Here, we get access to the reviews, as well as ratings and additional peripheral data.

At this point, we are ready to start our NLP sequence.

- **Step One:** Loop through every place and every review for each place, writing a short sentiment summary of each review and labeling each review with single-word descriptors such as "happy," "frustrated," "dirty," etc. This is the first step in our CoT. We also assign a 0-10 rating to each review, where 0 is highly negative and 10 is highly positive.
- **Step Two:** Loop through every place and generate a short sentiment review, as well as sentiment labels for each place based on all the reviews for that place. This is the second step in our CoT and serves as a summarization of all the reviews for a single place. We then calculate the average rating for the place, which is done manually (i.e., outside the LLM).
- **Step Three:** Decide on the best place by evaluating each summary, the labels for each place, and the average rating. The LLM decides on the best place holistically, rather than simply selecting the one with the highest rating. The result is presented as a print statement containing a short summary of why this is the best place, along with the rating and labels.

---

## Code Structure
The code is structured in pairs of blocks. The first block contains the method definitions (e.g., `get_place_id(place_name)`), and the following block executes these methods with the necessary loops (e.g., `get_place_ids('Starbucks Oslo')`). The second block also prints the output.

**Please use the variable `PLACE_OF_INTEREST` in the next codeblock to select your location. Other than that you are set.**
- First, we gather the place IDs using `get_place_ids`.
- Then, we get the reviews using `get_reviews`.
- Next, we label each review with sentiments using `sentiment_grade_review` (this method makes a lot of API calls and takes a couple of minutes).
- After that, we perform sentiment grading for each place by summarizing the sentiments of each review using `make_overall_sentiment_rating`.
- Finally, we decide on the best place by comparing the sentiments of each place using `select_best_place`.

---

# Challenges

## API Costs and Speed
This pipeline makes a lot of API calls, with one run typically taking two to three minutes. This is both slow and expensive. A possible solution might be to use threading for improved performance and to reduce the amount of text being processed. For instance, one could try to select the best place by just looking at the labels, disregarding the summaries. Experimenting with lighter, cheaper models might also be a reasonable approach.

## Hallucinations
During testing, I observed that the average score of some places was negative—for example, -3.7 on a 0-10 scale. This is a misjudgment by the model, as it sometimes interprets -10 as highly negative, 0 as neutral, and 10 as positive, even though the instructions specify that 0 is negative and 10 is positive.

## Old Reviews
Reviews that are old (e.g., 3+ years) are not as relevant. Ideally, I should have filtered these out.

---

# Key Takeaways

## Vast Amounts of Unused NLP-Based Data
There is a lot of unstructured (natural language) data waiting to be exploited. Such data is often user-generated—in the form of reviews, complaints, comments, etc.—and is extremely valuable for the business since it comes directly from the users.

Other exciting applications may include advanced FAQ bots, sophisticated RAG-chains based on internal company data, and more. For an implementation of RAG, please try out my ChatGPT + Google Drive application, which creates an embedding store of your Google Drive on *drivesearcher.com*.

## Modern LLMs Are Incredibly Good at Context
The first LLMs, such as ChatGPT 3.0, were good at generating general text but struggled to understand user instructions (such as prompt tuning) and context. Modern LLMs, on the other hand, are highly adaptable and understand context in a meaningful way, making applications like the one showcased below much more useful.

## Chain of Thought
This script should have been built in LangChain or LangGraph, for that matter. As demonstrated by OpenAI, a chain of thought is a highly efficient way of enhancing a model's performance. Instead of using a single chat instance that tells the model, "There are 10 reviews for each place. Here are 5 places; select the best one," it is better to break up the process into a chain of thought, similar to how humans would process the data mentally. This is the approach I have taken:
- **First:** Mark the sentiment of each review.
- **Second:** Mark the sentiment of each place based on the reviews.
- **Third:** Decide on the best place based on sentiment.

For a production environment, I would definitely have used LangChain, as it simplifies typing and data management, but due to the extra setup overhead, I decided to skip it for now.

## Typing
My biggest challenge along the way was not understanding the JSON structure of the previous method. For instance, I spent a lot of time going back and forth trying to determine where the name of the place was stored. If I had used typing from the start, the development process would have been much easier and quicker.


---
# Using LLMs versus traditional approaches
As we can see in the bottom-most codeblock, based solely on star-rating one would want to go to Starbucks located in Nedre Storgate 6, 3015 Drammen, Norway. This is different from the models recomendation which (during this testrun) were  Starbucks at Jernbanetorget, siting its good customer service and convenient location.

---
# Findings and Potensial improvements
Next time, i will definetly use typing as knowing where your data is saves yo so much time on debugging. A framework like LangGraph or LangChain, which has a **state** would be really usefull as each method in my implementation can be viewed upon as data enriching.

A large pitfall for me, which i fell inn head-on was the selection of models. I went in with the though "newer is better" and used the gpt4.5 preview. After some ammount of testing i hit my 5usd api-limit which was astonoshing considering the low volume of tokens. Upon inspecting the pricing page of openai i realised that gpt4.5 is 35 times as expensive as gpt 4o and over 100 times gpt o3-mini. Selecting the model is extremely important as they are costly but you need to balance out with a capable model. From my testing, i found 4o to be the sweeet-spot, delivering consistent results.

Summarising, i find my approach to be quite good as it consistently selects the places with the best sentimental reviews, which are not always the places with the highest rating. Also, the model is consistent. That is, it is giving me the same result for the same query every time i run it.



In [1]:
#load env vars
import os
from dotenv import load_dotenv

# Load variables from .env file
load_dotenv()

PLACES_API_KEY = os.getenv("PLACES_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_MODEL = os.getenv("OPENAI_MODEL")
PLACE_OF_INTEREST = "Burger king"

In [2]:
#get the necesary data from the GOOGLE PLACES API
import requests

#use semantic search to get the PLACE_ID, then fetch the reviews

def get_place_ids(input_text='starbucks'):
    #fields = "name,rating,place_id"

    url = (
        f"https://maps.googleapis.com/maps/api/place/textsearch/json?"
        f"query={input_text}&key={PLACES_API_KEY}"
    )

    # Make the GET request
    response = requests.get(url)
    data = response.json()

    places = []
    for place in data.get("results", []):
        places.append({
            "name": place.get("name"),
            "rating": place.get("rating"),
            "place_id": place.get("place_id"),
            "address": place.get("formatted_address")
        })

    # # Print results
    # for p in places:
    #     print(f"Name: {p['name']}, Rating: {p['rating']}, Place ID: {p['place_id']}, Address: {p['address']}")

    return places

In [3]:
#retutns a list of {place_ids}
places = get_place_ids(PLACE_OF_INTEREST)
print(places)

[{'name': 'Burger King', 'rating': 3.4, 'place_id': 'ChIJ3zOCu69oQUYRfKslbOCzpPo', 'address': 'Lofsrudveien 6, 1281 Oslo, Norway'}, {'name': 'Burger King', 'rating': 3.5, 'place_id': 'ChIJqYk108xoQUYR5Wi8yf6-FUg', 'address': 'Ekebergveien 235, 1162 Oslo, Norway'}, {'name': 'Burger King', 'rating': 3.8, 'place_id': 'ChIJiWX_42NuQUYREEhSOAtnsvg', 'address': 'Torggata 24, 0183 Oslo, Norway'}, {'name': 'Burger King', 'rating': 3.5, 'place_id': 'ChIJr-mEAX5uQUYRoHB5O5TtXYw', 'address': 'Klingenberggata 5, 0161 Oslo, Norway'}, {'name': 'Burger King Nygata', 'rating': 3.5, 'place_id': 'ChIJxap6ImJuQUYRU2yvtRTRZ1M', 'address': 'Storgata 14-18, 0184 Oslo, Norway'}, {'name': 'Burger King', 'rating': 3.5, 'place_id': 'ChIJy6tMtw9vQUYRUSqK68GcaaM', 'address': 'Plogveien 6, 0679 Oslo, Norway'}, {'name': 'Burger King Oslo Central Station', 'rating': 2.8, 'place_id': 'ChIJOT0lHIpuQUYR2V6zQbbZJ2A', 'address': 'Jernbanetorget 1, 0154 Oslo, Norway'}, {'name': 'Burger King', 'rating': 3.7, 'place_id': 'C

In [4]:
from urllib.parse import urlencode
#fetch the reviews based on place id

parameters = ['rating', 'reviews', 'types']
fields_str = ','.join(parameters) 

base_url = "https://maps.googleapis.com/maps/api/place/details/json"

def get_reviews(places :list):
    places_with_reviews = {}

    for place in places:
        place_id = place['place_id']
        name = place['name']
        address = place['address']
        rating = place['rating']

        query_params = {
            "fields": fields_str,
            "place_id": place_id,
            "key": PLACES_API_KEY
        }
        url = f"{base_url}?{urlencode(query_params)}"

        response = requests.get(url)
        data = response.json()
        print(data)

        #distill the necesary data
        result = data['result']
        reviews :list= result['reviews']

        #extract the necesary data from the reviews
        #that is: rating, text, time (maybe relative_time_description
        #add an ID (index) for easier managment
        formatted_reviews = []
        for i in range(len(reviews)):
            text = reviews[i]['text']
            formatted_reviews.append({
                "id": place_id,
                "index": i,
                "rating": rating,
                "text": text ,
                "name": name,
                "address": address
            })

        places_with_reviews[place_id]=formatted_reviews
    
    return places_with_reviews

In [5]:
places_with_reviews = get_reviews(places=places)
print(places_with_reviews)

{'html_attributions': [], 'result': {'rating': 3.4, 'reviews': [{'author_name': 'Bl Guda', 'author_url': 'https://www.google.com/maps/contrib/118082117704329096529/reviews', 'language': 'en', 'original_language': 'en', 'profile_photo_url': 'https://lh3.googleusercontent.com/a-/ALV-UjVJCZQ-rxvcxj88Qm5qrXv07F-gOhOcOP05iGIDSHwMuSWfYHbUDw=s128-c0x00000000-cc-rp-mo-ba5', 'rating': 5, 'relative_time_description': '4 months ago', 'text': 'We arrived a minute before they closed and they served us. Thanks to the sweet Nasteha and the rest of the staff who prepared our order. ❤️', 'time': 1731096466, 'translated': False}, {'author_name': 'Sara Berggren', 'author_url': 'https://www.google.com/maps/contrib/101357963932785970184/reviews', 'language': 'en', 'original_language': 'en', 'profile_photo_url': 'https://lh3.googleusercontent.com/a/ACg8ocIpc5lvYq-ckgc_o2MzBeEOh9Kf4CBnXWrIyfBuZjlGtxzR=s128-c0x00000000-cc-rp-mo', 'rating': 1, 'relative_time_description': 'a year ago', 'text': 'Orderd gluten-f

In [6]:
#invoke the llm, prompt-tune it to become a sentiment analasysist
#i use langgraph as i have used this before
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

system_prompt = """
    Asses the sentiment of the review below.
    Make a short, couple sentence summary of the sentiment. 
    Add some single-word labels that describe teh sentiment.
    Add a score of 0-10 where 10 is highly positive sentiment, while 0 is highly negative"
"""

#summary: a short textual description of the sentiment
#label: a one-word description of the sentiment such as "happy", "sad", "disappointed", "delucional"
class Sentiment(BaseModel):
    summary: str
    labels: list[str]
    score: int

#returns a set of reviews
def sentiment_grade_review(reviews :list):

    #loop throiugh each review and store the data in llm_responses
    llm_responses = []

    for review in reviews:
        place_id = review['id']
        index = review['index']
        rating = review['rating']
        text = review['text']
        name = review['name']
        address = review['address']

        print(f"working on review {index+1}/{len(reviews)}")

        #skip the reviews with no text
        if not text:
            pass

        completion = client.beta.chat.completions.parse(
            model=OPENAI_MODEL,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": text},
            ],
            response_format=Sentiment,
        )
        event = completion.choices[0].message.parsed

        llm_responses.append({
            "id": place_id,
            "index": index,
            "name": name,
            "address": address,
            "rating": rating,
            "text": text,
            "summary": event.summary,
            "labels": event.labels,
            "score": event.score
        })
    
    return llm_responses




In [7]:
places_with_sentiment = {}
for index, place_id in enumerate(list(places_with_reviews.keys())):  
    print(f"\n Working on place {index+1}/{len(places_with_reviews.keys())}")
    reviews = places_with_reviews[place_id] #list of reviews

    sentiments =  sentiment_grade_review(reviews=reviews)
    places_with_sentiment[place_id] = sentiments

print(places_with_sentiment)




 Working on place 1/20
working on review 1/5
working on review 2/5
working on review 3/5
working on review 4/5
working on review 5/5

 Working on place 2/20
working on review 1/5
working on review 2/5
working on review 3/5
working on review 4/5
working on review 5/5

 Working on place 3/20
working on review 1/5
working on review 2/5
working on review 3/5
working on review 4/5
working on review 5/5

 Working on place 4/20
working on review 1/5
working on review 2/5
working on review 3/5
working on review 4/5
working on review 5/5

 Working on place 5/20
working on review 1/5
working on review 2/5
working on review 3/5
working on review 4/5
working on review 5/5

 Working on place 6/20
working on review 1/5
working on review 2/5
working on review 3/5
working on review 4/5
working on review 5/5

 Working on place 7/20
working on review 1/5
working on review 2/5
working on review 3/5
working on review 4/5
working on review 5/5

 Working on place 8/20
working on review 1/5
working on revie

In [8]:
system_prompt = f"""
    Based on the labels and summaries below you are to make an overall sentimentanalysis.
    Write a short summary describing the general sentiment.
    Add the most common sentiment-labels
"""

class OverallSentiment(BaseModel):
    summary: str
    labels: list[str]


def make_overall_sentiment_rating(sentiments :list):
    number_of_reviews = len(sentiments)
    total_score = 0
    summaries = ''
    total_labels = []

    for s in sentiments:
        # place_id = s['id']
        # index = s['index']
        # rating = s['rating']
        # text = s['text']
        summary = s['summary']
        labels = s['labels']
        score = s['score']

        total_score += score
        summaries += "\n\n\n" + summary
        total_labels.extend(labels)

    avg_score = round(total_score / number_of_reviews, 1)

    prompt = summaries + "\nLabels: " + ", ".join(total_labels)

    #pass to llm
    completion = client.beta.chat.completions.parse(
        model=OPENAI_MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ],
        response_format=OverallSentiment,
    )
    event = completion.choices[0].message.parsed

    summary = event.summary
    labels = event.labels

    summarised_review = {
        "name": sentiments[0]['name'], #extraxt the name and adr from the first review
        "address": sentiments[0]['address'],
        "summary": summary,
        "labels": labels,
        "avg_score": avg_score,
    }

    return summarised_review
        

In [9]:
summarised_reviews = {}
for index, (place_id, sentiment) in enumerate(places_with_sentiment.items(), start=1):
  print(f"working on summarising reviews for place {index}/{len(places_with_sentiment.keys())}")
  pws = places_with_sentiment[place_id]
  summarised_review = make_overall_sentiment_rating(sentiments=pws)
  summarised_reviews[place_id] = summarised_review

print(summarised_reviews)

working on summarising reviews for place 1/20
working on summarising reviews for place 2/20
working on summarising reviews for place 3/20
working on summarising reviews for place 4/20
working on summarising reviews for place 5/20
working on summarising reviews for place 6/20
working on summarising reviews for place 7/20
working on summarising reviews for place 8/20
working on summarising reviews for place 9/20
working on summarising reviews for place 10/20
working on summarising reviews for place 11/20
working on summarising reviews for place 12/20
working on summarising reviews for place 13/20
working on summarising reviews for place 14/20
working on summarising reviews for place 15/20
working on summarising reviews for place 16/20
working on summarising reviews for place 17/20
working on summarising reviews for place 18/20
working on summarising reviews for place 19/20
working on summarising reviews for place 20/20
{'ChIJ3zOCu69oQUYRfKslbOCzpPo': {'name': 'Burger King', 'address': 'L

In [10]:
system_prompt = f"""
    Based on the sentiment summaries, labels and average score below you are to decide on the best place. Write a short summary about why this is the best place. 
    In the summary, include the average rating. Add labels describing the place.
"""


class Bestplace(BaseModel):
    summary: str
    labels: list[str]

"""
{
  "place_id": {
    "summary": "Short textual summary of the overall sentiment.",
    "name":str
    "labels": [
      "label1",
      "label2",
      "label3"
    ],
    "avg_score": float
  }
}
"""

def select_best_place(summarised_reviews :dict):
    
    prompt = ''
    for place_id in summarised_reviews.keys():
        place = summarised_reviews[place_id]
        summary = place['summary']
        labels = place['labels'] #list
        name = place['name']
        address = place['address']
        avg_score = place['avg_score']

        prompt += f"\n\n\n Name of the place: {name} \n Address: {address} \n Average 0-10 score, where 10 is highly positive is {avg_score} \n Summary: {summary} \n Labels: {labels}"

    completion = client.beta.chat.completions.parse(
    model=OPENAI_MODEL,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ],
    response_format=OverallSentiment,
    )

    event = completion.choices[0].message.parsed
    return event

In [11]:
best_place = select_best_place(summarised_reviews=summarised_reviews)
summary = best_place.summary
labels = best_place.labels

print(f"""
{summary}
\n\n
This place has the following sentiment labels:\n
{", ".join(labels)}
      """)



The best choice based on the sentiments and ratings is Burger King located at Sjøskogenveien 7, 1407 Ski, Norway. This location has an average rating of 8.4 out of 10, reflecting a predominantly positive sentiment among reviewers. Customers commonly express satisfaction with the service efficiency, quality of food, and the family-friendly atmosphere. Despite minimal mentions of issues with food consistency, most reviewers highly recommend the venue due to their positive experiences with the staff and service overall. This indicates that this particular location delivers a consistent and enjoyable dining experience, making it a top choice.



This place has the following sentiment labels:

positive, satisfied, praise, happy, recommendation
      


In [None]:
#Comapare my results to just the star-rating
places = get_place_ids(PLACE_OF_INTEREST) #starbucks oslo
best_place = None
best_rating = 0

for place in places:
    rating = place['rating']
    if rating>best_rating:
        best_place=place
        best_rating=rating

print(f"""
The best place based solely on star-rating is {place['name']} located in {place['address']}.
This place has a star-rating of {best_rating}
\n\n
Compared to our sentiment model: \n{summary}
      """)


The best place based solely on star-rating is Burger King, Skedsmo mall located in Furuholtet 1, 2020 Skedsmokorset, Norway.
This place has a star-rating of 4.6



Compared to our sentiment model: 
The best choice based on the sentiments and ratings is Burger King located at Sjøskogenveien 7, 1407 Ski, Norway. This location has an average rating of 8.4 out of 10, reflecting a predominantly positive sentiment among reviewers. Customers commonly express satisfaction with the service efficiency, quality of food, and the family-friendly atmosphere. Despite minimal mentions of issues with food consistency, most reviewers highly recommend the venue due to their positive experiences with the staff and service overall. This indicates that this particular location delivers a consistent and enjoyable dining experience, making it a top choice.
      
