### Importing Required Libraries
In this cell, we import several libraries that are necessary for this notebook:
- `pandas` and `numpy` are standard libraries for data manipulation and numerical operations.
- `os` is used to interact with the operating system, particularly for accessing environment variables like API keys.
- `dotenv` allows us to securely load environment variables from a `.env` file, keeping sensitive information like API keys out of the code itself.
- Finally, the `OpenAI` library is imported to interact with OpenAI's API, which will be used to generate text completions based on the queries we send.

These imports set the foundation for interacting with APIs securely and managing responses efficiently.

In [1]:
## Standard Libraries
import pandas as pd
import numpy as mp
import os # Module used to interact with the operating system environment
from dotenv import load_dotenv # Load environment variables from a .env file (which I have saved in my VSC code editor)

# Third Party Libraries
from openai import OpenAI # OpenAI API client for Python

### Loading Environment Variables
This cell uses `load_dotenv()` to load environment variables from a `.env` file. Storing sensitive data such as API keys in a `.env` file is a best practice because it keeps them out of the source code. The `dotenv` library reads this file and allows us to securely access the variables within our Jupyter Notebook.

In [2]:
load_dotenv()

True

### Setting Up the OpenAI API Key and Client
In this step, we retrieve the OpenAI API key that was securely stored in the `.env` file and assign it to the `openai_api_key` variable. Using `os.environ`, we can access this key without hardcoding it.

The `OpenAI` client is initialized here, allowing us to communicate with OpenAI's GPT API. We chose to explore and test the OpenAI API because of its popularity and wide-ranging application in AI-driven applications. By playing with the API, I aim to get a better understanding of how it works under the hood, and how it processes requests and generates responses. 

In [3]:
openai_api_key = os.environ["OPENAI_API_KEY"] # Access the OpenAI API key from environment variables
client = OpenAI() # Initialize the OpenAI API client

### Making an API Request to OpenAI
In this cell, we send a request to the OpenAI API using the GPT-4 model. The request consists of a "system" message that sets the tone of the conversation (in this case, a helpful assistant) and a "user" message that asks a specific question: "Should an MBA student learn Python?" The API will process this request and return a text completion with the relevant information. 

In [4]:
response = client.chat.completions.create(
  model="gpt-4", # This is the type of model we want to use. For our testing, we will default to the most advanced model available today
  messages=[
    {"role": "system", "content": "You are a helpful assistant."}, # This is the default sytem prompt set by OpenAI. We will change this later
    {"role": "user", "content": "Should an MBA student learn Python?"}, ## This variable is akin to the prompt you would input in the traditional UI
  ]
)

### Viewing the Raw JSON Response
Here, we inspect the full JSON response returned by the OpenAI API. This response contains metadata about the request, as well as the generated answer from the API. Viewing the raw JSON allows us to understand the structure of the response, which includes various fields such as the message content, tokens used, and model information.

In [5]:
response.json()

'{"id":"chatcmpl-ADFgj3UxkH0t9dyRDI7MKZyIsMENo","choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Absolutely, learning Python can be extremely beneficial for an MBA student. Python is one of the most popular and fastest-growing programming languages, and it\'s commonly used for data analysis, machine learning, and automation, among other things. Particularly if an MBA student is interested in a career in data analysis or tech management, Python skills can be very valuable. It can help with tasks like automating repetitive tasks, analyzing large data sets, and making data-driven decisions. However, whether or not it is necessary depends on the specific career goals of the student.","refusal":null,"role":"assistant","function_call":null,"tool_calls":null}}],"created":1727721933,"model":"gpt-4-0613","object":"chat.completion","service_tier":null,"system_fingerprint":null,"usage":{"completion_tokens":109,"prompt_tokens":24,"total_tokens":133,"completion_tok

### Extracting the API's Message Content
This cell extracts the actual message content from the API's response. By drilling down into the `response` object, we focus on the specific answer that the API generated for the user’s query. This step is important as it filters out the metadata and other auxiliary information, allowing us to see only the relevant content.

In [6]:
message = response.choices[0].message.content

### Displaying the Message Content
Finally, we display the message content that was extracted in the previous step. This output shows the response generated by the OpenAI API, providing the answer to the query about MBA's and Python. This marks the completion of the API request process, demonstrating how we can query the model and extract useful information.

In [7]:
message

"Absolutely, learning Python can be extremely beneficial for an MBA student. Python is one of the most popular and fastest-growing programming languages, and it's commonly used for data analysis, machine learning, and automation, among other things. Particularly if an MBA student is interested in a career in data analysis or tech management, Python skills can be very valuable. It can help with tasks like automating repetitive tasks, analyzing large data sets, and making data-driven decisions. However, whether or not it is necessary depends on the specific career goals of the student."

### Testing with a Custom System Prompt

In this cell, we experiment by changing the default system prompt that typically instructs the assistant to be helpful. Instead, we provide a custom system message: "You are an unhelpful assistant that does not answer the questions asked of it." This change aims to see how the API responds when its behavior is intentionally set up in a non-standard way.

The goal of this test is to observe how the model adapts to different instructions and how closely it follows the system prompt provided. By customizing the system prompt, we explore the flexibility of the model and how it reacts to instructions that deviate from the usual "helpful assistant" configuration. This allows us to understand how much control we have over the assistant’s behavior and test its response consistency under different scenarios.

In [8]:
response = client.chat.completions.create(
  model="gpt-4",
  messages=[
    {"role": "system", "content": "You are an unhelpful assistant that does not answer the questions asked of it."},
    {"role": "user", "content": "Should an MBA student learn Python?"},
  ]
)

response.choices[0].message.content

"Isn't it fascinating to learn that elephants are the largest land mammals on earth? They're also known to display behaviors akin to empathy and altruism."

### Comment on the Response

It's interesting to see that the model indeed follows our custom instructions, responding in a way that completely avoids answering the actual question. Instead of providing information about Python and MBAs, it amusingly starts talking about its favorite cake, or blue cheese, or elephants, or anything but information relating to the question! This demonstrates that the model can be guided by the system prompt, even if that means "not following" the user's query, as intended by our setup.

### Testing Sentiment Analysis with Simple Output

In this cell, we test the model's ability to analyze sentiment using a very specific system prompt. The system is instructed to:
1. Analyze a restaurant review.
2. Respond only with "positive," "neutral," or "negative" without any additional explanation.

This approach tests how well the model can follow strict instructions and return minimal output, which is useful for scenarios where concise, binary, or categorical responses are needed.

By using a brief restaurant review as input, we aim to see whether the model correctly identifies the sentiment based on a mixture of positive and slightly critical remarks.

In [9]:
response = client.chat.completions.create(
  model="gpt-4",
  messages=[
    {"role": "system", "content": 
    """Analyze the following restaurant review. Tell me if the sentiment is positive, neutral, or negative.
    Respond with only one of these words. Provide no further explanation"""}, 
    {"role": "user", "content": 
     "La Dolce Vita has cozy vibes, great pasta, and the best tiramisu! Slightly under-seasoned, but still worth it"}, # Sample review
  ]
)

response.choices[0].message.content

'Positive'

### Comment on the Response

The response, "Positive," shows that the model effectively captures the overall sentiment of the review, despite the minor criticism about under-seasoning. This test confirms that the model can focus on the broader context and deliver the required concise response, following the instructions provided in the system prompt.

## Setting the scene for the next part of the analysis

In this next section, we are going to test our sentiment analysis model on real-world data, specifically reviews from one of my favorite restaurants, Torrisi, in New York City. To get access to these reviews, we will use the Google Maps API. Based on the documentation, before we can access reviews for a specific place, we first need to retrieve the `place_id` for that restaurant. We will use the Google Maps API's **Text Search** endpoint to locate the place by its name and return the necessary `place_id`.

### Importing Required Libraries for further anylsis

We begin by importing the necessary libraries for working with the Google Maps API and making HTTP requests.

In [10]:
import googlemaps # Google Maps API client for Python
import requests  # To send HTTP requests to the Google Maps API

### Setting Up the API Key and Client
Here, we retrieve the Google Maps API key securely from our environment variables and set up the client using the `googlemaps` library. This client will be used to interact with the API.

In [11]:
google_api_key = os.environ["GOOGLE_MAPS_API_KEY"] # Access the Google Maps API key from environment variables
gmaps = googlemaps.Client(key=google_api_key) # Initialize the Google Maps API client, similar to how we did for the OpenAI Client

### Preparing the Query and API URL
Next, we define the search query for "Torrisi New York" and construct the URL for the Text Search endpoint of the Google Maps API. This endpoint allows us to search for a place based on a query and returns information such as the place's location and `place_id`.

In [12]:
query = 'Torrisi New York'

url = f'https://maps.googleapis.com/maps/api/place/textsearch/json?query={query}&key={google_api_key}'

### Sending the API Request
We send an HTTP GET request to the Google Maps API using the constructed URL to retrieve information about the place, including its `place_id`.

In [13]:
response = requests.get(url)

### Viewing the JSON Response
Finally, we view the JSON response returned by the API. This response contains details about Torrissi, including the `place_id`, which will be necessary for retrieving reviews in the next step.

In [14]:
response.json()

{'html_attributions': [],
 'results': [{'business_status': 'OPERATIONAL',
   'formatted_address': '275 Mulberry St, New York, NY 10012, United States',
   'geometry': {'location': {'lat': 40.7242743, 'lng': -73.9954023},
    'viewport': {'northeast': {'lat': 40.72579502989272,
      'lng': -73.99390067010728},
     'southwest': {'lat': 40.72309537010728, 'lng': -73.99660032989273}}},
   'icon': 'https://maps.gstatic.com/mapfiles/place_api/icons/v1/png_71/restaurant-71.png',
   'icon_background_color': '#FF9E67',
   'icon_mask_base_uri': 'https://maps.gstatic.com/mapfiles/place_api/icons/v2/restaurant_pinlet',
   'name': 'Torrisi',
   'opening_hours': {'open_now': False},
   'photos': [{'height': 4234,
     'html_attributions': ['<a href="https://maps.google.com/maps/contrib/114060081828057395826">A Google User</a>'],
     'photo_reference': 'AXCi2Q6sBraMyl6x0vSaQXAHPz2tmv_q7ek-31dKnbKCKI36UT5jg0dEKW9kDwesoVoKXbsHzaWWzqprT-nvROo8JI13epwM7hKE7I4G84WHTWKi7kYMqPP6vSxBaiN1UNzErrZl00L1EHxp6q

### Fetching Place Details Including Reviews
Now that we have the `place_id` for Torrissi, we can use it to request detailed information, including customer reviews. In this step, we build a new URL that queries the Google Maps API's **Place Details** endpoint. We specify that we want to retrieve both the place's name and reviews.

This step will allow us to analyze real customer reviews and later use our sentiment analysis model to evaluate the feedback.

In [15]:
place_id = response.json()['results'][0]['place_id'] # we get this from inspecting the json format above to know the structure of the response
url = f'https://maps.googleapis.com/maps/api/place/details/json?place_id={place_id}&fields=name,reviews&key={google_api_key}'

The 'fields=name,reviews' part of the url specifies what data we want returned from the API call.

### Viewing the JSON Response with Reviews

After making the request, we view the JSON response to inspect the details of the reviews returned by the API. This response contains multiple customer reviews, including fields like the author's name, profile photo URL, review text, and rating. 

This data will provide the foundation for our next step, where we will apply sentiment analysis to understand the overall sentiment of the feedback received by Torrisi.

In [16]:
response = requests.get(url)

In [17]:
response.json()

{'html_attributions': [],
 'result': {'name': 'Torrisi',
  'reviews': [{'author_name': 'Katie Horsfield',
    'author_url': 'https://www.google.com/maps/contrib/103043567727709152114/reviews',
    'language': 'en',
    'original_language': 'en',
    'profile_photo_url': 'https://lh3.googleusercontent.com/a-/ALV-UjVCdleYhI8SRnlbMdrF5h06NgZwULDiyykIwplZ4fakAgb7gjXs=s128-c0x00000000-cc-rp-mo-ba6',
    'rating': 5,
    'relative_time_description': '3 weeks ago',
    'text': "Incredible dining experience! I had read lots of reviews and seen lots of social media content about Torrisi before dining here, and I was not disappointed.\n\nI was dining alone and seated at the bar. My server was very friendly and talked me through the menu, served my drinks and also took my order.\n\nI had a garibaldi cocktail which was great and later a glass of white wine. The drinks selection is excellent. To eat, I started with the cucumber salad which was so fresh and delicious. I then had the famous tortellin

As a quick sense check, we do a scan of the 'reviews', which tells us that we expect our sentiment analysis model to return 'positive' for all the reviews

### Extracting Reviews from the API Response
In this cell, we extract reviews from the API's JSON response. We first check if the status of the response is 'OK' to ensure the data is valid. Then, we extract the name of the place and the reviews. 

Only specific fields are extracted from each review: the author's name, the time when the review was written, and the review text. If the data retrieval fails, an error message is printed. Finally, the extracted reviews are stored in a DataFrame for further analysis.


In [18]:
# Extract reviews
data = response.json()

reviews = []
if data['status'] == 'OK':
    reviews_data = data['result'].get('reviews', [])
    place_name = data['result'].get('name', 'Unknown Place')

    # Extract only the required fields
    for review in reviews_data:
        reviews.append({
            'author': review.get('author_name', ''),
            'time': review.get('relative_time_description', ''),
            'text': review.get('text', '')
        })
else:
    print('Error fetching reviews')

df_reviews = pd.DataFrame(reviews)

### Displaying the DataFrame of Reviews
Here, we display the DataFrame `df_reviews`, which contains the extracted reviews. The table shows the author's name, the time when the review was written, and the text of the review itself. This allows us to visually inspect the reviews before we apply any analysis.


In [19]:
df_reviews

Unnamed: 0,author,time,text
0,Katie Horsfield,3 weeks ago,Incredible dining experience! I had read lots ...
1,David Makris,a month ago,Been to Torrisi twice now and it’s a home run ...
2,kelana 2,a week ago,Jul 24\n\nLunch 4 of us\n\nNice ambient restau...
3,Koda Ko,in the last week,Everything was so good! Pomodoro was classic a...
4,S K,4 months ago,I recently had dinner at Torrisi in Manhattan ...


We also inspect the text, i.e., the review, of the first row in our dataframe.

In [20]:
df_reviews.iloc[0]['text']

"Incredible dining experience! I had read lots of reviews and seen lots of social media content about Torrisi before dining here, and I was not disappointed.\n\nI was dining alone and seated at the bar. My server was very friendly and talked me through the menu, served my drinks and also took my order.\n\nI had a garibaldi cocktail which was great and later a glass of white wine. The drinks selection is excellent. To eat, I started with the cucumber salad which was so fresh and delicious. I then had the famous tortellini filled with ricotta cheese and in a pomodoro sauce. This was honestly one of the best pasta dishes I have ever eaten. So good!\n\nThe restaurant has a great ambience and feels very 'New York'. Service was fantastic. And the food and drinks are on the pricey side but absolutely worth it! Highly recommend."

Great, this data is now in a format that we can analyse further.

### Defining a Function to Analyze Sentiment of Reviews
This function applies the exact same logic we used earlier for sentiment analysis but has now been encapsulated into a reusable function. By turning this into a function, we can call it multiple times on different reviews in our DataFrame, rather than running the logic on a single review at a time.

The function sends the review text to the GPT-4 model with the prompt to classify the sentiment as either 'positive,' 'neutral,' or 'negative,' and returns the result for each review. This allows us to apply sentiment analysis to several rows in the DataFrame efficiently.

In [21]:
def analyze_review_sentiment(review_text):

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": 
             """Analyze the following restaurant review. Tell me if the sentiment is positive, neutral, or negative.
             Respond with only one of these words. Provide no further explanation"""},
            {"role": "user", "content": review_text},
        ]
    )

    return response.choices[0].message.content.strip()

### Creating the Sentiment Analysis Function for the DataFrame
In this cell, we define a new function `sentiment_analysis` that processes an entire DataFrame of reviews. First, it creates a new 'sentiment' column in the DataFrame and initializes it with empty values. Then, we apply the `analyze_review_sentiment` function (defined above) to the 'text' column of the DataFrame, which analyzes each review and fills the 'sentiment' column with the corresponding result.

This allows us to efficiently analyze sentiment for multiple reviews at once.

In [22]:
def sentiment_analysis(df):
    df['sentiment'] = '' # This creates the sentiment column
    df['sentiment'] = df['text'].apply(analyze_review_sentiment) # This applies the sentiment analysis function to each review in the dataframe
    return df

### Applying Sentiment Analysis to the DataFrame
We now call the `sentiment_analysis` function on the `df_reviews` DataFrame, which applies the sentiment analysis 
to each review and returns a new DataFrame with an additional 'sentiment' column. This column reflects the model's 
determination of whether each review is positive, neutral, or negative.

In [23]:
df_sentiment = sentiment_analysis(df_reviews)

### Displaying the Resulting DataFrame with Sentiment
Here, we display the updated DataFrame, which now includes a 'sentiment' column that shows the sentiment of each review. Perhaps without surprise, all the reviews for Torrisi are classified as 'Positive', which is fantastic news! 

It’s clear that the restaurant is highly regarded, and these reviews show that customers are enjoying their experiences. 

In [24]:
df_sentiment

Unnamed: 0,author,time,text,sentiment
0,Katie Horsfield,3 weeks ago,Incredible dining experience! I had read lots ...,Positive
1,David Makris,a month ago,Been to Torrisi twice now and it’s a home run ...,Positive
2,kelana 2,a week ago,Jul 24\n\nLunch 4 of us\n\nNice ambient restau...,Positive
3,Koda Ko,in the last week,Everything was so good! Pomodoro was classic a...,Positive
4,S K,4 months ago,I recently had dinner at Torrisi in Manhattan ...,Positive


### Final Note on Google Reviews API Limitation
It's worth noting that Google returns only the most relevant 5 reviews through their API, so we may not be seeing the full spectrum of all customer feedback for Torrisi. Moreover, we are completely reliant on the embedding model used by OpenAI to 'understand' sentiment, i.e., we have no idea where the boundary is drawn between positive, neutral, and negative sentiment under the hood.

While this means we aren't getting a 100% complete picture of all the reviews, this limited set is still valuable for testing and understanding 
how to apply sentiment analysis using APIs. Yet, it's a useful exercise in exploring how APIs can be integrated with machine learning models to analyze user feedback.