# Short GenAI Workshop

In this workshop we will assist the toy manufacturerer "LykkeLand Leketøysfabrikk". Their internal data warehouse is a mess and they want us to help.

## Setup

In [None]:
%load_ext autoreload
%autoreload 2

import sys
import os
from pprint import pprint
# from openai import AzureOpenAI
from langfuse.openai import AzureOpenAI
from dotenv import load_dotenv

sys.path.append('..')
load_dotenv()

client = AzureOpenAI(
    # api_version=os.environ["AZURE_OPENAI_API_KEY"],
    # api_key=os.environ["AZURE_OPENAI_ENDPOINT"],
    # api_version=os.environ["OPENAI_API_VERSION"],
)

### Helper functions

In [None]:
from IPython.display import display, Markdown

def get_completion_from_prompt(prompt: str) -> str:
    """
    Function that creates a post against openAI chatGPT service
    in Azure AI from a string prompt and returns the first and
    most deterministic response/completion.

    :param prompt: A string prompt to be sent to the chatGPT service

    :return: A string representation of the first answer proposed by the algorithm
    """
    messages = [{"role": "user", "content": prompt}]

    response = client.chat.completions.create(
        model=os.environ["MODEL_DEPLOYMENT_NAME"],
        messages=messages,
        temperature=0,
    )

    return response.choices[0].message.content


def read_file(path: str) -> str:
    with open(path) as f:
        text = f.read()
    return text.replace("Ã¦", "æ").replace("Ã¥", "å").replace("Ã¸", "ø")


def print_pretty(text: str) -> None:
    display(Markdown(text))

In [None]:
# Verify correct setup
get_completion_from_prompt("Hello")

# Task: Data Cleaning of Product Reviews from Text Data

LykkeLand has gathered a substantial amount of product reviews to assess customer satisfaction. Unfortunately, the data collection process overlooked the inclusion of critical numerical and categorical fields, which is a limiting factor for the quality of analysis available. The dataset comprises usernames, dates, locations, review titles, and the main body of the text, collected from customers in both Norway and abroad. However, it lacks detailed information about the specific products being reviewed. Despite these limitations, there's still potential for meaningful analysis by leveraging language models to impute missing details and categorize the reviews based on the available text.

*Note: This notebook assumes a basic familiarity with the Pandas library.*


## Explore data
The product reviews collected from customers over the past few months are stored in an Excel sheet. Let's take a look

In [None]:
import pandas as pd

df = pd.read_excel("../data/product_reviews.xlsx")
df

Let's take a look at a single review in the sheet:

In [None]:
single_review = df.iloc[-1]

pprint(single_review.to_dict())

## Translating and summarizing reviews
To get a clear picture of what our customers think, we need to translate all reviews into a language we can read. It would also be useful with a dense summary. This way, we can quickly catch their general vibe and pinpoint the exact issues or compliments they're highlighting.

In [None]:
# TODO: Write a prompt which translates the review body into English and summarizes it

def condense_and_translate_review(review: pd.Series) -> str:
    prompt = f"""Please summarize the following review in one English sentence:
Title: <title> {review["title"]} </title>
Review: <review> {review["body"]} </review>    
"""
    return get_completion_from_prompt(prompt)

In [None]:
# Test: check if the function works as expected. The output should be a string
# containing a summary of the review body in English

pprint(condense_and_translate_review(single_review))

In [None]:
# Standardize and translate review body for each row in the dataframe
df["standardized_body"] = df.apply(lambda row: condense_and_translate_review(row), axis=1)

In [None]:
df["standardized_body"]

#### Find trends in the reviews
We can also summarize all the rows to gain an overview of common complaints among customers.

In [None]:
all_review_bodies = "\n\n".join(df["standardized_body"].to_list())

In [None]:
prompt = """Summarize the following reviews:""" # your prompt here
for review in df["standardized_body"]:
    prompt += f"\n\n<review>{review}</review>"
prompt += """Use the following output format:
### Product related feedback
**<product>**: <feedback summary> \n

### Other feedback

### Summary

### Recommendations
<recommendations for what LykkeLand can do to improve the products or customer satisfaction>
"""
# pprint(prompt)
condensed_reviews = get_completion_from_prompt(prompt)


In [None]:
print_pretty(condensed_reviews)

## Inferring country from location

The location data column is less than ideal. Unfortunately, there has been no validation in the form when the customer filled out location. As you see below, sometimes location is written as "city, country", sometimes just "city", sometimes just country and sometimes even "city, state".

In [None]:
print(set(df["location"].to_list()))

We can use a language model to standardize this information. Let's try to extract the country for each review.

**Note:** A potential problem can be that there are multiple ways to write the same country name, i.e. Norway/Kingdom of Norway/Norge/Noreg etc. If you encounter this issue, how can it be fixed?

Try to minimize the amount of information you give the model. Does it need all of the review information to do this task?

In [None]:
# TODO: write a prompt which classifies the country of the review based on the available unique countries

def get_country(review: pd.Series) -> str:
    prompt = f"""Return the country which this review is from. Just the English country name, nothing else.
    
    Location: {review.location}
    Review title: {review.title}
""" # your prompt here
    country = get_completion_from_prompt(prompt)
    return country

In [None]:
# Test: check if the function works as expected. The output should be a string
# containing the country of the review
get_country(single_review) 

Now let's create a new column called country where we perform this operation for every row in the dataset:

In [None]:
# Classify the country for each review in the dataframe
df["country"] = df.apply(lambda row: get_country(row), axis=1)

Let's plot the result of the classification

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Count the occurrences of each country
shipment_counts = df["country"].value_counts().sort_index()

# Create the bar plot using Seaborn
plt.figure(figsize=(12, 6))
ax = sns.barplot(
    x=shipment_counts.index,
    y=shipment_counts.values,
    hue=shipment_counts.index,
    palette="Blues_d",
    dodge=False,
    legend=False
)

# Add value labels on top of each bar
for container in ax.containers:
    ax.bar_label(container, fmt='%d', label_type='edge', padding=3)

# Adjust y-axis limits to add space above the tallest bar
ax.set_ylim(0, shipment_counts.max() * 1.1)  # Increase limit by 10%

# Customize the plot
ax.set_xlabel("Country", fontsize=14)
ax.set_ylabel("Number of Shipments", fontsize=14)
ax.set_title("Number of Shipments to Each Country", fontsize=16)
ax.tick_params(axis="x", rotation=45)

# Show the plot
plt.tight_layout()
plt.show()

### Verify if you succeeded

In [None]:
from hackathon.evaluation.country_inferring import verify as verify_country_inferring

verify_country_inferring(df, print_errors=True)

## Rating classification
We aim to quantify a customer's satisfaction level by assigning a numerical score to the review text, for example, ranging from 1 to 5. This process will allow us to systematically evaluate and compare customer feedback, providing a clear metric to gauge overall contentment or dissatisfaction.

In [None]:
# TODO: write a prompt which classifies the rating of each review on a scale from 1 to 5

def get_rating(review: pd.Series) -> int:
    prompt = f"""Rate the following review on a scale from 1 to 5, where 1 is the lowest and 5 is the highest. Output only the number.

<review>{review["body"]}</review>

Guidance:
1: Very negative review
2: Negative review
3: Neutral review. Customer either both liked and disliked the product or was indifferent.
4: Positive review
5: Very positive review
"""
    rating = get_completion_from_prompt(prompt)
    return int(rating)

In [None]:
# Test: check if the function works as expected. The output should be an integer
# containing the rating of the review
get_rating(single_review)

In [None]:
# Impute the rating for each review in the dataframe
df["rating"] = df.apply(lambda row: get_rating(row), axis=1)

Let's plot the rating distribution:

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Count the occurrences of each rating
rating_counts = df["rating"].value_counts().sort_index()

# Create the bar plot using Seaborn
plt.figure(figsize=(10, 6))
ax = sns.barplot(
    x=rating_counts.index,
    y=rating_counts.values,
    hue=rating_counts.index,
    palette="Blues_d",
    dodge=False,
    legend=False
)

# Customize the plot
ax.set_xlabel("Rating", fontsize=14)
ax.set_ylabel("Frequency", fontsize=14)
ax.set_title("Frequency of Each Rating Value", fontsize=16)
ax.tick_params(axis="both", which="major", labelsize=12)

# Show the plot
plt.tight_layout()
plt.show()

And the average rating by country:

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Calculate the average rating by country
average_ratings = df.groupby("country")["rating"].mean().sort_index()

# Create the bar plot using Seaborn
plt.figure(figsize=(10, 6))
ax = sns.barplot(
    x=average_ratings.index,
    y=average_ratings.values,
    hue=average_ratings.index,
    palette="Blues_d",
    edgecolor="darkblue",
    legend=False
)

# Customize the plot
ax.set_xlabel("Country", fontsize=14)
ax.set_ylabel("Average Rating", fontsize=14)
ax.set_title("Average Rating by Country", fontsize=16)
ax.tick_params(axis="x", rotation=45)
ax.tick_params(axis="both", which="major", labelsize=12)

# Show the plot
plt.tight_layout()
plt.show()

#### Verify your analysis

Based on the graph you just plotted. What do you observe? We want you to extract **two** specific insights.

**Note:** Simplify it. Don't consider that there can be errors due to small sample sizes.

In [None]:
from hackathon.evaluation.country_ratings_analysis import evaluate as evaluate_country_ratings_analysis

YOUR_ANSWER = "China has the most satisfied customers, USA has the least."

print(evaluate_country_ratings_analysis(YOUR_ANSWER))

## Visualize temporal trends
We can create visual representations to track how ratings from a specific country evolve over time. By plotting these ratings, we can identify patterns, trends, or anomalies, offering valuable insights into how customer satisfaction may vary with different factors or events. This analysis could help in making informed decisions or adjustments in strategy based on temporal shifts in customer feedback.

In [None]:
df["date"] = pd.to_datetime(df["date"])
france_ratings = df[df["country"] == "France"].sort_values("date")

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Create the line plot using Seaborn
plt.figure(figsize=(10, 6))
ax = sns.lineplot(
    x="date",
    y="rating",
    data=france_ratings,
    marker="o",
    color="skyblue",
    errorbar=None
)

# Customize the plot
ax.set_xlabel("Date", fontsize=14)
ax.set_ylabel("Rating", fontsize=14)
ax.set_title("Ratings Over Time for France", fontsize=16)
ax.tick_params(axis="x", rotation=45)
ax.tick_params(axis="both", which="major", labelsize=12)

# Show the plot
plt.tight_layout()
plt.show()

**Task:** Why does it seem like the ratings for France has detoriated after 2023-03-15? Find the answer by analyzing the (relevant) reviews using a LLM.

## Next steps: Experiment and play with the data yourself