<a href="https://colab.research.google.com/github/anshupandey/Working_with_Large_Language_models/blob/main/WWL_MP_Retail_Product_Description_Generation_final_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Industry Background

## Company Name: TechRetail Solutions

**TechRetail Solutions** is a leading e-commerce platform that connects vendors with customers worldwide. The company offers a wide range of products across various categories, including electronics, fashion, home appliances, and more. TechRetail Solutions is known for its user-friendly interface, extensive product catalog, and commitment to providing a seamless shopping experience.

# Problem Statement

One of the critical challenges faced by TechRetail Solutions is ensuring that vendors provide high-quality, clear, and accurate product descriptions. Many vendors struggle to write descriptions that meet the platform's standards and format requirements. This inconsistency leads to a suboptimal user experience, as poorly written descriptions can confuse customers, reduce trust, and ultimately impact sales.

The lack of standardized, engaging, and informative product descriptions also hampers the platform's ability to showcase products effectively. To enhance user experience and improve sales, TechRetail Solutions needs a solution that can help vendors generate high-quality product descriptions effortlessly.

# Solution Approach

To address this challenge, TechRetail Solutions aims to leverage Large Language Models (LLMs) to automate the generation of product descriptions. By using advanced LLMs, the company can ensure that product descriptions are not only accurate and informative but also engaging and consistent with the platform's standards.

The solution involves evaluating different LLMs to identify the most suitable model for generating product descriptions. The evaluation will be based on key metrics such as **BLEU**, **ROUGE**, human evaluation, latency, and resource usage. By selecting the best-performing LLM, TechRetail Solutions can provide vendors with a tool that generates high-quality product descriptions, enhancing the overall user experience and boosting sales.

# Dataset Overview

The dataset for evaluating LLMs consists of a diverse set of products across various categories. Each product scenario includes the product name, a brief description, and two high-quality reference descriptions. This dataset will be used to assess the performance of different LLMs based on their ability to generate product descriptions that closely match the reference descriptions.

## Example Dataset:

**Product 1: Wireless Earbuds**

- **Brief Description:** High-fidelity wireless earbuds with noise-canceling technology and long battery life.

- **Reference Description 1:** "Experience the ultimate in wireless freedom with our high-fidelity earbuds. Featuring noise-canceling technology and up to 20 hours of battery life, these earbuds are perfect for music lovers on the go."

- **Reference Description 2:** "Our wireless earbuds offer superior sound quality and comfort. With easy touch controls and a sleek design, enjoy your favorite tunes anytime, anywhere."


Dataset Link: https://github.com/anshupandey/Working_with_Large_Language_models/blob/main/retail_product_description_dataset.json

## Solution: Evaluating LLMs for Retail Product Description Generation

### Objective
Select the best LLM for generating high-quality, engaging, and accurate product descriptions.

### Metrics
- **BLEU**: Aim for a BLEU score above 0.3.
- **ROUGE**: Aim for a ROUGE-L score above 0.5.
- **Perplexity**: Average score less than 20.
- **Latency**: Less than 2 seconds per description.
- **Model Size and Resource Usage**: Fit within available computational resources.

### Steps

### 1. Define Evaluation Criteria
- BLEU score
- ROUGE score
- Perplexity Score
- Latency
- Model Size and Resource Usage

### 2. Benchmarking
- Select candidate LLMs. (Gemini 1.5 Flash, PaLM 2, GPT 35 Turbo)
- Load the benchmark dataset.


### 4. Evaluate LLMs
1. **Generate Descriptions**
   - Use each LLM to generate descriptions.
2. **Calculate Metrics**
   - BLEU Score
   - ROUGE Score
   - Perplexity Score
3. **Analyze Results**
   - Compare metrics.
   - Identify the best-performing model.



## Data Loading

In [8]:
import pandas as pd
url = "https://raw.githubusercontent.com/anshupandey/Working_with_Large_Language_models/main/retail_product_description_dataset.json"
df = pd.read_json(url)
df.shape

(10, 3)

In [9]:
df.head()

Unnamed: 0,name,brief_description,reference_descriptions
0,Wireless Earbuds,High-fidelity wireless earbuds with noise-canc...,Experience the ultimate in wireless freedom wi...
1,Smartwatch,A stylish smartwatch with fitness tracking and...,Stay connected and track your fitness goals wi...
2,Electric Kettle,A 1.7-liter electric kettle with rapid boil te...,Boil water quickly and safely with our 1.7-lit...
3,Gaming Laptop,A high-performance gaming laptop with a powerf...,Unleash your gaming potential with our high-pe...
4,Yoga Mat,"A non-slip, eco-friendly yoga mat with cushion...",Enhance your yoga practice with our eco-friend...


In [10]:
df.loc[0]['reference_descriptions']

'Experience the ultimate in wireless freedom with our high-fidelity earbuds. Featuring noise-canceling technology and up to 20 hours of battery life, these earbuds are perfect for music lovers on the go.|Our wireless earbuds offer superior sound quality and comfort. With easy touch controls and a sleek design, enjoy your favorite tunes anytime, anywhere.'

In [11]:
def get_prompt(product_name, brief_description):
  prompt = f""" for the below product, generate a 2 line product description.
  1. Maximum size for description: 2 lines
  2. Be specific, do not add any unknown information to product description.
  3. Use shorted sentences, smaller and easy to understand words.
  4. Consider using a pickup line at the begining.
  product_name: {product_name}
  brief_description: {brief_description}
  """
  return prompt


## Environment Setup

In [7]:
!pip3 install google-cloud-aiplatform openai rouge-score --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/327.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m174.1/327.5 kB[0m [31m4.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m327.5/327.5 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone


In [7]:
import sys
import os
from openai import OpenAI
os.environ['OPENAI_API_KEY']= "xxxxxxxxxxxx"
if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()

In [30]:
gemini_id = "gemini-1.0-pro"
paLM_id = "gemini-1.5-pro"
gpt_id = "gpt-3.5-turbo"
location="us-central1"
project_id="jrproject-402905"

In [31]:
import vertexai
from vertexai.generative_models import GenerationConfig, GenerativeModel
vertexai.init(project=project_id, location=location)

generation_config = GenerationConfig( temperature=0.9, top_k=32,)

gemini_model = GenerativeModel(gemini_id)
paLM_model = GenerativeModel(paLM_id)

## Implementing functions for product description generation

In [32]:
def get_description_gemini(prompt,model=gemini_model):
  response = model.generate_content(prompt,generation_config=generation_config)
  return response.text

def get_description_paLM(prompt,model=paLM_model):
  response = model.generate_content(prompt,generation_config=generation_config)
  return response.text

In [28]:
client = OpenAI()

def get_description_gpt(prompt,model_name=gpt_id):
  messages = [{"role":"user",'content':prompt}]
  response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    max_tokens=100,
    temperature=0.9)
  return response.choices[0].message.content

In [34]:
predictions = {"gemini":[],
               "palm":[],
               "gpt":[]}

references = []
import time

for i in range(df.shape[0]):
  product_name = df.loc[i]['name']
  brief_description = df.loc[i]['brief_description']
  reference_descriptions = df.loc[i]['reference_descriptions']

  prompt = get_prompt(product_name,brief_description)
  predictions["gemini"].append(get_description_gemini(prompt))
  time.sleep(60)
  predictions["palm"].append(get_description_paLM(prompt))
  predictions["gpt"].append(get_description_gpt(prompt))
  references.append(reference_descriptions.split("|")[0].strip())

In [40]:
for k in range(len(references)):
  print("Reference")
  print(references[k])
  print("Gemini Predictions")
  print(predictions['gemini'][k])
  print("PaLM Predictions")
  print(predictions['palm'][k])
  print("GPT Predictions")
  print(predictions['gpt'][k])
  print()

Reference
Experience the ultimate in wireless freedom with our high-fidelity earbuds. Featuring noise-canceling technology and up to 20 hours of battery life, these earbuds are perfect for music lovers on the go.
Gemini Predictions
Looking for earbuds that are like a melody to your ears? 🎧 Look no further! These wireless earbuds deliver crystal-clear sound, block out unwanted noise, and keep the beat going all day long. 🎶
PaLM Predictions
Are you my perfect match? 😉  
These earbuds have awesome sound, block noise, and last for hours. 

GPT Predictions
Hey there, listen to your favorite tunes with these high-quality wireless earbuds. Enjoy clear sound

Reference
Stay connected and track your fitness goals with our latest smartwatch. Featuring heart rate monitoring, step counting, and notifications, it's the perfect companion for an active lifestyle.
Gemini Predictions
Are you looking for a smartwatch that's as stylish as it is functional? Look no further! This smartwatch tracks your fit

In [41]:
import math
import nltk
from rouge_score import rouge_scorer
from nltk.translate.bleu_score import sentence_bleu

In [42]:
# Perplexity Calculation
def calculate_perplexity(predicted_sentence, reference_sentence):
    ref_len = len(reference_sentence.split()) # calculating total num of words in reference
    log_prob_sum = 0
    for word in reference_sentence.split():
        if word in predicted_sentence.split():
            log_prob_sum += math.log(1 / (predicted_sentence.split().count(word) / len(predicted_sentence.split())))
        else:
            log_prob_sum += math.log(1 / len(predicted_sentence.split()))
    return math.exp(log_prob_sum / ref_len)

In [43]:
def calculate_bleu(predicted_sentence, reference_sentence):
    return sentence_bleu([reference_sentence.split()], predicted_sentence.split())


In [44]:
# ROUGE Score Calculation
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

def calculate_rouge(predicted_sentence, reference_sentence):
    scores = scorer.score(reference_sentence, predicted_sentence)
    return scores

In [45]:
results = {"Perplexity":[],"BLEU":[],"ROUGE1":[],"ROUGE2":[],"ROUGEL":[]}

for model_name in predictions.keys():
  print(f"Model: {model_name}")
  perplexities = [calculate_perplexity(pred, ref) for pred, ref in zip(predictions[model_name], references)]
  average_perplexity = sum(perplexities) / len(perplexities)
  results['Perplexity'].append(average_perplexity)

  bleu_scores = [calculate_bleu(pred, ref) for pred, ref in zip(predictions[model_name], references)]
  average_bleu = sum(bleu_scores) / len(bleu_scores)
  results['BLEU'].append(average_bleu)

  rouge_scores = [calculate_rouge(pred, ref) for pred, ref in zip(predictions[model_name], references)]

  average_rouge = {
      'rouge1': sum([score['rouge1'].fmeasure for score in rouge_scores]) / len(rouge_scores),
      'rouge2': sum([score['rouge2'].fmeasure for score in rouge_scores]) / len(rouge_scores),
      'rougeL': sum([score['rougeL'].fmeasure for score in rouge_scores]) / len(rouge_scores),
  }
  results['ROUGE1'].append(average_rouge['rouge1'])
  results['ROUGE2'].append(average_rouge['rouge2'])
  results['ROUGEL'].append(average_rouge['rougeL'])



Model: gemini
Model: palm
Model: gpt


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


In [46]:
pd.DataFrame(results,index=predictions.keys())

Unnamed: 0,Perplexity,BLEU,ROUGE1,ROUGE2,ROUGEL
gemini,0.372987,0.04892,0.40521,0.170851,0.293942
palm,0.205016,0.009271,0.350904,0.115723,0.283365
gpt,0.495222,0.059729,0.474642,0.213665,0.37617
