# 4. Augmenting retail user interactions

This notebook contains individual sections to construct functions that could act as independent extensions of the basic functionalities described in notebook 3, addressing more commonly observed scenarios encountered in retail user interactions. 

4.0. [Set up](#4.0)

4.1. [Test and use the Bedrock Knowledge Base Retrieve API](#4.1)

4.2. [Augment query context using web search](#4.2)

4.3. [Use LLM to generate prompts for creating user-authored content](#4.3)

4.4. [Product review analysis](#4.4)

## <a id="4.0">Set up<a>

In [None]:
# run this cell to upgrade to the latest version of boto3 if required, and restart the kernel
!pip install --upgrade --force --quiet botocore boto3

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import boto3
import sagemaker

import pandas as pd
import ast

<div class="alert alert-block alert-warning">

IMPORTANT! Please copy and paste the required information for your <b>RDS Aurora PostgreSQL database</b> in the cell below.
    
</div>

In [None]:
sess = sagemaker.Session()
bucket = sess.default_bucket()
region = sess.boto_region_name
accountid = sess.account_id()
product_db_data_path = 'amazon-reviews-fashion-metadata'
bedrock_kb_data_path = 'bedrock-kb-data'
bedrock_kb_datasource_uri = f's3://{bucket}/{bedrock_kb_data_path}/'

database_identifier='<TODO>'
database_arn='<TODO>'
database_secret_arn='<TODO>'
database_name='<TODO>'

In [None]:
%mkdir -p util

## 4.1 <a id="4.1">Test and use the Bedrock Knowledge Base Retrieve API<a>

### Run a test query

<div class="alert alert-block alert-warning">

IMPORTANT! Please copy and paste the <b>Bedrock Knowledge Base ID</b> for the knowledge base that you are using in the cell below.
    
</div>

In [None]:
bedrock_kb_id = '<TODO>'

In [None]:
from util.bedrockkb import bedrock_kb_retrieve

search_query = 'shirts with buttons'
no_kb_results = 3
search_list = bedrock_kb_retrieve(bedrock_kb_id, search_query, no_kb_results)
search_list

### Import functions created in other Jupyter notebooks

In [None]:
from util.getasinlist import get_asin_list
from util.gallery import gallery
from util.imagehelpers import *
from util.pickimg import pick_img
from util.refinequery import refine_query
from util.getinfo import get_info_from_db

## 4.2 <a id="4.2">Augment query context using web search<a>

This section creates a function to search the web when a user query is related to a recent event and/or popular culture that the LLM has not been trained on. The added context from the web search results are used in generating the query for the vector database (the Amazon Bedrock knowledge base).

<div class="alert alert-block alert-info">

Note: run the next cell twice and/or restart the kernel to fix any <span style="color:red">ERRORS</span>

</div>

In [None]:
!pip install --quiet langchain duckduckgo-search

In [None]:
%%writefile util/websearch.py

import boto3
import json

from langchain_community.tools import DuckDuckGoSearchResults
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper


model_id = "anthropic.claude-3-haiku-20240307-v1:0"

def web_search(web_search_query, log_level='ERROR'):
    
    wrapper = DuckDuckGoSearchAPIWrapper(max_results=1)
    ddg_search = DuckDuckGoSearchResults(api_wrapper=wrapper)
    results = ddg_search.run(f'{web_search_query}')
    
    accept = "application/json"
    content_type = "application/json"
    
    system_prompt = """Your task is to help the user find the most trendy fashion apparel that they would like and convert it into a pgvector text query.
Please use the USER SEARCH and WEB RESULTS to output only apparel-related key terms. Do not explain or output anything else. 
Enhance the key terms by checking that they align with information inferred from user input about the age group, gender, color, material etc.
Do not assume or hallucinate. """
    body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 200,
    "temperature": 0,
    "system": system_prompt,
    "messages": [
      {
        "role": "user",
        "content": [
            {
            "type": "text",
            "text": f"USER SEARCH: {web_search_query}"
            },
            {
            "type": "text",
            "text": f"nWEB RESULTS: {str(results)}"
            }
        ]
      }
    ]
    })
    
    bedrock = boto3.client(service_name='bedrock-runtime')
    response = bedrock.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())['content'][0]['text']

    return response_body

In [None]:
from util.websearch import web_search

web_search_query=web_search('what should I wear to a Ultra 2024')
print(web_search_query)
search_asin = get_asin_list(bedrock_kb_retrieve(bedrock_kb_id, web_search_query, no_kb_results))[0]
asin_result = get_info_from_db(search_asin, database_arn, database_secret_arn, database_name)
asin_result['image'] = pick_img(asin_result['title'], asin_result['image'])
print(asin_result)
gallery([asin_result['image']])

In [None]:
web_search_query=web_search('what should I wear to a Harry Styles concert?')
print(web_search_query)
search_asin = get_asin_list(bedrock_kb_retrieve(bedrock_kb_id, web_search_query, no_kb_results))[0]
asin_result = get_info_from_db(search_asin, database_arn, database_secret_arn, database_name)
asin_result['image'] = pick_img(asin_result['title'], asin_result['image'])
print(asin_result)
gallery([asin_result['image']])

## 4.3 <a id="4.3">Use LLM to generate prompts for creating user-authored content<a>

The following function uses the RDS data API to retrieve product description information, which is then fed into an LLM to create a custom message for sharing about the product. This can be used as an initial prompt in writing user previews and/or for users to share products with their friends.

In [None]:
%%writefile util/sendmsg.py

import boto3
import json
from util.imagehelpers import * 

def get_desc_from_db(asin, database_arn, database_secret_arn, database_name):

    query=(
        f"SELECT description FROM products WHERE asin='{asin}';")

    rdsdata = boto3.client('rds-data')

    response = rdsdata.execute_statement(
        resourceArn=database_arn,
        secretArn=database_secret_arn,
        sql=query,
        database=database_name,
    )
    
    description = response['records'][0][0]['stringValue']
        
    return description


model_id = "anthropic.claude-3-haiku-20240307-v1:0"
# model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

def send_message(product, database_arn, database_secret_arn, database_name, log_level='ERROR'):
    
    asin = product['asin'].strip('"')
    title = product['title']
    image = product['image']
    
    accept = "application/json"
    content_type = "application/json"
    
                             
    desc = str(get_desc_from_db(asin, database_arn, database_secret_arn, database_name))
    
    content = [
                    {
                        "type": "text",
                        "text": f"PRODUCT TITLE: {title}" 
                    },
                    {
                        "type": "text",
                        "text": f"PRODUCT DESCRIPTION: {desc}" 
                    }
                ]
                             
    img_list = filter_image_url(image)
    if img_list:
        for img in img_list:
            image_data = get_base64_from_bytes(url_image_processing(img))
            content.append({
                            "type": "image",
                            "source": {
                                "type": "base64",
                                "media_type": "image/jpeg",
                                "data": image_data,
                            }
            })
        
    messages = [{
                "role": "user",
                "content": content
    }]
    
    system_prompt = """Your task is to help the user generate a short message to tell others about the product using the text and image inputs. 
    Generate only a short summarized message from first-person perspective."""
    body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 100,
    "system": system_prompt,
    "messages": messages
    })
    
    bedrock = boto3.client(service_name='bedrock-runtime')
    response = bedrock.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())['content'][0]['text']

    return response_body

In [None]:
from util.sendmsg import send_message

message = send_message(asin_result, database_arn, database_secret_arn, database_name)
print(message)

## 4.4 <a id="4.4">Product review analysis<a>

To provide end users with the best retail experience, it is important to understand their feedback at scale. This section provides some examples to perform NLP (Natural Language Processing) tasks on user-written reviews easily with LLMs without the need to fine-tune and/or train conventional NLP classification and/or topic modelling models. 

The added benefits of using LLMs include the ability to already comprehend sentiment accurately as LLMs have been trained on large datasets, and the ability to adapt to a variety of preferred output formats and/or themes. With the LLM's ability to be open-ended, we would also need to steer the LLM to our preferred output to achieve some level of predictability and consistency for further parsing and processing (e.g. creating charts).
    
This section brings you through the analysis of reviews from the Amazon Reviews dataset available [here](https://amazon-reviews-2023.github.io/). The exact dataset used is the 5-core fashion review dataset from 2018 to select a sufficiently small dataset size for this example. 

### Download fashion reviews dataset (5-core) and perform basic processing

In [None]:
!curl -O https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon_v2/categoryFilesSmall/AMAZON_FASHION_5.json.gz

In [None]:
dataframe = pd.read_json('AMAZON_FASHION_5.json.gz',lines=True)

In [None]:
reviews = dataframe[['asin','overall','reviewText']].dropna().drop_duplicates().reset_index(drop=True)
reviews

### 4.4.0 Extract topics from reviews (zero-shot)
Create a function to perform open-ended topic analysis from reviews. This uses a test prompt to understand the base model's ability to adapt to new formats and/or be consistent.

In [None]:
import boto3
import json

# model_id = "anthropic.claude-3-haiku-20240307-v1:0"
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

def topictest(review, log_level='ERROR'):
    
    accept = "application/json"
    content_type = "application/json"
    
    system_prompt = f"""Your role is to extract topics from the review as a JSON array. Do not output anything else."""
    body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 200,
    "temperature": 0,
    "system": system_prompt,
    "messages": [
      {
        "role": "user",
        "content": [
            {
            "type": "text",
            "text": f"<review>\n {review}\n</review>"
            }
        ]
      }
    ]
    })
    
    bedrock = boto3.client(service_name='bedrock-runtime')
    response = bedrock.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())['content'][0]['text']

    return response_body

In [None]:
pd.set_option('display.max_colwidth', None)

# Take first 5 reviews only
reviews = reviews[:5]

review_analysis = []

for index, row in reviews.iterrows():
    print('Analyzing review', index)
    analysis = topictest(row['reviewText'])
    review_analysis.append(analysis)
    
analyzed_reviews = reviews.assign(review_tags=review_analysis)
analyzed_reviews

### 4.4.1 Extract topics from reviews (one-shot)
Create a function to perform open-ended topic extraction from reviews for further classification and clustering purposes. In the prompt, we provide a single example for the LLM to use.

In [None]:
%%writefile util/reviews_topicextract.py

import boto3
import json

# model_id = "anthropic.claude-3-haiku-20240307-v1:0"
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

def topicextract(review, log_level='ERROR'):
    
    accept = "application/json"
    content_type = "application/json"
    
    system_prompt = f"""Your only task is to extract topics from product reviews as JSON Arrays. Do not output anything else.
Example:
user: <review>\n Best exercise shorts ever! The material breathes perfectly on hot summer days. It's also highly durable and its brightness does not fade after washes.\n</review>
assistant: [material, quality, color]"""
    body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 200,
    "temperature": 0,
    "system": system_prompt,
    "messages": [
      {
        "role": "user",
        "content": [
            {
            "type": "text",
            "text": f"<review>\n {review}\n</review>"
            }
        ]
      }
    ]
    })
    
    bedrock = boto3.client(service_name='bedrock-runtime')
    response = bedrock.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())['content'][0]['text']

    return response_body

In [None]:
from util.reviews_topicextract import topicextract

pd.set_option('display.max_colwidth', None)

# Take first 5 reviews only
reviews = reviews[:5]

review_analysis = []

for index, row in reviews.iterrows():
    print('Analyzing review', index)
    analysis = topicextract(row['reviewText'])
    review_analysis.append(analysis)
    
analyzed_reviews = reviews.assign(review_tags=review_analysis)
analyzed_reviews

### 4.4.2 Extract topic and the corresponding sentiment from reviews (few-shot)
Create a function to perform open-ended topic extraction and target-sentiment analysis from reviews for further classification and clustering purposes. This also helps merchants efficiently analyze feedback for product improvement. In the prompt, we provide only a three examples for the LLM to use.

In [None]:
%%writefile util/reviews_topicsentiment.py

import boto3
import json

# model_id = "anthropic.claude-3-haiku-20240307-v1:0"
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

def topicsentiment(review, log_level='ERROR'):
    
    accept = "application/json"
    content_type = "application/json"
    
    system_prompt = f"""Identify the targets in the product REVIEW, and the writer's stance towards the target from options: positive / negative / neutral. 
Extract this in the form of a JSON Array with format [{{target:stance}},{{target:stance}}]. Output only the JSON Array and do not return anything else.
user: <review>\n I thought that this was not bad.\n</review>
assistant: [{{"quality": "neutral"}}]

user: <review>\n Not the best.\n</review>
assistant: [{{"quality": "negative"}}]

user: <review>\n I hate how the packaging is so difficult to remove and the colors are so ugly. But aside from that, the quality is excellent.\n</review>
assistant: [{{"packaging": "negative"}}, {{"color": "negative"}}, {{"quality": "positive"}}]"""
    body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 200,
    "temperature": 0,
    "system": system_prompt,
    "messages": [
      {
        "role": "user",
        "content": [
            {
            "type": "text",
            "text": f"<review>\n {review}\n</review>"
            }
        ]
      }
    ]
    })
    
    bedrock = boto3.client(service_name='bedrock-runtime')
    response = bedrock.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())['content'][0]['text']

    return response_body

In [None]:
from util.reviews_topicsentiment import topicsentiment

pd.set_option('display.max_colwidth', None)

# Take first 5 reviews only
reviews = reviews[:5]

review_analysis = []

for index, row in reviews.iterrows():
    print('Analyzing review', index)
    analysis = topicsentiment(row['reviewText'])
    review_analysis.append(analysis)
    
analyzed_reviews = reviews.assign(review_tags=review_analysis)
analyzed_reviews

In [None]:
pd.reset_option('display.max_colwidth')