Dataset link : https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset

In [1]:
import pandas as pd
from faker import Faker

# Load the CSV file
file_path = 'amazon.csv'
amazon_data = pd.read_csv(file_path)

# Extract data for Product table
product_data = amazon_data[['product_id', 'product_name', 'about_product', 'rating', 'discounted_price']].copy()
product_data.rename(columns={
    'product_id': 'ProductID',
    'product_name': 'ProductName',
    'about_product': 'AboutProduct',
    'rating': 'ProductRating',
    'discounted_price': 'Price'
}, inplace=True)

# Extract data for Review table
review_data = amazon_data[['review_id', 'user_id', 'product_id', 'review_content', 'rating']].copy()
review_data.rename(columns={
    'review_id': 'ReviewID',
    'user_id': 'UserID',
    'product_id': 'ProductID',
    'review_content': 'ReviewText',
    'rating': 'Rating'
}, inplace=True)

# Extract and generate data for User table
user_data = amazon_data[['user_id', 'user_name']].copy()
user_data = user_data.drop_duplicates().copy()
user_data.rename(columns={
    'user_id': 'UserID',
    'user_name': 'UserName'
}, inplace=True)

# Generate fake email addresses for users
fake = Faker()
user_data['UserEmail'] = [fake.email() for _ in range(len(user_data))]

# Adjust criteria to find users with at least 3 reviews and products with at least 3 reviews
review_data = amazon_data[['user_id', 'product_id', 'review_id']]

# Calculate the number of reviews per user
user_review_counts = review_data['user_id'].value_counts()
# Calculate the number of reviews per user with further relaxed criteria
users_with_3_or_more_reviews = user_review_counts[user_review_counts >= 3]

# Filter review data for users with 3 or more reviews
filtered_reviews = review_data[review_data['user_id'].isin(users_with_3_or_more_reviews.index)]

# Calculate the number of reviews per product from the filtered data with further relaxed criteria
product_review_counts = filtered_reviews['product_id'].value_counts()
products_with_3_or_more_reviews = product_review_counts[product_review_counts >= 3]

# Filter review data for products with 3 or more reviews
final_filtered_reviews = filtered_reviews[filtered_reviews['product_id'].isin(products_with_3_or_more_reviews.index)]

# Calculate the final number of reviews per user with further relaxed criteria
final_user_review_counts = final_filtered_reviews['user_id'].value_counts()
final_users_with_3_or_more_reviews = final_user_review_counts[final_user_review_counts >= 3]

# Select the top 10 users
top_10_users = final_users_with_3_or_more_reviews.head(10)

# Get the final set of data
final_data = final_filtered_reviews[final_filtered_reviews['user_id'].isin(top_10_users.index)]

final_data.head()


Unnamed: 0,user_id,product_id,review_id
0,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...",B07JW9H4J1,"R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K..."
1,"AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...",B098NS6PVG,"RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY..."
20,"AFNYIBWKJLJQKY4BGK77ZOTVMORA,AFCTNNMP2LZLY5466...",B09C6HXFC1,"R12D1BZF9MU8TN,R32MNCWO5LGFCG,RZU3UK8OZKD6X,R3..."
23,"AHIKJUDTVJ4T6DV6IUGFYZ5LXMPA,AE55KTFVNXYFD5FPY...",B09NHVCHS9,"R3F4T5TRYPTMIG,R3DQIEC603E7AY,R1O4Z15FD40PV5,R..."
28,"AHZWJCVEIEI76H2VGMUSN5D735IQ,AH2DFUHFTG4CKQFVG...",B09W5XR9RT,"R1Y30KU04V3QF4,RK3DSUGKIZT8Z,R3BIG7J6V2JZTU,R1..."


In [2]:
# Prepare filtered data
final_product_data = amazon_data[amazon_data['product_id'].isin(final_data['product_id'].unique())][['product_id', 'product_name', 'about_product', 'rating', 'discounted_price']].copy()
final_product_data.rename(columns={
    'product_id': 'ProductID',
    'product_name': 'ProductName',
    'about_product': 'AboutProduct',
    'rating': 'ProductRating',
    'discounted_price': 'Price'
}, inplace=True)

final_review_data = amazon_data[amazon_data['review_id'].isin(final_data['review_id'].unique())][['review_id', 'user_id', 'product_id', 'review_content', 'rating']].copy()
final_review_data.rename(columns={
    'review_id': 'ReviewID',
    'user_id': 'UserID',
    'product_id': 'ProductID',
    'review_content': 'ReviewText',
    'rating': 'Rating'
}, inplace=True)

final_user_data = amazon_data[amazon_data['user_id'].isin(final_data['user_id'].unique())][['user_id', 'user_name']].drop_duplicates().copy()
final_user_data.rename(columns={
    'user_id': 'UserID',
    'user_name': 'UserName'
}, inplace=True)
final_user_data['UserEmail'] = [fake.email() for _ in range(len(final_user_data))]


In [3]:
final_product_data.head()

Unnamed: 0,ProductID,ProductName,AboutProduct,ProductRating,Price
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,High Compatibility : Compatible With iPhone 12...,4.2,₹399
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,"Compatible with all Type C enabled devices, be...",4.0,₹199
20,B09C6HXFC1,Duracell USB Lightning Apple Certified (Mfi) B...,Supports Ios Devices With Max Output Up To 2.4...,4.5,₹970
23,B09NHVCHS9,Flix Micro Usb Cable For Smartphone (Black),"Micro usb cable is 1 meter in length, optimize...",4.0,₹59
28,B09W5XR9RT,Duracell USB C To Lightning Apple Certified (M...,1.2M Tangle Free durable tough braiding sync &...,4.4,₹970


In [4]:
final_user_data["UserID"].iloc[0]

'AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBBSNLYT3ONILA,AHCTC6ULH4XB6YHDY6PCH2R772LQ,AGYHHIERNXKA6P5T7CZLXKVPT7IQ,AG4OGOFWXJZTQ2HKYIOCOY3KXF2Q,AENGU523SXMOS7JPDTW52PNNVWGQ,AEQJHCVTNINBS4FKTBGQRQTGTE5Q,AFC3FFC5PKFF5PMA52S3VCHOZ5FQ'

In [5]:
from sqlalchemy import MetaData, Table, Column, Integer, String, Float, Boolean, DateTime, Sequence, ForeignKey
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Replace these with your Snowflake account details
engine = create_engine(URL(
    user=os.getenv("SNOWFLAKE_USER"),
    password=os.getenv("SNOWFLAKE_PASSWORD"),
    account=os.getenv("SNOWFLAKE_ACCOUNT"),    
    database=os.getenv("SNOWFLAKE_DATABASE"),
    schema=os.getenv("SNOWFLAKE_SCHEMA"),
    warehouse=os.getenv("SNOWFLAKE_WAREHOUSE"),
    role=os.getenv("SNOWFLAKE_ROLE"),
))

connection = engine.connect()

metadata = MetaData()

# Define the tables
product_table = Table('Product', metadata,
    Column('ProductID', String, primary_key=True),
    Column('ProductName', String),
    Column('AboutProduct', String),
    Column('ProductRating', Float),
    Column('Price', String)
)

user_table = Table('User', metadata,
    Column('UserID', String, primary_key=True),
    Column('UserName', String),
    Column('UserEmail', String)
)

review_table = Table('Review', metadata,
    Column('ReviewID', String, primary_key=True),
    Column('UserID', String, ForeignKey('User.UserID')),
    Column('ProductID', String, ForeignKey('Product.ProductID')),
    Column('ReviewText', String),
    Column('Rating', Float)
)

# Create all tables
metadata.create_all(engine)

# Function to insert data into the table
def insert_data(df, table):
    df.to_sql(table.name, con=engine, index=False, if_exists='append')

# Load the prepared CSV files
product_data = final_product_data
user_data = final_user_data
review_data = final_review_data

# Insert data into the tables
insert_data(product_data, product_table)
insert_data(user_data, user_table)
insert_data(review_data, review_table)


  functions.register_function("flatten", flatten)


In [1]:
from database import run_query
from llm_callers import OpenAICaller

def make_review(product_id):
    # Run query to get all reviews for this product
    reviews_df = run_query(f'''SELECT * FROM "Review" WHERE "ProductID"='{product_id}';''')
    
    # Combine all reviews into a single string
    combined_reviews = ' '.join(reviews_df['ReviewText'].tolist())

    # Initialize the LLM caller with the appropriate prompt
    gpt_caller = OpenAICaller("You are an assistant who helps with product review generation.")
    
    # Call the LLM with the combined reviews and a specific prompt
    generated_review = gpt_caller.call_llm(f"Here are the product reviews from several users: {combined_reviews} Based on this information, please generate a high-level summary review of the product in 1 to 2 sentences.")

    return generated_review

# Example usage
make_review(
    product_id='B07JW9H4J1'
)

  functions.register_function("flatten", flatten)


'This product is generally well-received for its durability and effective charging performance, with many users satisfied with its value for money. However, some have noted that the charging speed is slower compared to original iPhone cables.'

In [2]:
from database import run_query
from llm_callers import OpenAICaller

def pointwise_recommender(user_id, product_id):
    # Get user's review history
    user_reviews = run_query(f'''SELECT "ReviewText" FROM "Review" WHERE "UserID"='{user_id}';''')
    user_reviews_text = ' '.join(user_reviews['ReviewText'].tolist())
    
    # Get the product information
    product_info = run_query(f'''SELECT * FROM "Product" WHERE "ProductID"='{product_id}';''').iloc[0]
    
    # Initialize the LLM caller
    gpt_caller = OpenAICaller("You are an assistant who helps with product recommendations based on user review history.")
    
    # Call the LLM
    response = gpt_caller.call_llm(f"Here is the user's review history: {user_reviews_text} Based on this history, will the user like the product '{product_info['ProductName']}'? Please provide a rating between 1 and 5.")
    return response

# Example usage
pointwise_recommender(
    user_id='AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBBSNLYT3ONILA,AHCTC6ULH4XB6YHDY6PCH2R772LQ,AGYHHIERNXKA6P5T7CZLXKVPT7IQ,AG4OGOFWXJZTQ2HKYIOCOY3KXF2Q,AENGU523SXMOS7JPDTW52PNNVWGQ,AEQJHCVTNINBS4FKTBGQRQTGTE5Q,AFC3FFC5PKFF5PMA52S3VCHOZ5FQ', 
    product_id='B07JW9H4J1'
)

"Based on the user's review history, it seems they value product durability, fast charging, and overall quality. They also mentioned that some products had slower charging speeds compared to the original iPhone cable. \n\nThe Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable claims to provide fast charging and is compatible with various iPhone models. Given the user's past positive remarks about charging speed and durability, it is likely that they would appreciate this product.\n\nHowever, since they have noted that the charging speed of some non-original cables was slower, this may cause hesitation. Still, considering the interest in fast charging and the positive feedback on durability, I would rate this product a **4 out of 5** for this user."

In [5]:
def pairwise_recommender(user_id, product_id1, product_id2):
    # Get user's review history
    user_reviews = run_query(f'''SELECT "ReviewText" FROM "Review" WHERE "UserID"='{user_id}';''')
    user_reviews_text = ' '.join(user_reviews['ReviewText'].tolist())
    
    # Get information for both products
    product_info1 = run_query(f'''SELECT * FROM "Product" WHERE "ProductID"='{product_id1}';''').iloc[0]
    product_info2 = run_query(f'''SELECT * FROM "Product" WHERE "ProductID"='{product_id2}';''').iloc[0]
    
    # Initialize the LLM caller
    gpt_caller = OpenAICaller("You are an assistant who helps with product recommendations based on user review history.")
    
    # Call the LLM
    response = gpt_caller.call_llm(f"Here is the user's review history: {user_reviews_text} Based on this history, which product would the user prefer: '{product_info1['ProductName']}' or '{product_info2['ProductName']}'? Please provide your preference and reasoning.")
    
    return response

# Example usage
pairwise_recommender(
    user_id='AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBBSNLYT3ONILA,AHCTC6ULH4XB6YHDY6PCH2R772LQ,AGYHHIERNXKA6P5T7CZLXKVPT7IQ,AG4OGOFWXJZTQ2HKYIOCOY3KXF2Q,AENGU523SXMOS7JPDTW52PNNVWGQ,AEQJHCVTNINBS4FKTBGQRQTGTE5Q,AFC3FFC5PKFF5PMA52S3VCHOZ5FQ', 
    product_id1='B09C6HXFC1', 
    product_id2='B09NHVCHS9')


'Based on the user\'s review history, it appears they have a strong preference for products that are durable, have good quality, and offer fast charging capabilities. They also seem to appreciate value for money and have expressed some concern about the charging speed of cables, particularly mentioning that some products are slower than original iPhone cables.\n\nThe **Duracell USB Lightning Apple Certified (Mfi) Braided Sync & Charge Cable** aligns well with the user\'s preferences because:\n\n1. **Compatibility**: It is specifically designed for iPhones, iPads, and iPods, which suggests that it would meet their charging needs effectively.\n2. **Fast Charging**: The user has highlighted their appreciation for fast-charging capabilities in previous reviews. Since the Duracell cable is marketed as a fast charging cable, this would likely satisfy their requirement for quicker charging.\n3. **Durability**: Being a braided cable usually implies enhanced durability, which the user has indic

In [7]:
def listwise_recommender(user_id, product_ids):
    # Get user's review history
    user_reviews = run_query(f'''SELECT "ReviewText" FROM "Review" WHERE "UserID"='{user_id}';''')
    user_reviews_text = ' '.join(user_reviews['ReviewText'].tolist())
    
    # Get information for all products
    product_info_list = [run_query(f'''SELECT * FROM "Product" WHERE "ProductID"='{product_id}';''').iloc[0] for product_id in product_ids]
    product_names = ', '.join([product_info['ProductName'] for product_info in product_info_list])
    
    # Initialize the LLM caller
    gpt_caller = OpenAICaller("You are an assistant who helps with product recommendations based on user review history.")
    
    # Call the LLM
    response = gpt_caller.call_llm(f"Here is the user's review history: {user_reviews_text} Based on this history, how would the user rank the following products: {product_names}? Please provide the ranking from most to least preferred.")

    return response

# Example usage
listwise_recommender(
    user_id='AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBBSNLYT3ONILA,AHCTC6ULH4XB6YHDY6PCH2R772LQ,AGYHHIERNXKA6P5T7CZLXKVPT7IQ,AG4OGOFWXJZTQ2HKYIOCOY3KXF2Q,AENGU523SXMOS7JPDTW52PNNVWGQ,AEQJHCVTNINBS4FKTBGQRQTGTE5Q,AFC3FFC5PKFF5PMA52S3VCHOZ5FQ', 
    product_ids=['B09W5XR9RT', 'B09NHVCHS9', 'B09C6HXFC1']
)


"Based on the user's review history, they seem to value durability, charging speed, and overall quality in their charging cables. Here’s how the user might rank the products:\n\n1. **Duracell USB C To Lightning Apple Certified (Mfi) Braided Sync & Charge Cable For iPhone, iPad And iPod. Fast Charging Lightning Cable, 3.9 Feet (1.2M) - Black:** The user appreciates fast charging capabilities and good quality, which this product offers being a certified option.\n\n2. **Duracell USB Lightning Apple Certified (Mfi) Braided Sync & Charge Cable For iPhone, iPad And iPod. Fast Charging Lightning Cable, 3.9 Feet (1.2M) - Black:** This product is similar to the first and would likely be rated highly due to its brand reliability, durability, and fast charging while matching the user's expectations for quality.\n\n3. **Flix Micro USB Cable For Smartphone (Black):** Given that the user primarily reviews products for iPhone and Apple devices, a Micro USB cable may not be as preferred. However, if t

In [8]:
def summarize_product_features(product_id):
    # Get the product information
    product_info = run_query(f'''SELECT "AboutProduct" FROM "Product" WHERE "ProductID"='{product_id}';''').iloc[0]['AboutProduct']
    
    # Initialize the LLM caller
    gpt_caller = OpenAICaller("You are an assistant who helps summarize product features.")
    
    # Call the LLM
    response = gpt_caller.call_llm(f"Here is the product information: {product_info} Please summarize the main features of this product into a short, easy-to-read summary of 1-2 paragraphs.")
    
    return response

# Example usage
print(summarize_product_features('B09C6HXFC1'))

This product is a durable 1.2-meter sync and charge cable designed for iOS devices, offering a maximum output of 2.4A for efficient charging. It is rigorously tested for longevity, boasting over 10,000 bends and 10,000 plugging/unplugging cycles to ensure a long lifespan. The cable supports fast and stable data transmission speeds of up to 480 Mbps.

With a compatible design that works seamlessly with MFi and Apple devices, including iPhone, iMac, and iPad, this cable meets a variety of user needs. Additionally, it comes with a 2-year warranty, providing peace of mind for users seeking quality and reliability in their charging solutions. Its tangle-free and tough braiding further enhances its durability for everyday use.
