<a href="https://colab.research.google.com/github/Zeaxanthin80/CAI2300C/blob/main/Assignments/Assignment%202.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" width="200"/></a>

# Assignment 2

## Building a Semantic Search Engine for an E-Commerce site with OpenAI.




---


## Step 1: Understanding Semantic Search

**Use Case:**

Imagine you launch an e-commerce store specializing in electronics. A customer searches for a "4K TV." With a traditional keyword-based search, the results would only include TVs explicitly labeled with those exact words. In contrast, semantic search interprets the intent behind the query. It can surface TVs described with terms like "4K," "OLED," or even "curved TV," even if those exact words weren’t used in the search. Additionally, it can prioritize products with reviews highlighting phrases such as "stunning visuals" or "immersive experience," even if those words aren’t in the product description. This results in more relevant search results, enhancing the customer’s experience and increasing the chances of a purchase.





---


## Step 2: Setting Up the Environment












In [3]:
from openai import OpenAI  # Import the OpenAI library to interact with OpenAI's API
from scipy.spatial import distance  # Import distance from scipy.spatial for computing vector distances
import numpy as np  # Import NumPy for numerical operations

from google.colab import userdata
openai = userdata.get('OPENAI_KEY')

# Initialize OpenAI client with your API key
client = OpenAI(api_key=openai)  # Replace "your_key" with your actual OpenAI API key

# Function to generate embeddings for a list of input texts
def create_embeddings(texts, model="text-embedding-3-small"):
    """
    This function takes a list of texts and generates embeddings using the specified OpenAI model.

    Parameters:
    texts (list of str): List of input texts to be embedded.
    model (str): The name of the embedding model to use (default is "text-embedding-3-small").

    Returns:
    list of lists: A list containing embedding vectors for each input text.
    """
    embeddings = []  # Initialize an empty list to store the embeddings
    for text in texts:  # Iterate through each text in the input list
        response = client.embeddings.create(input=text, model=model)
        embeddings.append(response.data[0].embedding)  # Extract and store the embedding vector
    return embeddings  # Return the list of embeddings




---


## Step 3: Data Preparation



In [4]:
# This is a list of customer reviews for 4K tv's.

tv_reviews_4K = [
 'I would definitely recommend this TV to others.',
 'The picture is crisp and clear.',
 'The local dimming is a standout feature.  It really enhances the viewing experience.',
 'The local dimming feature is a great addition. It makes the picture more immersive.',
 'The picture is crisp and clear.',
 'Very happy with this purchase.  Easy to set up and use. The smart features are great.',
 'Great price for 75” and picture quality is superb!',
 'I love the design of this TV. It looks great in my living room.',
 'The viewing angles are very good.',
 'The local dimming is a must-have for any serious home theater enthusiast.',
 'This TV is a great upgrade from my old one.',
 'This TV is a great addition to my home entertainment setup.',
 'The remote control is intuitive and easy to use.',
 'The picture is crisp and clear.',
 'I would definitely recommend this TV to others.',
 'The built-in apps work perfectly.',
 'I love the design of this TV.  It looks great in my living room.',
 'The local dimming is fantastic! Blacks are truly black, and the contrast is amazing.',
 "I'm impressed with the overall performance of this TV.",
 'This TV is perfect for watching movies and TV shows.',
 'The build in sound gets very loud “it’s a good thing”, remote with backlit is a plus when  se for visibility, it’s very responsive google tv.',
 'The connectivity options are excellent.',
 'This TV is perfect for watching movies and TV shows.',
 'I would definitely recommend this TV to others.',
 'The viewing angles are very good.',
 'The smart features are very responsive and easy to navigate.',
 "The blacks are so deep thanks to the excellent local dimming.  I'm very happy with this purchase.",
 'I would definitely recommend this TV to others.',
 'The local dimming works flawlessly, enhancing the contrast and depth of the image.',
 'Local dimming performance is top-notch.  Worth every penny.',
 'The setup process was quick and painless.',
 'I love the design of this TV.  It looks great in my living room.',
 'The built-in apps work perfectly.',
 'The sound quality is better than I expected.',
 'I would definitely recommend this TV to others.',
 'This TV is a great addition to my home entertainment setup.',
 "If you're looking for a large-screen TV with excellent picture quality and modern gaming features at a reasonable price, the TCL Q7 is a solid choice.",
 'The local dimming is subtle but effective, making a significant difference in picture quality.',
 'I would definitely recommend this TV to others.',
 'The colors are vibrant and lifelike.',
 'I love the design of this TV.  It looks great in my living room.',
 'I would definitely recommend this TV to others.',
 'The picture is crisp and clear.',
 'The built-in apps work perfectly.',
 'The setup process was quick and painless.',
 'The TV is lightweight and easy to move.',
 'Very happy with this purchase.  Easy to set up and use.  The smart features are great.',
 'The local dimming is a must-have for any serious home theater enthusiast.',
 'The connectivity options are excellent.',
 'Very happy with this purchase.  Easy to set up and use.  The smart features are great.',
 'This TV is a great addition to my home entertainment setup.',
 'I love the design of this TV.  It looks great in my living room.',
 'The local dimming works incredibly well.  No more washed-out blacks. Highly recommend!',
 'The TV is lightweight and easy to move.',
 "I love how the local dimming improves the overall picture quality. It's a noticeable upgrade.",
 'Great TV!  The picture quality is excellent, and the sound is surprisingly good.',
 'I would definitely recommend this TV to others.',
 'The local dimming is a must-have for any serious home theater enthusiast.',
 'Great TV!  The picture quality is excellent, and the sound is surprisingly good.',
 'This TV is a great upgrade from my old one.',
 'I would definitely recommend this TV to others.',
 'The smart features are very responsive and easy to navigate.',
 'The built-in apps work perfectly.',
 'The viewing angles are very good.',
 'The TV is lightweight and easy to move.',
 'This TV is perfect for watching movies and TV shows.',
 'The local dimming is very effective, providing excellent contrast and shadow detail.',
 'I would definitely recommend this TV to others.',
 "I'm impressed with the overall performance of this TV.",
 'Great TV!  The picture quality is excellent, and the sound is surprisingly good.',
 'This TV is perfect for watching movies and TV shows.',
 'The connectivity options are excellent.',
 'The local dimming feature is a great addition. It makes the picture more immersive.',
 'The local dimming is very effective, providing excellent contrast and shadow detail.',
 "I'm impressed with the overall performance of this TV.",
 "I'm very satisfied with this purchase.",
 'The built-in apps work perfectly.',
 'The connectivity options are excellent.',
 'Very happy with this purchase.  Easy to set up and use.  The smart features are great.',
 'The smart features are very responsive and easy to navigate.',
 "I'm very satisfied with this purchase.",
 'The local dimming is a standout feature, creating a truly immersive viewing experience.',
 'The setup process was quick and painless.',
 'This TV is a great upgrade from my old one.',
 "I'm very satisfied with this purchase.",
 'I would definitely recommend this TV to others.',
 "The TCL Q750 75-inch TV offers a compelling blend of performance and value that's hard to beat.",
 'The local dimming is a huge improvement over my previous TV. The picture is so much better.',
 "I'm very satisfied with this purchase.",
 'The local dimming works incredibly well.  No more washed-out blacks. Highly recommend!',
 'The connectivity options are excellent.',
 'The sound quality is better than I expected.',
 'The connectivity options are excellent.',
 'The local dimming works flawlessly, enhancing the contrast and depth of the image.',
 'The viewing angles are very good.',
 "I'm impressed with the overall performance of this TV.",
 "I love how the local dimming improves the overall picture quality. It's a noticeable upgrade.",
 "Whether I'm watching a movie, playing games, or streaming sports, the picture quality is consistently excellent.",
 "I'm impressed with the overall performance of this TV.",
 'The picture is crisp and clear.']

In [5]:
# Generate embeddings for the complaints by calling the function create_embeddings()
# This section of the code aims to create and store embeddings for each customer review in the tv_reviews_4K list.
reviews = []
embeddings = create_embeddings(tv_reviews_4K, model="text-embedding-3-small")

# This line starts a loop that iterates through each customer review and its corresponding embedding.
# The zip function is used to combine the tv_reviews_4K and embeddings lists, so that in each iteration of the loop, review will hold a single customer review and embedding will hold its corresponding embedding.
# Inside the loop, the line creates a dictionary containing the review and its embedding, and appends it to the reviews list. This way, the reviews list ultimately stores each review along with its numerical representation (embedding).
for review, embedding in zip(tv_reviews_4K, embeddings):
    reviews.append({"review": review, "embedding": embedding})



---


## Step 4: Implementing Semantic Search

In [6]:
# User search querys
search_text = "local dimming"

# Generate the embedding for the query
# This line calls the create_embeddings function to generate an embedding for the search_text.
# The result (the embedding) is stored in the search_embedding variable.
search_embedding = create_embeddings([search_text])[0]

# Calculate cosine distances between the query and reviews
distances = []
# This loop iterates through each customer review in the reviews list.
# For each review, it calculates the cosine distance between the search_embedding and the review["embedding"].
for review in reviews:
    dist = distance.cosine(search_embedding, review["embedding"])
    distances.append(dist)

# Find the closest review
# np.argmin(distances) finds the index of the minimum value in the distances list.
# This index corresponds to the review that is most similar to the search query.
min_dist_ind = np.argmin(distances)
closest_review = reviews[min_dist_ind]

print(f"Search Query: {search_text}")
print(f"Closest Review: {closest_review['review']}")

Search Query: local dimming
Closest Review: Local dimming performance is top-notch.  Worth every penny.




---


## Step 5: Visualizing with Gradio

In [7]:
# Gradio is a library that makes it easy to create user interfaces.
!pip install --upgrade gradio -qqq
import gradio as gr

# Define the search function
# This is the core of the semantic search functionality.
# This defines a function called find_similar_reviews that takes the user's search text as input.
# The rest of the lines process the query just like the steps above
def find_similar_reviews(query):
    search_embedding = create_embeddings([query])[0]
    distances = [distance.cosine(search_embedding, c["embedding"]) for c in reviews]
    min_dist_ind = np.argmin(distances)
    closest_review = reviews[min_dist_ind]
    return f"Query: {query}\n\nMost Similar Review: {closest_review['review']}"

# Create the Gradio interface
# This creates a Gradio interface object.
interface = gr.Interface(
    # This specifies that the find_similar_reviews function will be called when the user interacts with the interface.
    fn=find_similar_reviews,
    inputs="text",
    outputs="text",
    title="Semantic Search for Customer Reviews",
    description="Enter a customer query to find similar reviews in the database."
)

# Launch the app
interface.launch()

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.2/62.2 MB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.9/321.9 kB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.8/94.8 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.5/12.5 MB[0m [31m87.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.5/71.5 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.3/62.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hRunning Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL

