# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

For this Custom Chatbot project, I have strategically opted for the 2023 Fashion Trends dataset to drive our fashion-oriented conversational interface. This dataset meticulously captures the nuanced dynamics of the contemporary fashion landscape, encapsulating trending styles, popular color schemes, fabric preferences, and other pivotal fashion insights observed throughout 2023. This dataset aligns seamlessly with the objective of facilitating the development of a sophisticated chatbot tailored to meet the discerning needs of fashion enthusiasts and industry stakeholders alike.

A custom chatbot equipped with a tailored fashion trends dataset, such as the 2023 Fashion Trends dataset, holds significant potential for companies like H&M, Zalando, or any other fashion company. Here's why:

* **Real-time Insights**: The chatbot provides real-time insights into the latest fashion trends, ensuring that companies stay updated with the rapidly changing preferences of consumers. This enables fashion companies to adapt their product offerings and marketing strategies accordingly, staying ahead of competitors.

* **Personalized Customer Experience**: With the chatbot's ability to deliver customized information on fashion trends, companies can offer a personalized experience to their customers. By understanding individual preferences and style choices, companies like H&M and Zalando can recommend products that align with each customer's unique taste, leading to higher customer satisfaction and loyalty.

* **Market Research and Product Development**: The chatbot can also serve as a valuable tool for conducting market research and gathering insights into consumer preferences. By analyzing the interactions and queries received by the chatbot, fashion companies can identify emerging trends, popular styles, and customer preferences, informing their product development and inventory management strategies.

* **Enhanced Customer Engagement**: Integrating a chatbot with a custom fashion trends dataset enhances customer engagement by providing a convenient and interactive platform for users to explore the latest trends, seek fashion advice, and discover new styles. This fosters a deeper connection between the brand and its customers, driving brand loyalty and advocacy.

* **Competitive Advantage**: By leveraging a chatbot with exclusive access to current fashion trends data, companies can gain a competitive advantage in the market. They can position themselves as industry leaders who are at the forefront of innovation and trend forecasting, attracting fashion-forward consumers and setting trends rather than merely following them.

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [1]:
import pandas as pd
import openai
import numpy as np
import tiktoken

In [2]:
# read data
df = pd.read_csv('data/2023_fashion_trends.csv')

In [3]:
# display first rows
df.head()

Unnamed: 0,URL,Trends,Source
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...
1,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...
2,https://www.refinery29.com/en-us/fashion-trend...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...
3,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...
4,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...


In [4]:
# shows number of lines and columns
df.shape

(82, 3)

In [5]:
# drop irrelevant columns
df.drop(['URL', 'Source'], axis=1, inplace=True)

In [6]:
# rename context colum as 'text' due to project instruction
df = df.rename(columns={'Trends': 'text'})

In [7]:
# display first rows again
df.head()

Unnamed: 0,text
0,2023 Fashion Trend: Red. Glossy red hues took ...
1,2023 Fashion Trend: Cargo Pants. Utilitarian w...
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a..."
3,2023 Fashion Trend: Denim Reimagined. From dou...
4,2023 Fashion Trend: Shine For The Daytime. The...


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [8]:
# OpenAI API
client = openai.OpenAI()

In [9]:
# Set up a similarity calculator based on cosine distance

class SimilarityCalculator:
    """
    A class to calculate similarity between text embeddings using cosine similarity.

    Attributes:
        client: An object providing access to text embedding services.
    """

    def __init__(self, client):
        """
        Initializes the SimilarityCalculator with a client object.

        Parameters:
            client: An object providing access to text embedding services via OpenAI.
        """
        self.client = client

    def get_embedding(self, text: str) -> list:
        """
        Retrieves the embedding of the given text using the client's embedding service.

        Parameters:
            text: A string representing the text for which the embedding is needed.

        Returns:
            A list representing the embedding of the text.
        """
        # Call the client's embedding service to get the embedding for the text
        response = self.client.embeddings.create(
            input=text,
            model="text-embedding-ada-002"
        )
        # Return the embedding from the response
        return response.data[0].embedding

    def calculate_cosine_similarity(self, embedding1, embedding2):
        """
        Calculates the cosine similarity between two embeddings.

        Parameters:
            embedding1: The embedding of the first text.
            embedding2: The embedding of the second text.

        Returns:
            The cosine similarity score between the two embeddings.
        """
        # Calculate the dot product of the two embeddings
        dot_product = np.dot(embedding1, embedding2)
        # Calculate the norms of the embeddings
        norm_embedding1 = np.linalg.norm(embedding1)
        norm_embedding2 = np.linalg.norm(embedding2)
        # Calculate the cosine similarity using the dot product and the norms
        cosine_similarity = dot_product / (norm_embedding1 * norm_embedding2)
        return cosine_similarity

In [10]:
# Initialize the similarity calculator
similarity_calculator = SimilarityCalculator(openai)

In [11]:
# Calculate embeddings for each text and store them in a new column 'embedding'
df['embeddings'] = df['text'].apply(
    lambda x: similarity_calculator.get_embedding(x))

In [12]:
# Calculate cosine similarity for each pair of embeddings
cosine_similarities = []
for i in range(len(df)):
    for j in range(i+1, len(df)):
        similarity_score = similarity_calculator.calculate_cosine_similarity(
            df['embeddings'][i], df['embeddings'][j])
        cosine_similarities.append(similarity_score)

In [13]:
# Set up function sorting df by relevance based on cosine similarity
def sort_dataframe_by_relevance(question, df, similarity_calculator):
    """
    Sorts a DataFrame by relevance scores calculated based on the similarity between
    the provided question and embeddings in the DataFrame.

    Parameters:
    - question (str): The input question for relevance comparison.
    - df (DataFrame): The pandas DataFrame containing embeddings to be compared with the question.
    - similarity_calculator (SimilarityCalculator): An object capable of calculating embeddings
      and cosine similarity scores.

    Returns:
    - sorted_df (DataFrame): The DataFrame sorted in descending order of relevance scores
      based on cosine similarity with the question.
    """
    # Calculate the embedding of the question
    question_embedding = similarity_calculator.get_embedding(question)

    # Calculate cosine similarity between the question embedding and each embedding in the DataFrame
    relevance_scores = []
    for i, row in df.iterrows():
        similarity_score = similarity_calculator.calculate_cosine_similarity(
            question_embedding, row['embeddings'])
        relevance_scores.append(similarity_score)

    # Add relevance scores to the DataFrame
    df['relevance_score'] = relevance_scores

    # Sort the DataFrame based on relevance scores
    sorted_df = df.sort_values(by='relevance_score', ascending=False)

    return sorted_df

In [14]:
# Set up variables for model and max tokens
model_name = "gpt-3.5-turbo-instruct"
max_tokens = 2400

In [15]:
# Set up function for generating an answer (without providing a context) for a given prompt
def generate_initial_answer(prompt, max_tokens):
    """
    Generate an initial answer based on the given prompt.

    Args:
        prompt (str): The prompt to generate the initial answer from.
        max_tokens (int): The maximum number of tokens for the initial answer, 
                          ensuring it does not exceed the limit of 4,096 tokens.

    Returns:
        tuple: A tuple containing the finish reason and the initial answer text.
               - finish_reason (str): The reason for finishing the generation process.
               - initial_answer (str): The generated initial answer.
    """
    # Ensure that max_tokens does not exceed the limit of 4,096 tokens
    max_tokens = min(max_tokens, 4096)

    # Make the API call to create completions
    response = client.completions.create(
        model=model_name,
        prompt=prompt,
        max_tokens=max_tokens
    )

    # Extract finish_reason and text directly from the response.choices[0]
    finish_reason = response.choices[0].finish_reason

    # Truncate the text to fit within the max_tokens limit
    initial_answer = response.choices[0].text.strip()[:max_tokens]

    return finish_reason, initial_answer

In [16]:
# first question
prompt_q1 = """
Question: "What are the fashion trends in the year 2023?"
Answer:
"""

In [17]:
# Generate and print the initial answer along with finish_reason for first question
initial_finishreason_q1, initial_answer_q1 = generate_initial_answer(
    prompt_q1, max_tokens)
print("Finish Reason:", initial_finishreason_q1)
print("Initial Answer:", initial_answer_q1)

Finish Reason: stop
Initial Answer: Fashion trends for 2023 are still developing, but based on current social and cultural shifts, here are some potential fashion trends that could emerge in the year 2023:

1. Sustainable and Ethical Fashion 
In recent years, there has been a growing movement towards ethical and sustainable fashion. In 2023, this trend is likely to become more prevalent as consumers become more aware of the environmental and social impact of fast fashion. This could lead to an increased focus on using eco-friendly materials, fair labor practices, and transparent supply chains in the fashion industry.

2. Maximalism 
While minimalism has been a dominant trend in fashion for the past few years, maximalism is on the rise. In 2023, bold and loud fashion choices, such as statement prints, bright colors, and oversized silhouettes, could become more popular. This trend reflects a desire for self-expression and individuality in fashion.

3. Nostalgia and Retro Revivals 
Fashio

In [18]:
# second question
prompt_q2 = """
Question: "What colors are trending in the year 2023?"
Answer:
"""

In [19]:
# Generate and print the initial answer along with finish_reason for second question
initial_finishreason_q2, initial_answer_q2 = generate_initial_answer(
    prompt_q2, max_tokens)
print("Finish Reason:", initial_finishreason_q2)
print("Initial Answer:", initial_answer_q2)

Finish Reason: stop
Initial Answer: It is difficult to predict the exact colors that will be trending in the year 2023 as fashion and design trends are constantly evolving. However, according to experts, some colors that may be popular in 2023 include ecru, terra cotta, emerald green, burgundy, and shades of blue such as navy and aqua. Additionally, with a focus on sustainability and eco-friendliness, natural and earthy tones are expected to be popular in 2023, such as warm beige, terracotta, and olive green. Ultimately, color trends in 2023 will be influenced by a variety of factors, including cultural and social influences, advancements in technology, and overarching design trends.


In [20]:
# setting up custom query with context fed into the prompt
def create_prompt(question, df, max_token_count):
    """
    Create a prompt for a question based on the provided context dataframe, 
    ensuring that the total token count does not exceed a specified limit.

    Parameters:
    - question (str): The question to be answered based on the provided context.
    - df (DataFrame): The pandas DataFrame containing the context.
    - max_token_count (int): The maximum token count allowed in the prompt.

    Returns:
    - str: The formatted prompt containing the context and the question.
    """

    # Create a tokenizer designed to align with the embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")

    # Count the number of tokens in the prompt template and the question
    prompt_template = """
    Answer the question based on the context provided below. If the question
    cannot be answered based on the context, respond with "I don't know".

    Context: 

    {}

    ---

    Question: {}
    
    Answer:"""

    current_token_count = len(tokenizer.encode(prompt_template)) + \
        len(tokenizer.encode(question))

    # Sort the dataframe by relevance using the provided similarity calculator
    sorted_dataframe = sort_dataframe_by_relevance(
        question, df, similarity_calculator)

    context = []
    # Iterate through the sorted dataframe
    for _, row in sorted_dataframe.iterrows():
        text = row["text"]

        # Increase the token counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count

        # Add the row of text to the context list if the token count limit has not been exceeded
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    # Return the formatted prompt
    return prompt_template.format("\n\n###\n\n".join(context), question)

In [21]:
# Double-check custom prompt input
print(create_prompt("Is black a color that is suitable to wear in 2023?", df, 1000))


    Answer the question based on the context provided below. If the question
    cannot be answered based on the context, respond with "I don't know".

    Context: 

    2023 Fashion Trend: Cobalt Blue. The strongest color story to come out of Spring 2023 runways, cobalt blue has burst through the collections with the freshness of a sea mist on a morning day. Just bright enough to warrant a double take, yet subtle enough to be worked into daily wear, it's the type of deep blue that will excite even the most color-averse. Bonus points: It pairs well with Pantone's Viva Magenta.

###

2023 Fashion Trend: Red. Glossy red hues took over the Fall 2023 runways ranging from Sandy Liang and PatBo to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags, jewelry).

###

2023 Fashion Trend: Shine For The Daytime. The amount of shine on the 2023 runways would ma

In [22]:
# Set up function for generating an answer (with providing a context) for a given prompt
def generate_custom_answer(question, max_tokens, max_tokens_create_prompt=1000):
    """
    Generate a custom answer for a given question using an API call.

    Args:
    - question (str): The question for which a custom answer is generated.
    - max_tokens (int): The maximum number of tokens allowed for the generated answer. It should not exceed 4,096 tokens.
    - max_tokens_create_prompt (int, optional): The maximum number of tokens allowed for creating the prompt. Defaults to 1000. It should not exceed 4,096 tokens.

    Returns:
    - tuple: A tuple containing the finish_reason and the custom answer generated.
        - finish_reason (str): The reason for finishing the generation process.
        - custom_answer (str): The generated custom answer.

    Notes:
    - This function ensures that the maximum number of tokens does not exceed the limit of 4,096 tokens.
    - It creates a prompt using the 'create_prompt' function.
    - Makes an API call to OpenAI to create completions based on the provided model and prompt.
    - Extracts the finish_reason and text from the API response.
    - Truncates the text to fit within the 'max_tokens' limit before returning.

    """
    # Ensure that max_tokens does not exceed the limit of 4,096 tokens
    max_tokens = min(max_tokens, 4096)
    # Ensure that max_tokens_create_prompt does not exceed the limit of 4,096 tokens
    max_tokens_create_prompt = min(max_tokens_create_prompt, 4096)

    # Create prompt using create_prompt function
    prompt = create_prompt(question=question, df=df,
                           max_token_count=max_tokens_create_prompt)

    # Make the OpenAI API call to create completions
    response = client.completions.create(
        model=model_name,
        prompt=prompt,
        max_tokens=max_tokens
    )

    # Extract finish_reason and text directly from the response.choices[0]
    finish_reason = response.choices[0].finish_reason

    # Truncate the text to fit within the max_tokens limit
    custom_answer = response.choices[0].text.strip()[:max_tokens]

    return finish_reason, custom_answer

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [23]:
# print first question
print(prompt_q1)


Question: "What are the fashion trends in the year 2023?"
Answer:



In [24]:
# print initial answer for the first question
print("Finish Reason:", initial_finishreason_q1)
print("Initial Answer:", initial_answer_q1)

Finish Reason: stop
Initial Answer: Fashion trends for 2023 are still developing, but based on current social and cultural shifts, here are some potential fashion trends that could emerge in the year 2023:

1. Sustainable and Ethical Fashion 
In recent years, there has been a growing movement towards ethical and sustainable fashion. In 2023, this trend is likely to become more prevalent as consumers become more aware of the environmental and social impact of fast fashion. This could lead to an increased focus on using eco-friendly materials, fair labor practices, and transparent supply chains in the fashion industry.

2. Maximalism 
While minimalism has been a dominant trend in fashion for the past few years, maximalism is on the rise. In 2023, bold and loud fashion choices, such as statement prints, bright colors, and oversized silhouettes, could become more popular. This trend reflects a desire for self-expression and individuality in fashion.

3. Nostalgia and Retro Revivals 
Fashio

In [25]:
# print updated answer for contextualized prompt  for the first question
custom_finishreason_q1, custom_answer_q1 = generate_custom_answer(
    prompt_q1, max_tokens)
print("Finish Reason:", custom_finishreason_q1)
print("Initial Answer:", custom_answer_q1)

Finish Reason: stop
Initial Answer: Shine for the Daytime, Red, Cobalt Blue, Sheer Clothing, 3D Floral Designs, Elevated Basics, Cargo Pants, and Maxi Skirts.


### Question 2

In [26]:
# print second question
print(prompt_q2)


Question: "What colors are trending in the year 2023?"
Answer:



In [27]:
# print initial answer for the second question
print("Finish Reason:", initial_finishreason_q2)
print("Initial Answer:", initial_answer_q2)

Finish Reason: stop
Initial Answer: It is difficult to predict the exact colors that will be trending in the year 2023 as fashion and design trends are constantly evolving. However, according to experts, some colors that may be popular in 2023 include ecru, terra cotta, emerald green, burgundy, and shades of blue such as navy and aqua. Additionally, with a focus on sustainability and eco-friendliness, natural and earthy tones are expected to be popular in 2023, such as warm beige, terracotta, and olive green. Ultimately, color trends in 2023 will be influenced by a variety of factors, including cultural and social influences, advancements in technology, and overarching design trends.


In [28]:
# print updated answer for contextualized prompt for the second question
custom_finishreason_q2, custom_answer_q2 = generate_custom_answer(
    prompt_q2, max_tokens)
print("Finish Reason:", custom_finishreason_q2)
print("Initial Answer:", custom_answer_q2)

Finish Reason: stop
Initial Answer: Cobalt blue, red, and green.
