# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

**Dataset:** 2023 Fashion Trends

To complete custom chatbot project, I have choosen the 2023 Fashion Trends dataset. This dataset contains information about the latest fashion trends for the year 2023, including styles, popular colors, fabric types, and other key fashion details. The dataset contains 82 rows and 3 different columns, making it well-suited for this such application.

**User Case Scenario:**

The custom chatbot will be designed to provide current and insightful information on 2023 fashion trends to users who are interested in staying up-to-date with the latest styles and designs. By integrating this dataset, the chatbot will be capable of answering specific questions about fashion trends, including details about popular colors, fabrics, styles, and accessories for the year 2023.

This customization would be particularly beneficial for fashion enthusiasts, stylists, shoppers, and individuals looking to update their wardrobe with the latest trends. It can also serve as a valuable tool for retailers, designers, and marketers in the fashion industry who need to keep latest information of current styles to meet their clients requirements effectively. 

By offering accurate information on 2023 fashion trends, the chatbot can assist users in making informed fashion choices, enhancing their personal style, and staying ahead in the dynamic world of fashion.

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [79]:
import pandas as pd

# load the dataset
df = pd.read_csv('data/2023_fashion_trends.csv')

In [80]:
# check first five rows of the dataset
df.head()

Unnamed: 0,URL,Trends,Source
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...
1,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...
2,https://www.refinery29.com/en-us/fashion-trend...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...
3,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...
4,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...


In [81]:
# check shape of the dataset
df.shape

(82, 3)

In [82]:
# check few trends rows
df.Trends[0]

'2023 Fashion Trend: Red. Glossy red hues took over the Fall 2023 runways ranging from Sandy Liang and PatBo to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags, jewelry).'

In [83]:
df.Trends[3]

"2023 Fashion Trend: Denim Reimagined. From double-waisted jeans to carpenter jeans, it's been a while since we were this excited about denim trends. It seems like even the most luxe runway designers agree, sending out strapless dresses, shirting, and even undergarments and shoes (thigh-high-boot-jean hybrids anyone?) in the material. Whatever category you decide on, opt for timeless cuts and silhouettes that can stay in your closet rotation once the novelty wears off."

In [84]:
# check if any rows has no information at all
df[df["Trends"].str.len() == 0]

Unnamed: 0,URL,Trends,Source


In [85]:
# the url and source is not a useful information to our customer unless they specifically ask for it.
# for my project, i do not use this information.
df.drop(['URL', 'Source'], axis=1, inplace=True)

In [86]:
df.head()

Unnamed: 0,Trends
0,2023 Fashion Trend: Red. Glossy red hues took ...
1,2023 Fashion Trend: Cargo Pants. Utilitarian w...
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a..."
3,2023 Fashion Trend: Denim Reimagined. From dou...
4,2023 Fashion Trend: Shine For The Daytime. The...


In [87]:
df = df.rename(columns={'Trends':'text'})

In [88]:
df.head()

Unnamed: 0,text
0,2023 Fashion Trend: Red. Glossy red hues took ...
1,2023 Fashion Trend: Cargo Pants. Utilitarian w...
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a..."
3,2023 Fashion Trend: Denim Reimagined. From dou...
4,2023 Fashion Trend: Shine For The Daytime. The...


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [89]:
import openai

In [91]:
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
batch_size = 100 # batch size

embeddings = []
for i in range(0, len(df), batch_size):
    # Get embeddings from OpenAI model
    response = openai.Embedding.create(
        input=df.iloc[i:i+batch_size]["text"].tolist(),
        engine=EMBEDDING_MODEL_NAME
    )
    
    # Add embeddings to list
    embeddings.extend([data["embedding"] for data in response["data"]])

# Add embeddings list to dataframe
df["embeddings"] = embeddings
df.head()

Unnamed: 0,text,embeddings
0,2023 Fashion Trend: Red. Glossy red hues took ...,"[-0.02112550474703312, -0.021673880517482758, ..."
1,2023 Fashion Trend: Cargo Pants. Utilitarian w...,"[-0.0017983651487156749, -0.029038064181804657..."
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...","[-0.010469219647347927, -0.019250944256782532,..."
3,2023 Fashion Trend: Denim Reimagined. From dou...,"[-0.01552637666463852, -0.005442269612103701, ..."
4,2023 Fashion Trend: Shine For The Daytime. The...,"[-0.005012418609112501, 0.0017600730061531067,..."


In [92]:
# saving the embedding dataset for future use
df.to_csv("embeddings.csv")

In [93]:
import numpy as np
# read the saved embedding
df = pd.read_csv("embeddings.csv", index_col=0)
df["embeddings"] = df["embeddings"].apply(eval).apply(np.array)
df

Unnamed: 0,text,embeddings
0,2023 Fashion Trend: Red. Glossy red hues took ...,"[-0.02112550474703312, -0.021673880517482758, ..."
1,2023 Fashion Trend: Cargo Pants. Utilitarian w...,"[-0.0017983651487156749, -0.029038064181804657..."
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...","[-0.010469219647347927, -0.019250944256782532,..."
3,2023 Fashion Trend: Denim Reimagined. From dou...,"[-0.01552637666463852, -0.005442269612103701, ..."
4,2023 Fashion Trend: Shine For The Daytime. The...,"[-0.005012418609112501, 0.0017600730061531067,..."
...,...,...
77,"If lime green isn't your vibe, rest assured th...","[-0.002778862603008747, -0.018288010731339455,..."
78,"""As someone who can clearly (not fondly) remem...","[-0.014756127260625362, -0.006549983285367489,..."
79,"""Combine this design shift with the fact that ...","[-0.020819125697016716, -0.025045057758688927,..."
80,Thought party season ended at the stroke of mi...,"[-0.01982172764837742, -0.022165512666106224, ..."


In [94]:
from openai.embeddings_utils import get_embedding, distances_from_embeddings

def get_rows_sorted_by_relevance(question, df):
    """
    This function takes in a question string and a dataframe containing
    rows of text and associated embeddings
    
    Returns: 
        Dataframe - sorted from least to most relevant for that question
    """
    
    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)
    
    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine" 
    )
    
    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy

In [95]:
q1_prompt = """
Question: "What is new in fashion in 2023."
Answer:
"""

initial_q1_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct", # using laster instruct gpt model
    prompt=q1_prompt,
    max_tokens=150
)["choices"][0]["text"].strip()

print(initial_q1_answer)


It is difficult to predict exactly what will be new in fashion in 2023 as fashion trends are constantly changing and evolving. However, some possible trends that may emerge in 2023 could include:

1. Sustainable and ethical fashion: As people become more conscious about environmental and social issues, there is a growing demand for sustainable and ethical fashion. This could include fashion brands using eco-friendly materials, promoting fair labor practices, and implementing sustainable production methods.

2. Bold colors and patterns: Bold and vibrant colors and patterns may be a popular trend in 2023, as seen in recent fashion weeks. This could include bright neon hues, graphic prints, and bold stripes.

3. Oversized and relaxed silhouettes: Comfortable and relaxed silhou


In [96]:
q2_prompt = """
Question: "what colour is the trends in fall 2023?"
Answer:
"""

initial_q2_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=q2_prompt,
    max_tokens=150
)["choices"][0]["text"].strip()

print(initial_q2_answer)

Unfortunately, it is impossible to predict the color trends for future years as fashion and trends are constantly changing and evolving. The best way to stay updated on fall 2023 color trends is to follow fashion news and forecasts closer to that time.


In [97]:
import tiktoken

def create_prompt(question, df, max_token_count):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings
    
    Return:
        a text prompt to send to a Completion model
    """
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")
    
    # Count the number of tokens in the prompt template and question
    prompt_template = """
    Answer the question based on the context below, and if the question
    can't be answered based on the context, say "I don't know"

    Context: 

    {}

    ---

    Question: {}
    
    Answer:"""
    
    current_token_count = len(tokenizer.encode(prompt_template)) + \
                            len(tokenizer.encode(question))
    
    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:
        
        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count
        
        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question) #suggested in the live class

In [98]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"

def custom_query(question, df, max_prompt_tokens=3000, max_answer_tokens=750):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, 
    
    Return:
        Answer to the question according to an OpenAI Completion model
        If the model produces an error, return an empty string
    """
    
    prompt = create_prompt(question, df, max_prompt_tokens)
    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

In [99]:
print(create_prompt("In which season a orange colour will be suitable to wear in 2023?", df, 1000))



    Answer the question based on the context below, and if the question
    can't be answered based on the context, say "I don't know"

    Context: 

    2023 Fashion Trend: Red. Glossy red hues took over the Fall 2023 runways ranging from Sandy Liang and PatBo to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags, jewelry).

###


###

2023 Fashion Trend: Cobalt Blue. The strongest color story to come out of Spring 2023 runways, cobalt blue has burst through the collections with the freshness of a sea mist on a morning day. Just bright enough to warrant a double take, yet subtle enough to be worked into daily wear, it's the type of deep blue that will excite even the most color-averse. Bonus points: It pairs well with Pantone's Viva Magenta.

###

2023 Fashion Trend: Shine For The Daytime. The amount of shine on the 2023 runways would make you thi

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [100]:
print(initial_q1_answer)

It is difficult to predict exactly what will be new in fashion in 2023 as fashion trends are constantly changing and evolving. However, some possible trends that may emerge in 2023 could include:

1. Sustainable and ethical fashion: As people become more conscious about environmental and social issues, there is a growing demand for sustainable and ethical fashion. This could include fashion brands using eco-friendly materials, promoting fair labor practices, and implementing sustainable production methods.

2. Bold colors and patterns: Bold and vibrant colors and patterns may be a popular trend in 2023, as seen in recent fashion weeks. This could include bright neon hues, graphic prints, and bold stripes.

3. Oversized and relaxed silhouettes: Comfortable and relaxed silhou


In [101]:
question1 = "What is new in fashion in 2023"
result_custom_q1 = custom_query(question1, df)

print(f"Custom query result for Question 1: {result_custom_q1}")

Custom query result for Question 1: Sheer clothing, shine for daytime, cobalt blue, red, cargo pants, indie sleaze, maxi skirts, denim reimagined, elevated basics, floral motifs, tailored trousers, mesh, pinstripe tailoring, tailoring, vinyl, ballet shoes, green, trains, blazers cinched, trompe l'oeil prints, bohemian smudgy prints, bubble skirts, and digitally manipulated prints are new in fashion for 2023.


### Question 2

In [102]:
print(initial_q2_answer)

Unfortunately, it is impossible to predict the color trends for future years as fashion and trends are constantly changing and evolving. The best way to stay updated on fall 2023 color trends is to follow fashion news and forecasts closer to that time.


In [None]:
question2 = "what colour is the trends in fall 2023?"

result_custom_q2 = custom_query(question2, df)

print(f"Custom query result for Question 2: {result_custom_q2}")