# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

I decided to use the provided fashion dataset and extended it with the wikipedia page ‘2020s in fashion’ to mix a different writing style into the dataset and get a more objective overview. the wikipedia article also describes where the trends are inspired from and which decades before they were influenced by.<br>
The chatbot would be useful in a scenario involving trend analysis and fashion tips for the year 2023.

In [2]:
# all needed imports

import pandas as pd
import requests
from openai.embeddings_utils import get_embedding, distances_from_embeddings
import tiktoken
import openai
from IPython.display import display, Markdown

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [3]:
# load the provided fashion trend data into dataframe
df = pd.read_csv("data/2023_fashion_trends.csv")
df.rename(columns={"Trends": "text"}, inplace=True)
df = df.drop(columns=['URL', 'Source'])
df

Unnamed: 0,text
0,2023 Fashion Trend: Red. Glossy red hues took ...
1,2023 Fashion Trend: Cargo Pants. Utilitarian w...
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a..."
3,2023 Fashion Trend: Denim Reimagined. From dou...
4,2023 Fashion Trend: Shine For The Daytime. The...
...,...
77,"If lime green isn't your vibe, rest assured th..."
78,"""As someone who can clearly (not fondly) remem..."
79,"""Combine this design shift with the fact that ..."
80,Thought party season ended at the stroke of mi...


In [4]:
# Get the Wikipedia page for "2020s in fashion"

params = {
    "action": "query", 
    "prop": "extracts",
    "exlimit": 1,
    "titles": "2020s in fashion",
    "explaintext": 1,
    "formatversion": 2,
    "format": "json"
}
resp = requests.get("https://en.wikipedia.org/w/api.php", params=params)
response_dict = resp.json()
response_dict["query"]["pages"][0]["extract"].split("\n")

['The fashions of the 2020s represent a departure from 2010s fashion and feature a nostalgia for older aesthetics. They have been largely inspired by styles of the late 1990s to mid-2000s, and 1980s. Early in the decade, several publications noted the shortened trend and nostalgia cycle in 2020s fashion. Fashion was also shaped by the COVID-19 pandemic, which had a major impact on the fashion industry, and led to shifting retail and consumer trends.',
 'In the 2020s, many companies, including current fast fashion giants such as Shein and Temu, have been using social media platforms such as TikTok and Instagram as a marketing tool. Marketing strategies involving third parties, particularly influencers and celebrities, have become prominent tactics. E-commerce platforms which promote small businesses, such as Depop and Etsy, grew by offering vintage, homemade, or resold clothing from individual sellers. Thrifting has also exploded in popularity due to it being centered around finding val

In [5]:
# load wikipedia dataset into dataframe

df_wiki = pd.DataFrame()
df_wiki["text"] = response_dict["query"]["pages"][0]["extract"].split("\n")
df_wiki = df_wiki[(df_wiki["text"].str.len() > 0) ]
df_wiki = df_wiki[(~df_wiki["text"].str.startswith("=="))]
df_wiki = df_wiki[(~df_wiki["text"].str.startswith("\t"))]

df_wiki = df_wiki.iloc[:-5].reset_index(drop=True)
df_wiki

Unnamed: 0,text
0,The fashions of the 2020s represent a departur...
1,"In the 2020s, many companies, including curren..."
2,"During the COVID-19 pandemic, wearing a face m..."
3,"Fashion that prioritized comfort, style, and s..."
4,"In the United States, athletic wear such as un..."
...,...
79,"""About Time: Fashion and Duration"" October 26 ..."
80,In America: A Lexicon of Fashion at the Costum...
81,"""Fashioning Masculinities: The Art of Menswear..."
82,"""Africa Fashion"" July 2 - April 16, 2023 at Vi..."


In [6]:
# combine both datasets

df_dataset = pd.concat([df, df_wiki], ignore_index=True)
df_dataset

Unnamed: 0,text
0,2023 Fashion Trend: Red. Glossy red hues took ...
1,2023 Fashion Trend: Cargo Pants. Utilitarian w...
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a..."
3,2023 Fashion Trend: Denim Reimagined. From dou...
4,2023 Fashion Trend: Shine For The Daytime. The...
...,...
161,"""About Time: Fashion and Duration"" October 26 ..."
162,In America: A Lexicon of Fashion at the Costum...
163,"""Fashioning Masculinities: The Art of Menswear..."
164,"""Africa Fashion"" July 2 - April 16, 2023 at Vi..."


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [7]:
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = "YOUR API KEY"

In [8]:
# get embeddings for dataset

EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
batch_size = 100
embeddings = []
for i in range(0, len(df_dataset), batch_size):
    # Send text data to OpenAI model to get embeddings
    response = openai.Embedding.create(
        input=df_dataset.iloc[i:i+batch_size]["text"].tolist(),
        engine=EMBEDDING_MODEL_NAME
    )

    # Add embeddings to list
    embeddings.extend([data["embedding"] for data in response["data"]])

# Add embeddings list to dataframe
df_dataset["embeddings"] = embeddings
df_dataset

Unnamed: 0,text,embeddings
0,2023 Fashion Trend: Red. Glossy red hues took ...,"[-0.020744433626532555, -0.022010771557688713,..."
1,2023 Fashion Trend: Cargo Pants. Utilitarian w...,"[-0.0020284035708755255, -0.02862541377544403,..."
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...","[-0.010437785647809505, -0.019189883023500443,..."
3,2023 Fashion Trend: Denim Reimagined. From dou...,"[-0.015542690642178059, -0.005413859151303768,..."
4,2023 Fashion Trend: Shine For The Daytime. The...,"[-0.004980848170816898, 0.0017538908869028091,..."
...,...,...
161,"""About Time: Fashion and Duration"" October 26 ...","[-0.01773577369749546, -0.015697935596108437, ..."
162,In America: A Lexicon of Fashion at the Costum...,"[-0.016588587313890457, -0.006583962123841047,..."
163,"""Fashioning Masculinities: The Art of Menswear...","[-0.02403056062757969, -0.01970973052084446, -..."
164,"""Africa Fashion"" July 2 - April 16, 2023 at Vi...","[-0.015775732696056366, -0.020846504718065262,..."


In [9]:
# sort rows by relevance regarding the question

def get_rows_sorted_by_relevance(question, df):
    """
    Function that takes in a question string and a dataframe containing
    rows of text and associated embeddings, and returns that dataframe
    sorted from least to most relevant for that question
    """

    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)

    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine"
    )

    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy

In [10]:
# test functionality

example_question = "What are the fashion trends of February 2023?"
get_rows_sorted_by_relevance(example_question, df_dataset)

Unnamed: 0,text,embeddings,distances
53,"For spring 2023, there was a more surrealist i...","[-0.034433700144290924, -0.00899236835539341, ...",0.135561
6,2023 Fashion Trend: Cobalt Blue. The strongest...,"[-0.006997744552791119, -0.021408746019005775,...",0.136013
63,"""Every season, there is a trend that speaks to...","[-0.0108375558629632, -0.018693776801228523, 0...",0.140335
2,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...","[-0.010437785647809505, -0.019189883023500443,...",0.142032
4,2023 Fashion Trend: Shine For The Daytime. The...,"[-0.004980848170816898, 0.0017538908869028091,...",0.143355
...,...,...,...
138,One of the staples of early to mid 2020s fashi...,"[-0.0037799938581883907, -0.01533444318920374,...",0.257676
141,"Nymphet, also known as coquette, is an aesthet...","[-0.033557992428541183, -0.012761678546667099,...",0.259015
143,Bimbocore was criticized for glamorizing the s...,"[-0.04411861300468445, -0.047443728893995285, ...",0.261864
154,Common skincare techniques in America during t...,"[-0.006802489515393972, 0.010999912396073341, ...",0.265923


In [11]:
# create custom prompt

def create_prompt(question, df, max_token_count):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings, return a text prompt to send to a Completion model
    """
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")

    # Count the number of tokens in the prompt template and question
    prompt_template = """
You are an expert in fashion trends. Answer the way a true fashion enthusiast would speak.
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know."

Context: 

{}

---

Question: {}
Answer:"""

    current_token_count = len(tokenizer.encode(prompt_template)) + \
                            len(tokenizer.encode(question))

    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:

        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count

        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question)

In [12]:
# test functionality

max_token_count = 3500
example_prompt = create_prompt(example_question, df_dataset, max_token_count)
print(example_prompt)


You are an expert in fashion trends. Answer the way a true fashion enthusiast would speak.
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know."

Context: 

For spring 2023, there was a more surrealist interpretation with standout 3D designs and runway looks embellished with floral motifs. Standouts included provocative sculptural flowers on mini and maxi dresses paired with bold leaf shoes, says Page.

###

2023 Fashion Trend: Cobalt Blue. The strongest color story to come out of Spring 2023 runways, cobalt blue has burst through the collections with the freshness of a sea mist on a morning day. Just bright enough to warrant a double take, yet subtle enough to be worked into daily wear, it's the type of deep blue that will excite even the most color-averse. Bonus points: It pairs well with Pantone's Viva Magenta.

###


###

2023 Fashion Trend: Sheer Clothing. "Bare it all" has been the motto since the end of t

In [13]:
# get answer for prompt with context

COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"

def answer_question(
    question, df, max_prompt_tokens=1800, max_answer_tokens=150
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model

    If the model produces an error, return an empty string
    """

    prompt = create_prompt(question, df, max_prompt_tokens)

    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [14]:
question_1 = "What are the fashion trends of Spring 2023?"

In [15]:
example_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question_1,
    max_tokens=150
)["choices"][0]["text"].strip()
print(example_answer)

It is impossible to accurately predict fashion trends for Spring 2023 as it is several years in the future. Trends can often change quickly and are influenced by a variety of factors such as popular culture, world events, and social media. 
However, some general predictions for possible fashion trends in Spring 2023 could include:
1. Sustainability and minimalism: With the increasing focus on the environment and ethical fashion, there could be a shift towards more sustainable and minimalistic designs in Spring 2023.
2. Gender-neutral fashion: The boundaries between men's and women's fashion are becoming more blurred, and designers may continue to incorporate gender-neutral elements in their collections.
3. Bold colors and prints: Spring is traditionally associated with bright colors and


In [16]:
custom_fashion_answer = answer_question(question_1, df_dataset)
print(custom_fashion_answer)

For Spring 2023, some of the top fashion trends include surrealist 3D designs and floral embellishments, bold cobalt blue hues, glossy reds, sheer clothing, utilitarian cargo pants, shine for daytime looks, and the tailored minimalism trend. Other trends to look out for are the indie sleaze aesthetic, psychedelic 1960s fashion, and elevated basics.


### Question 2

In [17]:
question_2 = "What were the Hairtrends during lockdown in 2023?"

In [18]:
example_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question_2,
    max_tokens=150
)["choices"][0]["text"].strip()
print(example_answer)

It is difficult to predict hair trends for a specific year in the future, as fashion and trends are constantly changing. However, some potential hair trends during a lockdown in 2023 could include longer and more natural hair styles, as people may not have access to salons for regular haircuts and maintenance. Bright and vibrant hair colors could also gain popularity, as people look for ways to add excitement and fun to their at-home hair care routines. Alternatively, simpler and more low-maintenance styles such as braids, ponytails, and messy buns could also become popular as people prioritize comfort and ease during a lockdown. Ultimately, the hair trends during a lockdown in 2023 will depend on various factors and may vary greatly among different individuals and


In [19]:
custom_fashion_answer = answer_question(question_2, df_dataset)
print(custom_fashion_answer)

During lockdown in 2023, popular hairstyles included curtain bangs, 1980s, 1990s, and early 2000s-inspired bangs, hair extensions, ponytails, twin pigtails, French braids, and the "wolf cut". Natural hair for Black American women was also a popular trend.


### Input Loop

In [21]:
# While-Schleife für kontinuierliche Fragen
display(Markdown("**Jean Bot Gaultier:**"))
print("""Hello, I'm Jean Bot Gaultier, your expert in fashion, styles and trends!
      How can i help you?
      Type your question.
      Type "Thank you!" to end this conversation.""")

while True:
    # Benutzer nach einer Frage fragen
    display(Markdown("**You:**"))
    question = input()
    
    # Wenn der Benutzer "exit" eingibt, beenden wir die Schleife
    if question.lower() == "thank you!":
        display(Markdown("**Jean Bot Gaultier:**"))
        print("See you on the runway!")
        break
    
    # Funktion aufrufen, um die Antwort zu bekommen
    custom_fashion_answer = answer_question(question, df_dataset)
    
    # Antwort ausgeben
    display(Markdown('**Jean Bot Gaultier:**'))
    print(f"{custom_fashion_answer}\n")

**Jean Bot Gaultier:**

Hello, I'm Jean Bot Gaultier, your expert in fashion, styles and trends!
      How can i help you?
      Type your question.
      Type "Thank you!" to end this conversation.


**You:**

What are the fashion trends of Autumn 2023?


**Jean Bot Gaultier:**

Red, glossy red hues, shine for the daytime, and elevated basics are expected to continue into Autumn 2023. Additionally, green, cobalt blue, and indie sleaze (a mix of muddy hues, distressed denim, and chunky silver hardware for an edgier look) are expected to be popular in Autumn 2023. Finally, metallics and cargo pants are also expected to be trend for Autumn 2023.



**You:**

Which colors were popular in Summer 2023?


**Jean Bot Gaultier:**

Cobalt blue, glossy red, green, navy blue, neon green, electric blue, purple, white, coral, baby pink, light grey, silver, pastel pink, violet, pale blue, lavender, mint green, faded yellow, pastel teal, lemon yellow, orange, red, and brown were all popular colors in Summer 2023.



**You:**

Thank you!


**Jean Bot Gaultier:**

See you on the runway!
