# Custom Chatbot Project

I choose 2023 Fashion Trends Dataset. Because with this dataset we can create usefull application using large language models to suggest colors and type of dresses to people. Most of the people confuse when they create a style. This type of application would be useful

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [1]:
import pandas as pd

In [2]:
data_df = pd.read_csv("./data/2023_fashion_trends.csv")

In [9]:
data_df.head()

Unnamed: 0,URL,Trends,Source
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...
1,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...
2,https://www.refinery29.com/en-us/fashion-trend...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...
3,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...
4,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...


In [12]:
data_df["text"] = None
for i,row in data_df.iterrows():
    data_df.iloc[i]["text"] = data_df.iloc[i]["Trends"] + " Source: " + data_df.iloc[i]["Source"]

In [13]:
data_df.head()

Unnamed: 0,URL,Trends,Source,text
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Red. Glossy red hues took ...
1,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Cargo Pants. Utilitarian w...
2,https://www.refinery29.com/en-us/fashion-trend...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a..."
3,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Denim Reimagined. From dou...
4,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Shine For The Daytime. The...


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [14]:
import openai

In [57]:
openai.api_key="YOUR_API_KEY"

In [16]:
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
batch_size = 20
embeddings = []
for i in range(0, len(data_df), batch_size):
    # Send text data to OpenAI model to get embeddings
    response = openai.Embedding.create(
        input=data_df.iloc[i:i+batch_size]["text"].tolist(),
        engine=EMBEDDING_MODEL_NAME
    )

    # Add embeddings to list
    embeddings.extend([data["embedding"] for data in response["data"]])

# Add embeddings list to dataframe
data_df["embeddings"] = embeddings

In [17]:
len(data_df.iloc[0]["embeddings"])

1536

In [18]:
data_df.head()

Unnamed: 0,URL,Trends,Source,text,embeddings
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Red. Glossy red hues took ...,"[-0.015253156423568726, -0.019887520000338554,..."
1,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,"[-0.0005249278619885445, -0.028829893097281456..."
2,https://www.refinery29.com/en-us/fashion-trend...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...","[-0.008778235875070095, -0.020026393234729767,..."
3,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Denim Reimagined. From dou...,"[-0.00955200009047985, -0.0073344530537724495,..."
4,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Shine For The Daytime. The...,"[-0.0025164983235299587, 0.002365042455494404,..."


## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [19]:
from openai.embeddings_utils import get_embedding
from openai.embeddings_utils import distances_from_embeddings
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"

In [33]:
import tiktoken

def create_prompt(question, df, max_token_count):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings, return a text prompt to send to a Completion model
    """
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")

    # Count the number of tokens in the prompt template and question
    prompt_template = """
                        Answer the question based on the context below, and if the question
                        can't be answered based on the context, say "I don't know"

                        Context: 

                        {}

                        ---

                        Question: {}
                        Answer:
                        
                       """

    current_token_count = len(tokenizer.encode(prompt_template)) + \
                            len(tokenizer.encode(question))

    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:

        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count

        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question)

In [21]:
def get_rows_sorted_by_relevance(question, df):
    """
    Function that takes in a question string and a dataframe containing
    rows of text and associated embeddings, and returns that dataframe
    sorted from least to most relevant for that question
    """

    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)

    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine"
    )

    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy

In [34]:
question1 = "What are the fashion trends for woman for 2023 spring?"

In [40]:
context_q1_df = get_rows_sorted_by_relevance(question1,data_df)

In [42]:
max_token_count = 1000
prompt_q1 = create_prompt(question1, context_q1_df, max_token_count)

In [43]:
q1_answer = openai.Completion.create(model="gpt-3.5-turbo-instruct",prompt=prompt_q1,max_tokens=150)["choices"][0]["text"].strip()
print(q1_answer)

The fashion trends for women in spring 2023 include wearable surrealist designs with floral motifs, sheer clothing, white cotton tops, updated versions of classic button-up shirts and tailored looks with crisp white shirting.


### Question 2

In [55]:
question2 = "Which type of dresses were trend with red color?"
context_q2_df = get_rows_sorted_by_relevance(question2,data_df)
max_token_count = 1000
prompt_q2 = create_prompt(question2, context_q2_df, max_token_count)

In [56]:
q2_answer = openai.Completion.create(model="gpt-3.5-turbo-instruct",prompt=prompt_q2,max_tokens=150)["choices"][0]["text"].strip()
print(q2_answer)

Bold red dresses, tube dresses, draped dresses, and maxi dresses were all trending in red for Spring/Summer 2023.
