# Custom Chatbot Project

With the dataset loaded from the *2023_fashion_trends.csv* in the data folder, a cutom chatbot is created. This chatbot enhances the OpenAI ChatGPT 3.5 while giving the context of fashion trends after 2022.
ChatGPT 3.5 model is trained with data before 2022, and it will not give contents after that in its reply.

I have chosed to use *2023_fashion_trends.csv* dataset as it contains description about the fashion trend written in natural language, and easy to handle in the preparation step for OpenAI API calls.

In [73]:
# dependent packages in this projec
import pandas as pd
import numpy as np
import openai
from dateutil.parser import parse
import requests

import openai
from openai.embeddings_utils import get_embedding, distances_from_embeddings

OpenAI API calls require API key. You should obtain the key at *https://platform.openai.com/api-keys*.
This is a secret key, and you can keep it secret. In this notebook, the key itself is stored in a file, which is not uploaded into the github repository.

If you want to yuse your own, please set your key directly like

```
openai.api_key = <your key>
```

In [74]:
f = open("../openai_app.key", "rt")
openai.api_key = f.read()

To get response from open AI, text data should be transformed into vectors of floating point numbers. This is called **Embedding**. For embedding, somke models are provided by *Open AI*, and you can find the used model in *https://platform.openai.com/docs/guides/embeddings/embedding-models*.

In [75]:
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"


## Define utility functions

This is the utiliy function to add embedding column for the corresponding text column. The pandas DataFrame is provided as an argument. This DataFrame should have "text" column to get embedding for the text. 

As an ouput, Pandas DataFrame with "text" and "embedding" column will be returned.

In [76]:
def add_embeddings(df):
    if 'text' not in df_fashion.columns.to_list():
        raise KeyError("\'text\' should be defined in the data frame you provide")

    batch_size = 100
    embeddings = []

    df_copy = df.copy()
    for i in range(0, len(df_copy), batch_size):
        # Send text data to OpenAI model to get embeddings
        response = openai.Embedding.create(
            input=df.iloc[i:i+batch_size]["text"].tolist(),
            engine=EMBEDDING_MODEL_NAME
        )

        # Add embeddings to list
        embeddings.extend([data["embedding"] for data in response["data"]])

    # Add embeddings list to dataframe
    df_copy["embeddings"] = embeddings

    return df_copy

To find a relevant data, we get an ebeddings for the question text.
Based on the *cosine similarity* distance, the DataFrame is sorted.

In [84]:
def get_rows_sorted_by_relevance(question, df):
    """
    Function that takes in a question string and a dataframe containing
    rows of text and associated embeddings, and returns that dataframe
    sorted from least to most relevant for that question
    """

    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)

    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    #df_copy["embeddings"] = df_copy["embeddings"].apply(eval).apply(np.array)
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].to_list(),
        distance_metric="cosine"
    )

    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy

## Data Wrangling

In the cells below, load *2023_fashion_trends.csv* into a `pandas` dataframe with a column named `"text"`. 
Here the `"text"` is created by concatinating `"Trends"` adn `"Source"`.

## 2023 fashion trends

### Load data from a csv and inspect the data

In [78]:
df_fashion = pd.read_csv("data/2023_fashion_trends.csv")
display("columns: {}".format(df_fashion.columns.to_list()))
display("number of rows: {}".format(df_fashion.size))
display(df_fashion.describe())
df_fashion.head()

"columns: ['URL', 'Trends', 'Source']"

'number of rows: 246'

Unnamed: 0,URL,Trends,Source
count,82,82,82
unique,5,82,5
top,https://www.whowhatwear.com/spring-summer-2023...,2023 Fashion Trend: Red. Glossy red hues took ...,Spring/Summer 2023 Fashion Trends: 21 Expert-A...
freq,42,1,42


Unnamed: 0,URL,Trends,Source
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...
1,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...
2,https://www.refinery29.com/en-us/fashion-trend...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...
3,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...
4,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...


Seeing the table columns, both "Trends" and "Source" is meaningful input to add as a context. So in this chat, both will be included in the "Text" field.

In [79]:
df_fashion["text"] = df_fashion['Trends'] + "in" + df_fashion['Source']
df_fashion.tail()

Unnamed: 0,URL,Trends,Source,text
77,https://www.whowhatwear.com/spring-summer-2023...,"If lime green isn't your vibe, rest assured th...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"If lime green isn't your vibe, rest assured th..."
78,https://www.whowhatwear.com/spring-summer-2023...,"""As someone who can clearly (not fondly) remem...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"""As someone who can clearly (not fondly) remem..."
79,https://www.whowhatwear.com/spring-summer-2023...,"""Combine this design shift with the fact that ...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"""Combine this design shift with the fact that ..."
80,https://www.whowhatwear.com/spring-summer-2023...,Thought party season ended at the stroke of mi...,Spring/Summer 2023 Fashion Trends: 21 Expert-A...,Thought party season ended at the stroke of mi...
81,https://www.whowhatwear.com/spring-summer-2023...,"""This season, we saw the revival of the bubble...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"""This season, we saw the revival of the bubble..."


### Get embeddings.

In [80]:
df_embedded = add_embeddings(df_fashion)

In [81]:
df_embedded.head()

Unnamed: 0,URL,Trends,Source,text,embeddings
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Red. Glossy red hues took ...,"[-0.014947973191738129, -0.01965477503836155, ..."
1,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,"[0.0006544628995470703, -0.029493585228919983,..."
2,https://www.refinery29.com/en-us/fashion-trend...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...","[-0.006049180869013071, -0.020778054371476173,..."
3,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Denim Reimagined. From dou...,"[-0.0077167293056845665, -0.007716729305684566..."
4,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Shine For The Daytime. The...,"[0.0010094589088112116, 0.001649868325330317, ..."


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [82]:
MY_QUESTION = """List up the cool looking fashion in 2023?""";

In [85]:
df_embedded_sorted = get_rows_sorted_by_relevance(MY_QUESTION, df_embedded)
df_embedded_sorted

Unnamed: 0,URL,Trends,Source,text,embeddings,distances
63,https://www.whowhatwear.com/spring-summer-2023...,"""Every season, there is a trend that speaks to...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"""Every season, there is a trend that speaks to...","[-0.004253895487636328, -0.016355551779270172,...",0.130601
53,https://www.whowhatwear.com/spring-summer-2023...,"For spring 2023, there was a more surrealist i...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"For spring 2023, there was a more surrealist i...","[-0.026955636218190193, -0.005796519573777914,...",0.133132
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Red. Glossy red hues took ...,"[-0.014947973191738129, -0.01965477503836155, ...",0.133258
6,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cobalt Blue. The strongest...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Cobalt Blue. The strongest...,"[-0.0023217909038066864, -0.022100433707237244...",0.134244
56,https://www.whowhatwear.com/spring-summer-2023...,"Gen Z call it ""indie sleaze."" I call it my war...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"Gen Z call it ""indie sleaze."" I call it my war...","[-0.00038623865111730993, -0.00202168826945126...",0.134522
...,...,...,...,...,...,...
61,https://www.whowhatwear.com/spring-summer-2023...,"""As seen in Miu Miu's Milan Fashion Week show,...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"""As seen in Miu Miu's Milan Fashion Week show,...","[-0.02678501233458519, -0.005416037980467081, ...",0.204613
78,https://www.whowhatwear.com/spring-summer-2023...,"""As someone who can clearly (not fondly) remem...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...,"""As someone who can clearly (not fondly) remem...","[-0.007301459088921547, -0.0058418381959199905...",0.206437
7,https://www.instyle.com/spring-2023-fashion-tr...,Sculptural Bags That Make a Simple Statement. ...,"The Top 6 Trends to Wear for Spring 2023, Acco...",Sculptural Bags That Make a Simple Statement. ...,"[-0.02263840101659298, -0.00013554540055338293...",0.207002
18,https://www.glamour.com/story/spring-fashion-t...,"Oversized Bags. As cute as they can be, tiny b...",9 Spring 2023 Fashion Trends You’ll Want to Tr...,"Oversized Bags. As cute as they can be, tiny b...","[0.003914128988981247, -0.010709906928241253, ...",0.207272


### Compose a custom text prompt

In [86]:
import tiktoken

def create_prompt(question, df, max_token_count):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings, return a text prompt to send to a Completion model
    """
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")

    # Count the number of tokens in the prompt template and question
    prompt_template = """
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 

{}

---

Question: {}
Answer:"""

    current_token_count = len(tokenizer.encode(prompt_template)) + \
                            len(tokenizer.encode(question))

    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:

        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count

        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question)

Test the step.

In [87]:
prompt = create_prompt(MY_QUESTION, df_embedded_sorted, 100)
display(prompt)

'\nAnswer the question based on the context below, and if the question\ncan\'t be answered based on the context, say "I don\'t know"\n\nContext: \n\n\n\n---\n\nQuestion: List up the cool looking fashion in 2023?\nAnswer:'

Now define a function to get answers for the question.

In [88]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"

def answer_question(
    question, df, max_prompt_tokens=1800, max_answer_tokens=150
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model

    If the model produces an error, return an empty string
    """

    prompt = create_prompt(question, df, max_prompt_tokens)

    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

In [89]:
MY_QUESTION = """What is the most popular fasion in 2023 spring?""";

answer_question(MY_QUESTION, df_embedded)

'According to the given context, the most popular fashion trends for spring 2023 include reinvented classics, minimalist tailoring, edgy and nostalgic styles, sheer and mesh clothing, pinstripe tailoring, draped dresses, statement tops, and colorful 3D floral embellishments. However, fashion trends are constantly evolving and can vary depending on individual preferences and geographical location.'

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [103]:
MY_QUESTION = """What is the most popular bottoms in 2023 spring?""";

#### My custome query

In [104]:
answer_question(MY_QUESTION, df_embedded)

'The most popular bottoms in 2023 spring are bold colored options like what was seen on the runways of Bottega Veneta, Prada, and Dries Van Noten.'

In [105]:
response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=MY_QUESTION,
            max_tokens=150
        )
response["choices"][0]["text"].strip()

'As a language model AI, I do not have updated information on fashion trends. However, certain patterns and styles tend to remain popular over time and can be expected to be popular in the spring of 2023. These could include high-waisted pants, wide-leg trousers, culottes, and midi skirts. Other popular bottoms could include flowy or pleated midi skirts, paperbag waist pants, and denim jeans in various styles such as straight, wide-leg, and flare. Trends in 2023 could also include pastel colors, bold prints, and sustainable or eco-friendly materials. However, fashion trends are constantly changing and evolving, so it is challenging to accurately predict the most popular bottoms in 2023 spring.'

### Question 2

In [93]:
MY_QUESTION = """What fasion is most popular in 2023 in blue?""";


#### My custom query

In [94]:
answer_question(MY_QUESTION, df_embedded)

'Cobalt blue is the most popular fashion trend in 2023.'

In [95]:
response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=MY_QUESTION,
            max_tokens=150
        )
response["choices"][0]["text"].strip()

'It is difficult to predict exactly which fashion trends will be popular in 2023, as fashion is constantly evolving and can vary based on personal style and location. However, it is likely that we will continue to see a mix of classic styles and more avant-garde and experimental designs in blue. Some potential fashion trends in blue for 2023 could include:\n\n1. Sustainable and eco-friendly fashion: As environmental issues become more pressing, it is likely that the fashion industry will continue to shift towards more sustainable and environmentally friendly practices. This could manifest in the use of natural dyes and materials, as well as more conscious and ethical production methods, in blue fashion items.\n\n2. Denim reimagined: Denim is a timeless fashion staple'