# Custom Chatbot Project

The datasets choosen to compose the knowledge base for this Chatbot include the CSV Fashion Trends for 2023 and a Wikipedia on Red Carpet Fashion for celebrities. The idea is creating a personal stylist (or a first step towards it)!

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [1]:
## Using Wikipedia Red Carpet Data

In [2]:
from dateutil.parser import parse
import pandas as pd
import requests

# Get the Wikipedia page for "2023"
resp = requests.get("https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exlimit=1&titles=Red_carpet_fashion&explaintext=1&formatversion=2&format=json")

# Load page text into a dataframe
df1 = pd.DataFrame()
df1["text"] = resp.json()["query"]["pages"][0]["extract"].split("\n")

# Clean up text to remove empty lines and headings
df1 = df1[(df1["text"].str.len() > 0) & (~df1["text"].str.startswith("=="))]

# In some cases dates are used as headings instead of being part of the
# text sample; adjust so dated text samples start with dates
prefix = ""
for (i, row) in df1.iterrows():
    # If the row already has " - ", it already has the needed date prefix
    if " – " not in row["text"]:
        try:
            # If the row's text is a date, set it as the new prefix
            parse(row["text"])
            prefix = row["text"]
        except:
            # If the row's text isn't a date, add the prefix
            row["text"] = prefix + " – " + row["text"]
df1 = df1[df1["text"].str.contains(" – ")]
df1

Unnamed: 0,text
0,– Red carpet fashion consists of outfits worn...
4,"– Prior to the 1990s, many celebrities chose ..."
8,"– Having a dress featured on the red carpet, ..."
10,"– ""The dollar return of having some celebrity..."
12,"– ""I can't possibly quantify how much publici..."
16,– Fashion commentary often forms a key part o...
20,– When the 65th Golden Globe Awards ceremony ...
24,– List of individual dresses


In [3]:
# Checking on the other Provided CSV datasets

In [4]:
def load_local_dataset(filename):
    df = pd.read_csv(filename, index_col=0)
    return df

In [5]:
df2 = load_local_dataset("./data/2023_fashion_trends.csv")
df2

Unnamed: 0_level_0,Trends,Source
URL,Unnamed: 1_level_1,Unnamed: 2_level_1
https://www.refinery29.com/en-us/fashion-trends-2023,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...
https://www.refinery29.com/en-us/fashion-trends-2023,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...
https://www.refinery29.com/en-us/fashion-trends-2023,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...
https://www.refinery29.com/en-us/fashion-trends-2023,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...
https://www.refinery29.com/en-us/fashion-trends-2023,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...
...,...,...
https://www.whowhatwear.com/spring-summer-2023-fashion-trends/,"If lime green isn't your vibe, rest assured th...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...
https://www.whowhatwear.com/spring-summer-2023-fashion-trends/,"""As someone who can clearly (not fondly) remem...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...
https://www.whowhatwear.com/spring-summer-2023-fashion-trends/,"""Combine this design shift with the fact that ...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...
https://www.whowhatwear.com/spring-summer-2023-fashion-trends/,Thought party season ended at the stroke of mi...,Spring/Summer 2023 Fashion Trends: 21 Expert-A...


In [6]:
df3 = load_local_dataset("./data/character_descriptions.csv")
df3

Unnamed: 0_level_0,Description,Medium,Setting
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Emily,"A young woman in her early 20s, Emily is an as...",Play,England
Jack,"A middle-aged man in his 40s, Jack is a succes...",Play,England
Alice,"A woman in her late 30s, Alice is a warm and n...",Play,England
Tom,"A man in his 50s, Tom is a retired soldier and...",Play,England
Sarah,"A woman in her mid-20s, Sarah is a free-spirit...",Play,England
George,"A man in his early 30s, George is a charming a...",Play,England
Rachel,"A woman in her late 20s, Rachel is a shy and i...",Play,England
John,"A man in his 60s, John is a retired professor ...",Play,England
Maria,"A middle-aged Latina woman in her 40s, Maria i...",Movie,Texas
Caleb,"A young African American man in his early 20s,...",Movie,Texas


In [7]:
df4 = load_local_dataset("./data/nyc_food_scrap_drop_off_sites.csv")
df4

Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,borocd,councildist,...,location_point,:@computed_region_yeji_bk3q,:@computed_region_92fq_4b7q,:@computed_region_sbqj_enih,:@computed_region_efsh_h5xi,:@computed_region_f5dn_yrer,notes,ct2010,bbl,bin
0,Staten Island,Grasmere-Arrochar-South Beach-Dongan Hills,South Beach,"21 Robin Road, Staten Island NY",Snug Harbor Youth,Year Round,Friday (Start Time: 1:30 PM - End Time: 4:30 PM),snug-harbor.org,502,50,...,"{'type': 'Point', 'coordinates': [-74.062991, ...",1.0,14.0,76.0,10692.0,30.0,,,,
1,Manhattan,Inwood,SE Corner of Broadway & Academy Street,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,112,10,...,,,,,,,Download the app to access bins. Accepts all f...,,,
2,Brooklyn,Park Slope,Old Stone House Brooklyn,"336 3rd St, Brooklyn, NY 11215",Old Stone House Brooklyn,Year Round,24/7 (Start Time: 24/7 - End Time: 24/7),,306,39,...,"{'type': 'Point', 'coordinates': [-73.984731, ...",2.0,27.0,50.0,17617.0,14.0,,,,
3,Manhattan,East Harlem (North),SE Corner of Pleasant Avenue & E 116 Street,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,8,...,,,,,,,Download the app to access bins. Accepts all f...,,,
4,Queens,Corona,Malcolm X FSDO,"111-26 Northern Blvd, Flushing, NY 11368",NYC Compost Project Hosted by Big Reuse,Year Round,Tuesdays (Start Time: 12:00 PM - End Time: 2:...,,404,21,...,"{'type': 'Point', 'coordinates': [-73.8630721,...",3.0,21.0,68.0,14510.0,66.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
571,Brooklyn,Kensington,Albemarle Road and McDonald Avenue,southwest corner of McDonald Avenue and Albema...,NYC Compost Project Hosted by LES Ecology Center,Year Round,Tuesdays (Start Time: 10:00 AM - End Time: 2:...,https://www.lesecologycenter.org/programs/comp...,312,39,...,"{'type': 'Point', 'coordinates': [-73.97997, 4...",2.0,27.0,39.0,17620.0,2.0,"Not accepted: meat, bones, or dairy",,,
572,Queens,Old Astoria-Hallets Point,NW Corner of 21st Street & 30th Drive,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,401,22,...,,,,,,,Download the app to access bins. Accepts all f...,,,
573,Brooklyn,Crown Heights (North),Rochester Avenue & St. Johns Place,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,308,41,...,,,,,,,Download the app to access bins. Accepts all f...,,,
574,Brooklyn,Windsor Terrace-South Slope,*CLOSED FOR THE SEASON* East 4th Street Commun...,"173 E 4th St, Brooklyn, NY 11218",Members at East 4th Street Community Garden,April - October,Wednesdays and Saturdays (Start Time: Wednesda...,https://eastfourthstreetgarden.tumblr.com/,307,39,...,"{'type': 'Point', 'coordinates': [-73.9772287,...",2.0,27.0,45.0,17620.0,9.0,"Not accepted: meat, bones, or dairy",,,


In [8]:
# Final composed dataset will use the Wikipedia dataset with the Fashion Trends CSV
# The other two CSV files won´t be useful for this task

In [9]:
df = pd.DataFrame(df1["text"].tolist() + df2["Trends"].tolist(), columns=['text'])
df

Unnamed: 0,text
0,– Red carpet fashion consists of outfits worn...
1,"– Prior to the 1990s, many celebrities chose ..."
2,"– Having a dress featured on the red carpet, ..."
3,"– ""The dollar return of having some celebrity..."
4,"– ""I can't possibly quantify how much publici..."
...,...
85,"If lime green isn't your vibe, rest assured th..."
86,"""As someone who can clearly (not fondly) remem..."
87,"""Combine this design shift with the fact that ..."
88,Thought party season ended at the stroke of mi...


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [28]:
import openai
openai.api_key = "YOUR API KEY"

In [11]:
# Generating Embeddings
# We'll use the Embedding tooling from OpenAI documentation here to create vectors representing each row of our custom dataset.
# In order to avoid a RateLimitError we'll send our data in batches to the Embedding.create function.

In [12]:
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
batch_size = 100
embeddings = []
for i in range(0, len(df), batch_size):
    # Send text data to OpenAI model to get embeddings
    response = openai.Embedding.create(
        input=df.iloc[i:i+batch_size]["text"].tolist(),
        engine=EMBEDDING_MODEL_NAME
    )
    
    # Add embeddings to list
    embeddings.extend([data["embedding"] for data in response["data"]])

# Add embeddings list to dataframe
df["embeddings"] = embeddings
df

Unnamed: 0,text,embeddings
0,– Red carpet fashion consists of outfits worn...,"[0.005555762909352779, -0.014635973609983921, ..."
1,"– Prior to the 1990s, many celebrities chose ...","[0.0023636522237211466, -0.028377050533890724,..."
2,"– Having a dress featured on the red carpet, ...","[-0.0323331244289875, -0.015973027795553207, 0..."
3,"– ""The dollar return of having some celebrity...","[-0.018000848591327667, -0.003702789079397917,..."
4,"– ""I can't possibly quantify how much publici...","[-0.004414374940097332, -0.011313684284687042,..."
...,...,...
85,"If lime green isn't your vibe, rest assured th...","[-0.0027722418308258057, -0.018304530531167984..."
86,"""As someone who can clearly (not fondly) remem...","[-0.014718359336256981, -0.006478246301412582,..."
87,"""Combine this design shift with the fact that ...","[-0.020781872794032097, -0.025097381323575974,..."
88,Thought party season ended at the stroke of mi...,"[-0.019844094291329384, -0.022332968190312386,..."


In [13]:
# In order to avoid having to run that code again in the future, we'll save the generated embeddings as a CSV file.

df.to_csv("embeddings.csv")

In [14]:
# What we are implementing here is similar to a search engine or recommendation algorithm. We want to sort all of the rows of our dataset from least relevant to most relevant.
#This will use the embeddings that we generated previously in order to compare the vectorized version of our question to the vectorized versions of the rows of the dataset.

In [15]:
from openai.embeddings_utils import get_embedding, distances_from_embeddings

def get_rows_sorted_by_relevance(question, df):
    """
    Function that takes in a question string and a dataframe containing
    rows of text and associated embeddings, and returns that dataframe
    sorted from least to most relevant for that question
    """
    
    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)
    
    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine"
    )
    
    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy

In [16]:
# Let's test that out for a couple different questions:
get_rows_sorted_by_relevance("What is a good trending dress?", df)

Unnamed: 0,text,embeddings,distances
84,"""If there's one dress update you consider for ...","[-0.036194704473018646, -0.028788169845938683,...",0.154156
41,Draped Dressing. I can’t wait to wear spring’s...,"[-0.015158192254602909, -0.004933060612529516,...",0.157154
80,"""Sheer fashion dominated the trends conversati...","[-0.012716786004602909, 0.001000983058474958, ...",0.162433
11,2023 Fashion Trend: Denim Reimagined. From dou...,"[-0.015580940060317516, -0.005439954809844494,...",0.167760
10,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...","[-0.01045039389282465, -0.01925569400191307, 0...",0.169798
...,...,...,...
34,Bright White Footwear. I'm excited to lighten ...,"[0.0003186866524629295, -0.028604554012417793,...",0.234027
26,"Oversized Bags. As cute as they can be, tiny b...","[-0.0016623392002657056, -0.006312215700745583...",0.236258
45,The-Bigger-the-Better Bags. Gone are the days ...,"[-0.011331110261380672, -0.0035833714064210653...",0.240910
35,"Sculptural Statement Earrings. For me, this sp...","[-0.03322748839855194, -0.004589776508510113, ...",0.241846


In [17]:
get_rows_sorted_by_relevance("What should I choose to wear in a Hollywood Oscar Awards?", df)

Unnamed: 0,text,embeddings,distances
1,"– Prior to the 1990s, many celebrities chose ...","[0.0023636522237211466, -0.028377050533890724,...",0.149775
0,– Red carpet fashion consists of outfits worn...,"[0.005555762909352779, -0.014635973609983921, ...",0.180514
6,– When the 65th Golden Globe Awards ceremony ...,"[-0.006143881939351559, -0.02652035467326641, ...",0.189700
2,"– Having a dress featured on the red carpet, ...","[-0.0323331244289875, -0.015973027795553207, 0...",0.193474
4,"– ""I can't possibly quantify how much publici...","[-0.004414374940097332, -0.011313684284687042,...",0.194693
...,...,...,...
45,The-Bigger-the-Better Bags. Gone are the days ...,"[-0.011331110261380672, -0.0035833714064210653...",0.264268
35,"Sculptural Statement Earrings. For me, this sp...","[-0.03322748839855194, -0.004589776508510113, ...",0.264694
13,2023 Fashion Trend: Maxi Skirts. In response t...,"[-0.02548186294734478, -0.019320974126458168, ...",0.269355
69,"""As seen in Miu Miu's Milan Fashion Week show,...","[-0.03086216375231743, -0.007000280544161797, ...",0.271559


In [18]:
# Use ChatGPT LLM to get final responses:

In [19]:
import tiktoken

def create_prompt(question, df, max_token_count):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings, return a text prompt to send to a Completion model
    """
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")
    
    # Count the number of tokens in the prompt template and question
    prompt_template = """
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 

{}

---

Question: {}
Answer:"""
    
    current_token_count = len(tokenizer.encode(prompt_template)) + \
                            len(tokenizer.encode(question))
    
    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:
        
        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count
        
        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question)

In [20]:
print(create_prompt("What is a good trending dress?", df, 200))


Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 



---

Question: What is a good trending dress?
Answer:


In [21]:
# Our final step is to send that text prompt to a Completion model and parse the model output!

COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"

def answer_question(
    question, df, max_prompt_tokens=1800, max_answer_tokens=150
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model
    
    If the model produces an error, return an empty string
    """
    
    prompt = create_prompt(question, df, max_prompt_tokens)
    
    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

In [22]:
question1_prompt = """
Question: "What is the trend fashion in 2023?"
Answer:
"""
initial_question1_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question1_prompt,
    max_tokens=150
)["choices"][0]["text"].strip()
print(initial_question1_answer)

It is difficult to predict specific fashion trends for 2023, as fashion is constantly evolving and changing. However, some experts predict that sustainable and eco-friendly fashion will continue to be a major trend as consumers become more conscious about their impact on the environment. Minimalism and gender-neutral styles may also gain popularity, as well as bold and vibrant colors and playful prints. Athleisure and streetwear are also expected to remain strong trends in 2023. Ultimately, fashion is a form of self-expression and personal style, so it is likely that there will be a mix of various trends in the fashion world in 2023.


In [23]:
question2_prompt = """
Question: "What should I choose as a male dressing to wear in a Hollywood Oscar Awards?"
Answer:
"""
initial_question2_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question2_prompt,
    max_tokens=150
)["choices"][0]["text"].strip()
print(initial_question2_answer)

As a male attending the Hollywood Oscar Awards, you should choose a formal and stylish outfit. Some options could include a sharp tuxedo with a classic black bowtie or a sleek, tailored suit in a bold color like navy blue or burgundy. Accessorize with a pocket square, polished dress shoes, and a statement watch. You could also consider a modern take on the traditional attire by opting for a fitted suit with a black shirt and no tie. Whichever outfit you choose, make sure it is well-fitted and reflects your personal style while being appropriate for a formal event like the Oscars.


### Question 1

In [24]:
custom_question1_answer = answer_question("What is the trend fashion in 2023?", df)
print(custom_question1_answer)

Some of the trending fashion styles in 2023 include sheer clothing, red hues, shine for daytime, maximalist trends, cargo pants, elevated basics, denim reimagined, surrealism and 3D designs, maxi skirts, slouchy-fit trousers, green hues, perfectly cut trousers, pinstripe tailoring, and "indie sleaze" style.


In [25]:
print("Basic Completion Answer: " + initial_question1_answer)

Basic Completion Answer: It is difficult to predict specific fashion trends for 2023, as fashion is constantly evolving and changing. However, some experts predict that sustainable and eco-friendly fashion will continue to be a major trend as consumers become more conscious about their impact on the environment. Minimalism and gender-neutral styles may also gain popularity, as well as bold and vibrant colors and playful prints. Athleisure and streetwear are also expected to remain strong trends in 2023. Ultimately, fashion is a form of self-expression and personal style, so it is likely that there will be a mix of various trends in the fashion world in 2023.


### Question 2

In [26]:
custom_question2_answer = answer_question("What should I choose as a male dressing to wear in a Hollywood Oscar Awards?", df)
print(custom_question2_answer)

The best option for a male dressing for the Hollywood Oscar Awards is to choose a tailored suit, preferably in a bold color such as red or lime green. This will help you stand out on the red carpet and make a fashionable statement.


In [27]:
print("Basic Completion Answer: " + initial_question2_answer)

Basic Completion Answer: As a male attending the Hollywood Oscar Awards, you should choose a formal and stylish outfit. Some options could include a sharp tuxedo with a classic black bowtie or a sleek, tailored suit in a bold color like navy blue or burgundy. Accessorize with a pocket square, polished dress shoes, and a statement watch. You could also consider a modern take on the traditional attire by opting for a fitted suit with a black shirt and no tie. Whichever outfit you choose, make sure it is well-fitted and reflects your personal style while being appropriate for a formal event like the Oscars.
