# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

## Data Wrangling

TODO: I chose "2023_fashion_trends.csv", because I will create a fashion's assistant to help potential e-commerce clothes' buyers to find apropriate fashion clothes to their context. 

In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [1]:
import pandas as pd
df = pd.read_csv("./data/2023_fashion_trends.csv")

In [2]:
df['text'] = df['URL'] + "-" + df['Trends'] + "-" + df['Source']
df['text'].loc[0]

'https://www.refinery29.com/en-us/fashion-trends-2023-2023 Fashion Trend: Red. Glossy red hues took over the Fall 2023 runways ranging from Sandy Liang and PatBo to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags, jewelry).-7 Fashion Trends That Will Take Over 2023 — Shop Them Now'

## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [3]:
import openai
import tiktoken
from openai.embeddings_utils import distances_from_embeddings, get_embedding
import numpy as np
import pandas as pd


In [4]:
openai.api_key='YOUR API KEY'

In [5]:
def rag_prep_01(df, question):
    df_prep = df.copy()
    ##Add Embedding Feature in Dataset:
    response_embed = openai.Embedding.create(
        input=df_prep['text'].tolist(), 
        model="text-embedding-ada-002"
    )
    embeddings = [data['embedding'] for data in response_embed['data']]
    df_prep['embedding'] = embeddings
    ##Extract embedding feature of question:
    question_embedding = get_embedding(question, engine="text-embedding-ada-002")
    ##Calculate Cosine Distance of two Embeddings:
    distances = distances_from_embeddings(question_embedding, df_prep['embedding'].tolist())
    df_prep['distances'] = distances
    return df_prep.sort_values(by='distances')

In [6]:
def rag_prep_02(df, question):
    
    prompt_template = """
    You are a recommendation system of fashion clothes.
    Answer the question based on the context below, and if question 
    can't be answered based on the context, say "I don't know"
    
    Context:
    
    {}
    
    ---
    
    Question:
    {}
    
    Answer:
    """
    
    tokenizer = tiktoken.get_encoding("cl100k_base")
    max_token_count = 2000
    
    current_token_count = len(tokenizer.encode(question))+len(tokenizer.encode(prompt_template))
    context = []
    
    for text in df['text']:
        current_token_count += len(tokenizer.encode(text))
        if current_token_count<=max_token_count:
            context.append(text)
            
    print("Context Length: ", len(tokenizer.encode("\n\n###\n\n".join(context))))
    final_prompt = prompt_template.format("\n\n###\n\n".join(context), question)
    
    initial_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=final_prompt,
    max_tokens=150
    )['choices'][0]['text']

    return final_prompt, initial_answer

In [9]:
def rag(df, question):
    df_prep_01 = rag_prep_01(df=df, question=question)
    final_prompt, initial_answer = rag_prep_02(df=df_prep_01, question=question)
    return final_prompt, initial_answer

def benchmark(question):
    initial_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question,
    max_tokens=150
    )['choices'][0]['text']
    
    return question, initial_answer
    

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [11]:
fashion_question_01 = "I wanna go to Night Club. What clothes would you suggest I wear?"

In [12]:
prompt, answer = rag(df=df, question=fashion_question_01)
print(answer)

Context Length:  1886

    Based on the context, I would recommend wearing lingerie-inspired clothes such as sheer and lace garments or dresses with bold colors and metallic finishes. You could also try layering a mesh top or dress with a leather jacket for a grunge-inspired look.


In [None]:
print(prompt)

#### Benchmark

In [13]:
prompt_bench, answer_bench = benchmark(question=fashion_question_01)
print(answer_bench)



It depends on the type of night club and its dress code, but here are a few suggestions:

1. For a trendy and modern night club, try a fitted top or crop top paired with high-waisted pants or a skirt and a statement pair of heels.
2. If you're going to a hip-hop or R&B club, opt for a bold and flashy outfit such as a sequin dress, oversized t-shirt with biker shorts, or a jumpsuit with chunky jewelry.
3. A classic little black dress paired with statement earrings and strappy heels is always a stylish option for any type of night club.
4. If the night club has a more casual and relaxed vibe, you can also wear a stylish jumpsuit


In [14]:
print(prompt_bench)

I wanna go to Night Club. What clothes would you suggest I wear?


### Question 2

In [None]:
fashion_question_02 = "I need to meet my friends at the arcade. What clothes would you suggest I wear?"

In [None]:
prompt, answer = rag(df=df, question=fashion_question_02)
print(answer)

In [None]:
print(prompt)