# Custom Chatbot Project

### Dataset Choice and Explanation

**Dataset Chosen:** `nyc_food_scrap_drop_off_sites.csv`

**Why:**  
I chose the NYC food scrap drop-off sites dataset because it contains real-world information about compost drop-off locations in different boroughs of New York City. By customizing a chatbot on this dataset, users can get accurate, location-specific answers rather than general composting information. This improves the chatbot’s usefulness for real-world queries.


## Data Wrangling

In [1]:
import pandas as pd

In [2]:
import openai

In [3]:
df = pd.read_csv('nyc_food_scrap_drop_off_sites.csv')

In [4]:
print(df.columns)

Index(['Unnamed: 0', 'borough', 'ntaname', 'food_scrap_drop_off_site',
       'location', 'hosted_by', 'open_months', 'operation_day_hours',
       'website', 'borocd', 'councildist', 'latitude', 'longitude', 'precinct',
       'object_id', 'location_point', ':@computed_region_yeji_bk3q',
       ':@computed_region_92fq_4b7q', ':@computed_region_sbqj_enih',
       ':@computed_region_efsh_h5xi', ':@computed_region_f5dn_yrer', 'notes',
       'ct2010', 'bbl', 'bin'],
      dtype='object')


In [5]:
df['text'] = (
    df['food_scrap_drop_off_site'].astype(str) + " - " +
    df['location'].astype(str) + " - " +
    df['borough'].astype(str) + " - " +
    df['operation_day_hours'].astype(str) + " - " +
    df['open_months'].astype(str) + " - " +
    df['hosted_by'].astype(str) + " - " +
    df['notes'].astype(str)
)

In [6]:
df = df[['text']]

In [7]:
print(df.head())

                                                text
0  South Beach - 21 Robin Road, Staten Island NY ...
1  SE Corner of Broadway & Academy Street - nan -...
2  Old Stone House Brooklyn - 336 3rd St, Brookly...
3  SE Corner of Pleasant Avenue & E 116 Street - ...
4  Malcolm X FSDO - 111-26 Northern Blvd, Flushin...


In [8]:
print(f"Total rows: {len(df)}")

Total rows: 576


## Custom Query Completion

In [9]:
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = "YOUR API KEY"  

In [10]:
def find_relevant_text(question, df, limit=10):
    relevant_texts = []

    query = question.lower()

    for text in df['text']:
        if any(word in text.lower() for word in query.split()):
            relevant_texts.append(text)
        
        if len(relevant_texts) >= limit:
            break

    if not relevant_texts:
        return "No specific information found related to your query in the dataset."

    return "\n".join(relevant_texts)

In [11]:
def ask_basic_gpt(question):
    prompt = f"Answer the following question:\n\n{question}\n\nAnswer:"
    response = openai.Completion.create(
        engine="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=300,
        temperature=0.5
    )
    return response['choices'][0]['text']

In [12]:
def ask_custom_chatbot(question, df):
    relevant_text = find_relevant_text(question, df)

    prompt = f"""You are a helpful assistant. Use the following information to answer the question.

Relevant Information:
{relevant_text}

Question: {question}
Answer:"""

    response = openai.Completion.create(
        engine="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=300,
        temperature=0.5
    )

    return response['choices'][0]['text']

## Custom Performance Demonstration¶

#### Question 1

In [13]:
question1 = "Where can I drop off food scraps in Brooklyn?"

In [14]:
print("🔹 Basic GPT Answer to Question 1:")
print(ask_basic_gpt(question1))

🔹 Basic GPT Answer to Question 1:
 You can drop off food scraps at composting sites in Brooklyn, such as community gardens, farmers markets, and certain parks. You can also check with your local government or waste management agency for specific drop-off locations and guidelines.


In [15]:
print("\n🔹 Custom Chatbot Answer to Question 1:")
print(ask_custom_chatbot(question1, df))


🔹 Custom Chatbot Answer to Question 1:
 You can drop off food scraps at the Old Stone House Brooklyn located at 336 3rd St, Brooklyn, NY 11215.


#### Question 2

In [16]:
question2 = "What are the drop-off hours at Union Square?"

In [17]:
print("🔹 Basic GPT Answer to Question 2:")
print(ask_basic_gpt(question2))

🔹 Basic GPT Answer to Question 2:
 The drop-off hours at Union Square vary depending on the specific location and business. It is best to check with the specific business or location for their drop-off hours.


In [18]:
print("\n🔹 Custom Chatbot Answer to Question 2:")
print(ask_custom_chatbot(question2, df))


🔹 Custom Chatbot Answer to Question 2:
 The drop-off hours at Union Square are not specified.


## Conclusion

By customizing a chatbot using the NYC food scrap drop-off dataset, I was able to create a system that provides specific and accurate answers. The basic GPT model gave general advice, while the custom chatbot referenced real locations and details from the dataset. This demonstrates how fine-tuning prompts with relevant data significantly improves chatbot performance and user experience.
