### Dataset and Scenario Explanation

##### Dataset Selection:
I chose the nyc_food_scrap_drop_off_sites.csv dataset, which contains detailed information about NYC food scrap drop-off locations, including addresses, operating hours, and specific notes.

##### Why This Dataset?
It provides specific and localized information not included in the model's original training data. While OpenAI models can give general information about recycling, they lack detailed, up-to-date knowledge of NYC-specific locations, operating hours, and restrictions for food scrap drop-off. Incorporating this dataset enables the chatbot to provide accurate responses, making it highly useful for NYC residents.

In [56]:
import pandas as pd
from IPython.display import display 
# import openai
import openai
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

#### Preprocess the dataset 

In [16]:
# Load the dataset
file_path = 'Food_Scrap_Drop-Off_Locations_in_NYC.csv'  # Update with the correct file path
data = pd.read_csv(file_path)

# Combine relevant columns into a single 'text' column
data['text'] = (
    data['Borough'] + ", " +
    data['NTAName'] + ": Drop-off site at " +
    data['SiteName'] + " (" +
    data['SiteAddr'] + "). Open " +
    data['Day_Hours'] + ". Note: " +
    data['Notes'].fillna("No additional notes.")
)

# Remove rows where 'text' is empty or null
data = data.dropna(subset=['text'])
data = data[data['text'].str.strip() != ""]  # Remove rows with only whitespace

# Keep only the 'text' column
data = data[['text']]

# Save the processed dataset (optional)
processed_file_path = 'Processed_food_scrap_sites.csv'
data.to_csv(processed_file_path, index=False)

In [26]:
display(data.head(10))  # Show the first 10 rows as a clean table

Unnamed: 0,text
0,"Brooklyn, Bay Ridge: Drop-off site at 4th Aven..."
1,"Manhattan, East Midtown-Turtle Bay: Drop-off s..."
2,"Manhattan, Hell's Kitchen: Drop-off site at Hu..."
3,"Manhattan, East Midtown-Turtle Bay: Drop-off s..."
4,"Manhattan, Tribeca-Civic Center: Drop-off site..."
5,"Staten Island, St. George-New Brighton: Drop-o..."
6,"Manhattan, Manhattanville-West Harlem: Drop-of..."
7,"Brooklyn, Crown Heights (North): Drop-off site..."
8,"Brooklyn, Prospect Park: Drop-off site at Nurt..."
9,"Brooklyn, Bushwick (West): Drop-off site at BK..."


#### 2. Load the Processed Dataset

In [76]:
processed_file_path = 'Processed_food_scrap_sites.csv'
data = pd.read_csv(processed_file_path)
# Convert the 'text' column into a list for querying
knowledge_base = data['text'].tolist()[:25]

#### 3. Set Up the OpenAI API

In [84]:
openai.api_key ="OPENAI_API_KE"

#### 4. Create the Custom Query Function

In [86]:
def custom_chatbot(query, knowledge_base):
    # Combine the dataset into a single context
    context = "\n".join(knowledge_base)

    # Define the chatbot's message
    messages = [
        {"role": "system", "content": "You are a helpful assistant knowledgeable about NYC food scrap drop-off sites."},
        {"role": "user", "content": f"Using the following information:\n\n{context}\n\nQuestion: {query}"}
    ]

    # Call the OpenAI API with reduced max_tokens
    response = openai.ChatCompletion.create(
        model="gpt-4o-mini",  # Ensure this matches your available model
        messages=messages,
        max_tokens=50,  # Reduced token usage
        temperature=0.7
    )

    # Return the assistant's response
    return response["choices"][0]["message"]["content"].strip()

In [88]:
# Basic chatbot function without using the dataset
def basic_chatbot(query):
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},  # No dataset context
        {"role": "user", "content": query}
    ]
    
    response = openai.ChatCompletion.create(
        model="gpt-4o-mini",  # Ensure this matches your available model
        messages=messages,
        max_tokens=50,  # Keep token usage low
        temperature=0.7
    )
    
    return response["choices"][0]["message"]["content"].strip()

#### 5. Write Questions to Demonstrate Custom Performance

In [90]:
question_1 = "What are the operating hours for food scrap drop-off sites in Manhattan?"
custom_response_1 = custom_chatbot(question_1, knowledge_base)
basic_response_1 = basic_chatbot(question_1)

In [92]:
print(f"Question 1: {question_1}")

Question 1: What are the operating hours for food scrap drop-off sites in Manhattan?


In [94]:
print(f"Custom Answer: {custom_response_1}")

Custom Answer: Here are the operating hours for food scrap drop-off sites in Manhattan:

1. **East Midtown-Turtle Bay** (Dag Hammarskjold Plaza Greenmarket)
   - **Open:** Wednesday
   - **Hours:** 8:00 AM


In [96]:
print(f"Basic Answer: {basic_response_1}\n")

Basic Answer: As of my last update, food scrap drop-off sites in Manhattan typically operate during specific hours, often on designated days of the week. However, the exact hours can vary by location and may change over time.

To get the most accurate and up-to



In [98]:
# Test Question 2
question_2 = "Are there any food scrap drop-off sites open on Sunday in Brooklyn?"
custom_response_2 = custom_chatbot(question_2, knowledge_base)
basic_response_2 = basic_chatbot(question_2)

In [100]:
print(f"Question 2: {question_2}")

Question 2: Are there any food scrap drop-off sites open on Sunday in Brooklyn?


In [102]:
print(f"Custom Answer: {custom_response_2}")

Custom Answer: Yes, there are food scrap drop-off sites open on Sunday in Brooklyn:

1. **Crown Heights (North)**: 1100 Bergen Street Community Garden (1107 Bergen Street, Brooklyn, NY 11216). Open from 10:


In [104]:
print(f"Basic Answer: {basic_response_2}\n")

Basic Answer: Yes, Brooklyn has several food scrap drop-off sites that are typically open on Sundays. However, the specific locations and hours can vary. As of my last update, you might want to check the New York City Department of Sanitation's website or the

