Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

Dataset Selection: NYC Food Scrap Drop-off Sites

For this custom chatbot project, I have selected the NYC Food Scrap Drop-off Sites dataset. This dataset includes comprehensive details about food scrap drop-off sites in New York City, such as locations, operating hours, and other pertinent information. With a minimum of 20 rows of text data, it is well-suited for the task at hand.

Use Case:

The custom chatbot will be developed to provide users with accurate and current information regarding food scrap drop-off sites in New York City. This will be particularly useful for individuals interested in composting and supporting a more sustainable urban environment. Leveraging this dataset, the chatbot will be able to answer questions about site locations, hours of operation, and other relevant details.

This customization will benefit NYC residents and businesses seeking to responsibly dispose of their food scraps, as well as tourists who wish to maintain eco-friendly practices during their visit. By offering precise and helpful information on food scrap drop-off sites, the chatbot can assist users in adopting sustainable habits and contribute to reducing overall waste in New York City.

Additionally, by providing this service, the chatbot promotes a culture of caring and environmental stewardship among its users. It encourages individuals to make conscious, eco-friendly decisions and fosters a community spirit centered on sustainability and responsibility. Through this initiative, the chatbot not only aids in practical waste disposal but also inspires a deeper commitment to caring for our planet.

In [11]:
import pandas as pd
import numpy as np
import openai
import tiktoken

In [12]:
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = "voc-20890367151266773689311674ba14d68bb69.29998898"

In [26]:
df = pd.read_csv('/home/goutham/GENAI/customchatbot/nyc_food_scrap_drop_off_sites.csv')

In [27]:
def get_embedding(text, model="text-embedding-ada-002"):
    try:
        response = openai.Embedding.create(
            input=[text],
            model=model
        )
        return response['data'][0]['embedding']
    except Exception as e:
        print(f"Embedding error: {e}")
        return None


In [28]:

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

In [29]:
def retrieve_context(query, df, top_k=3):
    # Check if required columns exist
    required_cols = ['borough', 'ntaname', 'food_scrap_drop_off_site', 'location', 'hosted_by', 'open_months', 'operation_day_hours', 'website', 'borocd', 'councildist', 'latitude', 'longitude', 'precinct', 'object_id', 'location_point', 'notes']
    if not all(col in df.columns for col in required_cols):
        print(f"Required columns not found in the dataset: {', '.join(required_cols)}")
        return pd.DataFrame()
    
    # Create embeddings for the entire dataset
    df['embedding'] = df['food_scrap_drop_off_site'].apply(get_embedding)
    
    # Get query embedding
    query_embedding = get_embedding(query)
    
    # Calculate similarities
    df['similarity'] = df['embedding'].apply(lambda x: cosine_similarity(x, query_embedding))
    
    # Return top k most similar rows
    return df.nlargest(top_k, 'similarity')[required_cols]

In [30]:
def food_scrap_chatbot(query, df):
    # Retrieve relevant context
    context = retrieve_context(query, df)
    
    # Prepare context string
    context_str = context.apply(lambda row: f"Borough: {row['borough']}\n"
                                            f"Neighborhood: {row['ntaname']}\n"
                                            f"Location: {row['location']}\n"
                                            f"Hosted By: {row['hosted_by']}\n"
                                            f"Open Months: {row['open_months']}\n"
                                            f"Hours: {row['operation_day_hours']}\n"
                                            f"Website: {row['website']}\n"
                                            f"Community District: {row['borocd']}\n"
                                            f"Council District: {row['councildist']}\n"
                                            f"Latitude: {row['latitude']}\n"
                                            f"Longitude: {row['longitude']}\n"
                                            f"Precinct: {row['precinct']}\n"
                                            f"Object ID: {row['object_id']}\n"
                                            f"Location Point: {row['location_point']}\n"
                                            f"Notes: {row['notes']}\n"
                                            f"Description: {row['food_scrap_drop_off_site']}", axis=1).str.cat(sep='\n\n')
    
    # Construct prompt with retrieved context
    messages = [
        {
            "role": "system",
            "content": """You are a helpful NYC Food Scrap Drop-Off Sites assistant.
            Provide detailed and accurate information about food scrap recycling locations in New York City
            based on the given context. If specific details are not available,
            explain what information you can provide."""
        },
        {
            "role": "user",
            "content": f"Context of Food Scrap Drop-Off Sites:\n{context_str}\n\nQuery: {query}"
        }
    ]
    
    # Generate response using GPT-3.5 Turbo
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",  # Explicitly specify GPT-3.5 Turbo
            messages=messages,
            max_tokens=500  # Increase response length limit
        )
        return response['choices'][0]['message']['content']
    except Exception as e:
        print(f"Error generating response: {e}")
        return "I'm sorry, but I couldn't generate a response at this time."

In [31]:
def demonstrate_chatbot():
    # Sample queries to test the chatbot
    queries = [
        "Where can I drop off food scraps in NYC?",
        "What are the hours for food scrap recycling locations?",
        "Tell me about the food scrap drop-off sites in New York City"
    ]
    
    for query in queries:
        print("\n--- Query: " + query + " ---")
        response = food_scrap_chatbot(query, df)
        print(response)

# Run demonstration
demonstrate_chatbot()


--- Query: Where can I drop off food scraps in NYC? ---
You can drop off food scraps for recycling in New York City at the following locations:

1. **Madison Square Park Food Scrap Drop-off**
   - **Borough:** Manhattan
   - **Neighborhood:** Midtown South-Flatiron-Union Square
   - **Location:** 23rd St & Broadway
   - **Hosted By:** GrowNYC
   - **Open Months:** Year Round
   - **Hours:** Wednesdays (Start Time: 8:00 AM - End Time: 1:00 PM)
   - **Website:** [grownyc.org/compost](http://grownyc.org/compost)
   - **Community District:** 105
   - **Council District:** 3
   - **Notes:** Not accepted: meat, bones, or dairy

2. **Flatbush Junction Food Scrap Drop-off**
   - **Borough:** Brooklyn
   - **Neighborhood:** Flatbush
   - **Location:** Hillel Pl & Flatbush Ave
   - **Hosted By:** GrowNYC
   - **Open Months:** Year Round
   - **Hours:** Fridays (Start Time: 8:30 AM - End Time: 2:30 PM)
   - **Website:** [grownyc.org/compost](http://grownyc.org/compost)
   - **Community District: