## Objective
* To build an AI-powered chatbot that answers user queries about Changi Airport and Jewel Changi Airport by scraping their website content, embedding the information, and retrieving it using a vector database.

## Goal
* Extract and process relevant content from the official websites.
* Store content embeddings in a Pinecone vector database for efficient retrieval.
* Develop an interactive chatbot using LangChain and OpenAI GPT to provide accurate responses to user queries. 

In [None]:
# Install if neccessary 
'''
!pip install langchain-openai
!pip install openai
!pip install tiktoken
!pip install pinecone-client
!pip install beautifulsoup4 requests
!pip install -U langchain-community
!pip install langchain-openai pinecone-client streamlit'''

In [None]:
# Imports libraries
import os
import requests
from bs4 import BeautifulSoup
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Pinecone
from langchain.prompts import ChatPromptTemplate
from langchain.chains import RetrievalQA
import pinecone

In [None]:
# 1. Set API Keys
os.environ["OPENAI_API_KEY"] = "Enter your API key"  # Replace with your OpenAI API Key
PINECONE_API_KEY = ""                                # Replace with your Pinecone API Key
PINECONE_ENVIRONMENT = "us-west1-gcp"                # Replace with your Pinecone Environment

In [None]:
# 2. Scrape Website Content
def scrape_website(url):
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, "html.parser")
        return " ".join([p.text for p in soup.find_all("p")])  # Extract text from <p> tags
    else:
        print(f"Failed to retrieve {url}")
        return ""

# URLs to scrape
changi_airport_url = "https://www.changiairport.com/en.html"
jewel_changi_url = "https://www.jewelchangiairport.com/"

# Scrape content
print("Scraping content...")
changi_content = scrape_website(changi_airport_url)
jewel_content = scrape_website(jewel_changi_url)
combined_content = changi_content + jewel_content
print("Scraping completed!")

In [None]:
# 3. Embed Content and Store in Pinecone
print("Initializing Pinecone...")
pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENVIRONMENT)

index_name = "changi-airport-chatbot"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(index_name, dimension=1536)  # OpenAI embeddings dimension

# Load OpenAI Embeddings
embeddings = OpenAIEmbeddings()

# Connect to Pinecone Vector Store
vector_store = Pinecone(index_name, embeddings)

# Split content into chunks (LangChain text splitter)
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_text(combined_content)

# Add text chunks to Pinecone
print("Adding embeddings to Pinecone...")
vector_store.add_texts(texts)
print("Embeddings added successfully!")

In [None]:
# 4. Create the Chatbot (LangChain RetrievalQA)
print("Initializing the chatbot...")
llm = ChatOpenAI(model="gpt-3.5-turbo")

# Prompt Template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful chatbot. Use the Changi Airport and Jewel Changi Airport data to answer queries."),
    ("user", "Question: {question}")
])

# Retrieval QA Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vector_store.as_retriever(),
    chain_type_kwargs={"prompt": prompt}
)

In [None]:
# 5. Query the Chatbot
while True:
    query = input("\nAsk me a question (or type 'exit' to quit): ")
    if query.lower() == "exit":
        print("Goodbye!")
        break
    response = qa_chain.run(query)
    print("\nResponse:", response)

## Result
* Successfully created an AI chatbot capable of answering questions related to Changi Airport and Jewel Changi Airport.
* Content from both websites is embedded, stored, and retrieved effectively using Pinecone.
* Users can interact with the chatbot in a user-friendly way to get instant and accurate responses. 

## Description
* The Changi Airport Chatbot project combines web scraping, vector embeddings, and retrieval-augmented generation to build an intelligent Q&A system:

1. Content Scraping: Data is extracted from the Changi Airport and Jewel Changi websites using BeautifulSoup.
2. Text Processing: Content is split into smaller chunks for embedding using OpenAI's text-embedding-ada-002.
3. Pinecone Vector Database: The embeddings are stored in Pinecone for quick and relevant retrieval.
4. LangChain Integration: LangChain's RetrievalQA is used to query the vector database and generate responses via GPT-3.5-turbo.
5. Interactive Chat: A simple command-line interface allows users to ask questions, retrieve data, and get relevant answers. 

## About the Project
The Changi Airport Chatbot is a proof-of-concept AI system that enhances user experience by providing detailed information about one of the world's busiest airports, Changi Airport, and its attractions like Jewel Changi Airport. The project uses modern AI tools such as LangChain, OpenAI GPT, and Pinecone to deliver quick, reliable, and accurate information.

This chatbot showcases:

* Integration of web scraping for content gathering.
* Use of embeddings and a vector database for efficient search and retrieval.
* AI-powered responses using GPT for natural language understanding and generation. 