# vector DB using Chroma + LangChain

## Install the Required Packages

In [6]:
%pip install langchain openai
%pip install langchain
%pip install -U langchain-community
%pip install sentence-transformers
%pip install faiss-cpu 

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp313-cp313-macosx_14_0_arm64.whl.metadata (4.8 kB)
Downloading faiss_cpu-1.11.0-cp313-cp313-macosx_14_0_arm64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.11.0
Note: you may need to restart the kernel to use updated packages.


##  set your OpenAI API key

In [8]:
# set OpenAI and chroma API key .env
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

## Load and Prepare JSON Data


In [9]:
import json

with open("/Users/raneem/Desktop/RimalAI/RimalAI_dataset_expanded.json", "r", encoding="utf-8") as f:
    data = json.load(f)

docs = []
metadatas = []
ids = []

for entry in data:
    # Concatenate relevant fields for embedding
    doc_text = f"{entry['name']} ({entry['type']}): {entry.get('description', '')} Vision 2030: {entry.get('vision2030', '')}"
    docs.append(doc_text)
    metadatas.append({"id": entry["id"], "type": entry["type"], "name": entry["name"]})
    ids.append(str(entry["id"]))


## Create Embeddings and FAISS Vector Store

In [10]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Use a sentence-transformers model for embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create the FAISS vector store
vectordb = FAISS.from_texts(
    texts=docs,
    embedding=embeddings,
    metadatas=metadatas
)

# Save the FAISS index for later use
vectordb.save_local("faiss_rimalai_db")
print("FAISS vector DB created and saved!")


FAISS vector DB created and saved!


## Query the Vector Database

In [14]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

vectordb = FAISS.load_local(
    "faiss_rimalai_db",
    embeddings,
    allow_dangerous_deserialization=True  
)

query = "ancient Saudi cities"
results = vectordb.similarity_search(query, k=3)

for doc in results:
    print("Content:", doc.page_content)
    print("Metadata:", doc.metadata)
    print("---")


Content: Al-Ula (landmark): Al-Ula is an ancient city located in northwestern Saudi Arabia, famous for its sandstone mountains, historic tombs, and rich Nabatean heritage. It has been a crossroads for ancient civilizations and a center of trade and culture. The city is home to significant archaeological sites like Mada'in Saleh, and its unique rock formations make it a prime location for tourists and historians alike. Vision 2030: Al-Ula is a centerpiece of Saudi Arabia's Vision 2030, aiming to transform the city into a world-class tourism destination while preserving its archaeological and cultural heritage. The city is also committed to sustainable tourism practices, ensuring that its natural beauty and historical value are maintained for future generations.
Metadata: {'id': 1, 'type': 'landmark', 'name': 'Al-Ula'}
---
Content: Neom (city): Neom is a planned city in northwestern Saudi Arabia, designed to be a hub for technological innovation, sustainable living, and tourism. It combi

## LangChain RetrievalQA ("gpt-4o")

In [16]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4o")  

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever()
)

response = qa_chain("Tell me about Vision 2030 projects in Saudi Arabia.")
print(response["result"])



Saudi Arabia's Vision 2030 is a strategic framework aimed at diversifying the country's economy and reducing its dependency on oil. Several key projects are part of this vision, focusing on cultural, technological, and sustainable development:

1. **Diriyah Gate Project**: This initiative aims to restore and develop Diriyah, the historic birthplace of the Saudi state, into a premier cultural and tourist destination. Known for its mud-brick architecture and UNESCO World Heritage status, Diriyah is a significant cultural landmark.

2. **Al-Ula**: Al-Ula is being transformed into a world-class tourism destination while preserving its archaeological and cultural heritage. The project highlights sustainable tourism practices, ensuring the protection of its natural beauty and historical sites like Mada'in Saleh.

3. **Neom**: Neom is a planned city that serves as a hub for technological innovation, sustainable living, and tourism. The city, located along the Red Sea coast, integrates advance

## LangChain RetrievalQA (gpt-4)

In [17]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4")  

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever()
)

response = qa_chain("Tell me about Vision 2030 projects in Saudi Arabia.")
print(response["result"])

Vision 2030 is a strategic framework launched by Saudi Arabia with the aim to diversify its economy, reduce its dependence on oil, and develop public service sectors such as health, education, infrastructure, recreation and tourism. There are several key projects under this initiative:

1. Diriyah Gate: This project aims to restore and develop Diriyah, the historic birthplace of the Saudi state, into a premier cultural and tourist destination. Known for its mud-brick architecture and UNESCO World Heritage status, the Diriyah Gate project is a key cultural initiative under Vision 2030.

2. Al-Ula Development: Al-Ula, an ancient city with rich Nabatean heritage, is another centerpiece of Vision 2030. The plan is to transform Al-Ula into a world-class tourism destination while preserving its archaeological and cultural heritage. The city is committed to sustainable tourism practices to ensure its natural beauty and historical value are maintained for future generations.

3. Neom City: Neo

## LangChain RetrievalQA (gpt-3.5-turbo)

In [18]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo")  

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever()
)

response = qa_chain("Tell me about Vision 2030 projects in Saudi Arabia.")
print(response["result"])

In Saudi Arabia, Vision 2030 is a strategic framework aimed at diversifying the economy, reducing dependency on oil, and transforming various sectors. Some of the key Vision 2030 projects include the Diriyah Gate project in Diriyah, the Al-Ula development project in Al-Ula, and the Neom project in northwestern Saudi Arabia. These projects focus on cultural preservation, tourism development, technological innovation, and sustainable living practices to help achieve the goals set out in Vision 2030. Additionally, the promotion of Saudi coffee (Gahwa) as part of intangible cultural heritage is another aspect of Vision 2030 aimed at enhancing cultural tourism.
