# Improve our Agent "Second edition"

## - Install the Required Packages -

In [1]:
%pip install sentence-transformers -q

Note: you may need to restart the kernel to use updated packages.


In [2]:
!pip install langchain -q
!pip install transformers -q
!pip install pinecone-client -q
!pip install faiss-cpu -q
!pip install SpeechRecognition -q
!pip install pyttsx3 -q
!pip install -q openai
!pip install -q -U langchain-community

In [3]:
%pip install -q openai==0.28
%pip install -q gTTS
%pip install -q langdetect

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install -q ipywebrtc
%pip install -q pvrecorder
%pip install -q sounddevice
%pip install -q gtts
%pip install -q openai


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


##  - set your OpenAI API key- 

In [5]:
# set OpenAI and chroma API key .env
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

## - Load and Prepare JSON Data -


In [7]:
import json

with open("RimalAI_dataset_expanded.json", "r", encoding="utf-8") as f:
    data = json.load(f)

docs = []
metadatas = []
ids = []

for entry in data:
    # Concatenate relevant fields for embedding
    doc_text = f"{entry['name']} ({entry['type']}): {entry.get('description', '')} Vision 2030: {entry.get('vision2030', '')}"
    docs.append(doc_text)
    metadatas.append({"id": entry["id"], "type": entry["type"], "name": entry["name"]})
    ids.append(str(entry["id"]))


## - Create Embeddings and FAISS Vector Store -

In [8]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Use a sentence-transformers model for embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create the FAISS vector store
vectordb = FAISS.from_texts(
    texts=docs,
    embedding=embeddings,
    metadatas=metadatas
)

# Save the FAISS index for later use
vectordb.save_local("faiss_rimalai_db")
print("FAISS vector DB created and saved!")


  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
  from .autonotebook import tqdm as notebook_tqdm


FAISS vector DB created and saved!


## - Query the Vector Database - 

In [9]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Custom model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


# Load the database
vectordb = FAISS.load_local("faiss_rimalai_db", embeddings, allow_dangerous_deserialization=True)

# query
query= "ancient Saudi cities"
results = vectordb.similarity_search(query, k=3)



# Print results
print("Results:")
for doc in results:
    print("Content:", doc.page_content)
    print("Metadata:", doc.metadata)
    print("---")

Results:
Content: Al-Ula (landmark): Al-Ula is an ancient city located in northwestern Saudi Arabia, famous for its sandstone mountains, historic tombs, and rich Nabatean heritage. It has been a crossroads for ancient civilizations and a center of trade and culture. The city is home to significant archaeological sites like Mada'in Saleh, and its unique rock formations make it a prime location for tourists and historians alike. Vision 2030: Al-Ula is a centerpiece of Saudi Arabia's Vision 2030, aiming to transform the city into a world-class tourism destination while preserving its archaeological and cultural heritage. The city is also committed to sustainable tourism practices, ensuring that its natural beauty and historical value are maintained for future generations.
Metadata: {'id': 1, 'type': 'landmark', 'name': 'Al-Ula'}
---
Content: Neom (city): Neom is a planned city in northwestern Saudi Arabia, designed to be a hub for technological innovation, sustainable living, and tourism.

## LangChain RetrievalQA ("gpt-4o") + Arabic Response

In [10]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

# Initialize the GPT-4 model
llm = ChatOpenAI(model_name="gpt-4o")

# Create the QA chain with retrieval
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever()
)

# English query
response = qa_chain("Tell me about Vision 2030 projects in Saudi Arabia.")
print("English Response:", response["result"])

# Arabic query
response = qa_chain("أخبرني عن مشاريع رؤية 2030 في السعودية.")
print("Arabic Response:", response["result"])

  llm = ChatOpenAI(model_name="gpt-4o")
  response = qa_chain("Tell me about Vision 2030 projects in Saudi Arabia.")


English Response: Vision 2030 is a strategic framework aimed at transforming Saudi Arabia's economy and society, focusing on diversification, sustainability, and modernization. Several key projects are part of this initiative:

1. **Diriyah Gate**: This project aims to restore and develop Diriyah, the historic birthplace of the Saudi state, into a premier cultural and tourist destination. It highlights the area's mud-brick architecture and its UNESCO World Heritage status.

2. **Al-Ula**: This ancient city is being transformed into a world-class tourism destination while preserving its rich archaeological and cultural heritage. The project emphasizes sustainable tourism practices to maintain the city's natural beauty and historical value.

3. **Neom**: A planned city designed to be a hub for technological innovation, sustainable living, and tourism. Neom combines advanced infrastructure with renewable energy, artificial intelligence, and environmental sustainability, positioning itself

## LangChain RetrievalQA (gpt-4) + Arabic Response

In [11]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

# Initialize the GPT-4 model
llm = ChatOpenAI(model_name="gpt-4")

# Create the QA chain with retrieval
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever()
)

# English query
response_english = qa_chain("Tell me about Vision 2030 projects in Saudi Arabia.")
print("English Response:", response_english["result"])

# Arabic query
response_arabic = qa_chain("أخبرني عن مشاريع رؤية 2030 في السعودية.")
print("Arabic Response:", response_arabic["result"])

English Response: Vision 2030 is a strategic framework for the future of Saudi Arabia, with a focus on diversifying the economy, promoting cultural tourism, and creating sustainable urban spaces. Here are some of the key projects under Vision 2030:

1. The Diriyah Gate project: This is a cultural initiative aiming to restore and develop Diriyah, the historic birthplace of the Saudi state. The project seeks to leverage the area's UNESCO World Heritage status and its unique mud-brick architecture to establish Diriyah as a leading cultural and tourist destination.

2. Al-Ula: Located in northwestern Saudi Arabia, Al-Ula is known for its rich Nabatean heritage and significant archaeological sites, including Mada'in Saleh. Under Vision 2030, there are plans to transform Al-Ula into a world-class tourism destination while preserving its archaeological and cultural heritage. The city is also committed to sustainable tourism practices.

3. Neom: A planned city in northwestern Saudi Arabia, Neo

## LangChain RetrievalQA (gpt-3.5-turbo) + Arabic Response

In [12]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

# Initialize the GPT-3.5-turbo model
llm = ChatOpenAI(model_name="gpt-3.5-turbo")

# Create the QA chain with retrieval
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever()
)

# English query
response_english = qa_chain("Tell me about Vision 2030 projects in Saudi Arabia.")
print("English Response:", response_english["result"])

# Arabic query
response_arabic = qa_chain("أخبرني عن مشاريع رؤية 2030 في السعودية.")
print("Arabic Response:", response_arabic["result"])

English Response: Saudi Arabia's Vision 2030 includes several key projects aimed at transforming the country in various aspects. Some of these projects include:

1. Diriyah Gate Project: This initiative focuses on restoring and developing Diriyah as a premier cultural and tourist destination, highlighting its historic significance and UNESCO World Heritage status.

2. Al-Ula Transformation: Al-Ula aims to become a world-class tourism destination while preserving its archaeological and cultural heritage. The city is committed to sustainable tourism practices to maintain its natural beauty and historical value.

3. Neom: Neom is a planned city designed to be a hub for technological innovation, sustainable living, and tourism. This project aims to diversify the economy, create sustainable urban spaces, and position Saudi Arabia as a leader in innovation and technology.

Additionally, Vision 2030 promotes Saudi coffee (Gahwa) as part of the country's intangible cultural heritage to enhance

In [13]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

# Initialize the GPT-4 model
llm = ChatOpenAI(model_name="gpt-4o")

# Create the QA chain with retrieval
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever()
)

# English query
response_english = qa_chain("Tell me about Al-Ula and give me a video link.")
print("English Response:", response_english["result"])

# Arabic query
response_arabic = qa_chain("أخبرني عن العلا وأعطني رابط فيديو.")
print("Arabic Response:", response_arabic["result"])

English Response: Al-Ula is an ancient city located in northwestern Saudi Arabia, known for its stunning sandstone mountains, historic tombs, and rich Nabatean heritage. It has served as a crossroads for ancient civilizations and a center for trade and culture. The city is home to significant archaeological sites, including Mada'in Saleh, and is renowned for its unique rock formations. Al-Ula is part of Saudi Arabia's Vision 2030, which aims to transform the city into a world-class tourism destination while preserving its archaeological and cultural heritage. The initiative also emphasizes sustainable tourism practices to maintain its natural beauty and historical significance for future generations.

I'm unable to provide video links, but you can find videos about Al-Ula by searching on platforms like YouTube for visual documentaries and travel guides related to this remarkable city.
Arabic Response: العلا هي مدينة تاريخية تقع في شمال غرب المملكة العربية السعودية، وتشتهر بجبالها الرمل

### Why did the model say "Unfortunately, I can't provide a video link"?

LLM Behavior:
The language model generates answers based on the text context it receives from your vector database. It does not automatically extract or highlight multimedia URLs unless the prompt or chain instructs it to.

Data in Vector Store:
Your dataset does contain video URLs for Al-Ula (e.g., "https://www.youtube.com/watch?v=u7cXaKYLyvs&t=315s"), but the vector store stores text embeddings of concatenated fields. The LLM sees the video URL as part of the text only if it was included in the chunk fed to it.

# fixed code with links to media provided in the prompt

In [15]:
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
import re

# Set up LLM and RetrievalQA
llm = ChatOpenAI(model_name="gpt-4o")
prompt = PromptTemplate(
    template="""
You are a helpful assistant. Use the context below to answer the question.

Context:
{context}

Question:
{question}
""",
    input_variables=["context", "question"]
)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(),
    chain_type_kwargs={"prompt": prompt}
)

# Helper function to detect language
def detect_language(text):
    # Simple heuristic: checks if the text contains Arabic characters
    if re.search(r'[\u0600-\u06FF]', text):
        return 'arabic'
    else:
        return 'english'

# Main query function: Show only the top entity's media links
def query_with_auto_entity(query):
    # Detect the language of the query
    language = detect_language(query)

    docs = vectordb.similarity_search(query, k=5)
    entity_doc = docs[0] if docs else None
    if not entity_doc:
        return "No relevant entry found."
    
    # Get the response from the model (supports Arabic as well)
    response = qa_chain.invoke(query)["result"]
    
    # Retrieve media links from the entity document
    media = entity_doc.metadata.get("media", {})
    video_link = media.get("videos")[0] if media.get("videos") else None
    image_link = media.get("images")[0] if media.get("images") else None
    audio_link = media.get("audio")[0] if media.get("audio") else None
    
    # Prepare the response
    if language == 'arabic':
        # Arabic response
        result = response + "\n\nهذه الروابط ذات الصلة:\n"
        if video_link:
            result += f"- فيديو: [شاهد على يوتيوب]({video_link})\n"
        if image_link:
            result += f"- صورة: ![صورة]({image_link})\n"
        if audio_link:
            result += f"- صوت: [استمع هنا]({audio_link})\n"
        return f"Arabic Query Result:\n{result}"
    
    else:
        # English response
        result = response + "\n\nHere are the relevant media links:\n"
        if video_link:
            result += f"- Video: [Watch on YouTube]({video_link})\n"
        if image_link:
            result += f"- Image: ![Image]({image_link})\n"
        if audio_link:
            result += f"- Audio: [Listen here]({audio_link})\n"
        return f"English Query Result:\n{result}"

# Example usage with two queries: one in English, one in Arabic

# English query
query_english = "Tell me about Al-Ula"
answer_english = query_with_auto_entity(query_english)
print(answer_english)

# Arabic query
query_arabic = "أخبرني عن العلا"
answer_arabic = query_with_auto_entity(query_arabic)
print("\n")
print(answer_arabic)

English Query Result:
Al-Ula is an ancient city located in northwestern Saudi Arabia, renowned for its stunning sandstone mountains, historic tombs, and rich Nabatean heritage. It has historically served as a crossroads for ancient civilizations and has been a center of trade and culture. The city is home to significant archaeological sites, including Mada'in Saleh, and boasts unique rock formations that attract tourists and historians alike.

Under Saudi Arabia's Vision 2030, Al-Ula is being transformed into a world-class tourism destination. The initiative aims to preserve the city's archaeological and cultural heritage while promoting sustainable tourism practices. This ensures that Al-Ula's natural beauty and historical significance are maintained for future generations, making it a key component in the country's broader strategy to diversify its economy and promote cultural tourism.

Here are the relevant media links:



Arabic Query Result:
العلا هي مدينة قديمة تقع في شمال غرب ال

In [16]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import langdetect

# 1. Set up the LLM and RetrievalQA chain
llm = ChatOpenAI(model_name="gpt-4o")

prompt = PromptTemplate(
    template="""
You are a helpful assistant. Use the context below to answer the question.

Context:
{context}

Question:
{question}
""",
    input_variables=["context", "question"]
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(),
    chain_type_kwargs={"prompt": prompt}
)

# 2. Helper function to get the first media link by type
def get_first_media_link(media_dict, media_type):
    return media_dict.get(media_type, [None])[0]

# 3. Detect if the query is in Arabic
def detect_language(text):
    try:
        lang = langdetect.detect(text)
        return 'arabic' if lang == 'ar' else 'english'
    except:
        return 'english'

# 4. Main query function: Retrieves answer and media
def query_with_auto_entity(query):
    docs = vectordb.similarity_search(query, k=5)
    entity_doc = docs[0] if docs else None
    if not entity_doc:
        return "No relevant entry found."
    
    response = qa_chain.invoke(query)["result"].strip()

    media = entity_doc.metadata.get("media", {})
    video_link = get_first_media_link(media, "videos")
    image_link = get_first_media_link(media, "images")
    audio_link = get_first_media_link(media, "audio")

    language = detect_language(query)

    if language == 'arabic':
        result = f"{response}\n\nإليك الروابط ذات الصلة:\n"
        if video_link:
            result += f"- فيديو: [شاهد على يوتيوب]({video_link})\n"
        if image_link:
            result += f"- صورة: ![صورة]({image_link})\n"
        if audio_link:
            result += f"- صوت: [استمع هنا]({audio_link})\n"
    else:
        result = f"{response}\n\nHere are the relevant links:\n"
        if video_link:
            result += f"- Video: [Watch on YouTube]({video_link})\n"
        if image_link:
            result += f"- Image: ![Image]({image_link})\n"
        if audio_link:
            result += f"- Audio: [Listen here]({audio_link})\n"

    return result

# 5. Example usage
if __name__ == "__main__":
    queries = [
        "What is Saudi Coffee (Gahwa)?",
        "ما هي القهوة السعودية؟"
    ]
    for query in queries:
        print(f"\nQuery: {query}\n")
        print(query_with_auto_entity(query))
        print("="*80)



Query: What is Saudi Coffee (Gahwa)?

Saudi Coffee (Gahwa) is a traditional beverage in Saudi Arabia that symbolizes hospitality. It is often served with dates during social gatherings. As part of the Vision 2030 initiative, Saudi coffee is promoted as a component of the country's intangible cultural heritage to enhance cultural tourism.

Here are the relevant links:


Query: ما هي القهوة السعودية؟

القهوة السعودية، أو "القهوا"، هي مشروب تقليدي يرمز إلى الضيافة في الثقافة السعودية. عادةً ما تُقدَّم القهوة السعودية مع التمر خلال التجمعات الاجتماعية. تعتبر جزءًا من التراث الثقافي غير المادي الذي يتم الترويج له لتعزيز السياحة الثقافية ضمن رؤية السعودية 2030.

إليك الروابط ذات الصلة:



## 1. Speech-to-Text Using OpenAI Whisper API

In [17]:
import openai
import os
from dotenv import load_dotenv

# Load environment variables from the .env file
load_dotenv()

# Function to transcribe audio using OpenAI's Whisper model
def transcribe_audio(file_path: str) -> str:
    # Open the audio file in binary mode
    with open(file_path, "rb") as audio_file:
        # Call the Whisper model to transcribe the audio
        transcript = openai.Audio.transcribe(
            model="whisper-1",
            file=audio_file
        )
    # Return the transcribed text
    return transcript['text']


In [19]:
# Usage English Audio
text = transcribe_audio("BedouinCulture.mp3")
print("Transcribed text:", text)

Transcribed text: Verbal poetry is the most popular art form in Bedouin culture. It was used not only as an art form, but also as a way to pass down stories through the generations, to convey information, and also to maintain social order in Bedouin society. Poets in Bedouin society were highly respected, and Bedouin poems included accounts of historical events, advice to children of the family, and accounts of battles. Traditionally, poems were recited around the campfire at night, and shared along with other stories. In ancient times, Bedouin couldn't read or write, so they relied on this type of verbal expression to pass on their traditions from one generation to the next.


In [None]:
# Usage Arabic Audio
text = transcribe_audio("ArabicAudio.m4a")
print("Transcribed text:", text)

Transcribed text: حكم سيوفك في رقاب العُذَّلي وإذا نزلت بدار ظلٍ فرحلي وإذا بليت بظالمٍ كن ظالمًا وإذا لقيت ذو الجهالة فجهلي وإذا الجبان نهاك يومك كريهةٍ خوفاً عليك من ازدحام الجحفلي فاعصي مقالته ولا تحفل بها واقدم إذا حق اللقى في الأول واختر لنفسك منزلاً تعلو به أو متكريماً تحت ظل القصطلي فالموت لا ينجيك من آفاته حصنٌ ولو شيدته بالجندلي موت الفتى في عزةٍ خير له من أن يبي تأسير طرفٍ أكحلي إن كنت في عدد العبيد فهمتي فوق الثريا والسماك الأعزلي أو أنكرت فرسان عبس النسبتي فسنان رمحي والحسام يقر لي وبذابلي ومهندي نلت العلا لا بالقرابة والعديد الأجزلي ورميت مهري في العجاج فخاضه والنار تقدح من شفار الأنصلي يا نازلين على الحما ودياره هل لا رأيتم في الديار تقلقلي طال عزكم وذلي في الهوى ومن العجاء بعزكم وتذللي لا تسقني ماء الحياة بذلة بل فاسقني بالعز كأس الحنظلي ماء الحياة بذلة كجهنم وجهنم بالعز أطيب منزلي لدعمنا قم بتسجيل الاعجاب بالفيديو والاشتراك في القناة مع مشاركة الفيديو


## 2. Text-to-Speech (TTS) Using ElevenLabs Voices

In [None]:
import requests
from langdetect import detect
import os

# ElevenLabs API key
ELEVEN_API_KEY = 

# Arabic voice ID from ElevenLabs
VOICE_ID_AR =
# English voice ID from ElevenLabs
VOICE_ID_EN = 

def text_to_speech_elevenlabs(text, output_file, lang="en"):
    """
    Converts text to speech using ElevenLabs API and saves it as an audio file.
    
    Parameters:
    - text: The text to convert to speech.
    - output_file: The file name to save the output audio.
    - lang: The language of the text ('ar' for Arabic, 'en' for English).
    """
    try:
        # Select URL based on language (Arabic or English)
        if lang == "ar":
            url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID_AR}"
        else:
            url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID_EN}"

        data = {
            "text": text,
            "model_id": "eleven_monolingual_v1",  # Use "eleven_multilingual_v1" for multilingual support
            "voice_settings": {
                "stability": 0.75,
                "similarity_boost": 0.75
            }
        }

        headers = {
            "xi-api-key": ELEVEN_API_KEY,
            "Content-Type": "application/json"
        }

        # Send the request to the API
        response = requests.post(url, headers=headers, json=data)

        # Save the resulting audio file
        if response.status_code == 200:
            with open(output_file, "wb") as f:
                f.write(response.content)
            print(f"Audio saved successfully as {output_file}")
        else:
            print(f"Request failed: {response.status_code} - {response.text}")
    except Exception as e:
        print(f"An error occurred during text-to-speech conversion: {e}")

def text_to_speech_elevenlabs_pipeline(text, output_file):
    """
    Detects the language of the text and converts it to speech accordingly.
    
    Parameters:
    - text: The text to be converted to speech.
    - output_file: The file name to save the output audio.
    """
    try:
        # Detect language ('en', 'ar', etc.)
        lang = detect(text)
    except Exception:
        lang = 'en'  # fallback to English if language detection fails

    print(f"Detected language: {lang}")
    
    # Convert text to speech using ElevenLabs
    text_to_speech_elevenlabs(text, output_file, lang)

def voice_query_pipeline(audio_file_path: str):
    """
    Handles a voice query: transcribes user speech, retrieves an answer, and converts it to speech.
    
    Parameters:
    - audio_file_path: Path to the input audio file containing the user's voice query.
    
    Returns:
    - output_audio: The path to the output audio file with the answer.
    """
    try:
        # 1. Transcribe user speech to text
        print("Transcribing user speech to text...")
        query_text = transcribe_audio(audio_file_path)
        print("User asked:", query_text)

        # 2. Get answer from LangChain RetrievalQA
        print("Fetching answer from LangChain RetrievalQA...")
        answer_text = query_with_auto_entity(query_text)
        print("Answer:", answer_text)

        # 3. Convert answer text to speech (using ElevenLabs for Arabic/English)
        output_audio = "answer.mp3"
        text_to_speech_elevenlabs_pipeline(answer_text, output_audio)

        # 4. Return or play output_audio as needed
        return output_audio

    except Exception as e:
        print(f"An error occurred during the voice query pipeline: {e}")
        return None

# Example usage
text_en = "Welcome to Al-Ula, a city of ancient wonders and modern vision."
text_ar = "مرحبًا بكم في العلا، مدينة العجائب القديمة والرؤية الحديثة."

# Convert text to speech
text_to_speech_elevenlabs_pipeline(text_en, "alula_en_elevenlabs.mp3")
text_to_speech_elevenlabs_pipeline(text_ar, "alula_ar_elevenlabs.mp3")

# Usage: Assuming the audio file path contains a valid user's question
audio_path = "user_question.wav"
output = voice_query_pipeline(audio_path)

if output:
    print(f"Audio output saved as {output}")
else:
    print("There was an error processing the query.")


Detected language: en
Request failed: 401 - {"detail":{"status":"quota_exceeded","message":"This request exceeds your quota of 10000. You have 0 credits remaining, while 63 credits are required for this request."}}
Detected language: ar
Request failed: 401 - {"detail":{"status":"quota_exceeded","message":"This request exceeds your quota of 10000. You have 0 credits remaining, while 59 credits are required for this request."}}
Transcribing user speech to text...
An error occurred during the voice query pipeline: [Errno 2] No such file or directory: 'user_question.wav'
There was an error processing the query.


In [21]:
def voice_query_pipeline(audio_file_path: str):
    # 1. Transcribe user speech to text
    query_text = transcribe_audio(audio_file_path)
    print("User asked:", query_text)

    # 2. Get answer from LangChain RetrievalQA
    answer_text = query_with_auto_entity(query_text)
    print("Answer:", answer_text)

    # 3. Convert answer text to speech using ElevenLabs (for Arabic/English)
    output_audio = "answer.mp3"
    text_to_speech_elevenlabs_pipeline(answer_text, output_audio)

    # 4. Return or play output_audio as needed
    return output_audio

In [None]:
# Usage
audio_path = "agent/user_question.wav"
voice_query_pipeline(audio_path)

User asked: هل يمكنك أن تخبرني بشيء أكثر عن كخوة؟
Answer: بالطبع! القهوة السعودية، أو كما تُعرف بـ "القهوة العربية" أو "القهَوة"، هي جزء مهم من الثقافة السعودية وتعتبر رمزًا للضيافة. تُقدم القهوة السعودية تقليديًا مع التمر في المناسبات الاجتماعية والزيارات العائلية والاحتفالات. تتميز القهوة السعودية بنكهتها الفريدة التي تُكتسب من خلال إضافة الهيل وأحيانًا الزعفران أو القرنفل أو الزنجبيل، مما يمنحها طعمًا مميزًا ومختلفًا عن أنواع القهوة الأخرى.

في إطار رؤية المملكة العربية السعودية 2030، يتم الترويج للقهوة السعودية كجزء من التراث الثقافي غير المادي بهدف تعزيز السياحة الثقافية. يُعتبر تقديم القهوة السعودية للضيوف تقليدًا عريقًا يعكس الكرم وحسن الضيافة في المجتمع السعودي.

إليك الروابط ذات الصلة:

Detected language: ar
Request failed: 401 - {"detail":{"status":"quota_exceeded","message":"This request exceeds your quota of 10000. You have 0 credits remaining, while 646 credits are required for this request."}}


'answer.mp3'

In [None]:
'''# Usage
audio_path = "D:\Project\Q2.WAV"
voice_query_pipeline(audio_path)'''

User asked: اخبرني عن العلم
Answer: يبدو أنك تسأل عن العلم في سياق رؤية 2030 أو تاريخ توحيد السعودية. إذا كنت تستفسر عن العلم في سياق رؤية 2030، فإنه يعكس جهود المملكة العربية السعودية في تعزيز التعليم والبحث العلمي كجزء من خطتها للتحول الوطني. تهدف رؤية 2030 إلى تحسين جودة التعليم وزيادة الاستثمار في البحوث العلمية والتطوير، وذلك لدعم الاقتصاد القائم على المعرفة.

أما إذا كنت تشير إلى العلم في سياق تاريخ توحيد المملكة، فإن التعليم قبل التوحيد كان محدودًا بسبب الظروف القاسية وقلة الموارد. ومع ذلك، بعد توحيد المملكة تحت قيادة الملك عبدالعزيز، بدأت الجهود لتحسين التعليم ونشر المعرفة كجزء من عملية بناء الدولة وتطويرها.

إليك الروابط ذات الصلة:

Detected language: ar
Audio saved successfully as answer.mp3


'answer.mp3'

## add recoding botton 

In [24]:
from pvrecorder import PvRecorder
import wave
import struct

def record_audio(output_file="user_question.wav", duration=5):
    recorder = PvRecorder(frame_length=512)
    recorder.start()
    frames = []
    
    # Record for 'duration' seconds
    for _ in range(int(duration * 16000 / 512)):  # 16000Hz / 512 samples-per-frame
        pcm_frames = recorder.read()
        # Convert 16-bit integer samples to bytes
        bytes_frames = [struct.pack('h', frame) for frame in pcm_frames]
        frames.extend(bytes_frames)
    
    recorder.stop()
    with wave.open(output_file, 'wb') as f:
        f.setnchannels(1)
        f.setsampwidth(2)  # 16-bit = 2 bytes
        f.setframerate(16000)
        f.writeframes(b''.join(frames))
    recorder.delete()

# Test recording
record_audio(duration=25)


## EXPERIMENT 1

In [None]:
from pvrecorder import PvRecorder
import wave
import struct
import openai
from langdetect import detect
import requests
from IPython.display import Audio 

# ElevenLabs API key and voice IDs
ELEVEN_API_KEY = 
VOICE_ID_AR = 
VOICE_ID_EN = 

# Function to convert text to speech using ElevenLabs API
def text_to_speech_elevenlabs(text, output_file, lang="en"):
    try:
        if lang == "ar":
            url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID_AR}"
        else:
            url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID_EN}"

        data = {
            "text": text,
            "model_id": "eleven_monolingual_v1",
            "voice_settings": {
                "stability": 0.75,
                "similarity_boost": 0.75
            }
        }

        headers = {
            "xi-api-key": ELEVEN_API_KEY,
            "Content-Type": "application/json"
        }

        response = requests.post(url, headers=headers, json=data)

        if response.status_code == 200:
            with open(output_file, "wb") as f:
                f.write(response.content)
            print(f"Audio saved successfully as {output_file}")
        else:
            print(f"Request failed: {response.status_code} - {response.text}")
    except Exception as e:
        print(f"An error occurred during text-to-speech conversion: {e}")

def record_audio(output_file="user_question.wav", duration=5):
    recorder = PvRecorder(frame_length=512)
    recorder.start()
    frames = []
    
    for _ in range(int(duration * 16000 / 512)):  
        pcm_frames = recorder.read()
        bytes_frames = [struct.pack('h', frame) for frame in pcm_frames]
        frames.extend(bytes_frames)
    
    recorder.stop()
    with wave.open(output_file, 'wb') as f:
        f.setnchannels(1)
        f.setsampwidth(2)
        f.setframerate(16000)
        f.writeframes(b''.join(frames))
    recorder.delete()

def voice_query_pipeline():
    # 1. Record audio
    record_audio() 
    
    # 2. Transcribe using Whisper
    audio_file = open("user_question.wav", "rb")
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
    print("You asked:", transcript['text'])
    
    # 3. Get LangChain answer
    answer = query_with_auto_entity(transcript['text'])
    print("Answer:", answer)
    
    # 4. Convert answer to speech using ElevenLabs
    output_audio = "answer.mp3"
    lang = detect(answer)  # Detect the language of the answer
    text_to_speech_elevenlabs(answer, output_audio, lang)
    
    # 5. Return audio player
    return Audio(output_audio)




In [27]:
# Run the full pipeline
voice_query_pipeline()

You asked: هل يمكنك أن تخبرني بمزيد عن الكهوة؟
Answer: بالطبع! الكهوة السعودية، أو القهوة العربية، هي مشروب تقليدي يرمز إلى الضيافة في الثقافة السعودية. تُقدم الكهوة غالبًا مع التمر خلال التجمعات الاجتماعية، وتُعتبر جزءًا من التراث الثقافي غير المادي في السعودية. تتميز الكهوة بطريقة تحضيرها الفريدة التي تشمل استخدام حبوب البن الخضراء والهيل، مما يمنحها نكهة مميزة. تحت رؤية 2030، يتم الترويج للكهوة السعودية كجزء من الجهود لتعزيز السياحة الثقافية، حيث تُعتبر رمزًا للضيافة والتقاليد السعودية الأصيلة.

إليك الروابط ذات الصلة:

Request failed: 401 - {"detail":{"status":"quota_exceeded","message":"This request exceeds your quota of 10000. You have 0 credits remaining, while 473 credits are required for this request."}}


ValueError: rate must be specified when data is a numpy array or list of audio samples.

## EXPERIMENT 2

In [28]:
from gtts import gTTS
from IPython.display import Audio
import openai


def voice_query_pipeline():
    # 1. Record audio
    record_audio() 
    
    # 2. Transcribe using Whisper
    audio_file = open("user_question.wav", "rb")
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
    print("You asked:", transcript['text'])
    
    # 3. Get LangChain answer
    answer = query_with_auto_entity(transcript['text'])
    print("Answer:", answer)
    
    # 4. Convert to speech
    tts = gTTS(answer)
    tts.save("answer.mp3")
    return Audio("answer.mp3")

# Run the full pipeline
voice_query_pipeline()



You asked: أخبرني عن الثقافة السعودية
Answer: تعتبر الثقافة السعودية غنية ومتنوعة، تتأثر بتاريخها العريق وموقعها الجغرافي. من الجوانب البارزة في الثقافة السعودية:

1. **الضيافة**: تُعرف السعودية بكرم الضيافة، ويعتبر تقديم القهوة السعودية (القهوة أو "القهوجة") مع التمر رمزاً مهماً للضيافة في المجتمع السعودي.

2. **التراث**: تمتلك المملكة تراثاً ثقافياً غنياً يشمل الموسيقى التقليدية، والرقصات الشعبية مثل العرضة السعودية، والحرف اليدوية كالخزف والنسيج.

3. **الأماكن التاريخية**: تضم السعودية مواقع تاريخية مهمة مثل العلا، التي تُعتبر ممرًا للحضارات القديمة وموطناً للآثار النبطية، ومدينة الدرعية، مهد الدولة السعودية الأولى.

4. **المبادرات الثقافية**: ضمن رؤية 2030، تهدف السعودية لتعزيز تراثها الثقافي وتطويره ليكون جزءاً من السياحة الثقافية، ومن ذلك تعزيز القهوة السعودية كموروث ثقافي غير مادي.

5. **التطور والحداثة**: رغم التمسك بالتراث، تسعى السعودية للتطور والحداثة من خلال مشاريع مثل مدينة نيوم، التي تمثل مزيجاً بين التكنولوجيا والابتكار والاستدامة.

تعكس الثقافة السعودية مزيجاً فريداً من

## EXPERIMENT 3

voice/text input, detects language, and responds in text/voice using the dataset and requirements

In [33]:
%pip install speechrecognition pyaudio langid gtts openai

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting langid
  Downloading langid-1.1.6.tar.gz (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: langid
  Building wheel for langid (pyproject.toml) ... [?25ldone
[?25h  Created wheel for langid: filename=langid-1.1.6-py3-none-any.whl size=1941216 sha256=1fbadec2b7a8ddf2c30b77ccd8846094065f1f6e6420b8695128073eb738f094
  Stored in directory: /Users/raneem/Library/Caches/pip/wheels/50/d7/8b/f20e951da531c61b96b87311b48dd1cc6fedaf1e37c581aaee
Successfully built langid
Installing collected packages: langid
Successfully installed langid-1.1.6
Note: you may need to restart the kernel to use updated packages.


In [35]:
from gtts import gTTS
from IPython.display import Audio, display
import openai
import speech_recognition as sr
import langid

def record_audio(duration=5, filename="user_question.wav"):
    """Record audio with automatic language detection"""
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Listening...")
        audio = r.record(source, duration=duration)
        with open(filename, "wb") as f:
            f.write(audio.get_wav_data())
        return filename

def detect_language(text):
    """Detect language using langid"""
    lang, confidence = langid.classify(text)
    return lang

def voice_query_pipeline():
    # Input selection
    input_method = input("Enter 'v' for voice or 't' for text: ").lower()
    
    # 1. Input capture
    if input_method == 'v':
        audio_file = record_audio()
        with open(audio_file, "rb") as f:
            transcript = openai.Audio.transcribe("whisper-1", f)
        user_text = transcript['text']
    else:
        user_text = input("Enter your question: ")
    
    print("\nYou asked:", user_text)
    
    # 2. Language detection
    lang = detect_language(user_text)
    print(f"Detected language: {lang}")
    
    # 3. Get LangChain answer
    answer = query_with_auto_entity(user_text)
    print("\nAnswer:", answer)
    
    # 4. Convert to speech
    tts = gTTS(answer, lang=lang if lang in ['en', 'fr', 'es','ar'] else 'en')  # Add more supported languages as needed
    tts.save("answer.mp3")
    
    # 5. Output methods
    output_method = input("\nEnter 'v' for voice or 't' for text output: ").lower()
    
    if output_method == 'v':
        return Audio("answer.mp3")
    else:
        return answer

# Run the full pipeline
result = voice_query_pipeline()
if isinstance(result, Audio):
    display(result)
else:
    print("\nText Answer:", result)


Listening...

You asked: Tell me about Gahwa.
Detected language: en

Answer: Saudi coffee, known as Gahwa, is a traditional beverage in Saudi Arabia that holds significant cultural importance. It is often served as a symbol of hospitality, particularly during social gatherings and special occasions. Gahwa is typically accompanied by dates, enhancing the overall experience of enjoying this traditional drink. In the context of Vision 2030, Gahwa is promoted as part of Saudi Arabia's intangible cultural heritage to enrich cultural tourism experiences for visitors.

Here are the relevant links:



In [37]:
# Run the full pipeline
result = voice_query_pipeline()
if isinstance(result, Audio):
    display(result)
else:
    print("\nText Answer:", result)

Listening...

You asked: ما هي القهوة؟
Detected language: ar

Answer: القهوة السعودية (القهوة العربية) هي مشروب تقليدي يرمز إلى الضيافة، وغالبًا ما يتم تقديمه مع التمور خلال التجمعات الاجتماعية.

Here are the relevant links:



# tools

### 1. Mapbox

In [59]:
load_dotenv()

MAPBOX_TOKEN = os.getenv("MAPBOX_TOKEN_KEY")
if not MAPBOX_TOKEN:
    raise ValueError("not found in environment variables")

In [61]:

response = requests.get(f"https://api.mapbox.com/styles/v1/mapbox/streets-v11?access_token={MAPBOX_TOKEN}")
print(response.status_code)  # Should print 200 if the token works

200


In [64]:
import os
import requests
from urllib.parse import quote_plus
from dotenv import load_dotenv
from IPython.display import Image, display

# Load environment variables
load_dotenv()

# Get Mapbox token
MAPBOX_TOKEN = os.getenv("MAPBOX_TOKEN_KEY")
if not MAPBOX_TOKEN:
    raise ValueError("MAPBOX_TOKEN_KEY not found in environment variables")

# Test if token works (should print 200)
test_response = requests.get(f"https://api.mapbox.com/styles/v1/mapbox/streets-v11?access_token={MAPBOX_TOKEN}")
print("Token Test Status Code:", test_response.status_code)  # Should print 200

# === Functions ===
def locate_place(search_query):
    """Forward geocode place name to coordinates using Mapbox API v5 (restricted to Saudi Arabia)"""
    base_url = "https://api.mapbox.com/geocoding/v5/mapbox.places/"
    query_encoded = quote_plus(search_query)
    url = f"{base_url}{query_encoded}.json"
    
    # Bounding box for Saudi Arabia: [minLon, minLat, maxLon, maxLat]
    saudi_bbox = [34.0, 16.0, 56.0, 32.0]  # Rough box around Saudi borders
    
    params = {
        "access_token": MAPBOX_TOKEN,
        "limit": 1,
        "types": "poi,place,address",
        "bbox": ",".join(map(str, saudi_bbox))  # restrict search to Saudi Arabia
    }
    response = requests.get(url, params=params)
    if response.status_code == 200:
        data = response.json()
        features = data.get("features", [])
        if features:
            feature = features[0]
            coords = feature["geometry"]["coordinates"]  # [lng, lat]
            place_name = feature.get("place_name", "Unknown place")
            return coords, place_name
    return None, None

def get_static_map(lnglat, zoom=14, size=(600, 400)):
    """Generate Mapbox Static Map URL with marker"""
    return (f"https://api.mapbox.com/styles/v1/mapbox/streets-v12/static/"
            f"pin-l-circle+ff0000({lnglat[0]},{lnglat[1]})/"
            f"{lnglat[0]},{lnglat[1]},{zoom}/{size[0]}x{size[1]}"
            f"?access_token={MAPBOX_TOKEN}")

def location_chatbot():
    while True:
        place = input("\nEnter place name (or 'quit' to exit): ")
        if place.lower() == "quit":
            break
        coords, name = locate_place(place)
        if coords:
            print(f"📍 Found: {name}")
            map_url = get_static_map(coords)
            display(Image(url=map_url))
        else:
            print("⚠️ Location not found. Please try again.")

# Run chatbot
location_chatbot()


Token Test Status Code: 200
📍 Found: Addiriyah, Riyadh, Saudi Arabia


📍 Found: Desert Spring Village Residence Street, Al Tanyah First, Dubai, Dubai, United Arab Emirates


📍 Found: طريق الخيل, Al Jadaf, Dubai, Dubai, United Arab Emirates


📍 Found: Al Rub' Al Khali Road, Alkhubar, Eastern, Saudi Arabia


In [44]:
pip install streamlit pydeck pandas

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting protobuf<6,>=3.20 (from streamlit)
  Using cached protobuf-5.29.4-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Using cached protobuf-5.29.4-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Installing collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 6.31.0rc1
    Uninstalling protobuf-6.31.0rc1:
      Successfully uninstalled protobuf-6.31.0rc1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.72.0rc1 requires protobuf<7.0dev,>=6.30.0, but you have protobuf 5.29.4 which is incompatible.[0m[31m
[0mSuccessfully installed protobuf-5.29.4
Note: you may need to restart the kernel to use updated packages.


In [76]:
import pydeck as pdk
import pandas as pd

# ✅ Set token globally (this avoids the TypeError)
pdk.settings.mapbox_api_key = MAPBOX_TOKEN

map_style = f"https://api.mapbox.com/styles/v1/mapbox/satellite-streets-v12?access_token={MAPBOX_TOKEN}"

# Example Saudi landmark
icon_data = pd.DataFrame({
    'lat': [26.6912],
    'lon': [37.9236],
    'name': ['Al-Ula (Hegra)'],
    'icon': ['marker']
})

icon_layer = pdk.Layer(
    'IconLayer',
    icon_data,
    get_icon='icon',
    get_size=4,
    size_scale=15,
    get_position='[lon, lat]',
    pickable=True,
    icon_atlas="https://raw.githubusercontent.com/visgl/deck.gl-data/master/website/icon-atlas.png",
    icon_mapping={
        "marker": {"x": 0, "y": 0, "width": 128, "height": 128, "anchorY": 128}
    }
)

r = pdk.Deck(
    layers=[icon_layer],
    initial_view_state=view_state,
    map_style=map_style,  # ✅ correct HTTP style URL,
    tooltip={"text": "{name}"}
)

r.show()