# Overview

## Tools
## PDF Search (rag_tool)
This tool **searches a provided PDF** for relevant information.  
For example, if you ask about **trading suspension procedures**, it will find the part of the document that explains it.

---

## Web Search (web_search_tool)
This tool **searches the web** based on your query.  
For example, if you ask about **"Navid.sa"**, it will find official sites and LinkedIn profiles.

---

## Router Tool (router_tool)
The **Router Tool** determines whether to use **PDF search** or **web search** based on your question, ensuring the correct tool is used for the task.

---

## Grading Agents
These agents filter and grade the responses to ensure quality and relevance:

- **Retriever Agent**: Finds relevant information from a database.
- **Grader Agent**: Checks if the information retrieved is relevant.
- **Hallucination Grader**: Ensures the answer is based on actual facts.
- **Answer Grader**: Verifies the answer is accurate and aligned with the question.

---


# Installs

In [63]:
!pip install crewai==0.76.9 crewai_tools==0.13.4 langchain_community==0.3.5



In [64]:
#!pip install CrewAI crewai_tools langchain_community

In [65]:
#!pip install pandas openpyxl

In [66]:
!pip install langchain-groq



In [67]:
# Warning control
import warnings
warnings.filterwarnings('ignore')

In [68]:
!pip install langchain_huggingface



In [69]:
import json
from crewai import Agent, Task, Crew, Process
from crewai_tools  import tool
import os
from crewai_tools import PDFSearchTool
from langchain_community.tools.tavily_search import TavilySearchResults

In [70]:
import os
from google.colab import userdata


# os.environ["GROQ_API_KEY"] = userdata.get('GROK_LLAMA3')
os.environ["SERPER_API_KEY"] = userdata.get('SERPER_API_KEY')
os.environ["OPENAI_API_KEY"] = "placeholder_key"



from google.colab import userdata
openai_api_key = userdata.get('OPENAI_API_KEY')
serper_api_key = userdata.get('SERPER_API_KEY')


os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_MODEL_NAME"] = 'gpt-4o-mini'
os.environ["SERPER_API_KEY"] = userdata.get('SERPER_API_KEY')


# Tools

In [71]:
# @title PDF Read Tool

rag_tool = PDFSearchTool(
    pdf='/content/Arabic_ RAG sampels Dataset.pdf',
    config=dict(
        llm=dict(
            provider="openai",  # Using OpenAI instead of Groq
            config=dict(
                model=os.environ.get("gpt-4o-mini", "gpt-4"),  # Fetching model name from environment variable
                temperature=0.5,  # Optional: Adjust for response variability
                top_p=1,  # Optional: Adjust for nucleus sampling
                stream=True,  # Optional: Enables streaming responses
            ),
        ),
        embedder=dict(
            provider="huggingface",  # Keeping Hugging Face for embeddings
            config=dict(
                model="BAAI/bge-small-en-v1.5",  # Specify the embedding model
                # task_type="retrieval_document",  # Optional, uncomment if required
                # title="Embeddings",  # Optional, uncomment if required
            ),
        ),
    )
)

In [72]:
# @title TAVILY for internet sarch
os.environ['TAVILY_API_KEY'] = userdata.get('TAVILY_API_KEY')
web_search_tool = TavilySearchResults(k=3)

In [73]:
rag_tool.run("hi what this document about")

Using Tool: Search a PDF's content


'Relevant Content:\nفترة التداول، وذلك حسب الإجراء الآتي: 1. يتقدم المصدر من خلال ممثليه المعينين أمام السوق بطلب تعليق التداول المؤقت لأوراقه المالية المدرجة قبل نصف ساعة على الأقل من وقت بداية التعليق الذي سيحدده المصدر، ويجب أن يحتوي الطلب على الآتي: ○ وقت بداية التعليق ○ مدة التعليق ○ مبررات طلب التعليق\n\nالمؤقت بناء على طلب المصدر 2. عند عدم التزام المصدر بالمواعيد المحددة للإفصاح عن معلوماته المالية الدورية وفق اللوائح التنفيذية ذات العلاقة 3. عند تضمن تقرير مراجع الحسابات على القوائم المالية للمصدر رأي معارض أو امتناع عن إبداء الرأي 4. عند صدور قرار عن الجمعية العامة غير العادية للمصدر بتخفيض رأس ماله وذلك ليومي التداول التاليين لصدور القرار ب. لا تخلّ هذه الإجراءات بالأحكام الواردة في نظام السوق المالية ولوائحه التنفيذية وقواعد السوق والأنظمة الأخرى ذات العلاقة. ج. لا يحول اتخاذ أي إجراء وارد في هذه الإجراءات دون إيقاع الجزاءات المقررة على المصدر في حال خالف أي من أحكام نظام السوق المالية ولوائحه التنفيذية وقواعد السوق. إجراءات تعليق تداول الأوراق المالية المدرجة أولاً: تعليق 

In [74]:
web_search_tool.run("what is Navid.sa")

[{'url': 'https://navid.sa/',
  'content': 'Navid specializes in providing end-to-end data center solutions to cater to the burgeoning demands of AI and computational workloads. We understand the critical role that robust infrastructure plays in powering AI-driven applications, and thus, we are committed to delivering cutting-edge solutions. Our solutions are designed to optimize'},
 {'url': 'https://navid.sa/satellite-technology-solutions/',
  'content': "At Navid, we're at the forefront of harnessing the power of Low Earth Orbit (LEO) satellites to revolutionize communication, monitoring, and data analysis. Our satellite technology opens up new horizons for global connectivity, detailed Earth observation, and advanced data analysis, all designed to meet the demands of tomorrow."},
 {'url': 'https://navid.sa/about/',
  'content': 'Our Vision. Our vision with Navid is to cultivate broad AI expertise that extends beyond foundational areas and ventures into cutting-edge fields. In domain

In [75]:
web_search_tool.run("who is Hamza Shahid whose worked in Navid.SA")


[{'url': 'https://www.facebook.com/public/Hamza-Shahid/',
  'content': 'Hamza Shahid. See Photos. Hamza Shahid (حمزہ شاھد) See Photos. View the profiles of people named Hamza Shahid. Join Facebook to connect with Hamza Shahid and others you may know.'},
 {'url': 'https://www.linkedin.com/in/hamzashahid',
  'content': 'Making Things Work San Francisco, CA. Connect Sean Dillon New York City Metropolitan Area ... Others named Hamza Shahid in United States. Hamza Shahid Edison, NJ. Hamza Shahid'},
 {'url': 'https://navid.sa/',
  'content': "At Navid, we're at the forefront of harnessing the power of Low Earth Orbit (LEO) satellites to revolutionize communication, monitoring, and data analysis. Our satellite technology opens up new horizons for global connectivity, detailed Earth observation, and advanced data analysis, all designed to meet the demands of tomorrow. How it work."},
 {'url': 'https://www.facebook.com/public/Navid-Hamza/',
  'content': 'View the profiles of people named Navid 

In [76]:
web_search_tool.run("what is Mullhem that  in Navid.SA")

[{'url': 'http://navid.sa/mulhem/',
  'content': 'Mulhem. Navid is proud to announce the launch of Mulhem, a groundbreaking artificial intelligence model that marks a significant milestone in the field of AI within the Kingdom of Saudi Arabia.'},
 {'url': 'https://www.linkedin.com/posts/navidco_ملهم-نفيد-mulhem-activity-7247609183062085632-jltI',
  'content': 'For more information: 🌐 navid.sa/contact #LLM #AI #ArtificialIntelligence #AIForBusiness #GenerativeAI #NLP ... Mulheim Rag offers unique features like bilingual capabilities, cultural'},
 {'url': 'https://navid.sa/',
  'content': 'Navid, AI & Infrastructure Solution Provider. Examine the Potential of Infrastructure Connectivity Intelligence Performance Innovation Resilience Security Strategy Insight Ethics With Navid . At Navid, we believe that a key differentiator in an intelligent organization is combining human power with advanced data management and automation technology.'},
 {'url': 'https://navid.sa/about/',
  'content': '

In [77]:
@tool
def router_tool(question):
    """Comprehensive Router Function."""

    # Check if the question contains specific keywords related to vectorstore or document search
    if 'vectorstore' in question.lower() or 'embedding' in question.lower():
        return 'vectorstore'
    elif 'pdf' in question.lower() or 'document' in question.lower() or 'rag' in question.lower():
        return 'pdf_search'
    else:
        return 'web_search'


# Agents

In [78]:
# @title Router_Agent

Router_Agent = Agent(
  role='Router',
  goal='Route user question to a vectorstore or web search',
  backstory=(
    "You are an expert at routing a user question to a vectorstore or web search."
    "Use the vectorstore for questions on concept related to Retrieval-Augmented Generation."
    "You do not need to be stringent with the keywords in the question related to these topics. Otherwise, use web-search."
  ),
  verbose=True,
  allow_delegation=False
)


In [79]:
# @title Retriever Agent
Retriever_Agent = Agent(
role="Retriever",
goal="Use the information retrieved from the vectorstore to answer the question",
backstory=(
    "You are an assistant for question-answering tasks."
    "Use the information present in the retrieved context to answer the question."
    "You have to provide a clear concise answer."
),
verbose=True,
allow_delegation=False
)

In [80]:
# @title Grader agent
Grader_agent =  Agent(
  role='Answer Grader',
  goal='Filter out erroneous retrievals',
  backstory=(
    "You are a grader assessing relevance of a retrieved document to a user question."
    "If the document contains keywords related to the user question, grade it as relevant."
    "It does not need to be a stringent test.You have to make sure that the answer is relevant to the question."
  ),
  verbose=True,
  allow_delegation=False
)

In [81]:
# @title Hallucination grader
hallucination_grader = Agent(
    role="Hallucination Grader",
    goal="Filter out hallucination",
    backstory=(
        "You are a hallucination grader assessing whether an answer is grounded in / supported by a set of facts."
        "Make sure you meticulously review the answer and check if the response provided is in alignmnet with the question asked"
    ),
    verbose=True,
    allow_delegation=False)


In [82]:
# @title Answer Grader
answer_grader = Agent(
    role="Answer Grader",
    goal="Filter out hallucination from the answer.",
    backstory=(
        "You are a grader assessing whether an answer is useful to resolve a question."
        "Make sure you meticulously review the answer and check if it makes sense for the question asked"
        "If the answer is relevant generate a clear and concise response."
        "If the answer gnerated is not relevant then perform a websearch using 'web_search_tool'"
    ),
    verbose=True,
    allow_delegation=False
)

# Tasks

In [83]:
# @title Router task
router_task = Task(
    description=("Analyse the keywords in the question {question}"
    "Based on the keywords decide whether it is eligible for a vectorstore search or a web search."
    "Return a single word 'vectorstore' if it is eligible for vectorstore search."
    "Return a single word 'websearch' if it is eligible for web search."
    "Do not provide any other premable or explaination."
    ),
    expected_output=("Give a binary choice 'websearch' or 'vectorstore' based on the question"
    "Do not provide any other premable or explaination."),
    agent=Router_Agent,
    tools=[router_tool],
)

In [84]:
# @title Retriever Task

retriever_task = Task(
    description=("Based on the response from the router task extract information for the question {question} with the help of the respective tool."
    "Use the web_serach_tool to retrieve information from the web in case the router task output is 'websearch'."
    "Use the rag_tool to retrieve information from the vectorstore in case the router task output is 'vectorstore'."
    ),
    expected_output=("You should analyse the output of the 'router_task'"
    "If the response is 'websearch' then use the web_search_tool to retrieve information from the web."
    "If the response is 'vectorstore' then use the rag_tool to retrieve information from the vectorstore."
    "Return a claer and consise text as response."),
    agent=Retriever_Agent,
    context=[router_task],
  #  tools=[retriever_tool],
)

In [85]:
# @title Grader Task
grader_task = Task(
    description=("Based on the response from the retriever task for the quetion {question} evaluate whether the retrieved content is relevant to the question."
    ),
    expected_output=("Binary score 'yes' or 'no' score to indicate whether the document is relevant to the question"
    "You must answer 'yes' if the response from the 'retriever_task' is in alignment with the question asked."
    "You must answer 'no' if the response from the 'retriever_task' is not in alignment with the question asked."
    "Do not provide any preamble or explanations except for 'yes' or 'no'."),
    agent=Grader_agent,
    context=[retriever_task],
)

In [86]:
# @title Hallucination_task

hallucination_task = Task(
    description=("Based on the response from the grader task for the quetion {question} evaluate whether the answer is grounded in / supported by a set of facts."),
    expected_output=("Binary score 'yes' or 'no' score to indicate whether the answer is sync with the question asked"
    "Respond 'yes' if the answer is in useful and contains fact about the question asked."
    "Respond 'no' if the answer is not useful and does not contains fact about the question asked."
    "Do not provide any preamble or explanations except for 'yes' or 'no'."),
    agent=hallucination_grader,
    context=[grader_task],
)


In [87]:
# @title Answer Task

answer_task = Task(
    description=("Based on the response from the hallucination task for the quetion {question} evaluate whether the answer is useful to resolve the question."
    "If the answer is 'yes' return a clear and concise answer."
    "If the answer is 'no' then perform a 'websearch' and return the response"),
    expected_output=("Return a clear and concise response if the response from 'hallucination_task' is 'yes'."
    "Perform a web search using 'web_search_tool' and return ta clear and concise response only if the response from 'hallucination_task' is 'no'."
    "Otherwise respond as 'Sorry! unable to find a valid response'."),
    context=[hallucination_task],
    agent=answer_grader,
    #tools=[answer_grader_tool],
)

# Crew

In [88]:
rag_crew = Crew(
    agents=[Router_Agent, Retriever_Agent, Grader_agent, hallucination_grader, answer_grader],
    tasks=[router_task, retriever_task, grader_task, hallucination_task, answer_task],
    verbose=True,

)



In [89]:
# @title Input
inputs ={"question":"في حالة عدم تمكن المصدر من نشر معلوماته المالية الدورية خلال المهلة المحددة في اللوائح التنفيذية، ما هي الإجراءات التي تتبعها السوق المالية السعودية (تداول)؟ واذكر بالتفصيل التوقيتات المحددة لكل خطوة."}


In [90]:
# @title Result
result = rag_crew.kickoff(inputs=inputs)

[1m[95m# Agent:[00m [1m[92mRouter[00m
[95m## Task:[00m [92mAnalyse the keywords in the question في حالة عدم تمكن المصدر من نشر معلوماته المالية الدورية خلال المهلة المحددة في اللوائح التنفيذية، ما هي الإجراءات التي تتبعها السوق المالية السعودية (تداول)؟ واذكر بالتفصيل التوقيتات المحددة لكل خطوة.Based on the keywords decide whether it is eligible for a vectorstore search or a web search.Return a single word 'vectorstore' if it is eligible for vectorstore search.Return a single word 'websearch' if it is eligible for web search.Do not provide any other premable or explaination.[00m


[1m[95m# Agent:[00m [1m[92mRouter[00m
[95m## Final Answer:[00m [92m
websearch[00m


[1m[95m# Agent:[00m [1m[92mRetriever[00m
[95m## Task:[00m [92mBased on the response from the router task extract information for the question في حالة عدم تمكن المصدر من نشر معلوماته المالية الدورية خلال المهلة المحددة في اللوائح التنفيذية، ما هي الإجراءات التي تتبعها السوق المالية السعودية (تداول)؟

In [91]:
{
  "inputs": {
    "question": "ما هو الحد الأقصى لمدة تعليق تداول الأوراق المالية إذا لم ينشر المصدر معلوماته المالية بعد انتهاء المهلة المحددة؟"
  }
}


result = rag_crew.kickoff(inputs=inputs)

[1m[95m# Agent:[00m [1m[92mRouter[00m
[95m## Task:[00m [92mAnalyse the keywords in the question في حالة عدم تمكن المصدر من نشر معلوماته المالية الدورية خلال المهلة المحددة في اللوائح التنفيذية، ما هي الإجراءات التي تتبعها السوق المالية السعودية (تداول)؟ واذكر بالتفصيل التوقيتات المحددة لكل خطوة.Based on the keywords decide whether it is eligible for a vectorstore search or a web search.Return a single word 'vectorstore' if it is eligible for vectorstore search.Return a single word 'websearch' if it is eligible for web search.Do not provide any other premable or explaination.[00m


[1m[95m# Agent:[00m [1m[92mRouter[00m
[95m## Final Answer:[00m [92m
websearch[00m


[1m[95m# Agent:[00m [1m[92mRetriever[00m
[95m## Task:[00m [92mBased on the response from the router task extract information for the question في حالة عدم تمكن المصدر من نشر معلوماته المالية الدورية خلال المهلة المحددة في اللوائح التنفيذية، ما هي الإجراءات التي تتبعها السوق المالية السعودية (تداول)؟

In [92]:
print(result)


في حالة عدم تمكن المصدر من نشر معلوماته المالية الدورية خلال المهلة المحددة في اللوائح التنفيذية، تقوم السوق المالية السعودية (تداول) باتباع الإجراءات التالية: 

1. **إشعار المصدر**: يتم إرسال إشعار إلى المصدر يطلب فيه تبرير عدم نشر المعلومات في الوقت المحدد.
2. **تحديد المهلة الجديدة**: تُحدد السوق المالية فترة زمنية إضافية للمصدر لنشر المعلومات المطلوبة، عادة ما تكون فترة قصيرة لا تتجاوز أسبوعين.
3. **تنبيه للمستثمرين**: تُقوم السوق بإبلاغ المستثمرين عن تأخير نشر المعلومات من المصدر لتوفير الشفافية.
4. **فرض العقوبات**: في حالة عدم الالتزام بالمواعيد الجديدة، يمكن أن تفرض السوق المالية عقوبات على المصدر، مثل الغرامات أو تعليق التداول في أسهمه.
5. **متابعة الحالات**: تُتابع السوق الحالة من خلال مراجعة مدى تقدم المصدر في نشر المعلومات المطلوبة.

يرجى مراجعة اللوائح التنفيذية الخاصة بالسوق المالية للحصول على تفاصيل دقيقة حول التوقيتات المحددة لكل خطوة.
