# Saudi Tourism Agent

The **Saudi Tourism Data Agent** is an AI-powered assistant designed to retrieve, process, and summarize Saudi tourism statistics and visa regulations. It combines multiple tools and data sources to provide answers to user queries.


## Features

- **Tourism Statistics Search**  
  Fetch official reports and data from the Saudi General Authority for Statistics based on user queries (e.g., "Tourism", "Domestic Travel", "Visa Guidelines").

- **PDF Processing**  
  Extract relevant information from PDF files using Docling for text extraction and indexing.

- **Visa Regulations Search**  
  Search and retrieve specific rules from pre-indexed Tourist Visa Regulations PDF for quick answers to questions like "What are the visa requirements for Umrah?".

- **Semantic Search & Q&A**  
  Retrieve relevant content from reports and regulations using Retrieval-Augmented Generation (RAG) techniques.

- **External Tool Integration (ToDo)**  
  Connect with tools like Google Sheets for sharing outputs.


## Data Sources

- **Tourism Reports**:  
  https://www.stats.gov.sa/en/statistics-tabs

- **Tourist Visa Regulations**:  
  https://cdn.mt.gov.sa/mtportal/mt-fe-production/content/policies-regulations/documents/tourism-regulations/Tourist-Visa-Regulations.pdf


## Usage Example

- **User Query**:  
  "Get Tourism Statistics related number of female employees out of total employees in tourism activities"



## Setup and Configuration


1. Set up a `.env` file or `credentials.json` file for authentication (excluded from Git using `.gitignore`).


## 1. Install and Imports

In [3]:
%%capture
!pip install aixplain

In [None]:
# Imports
from dotenv import load_dotenv
import os

# Get Keys form .env
load_dotenv()

os.environ["TEAM_API_KEY"] = os.getenv('TEAM_API_KEY')


from aixplain.enums import Function, Supplier, DataType
from aixplain.modules.model.utility_model import UtilityModelInput
from aixplain.factories import ModelFactory, AgentFactory,IndexFactory
from aixplain.modules.model.record import Record


## 2. Create Stastical Report fetching Utility

In [20]:
def download_stats_gov_pdf(topic: str = "tourism") -> dict:

    import requests
    from bs4 import BeautifulSoup
    """
    Author: Rahaf Alawwad
    Date: 2025-05-31

    Description:
    This utility function scrapes the Saudi GASTAT page for the latest publication titled
    'Statistics (Establishments)' and downloads its associated PDF file.

    Parameters:
    - target_title (str): The prefix of the publication title to search for.

    Output:
    - If the PDF is found and downloaded:
        Returns a dictionary containing:
        {
            "title": The title of the publication,
            "filename": The saved PDF file name,
            "link": The full URL of the PDF
        }
    - If no matching publication is found or the download fails:
        Returns False.
    """

    topic = topic[0].upper() + topic[1:]
    target_title=f"{topic} Statistics (Establishments)"
    print(topic)
    base_url = "https://www.stats.gov.sa"  # Make sure base_url is defined here!
    url = base_url + '/en/statistics-tabs?tab=436312&category=124304'
    headers = {'User-Agent': 'Mozilla/5.0'}

    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')

    accordion_items = soup.find_all('div', class_='accordion-item')
    sucess_flag = False

    for item in accordion_items:
        header = item.find('a', class_='accordion-button')
        if header:
            title_span = header.find('span', class_='td')
            if title_span and title_span.text.strip().startswith(target_title):
                title = title_span.text.strip()
                body = item.find('div', class_='accordion-body')
                if body:
                    pdf_link_tag = body.find('a', href=lambda href: href and '.pdf' in href)
                    if pdf_link_tag:
                        pdf_link = base_url + pdf_link_tag['href']
                        filename = pdf_link.split('/')[-1].split('?')[0]

                        return {"title": title, "link": pdf_link}

    return "Target File was not found"





In [None]:
# Define parameters for the utility model
input_topic = UtilityModelInput(
    name="topic",
    description="topic of stastics e.g. Tourism",
    type=DataType.TEXT
)

# Create and deploy the utility model
utility = ModelFactory.create_utility_model(
    name="Get Latest SA Gov Stat",
    description="Fetches 'Establishments Statistics' PDF report by year and quarter from the Saudi General Authority for Statistics website.",
    code=download_stats_gov_pdf,
    inputs=[input_topic]
)

utility.deploy()


## 3. Craete Dynamic Link Indexing Utility

In [65]:

def index_docling_output(text: str, source: str = "N/A") -> dict:

    from aixplain.factories import IndexFactory
    from aixplain.modules.model.record import Record


    index = IndexFactory.create(
        name="Reports Index",
        description="Index for Statistics reports"
    )

    record = Record(
        id="stat-report",
        value=text,
        attributes={"source": source}
    )
    index.upsert([record])

    return {"status": "Success", "source": source, "record_id": "stat-report"}

# Inputs
input_text = UtilityModelInput(
    name="text",
    description="Text content extracted by Docling",
    type=DataType.TEXT
)

input_source = UtilityModelInput(
    name="source",
    description="Source PDF URL or identifier",
    type=DataType.TEXT
)

# Create the utility model
index_util = ModelFactory.create_utility_model(
    name="Index Docling Output",
    description="Indexes text extracted from PDF by Docling",
    code=index_docling_output,
    inputs=[input_text, input_source]
)

index_util.deploy()


## 4. Static indexing of Visa Regulations Document
Assuming modified docuemnt will have the same URL

In [90]:
# Read PDF using Docling
docling = ModelFactory.get("677bee6c6eb56331f9192a91")  # Docling model ID

# Set the static link
pdf_link = "https://cdn.mt.gov.sa/mtportal/mt-fe-production/content/policies-regulations/documents/tourism-regulations/Tourist-Visa-Regulations-En-V012.pdf"
result = docling.run(pdf_link)
extracted_text = result.data

# Delete older indices (For Experimental Purposes)
indices = IndexFactory.list(query="Regulations Index Tourism")
for index in indices["results"]:
  index.delete()

# Store record
record = Record(
    id="tourist-visa-regulations",
    value=extracted_text,
    attributes={"source": pdf_link}
)


# Create the index
regulations_index = IndexFactory.create(
    name="Regulations Index Tourism",
    description="Index for Saudi HR and tourism regulations"
)

# Upload the extracted text from Docling
record = Record(
    id="tourist-visa-regulations",
    value=extracted_text,
    attributes={"source": pdf_link}
)

regulations_index.upsert([record])

print("Index created and record added.")
print("Index ID:", regulations_index.id)


# Warp the tool
index_tool_regulations = AgentFactory.create_model_tool(
    model=regulations_index.id,
    description="Search index for Saudi HR and tourism regulations"
)


Index created and record added.
Index ID: 683b24823828522c9caae2c8


## 5. Create the Agent

Agent Role and Instructions

This agent is designed to answer questions about Saudi tourism by connecting to two distinct data sources:

1. **Tourism Reports**  
   - Source: Saudi General Authority for Statistics  
   - Purpose: Answer questions about **tourism statistics**, such as visitor numbers, trip purposes, and travel patterns.
   - Example Query: "How many domestic trips were taken in 2023?"

2. **Tourist Visa Regulations PDF**  
   - Source: Ministry of Tourism  
   - Purpose: Answer questions about **visa rules and requirements** for different visitor categories.
   - Example Query: "What are the visa requirements for Umrah visitors?"

**Routing Logic**  
The agent automatically determines the correct source based on the query:
- If the query relates to **numbers, trends, or data** → route to **Tourism Reports**.
- If the query relates to **rules, policies, or requirements** → route to **Tourist Visa Regulations**.



In [135]:
agent = AgentFactory.create(
    name="Saudi Visitors Report Agent",
    description="Fetches Saudi tourism reports and answers questions about visa regulations.",
    instructions=(
        "role: A data retrieval and answering agent for Saudi tourism reports and regulations."
        "This agent retrieves Saudi tourism data and answers questions about regulations. "
        "For reports and statistics, extract the main topic from the user's query (for example, 'Tourism') and use the first utility to fetch the corresponding PDF link from the Saudi General Authority for Statistics. "
        "Then pass the PDF link to the Docling tool to extract text from the document. "
        "Retrieve the most relevant information based on the user's query and provide a summary along with the report link. "
        "For questions about regulations or visa requirements, use the Regulations Index tool to search the pre-indexed regulations PDF and provide the most relevant answer. "
        "always answer in the same langauge as requested by user"
    ),
    tools=[
        AgentFactory.create_model_tool(model=utility.id),                  # Fetch PDF links for reports
        AgentFactory.create_model_tool("677bee6c6eb56331f9192a91"),       # Docling for text extraction
        AgentFactory.create_model_tool(model=index_util.id),              # Index for dynamic reports (Not Used)
        index_tool_regulations                                           # Search the Regulations Index
    ],
    llm_id="6646261c6eb563165658bbb1"
)




## 6. Example Runs

In [136]:
# Query stastics which will invoke the first route
query = "Get Tourism Statistics related number of female employees out of total employees in tourism activities , answer in arabic"
response = agent.run(query)

In [137]:
# First response was requested to be in arabic
response.data.output

'خلال الربع الرابع من عام 2024، بلغ عدد الموظفات في الأنشطة السياحية 128,559، مما يمثل نسبة 13.3% من إجمالي الموظفين في الأنشطة السياحية.'

In [138]:
# Intermediate steps are shown to determine if correct route was taken
response.data.intermediate_steps

[{'agent': 'Saudi Visitors Report Agent',
  'input': "{'input': 'Get Tourism Statistics related number of female employees out of total employees in tourism activities , answer in arabic', 'chat_history': [], 'outputFormat': 'text'}",
  'output': 'خلال الربع الرابع من عام 2024، بلغ عدد الموظفات في الأنشطة السياحية 128,559، مما يمثل نسبة 13.3% من إجمالي الموظفين في الأنشطة السياحية.',
  'tool_steps': [{'tool': 'utilities-aixplain-get_latest_sa_gov_stat',
    'input': "{'topic': 'Tourism'}",
    'output': "{'title': 'Tourism Statistics (Establishments) Q4/2024',\n 'link': 'https://www.stats.gov.sa/documents/20117/2435281/Tourism+Establishments+Statistics+Q4+of+2024EN.pdf/dc72eff1-38ed-895c-e9fc-e1548ea6460e?t=1745469570209'}"},
   {'tool': 'utilities-aixplain-docling',
    'input': "{'data': 'https://www.stats.gov.sa/documents/20117/2435281/Tourism+Establishments+Statistics+Q4+of+2024EN.pdf/dc72eff1-38ed-895c-e9fc-e1548ea6460e?t=1745469570209'}",
    'output': 'Figure5. Average length of

In [104]:
# Query regulations which will invoke the second route
query = "Get Visa Regulations regarding the application"
response = agent.run(query)

In [105]:
# Second reponse is in default english
response.data.output

'The Tourist Visa Regulations for Saudi Arabia include the following key points: \n\n1. A Tourist Visa is an entry permit granted under specific conditions and can be issued by the Ministry of Foreign Affairs, Saudi diplomatic missions abroad, or upon arrival.\n2. The Visa can be obtained through three main methods: upon arrival, electronically via the platform www.visitsaudi.com, or through Saudi diplomatic missions abroad.\n3. The Visa validity varies: a one-time entry is valid for three months with a stay not exceeding one month, while multiple entries are valid for one year with a stay not exceeding three months.\n4. Applicants must fill out a Visa Application Form, register biometric characteristics, hold a valid passport, obtain medical insurance, register their address in the Kingdom, and pay the visa fee.\n5. The age requirement for tourists is at least 18 years unless accompanied by a guardian.\n6. The fee for a Tourist Visa is SAR 300.\n7. Tourists must adhere to Saudi laws, 

In [106]:
# Intermediate steps are shown to determine if correct route was taken
response.data.intermediate_steps

[{'agent': 'Saudi Visitors Report Agent',
  'input': "{'input': 'Get Visa Regulations regarding the application', 'chat_history': [], 'outputFormat': 'text'}",
  'output': 'The Tourist Visa Regulations for Saudi Arabia include the following key points: \n\n1. A Tourist Visa is an entry permit granted under specific conditions and can be issued by the Ministry of Foreign Affairs, Saudi diplomatic missions abroad, or upon arrival.\n2. The Visa can be obtained through three main methods: upon arrival, electronically via the platform www.visitsaudi.com, or through Saudi diplomatic missions abroad.\n3. The Visa validity varies: a one-time entry is valid for three months with a stay not exceeding one month, while multiple entries are valid for one year with a stay not exceeding three months.\n4. Applicants must fill out a Visa Application Form, register biometric characteristics, hold a valid passport, obtain medical insurance, register their address in the Kingdom, and pay the visa fee.\n5.