# 📌 Step 1: Install Required Libraries  
Before running the notebook, ensure you have the necessary libraries installed.

```python
!pip install faiss-cpu mistralai beautifulsoup4 requests numpy

In [1]:
# !pip install faiss-cpu mistralai beautifulsoup4 requests numpy

# 📌 Step 2: Set Up API Key  
We'll use Mistral AI for embedding generation and chat-based responses.  
Replace `"YOUR_API_KEY"` with your actual Mistral API key.

In [3]:
import os

# Set up Mistral API Key
os.environ["MISTRAL_API_KEY"] = "hFS71eF2NzuvFCBQ00AMtrcaqUsqxX89Y"
api_key = os.getenv("MISTRAL_API_KEY")

print(f"MISTRAL_API_KEY: {api_key}")  # Debugging check

MISTRAL_API_KEY: hFS71eF2NzuvFCBQ00AMtrcaqUsqxX89Y


# 📌 Step 3: Web Scraping - Extract Sports Facility Data  
We'll scrape the **UDST Sports Facilities** webpage to extract facility names and descriptions.

In [7]:
import requests
from bs4 import BeautifulSoup

# UDST Sports Facilities webpage URL
url = "https://www.udst.edu.qa/sport-and-wellness/our-facilities"

# Send a GET request
response = requests.get(url)
response.raise_for_status()  # Ensure request was successful

# Parse the HTML
soup = BeautifulSoup(response.text, 'html.parser')

# Find all facility sections
facility_sections = soup.find_all('div', class_='field-content')  # Adjust class if needed

facilities = []

# Extract facility information
for facility_section in facility_sections:
    facility_name_tag = facility_section.find('h3')  # Facility Name
    facility_description_tag = facility_section.find('div', class_='field-name-body')  # Description container

    if facility_name_tag and facility_description_tag:
        facility_name = facility_name_tag.get_text(strip=True)
        
        # Extract all list items within the description section
        description_list = facility_description_tag.find_all('li')
        facility_description = " ".join([item.get_text(strip=True) for item in description_list])  # Join text from <li> tags
        
        # Remove "Book Now" if present
        facility_description = facility_description.replace("Book Now", "").strip()
        
        facilities.append({'name': facility_name, 'description': facility_description})

# Print extracted sports facilities details
for facility in facilities:
    print(f"Facility Name: {facility['name']}")
    print(f"Description: {facility['description']}")
    print('-' * 80)

Facility Name: Natural Grass Cricket Ground
Description: Caters for both hardball and MRI. Appropriate lighting requirements for hosting evening/night events Amenities including covered seating, proper stump configuration, bowlers back drops
--------------------------------------------------------------------------------
Facility Name: Turf Football Pitch
Description: A FIFA International standard 11v11 sectioned into three blue “cross pitches” Accommodates 7v7,8v8 or 9v9
--------------------------------------------------------------------------------
Facility Name: Outdoor Padel Courts
Description: International standards, highly durable Outdoor Padel courts designed to withstand extreme weather conditions. Designated courts (Court 1) are adjusted to ensure privacy for female users.
--------------------------------------------------------------------------------
Facility Name: Running Track
Description: International Athletics Federation approved 8-lane running track 8+ Participants p

# 📌 Step 4: Preprocess Text for FAISS  
We need to clean and split the facility descriptions before embedding them in FAISS.

In [10]:
# Ensure only relevant descriptions are indexed
chunks = [facility['description'] for facility in facilities if facility['description']]

# Clean the text to remove unnecessary whitespace and newlines
chunks = [" ".join(chunk.split()) for chunk in chunks]

print("Total Chunks for Indexing:", len(chunks))  # Debugging check

Total Chunks for Indexing: 19


# 📌 Step 5: Generate Text Embeddings with Mistral  
We'll convert each chunk into vector embeddings using the **Mistral embedding model**.

In [15]:
import numpy as np
from mistralai import Mistral

# Function to generate embeddings
def get_text_embedding(list_txt_chunks):
    client = Mistral(api_key=api_key)
    embeddings_batch_response = client.embeddings.create(model="mistral-embed", inputs=list_txt_chunks)
    return embeddings_batch_response.data

# Generate embeddings for the text chunks
text_embeddings = get_text_embedding(chunks)

print("Number of Embeddings:", len(text_embeddings))  # Debugging check

Number of Embeddings: 19


# 📌 Step 6: Store Embeddings in FAISS Index  
We'll store our embeddings in a **FAISS vector database** for fast similarity searches.

In [20]:
import faiss

# Get embedding dimension
d = len(text_embeddings[0].embedding)

# Create a FAISS index
index = faiss.IndexFlatL2(d)

# Convert embeddings into NumPy array and add them to the index
embeddings_array = np.array([embedding.embedding for embedding in text_embeddings])
index.add(embeddings_array)

print("Total Embeddings Indexed:", index.ntotal)  # Debugging check

Total Embeddings Indexed: 19


# 📌 Step 7: Retrieve Relevant Chunks from FAISS  
We'll query FAISS to find the most relevant chunks for a given question.

In [23]:
question = "What are the ways I can use sports facilities?"
question_embedding = np.array([get_text_embedding([question])[0].embedding])

# Retrieve top 2 relevant chunks
D, I = index.search(question_embedding, k=2)

# Print retrieved chunk indices and their similarity scores
print("Indices:", I)
print("Scores:", D)

# Extract the actual text chunks
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]

# Clean the retrieved text
retrieved_text = " ".join(retrieved_chunk).replace("\n", " ").strip()

print("Retrieved Chunk:", retrieved_text)

Indices: [[14 13]]
Scores: [[0.44209975 0.5072693 ]]
Retrieved Chunk: Multi-sport Courts suitable for: Basketball, Futsal, Handball, Volleyball and Tennis 60-minute sessions High-quality indoor multi-sport hall provides the ideal surface Amenities for Basketball, Volleyball, Badminton, and Futsal. Court Dimensions: 35.5mx20.5m.


# 📌 Step 8: Generate Answer Using Mistral  
We'll use Mistral to generate a structured answer based on the retrieved text.


In [26]:
from mistralai import UserMessage

# Define the prompt
prompt = f"""
You are an AI assistant that provides structured answers based on retrieved knowledge.

Context:
---------------------
{retrieved_text}
---------------------

Answer the following question in a structured format with numbered points.

Query: {question}

Answer:
"""

# Function to generate response
def mistral(user_message, model="mistral-small-latest", is_json=False):
    client = Mistral(api_key=api_key)
    messages = [UserMessage(content=user_message)]

    chat_response = client.chat.complete(
        model="mistral-large-latest",
        messages=messages,
    )

    return chat_response.choices[0].message.content

# Get the AI-generated response
response = mistral(prompt)
print(response)

Based on the provided context, here are the ways you can use the sports facilities:

1. **Sport Activities**:
   - **Basketball**: You can play basketball games or have practice sessions.
   - **Futsal**: The facility is suitable for futsal games and practices.
   - **Handball**: You can use the court for handball games.
   - **Volleyball**: The court can accommodate volleyball games and practices.
   - **Tennis**: The facility is also suitable for tennis games.
   - **Badminton**: You can play badminton games or have practice sessions.

2. **Session Duration**:
   - You can book the courts in 60-minute sessions for any of the above sports.

3. **Amenities**:
   - The high-quality indoor multi-sport hall provides amenities specifically designed for Basketball, Volleyball, Badminton, and Futsal. This could include appropriate court markings, nets, and hoops.

4. **Court Dimensions**:
   - The court dimensions are 35.5 meters by 20.5 meters, which accommodate the mentioned sports. You ca