# **Introduction**

---

## Groundedness Definition
**Groundedness**: Whether a response is rooted, connected, or anchored in reality or specific context; based on factual information rather than invented content.

## Ungrounded Prompts and Responses

### Characteristics
- Model uses only training data (large volume of uncontextualized text)
- Responses are grammatically coherent and logical
- **Problem**: Uncontextualized, potentially inaccurate, may include invented information

### Example Issue
Question: "Which product should I use to do X?"
**Risk**: Response may include details of fictional products

## Grounded Prompts and Responses

### Approach
Use a data source to ground the prompt with relevant, factual context before submitting to language model

### Process
1. Retrieve relevant data from data source
2. Include grounding data with prompt
3. Language model generates contextualized, relevant, and accurate response

### Data Source Examples
- Product catalog databases
- Corporate documentation
- Knowledge bases
- Real-time data repositories

### Benefits
- Responses based on actual facts
- Contextually relevant information
- Accurate details from verified sources
- Reduced hallucination and invented content

## Application
Build chat-based language model applications (agents) that are grounded using your own data to ensure accurate, factual responses.

## Key Takeaway
Grounding uses external data sources to provide factual context for prompts, resulting in accurate, contextualized responses instead of potentially invented information from training data alone.

# **Grounding**

---

## Groundedness Definition
**Groundedness**: Whether a response is rooted, connected, or anchored in reality or specific context; based on factual information rather than invented content.

## Ungrounded Prompts and Responses

### Characteristics
- Model uses only training data (large volume of uncontextualized text)
- Responses are grammatically coherent and logical
- **Problem**: Uncontextualized, potentially inaccurate, may include invented information

### Example Issue
Question: "Which product should I use to do X?"
**Risk**: Response may include details of fictional products

## Grounded Prompts and Responses

### Approach
Use a data source to ground the prompt with relevant, factual context before submitting to language model

### Process
1. Retrieve relevant data from data source
2. Include grounding data with prompt
3. Language model generates contextualized, relevant, and accurate response

### Data Source Examples
- Product catalog databases
- Corporate documentation
- Knowledge bases
- Real-time data repositories

### Benefits
- Responses based on actual facts
- Contextually relevant information
- Accurate details from verified sources
- Reduced hallucination and invented content

## Application
Build chat-based language model applications (agents) that are grounded using your own data to ensure accurate, factual responses.

## Key Takeaway
Grounding uses external data sources to provide factual context for prompts, resulting in accurate, contextualized responses instead of potentially invented information from training data alone.

# **Make your data searchable**
---
## Azure AI Search Integration
**Purpose**: Retrieve relevant context for chat flows in Azure AI Foundry
**Function**: Acts as retriever in language model applications built with prompt flow

## Vector Embeddings

### Definition
**Embedding**: Special data format represented as vector of floating-point numbers; enables semantic similarity calculation

### Example
Two semantically related documents:
- "The children played joyfully in the park."
- "Kids happily ran around the playground."

**Result**: Vector embeddings capture semantic relationship despite different words

### Cosine Similarity
**Method**: Calculate distance between vectors by measuring angle between them
**Purpose**: Compute semantic similarity between documents and queries

### Benefits
- Extract relevant context across different formats (text, images)
- Work across multiple languages
- Find semantically similar content, not just keyword matches

## Creating Search Indexes

### Search Index Purpose
Describes how content is organized to make it searchable (like library catalog for books)

### Index Creation in Azure AI Foundry
1. Add data to Azure AI Foundry
2. Use Azure AI Search to create index using embedding model
3. Index stored in Azure AI Search, queried by Azure AI Foundry in chat flows

### Embedding Models
Use Azure OpenAI embedding models available in Azure AI Foundry to create embeddings for vector search

## Search Techniques

### Keyword Search
**Method**: Identifies documents based on specific keywords or exact terms
**Use case**: Precise matches when exact terminology is known

### Semantic Search
**Method**: Understands query meaning, matches semantically related content
**Advantage**: Goes beyond exact keyword matches using semantic models

### Vector Search
**Method**: Uses mathematical representations (vectors) to find similar content based on semantic meaning
**Advantage**: Most advanced technique, captures contextual relationships

### Hybrid Search ⭐ (Recommended for Generative AI)
**Method**: Combines keyword, vector, and optionally semantic ranking
**Execution**: Queries run in parallel, results unified in single set
**Advantage**: Most accurate results for generative AI applications

## Hybrid Search Details

### How It Works
- **Single query request** with both full text and vector parameters
- **Parallel execution** of keyword and vector searches
- **Reciprocal Rank Fusion (RRF)** merges results into unified ranked output

### Key Components
1. **Full text search**: Uses BM25 ranking for keyword matches
2. **Vector search**: Uses HNSW or eKNN algorithms for similarity
3. **RRF algorithm**: Merges and ranks unified results
4. **Optional semantic ranking**: Applies machine reading comprehension for relevance

### Query Structure Elements
- **search**: Full text query parameter
- **vectorQueries**: Multiple vector queries targeting vector fields
- **select**: Fields to return in results
- **filter**: Include/exclude criteria (e.g., geospatial, property filters)
- **facets**: Compute facet buckets over results
- **queryType=semantic**: Invoke semantic ranker (optional)

### Advantages
**Precision + Recall**:
- Keyword search provides precision with exact matches
- Vector search provides recall with conceptual similarity
- Combines strengths of both approaches

**Best Performance Scenarios**:
- Product codes and IDs
- Specialized jargon
- Dates and names
- Conceptual similarity searches

**Optimal for Generative AI**: Benchmark testing shows hybrid + semantic ranking offers significant benefits for search relevance and grounding data quality

### Filter and Facet Support
- Filters apply to geospatial search, include/exclude criteria
- Can be applied pre-filter or post-filter
- Facets work over hybrid query results
- Independent of inverted and vector indexes

### Important Notes
- No explicit `orderby` in hybrid queries (overrides relevance ranking)
- Set `k` to 50 when using semantic ranker to maximize inputs
- Multi-lingual content searchable without language analyzers

## Key Takeaway
Azure AI Search with vector embeddings enables semantic search across data. Hybrid search (combining keyword, vector, and semantic ranking) provides the most accurate results for generative AI applications by balancing precision and conceptual similarity.

# **Create RAG-based client application**

---

## Overview
Use Azure AI Search index with OpenAI models by extending requests with index connection details through Azure OpenAI SDK.

## Keyword-Based RAG Implementation

### Basic Structure
```python
from openai import AzureOpenAI

# Initialize OpenAI client
chat_client = AzureOpenAI(
    api_version = "2024-12-01-preview",
    azure_endpoint = open_ai_endpoint,
    api_key = open_ai_key
)

# Create prompt with system and user messages
prompt = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": input_text}
]

# RAG parameters with Azure AI Search connection
rag_params = {
    "data_sources": [{
        "type": "azure_search",
        "parameters": {
            "endpoint": search_url,
            "index_name": "index_name",
            "authentication": {
                "type": "api_key",
                "key": search_key,
            }
        }
    }],
}

# Submit prompt with index information
response = chat_client.chat.completions.create(
    model="<model_deployment_name>",
    messages=prompt,
    extra_body=rag_params
)
```

### Keyword-Based Query Characteristics
- Query consists of text from user prompt
- Matched to text in indexed documents
- Searches for literal text matches

## Vector-Based RAG Implementation

### Modified RAG Parameters
```python
rag_params = {
    "data_sources": [{
        "type": "azure_search",
        "parameters": {
            "endpoint": search_url,
            "index_name": "index_name",
            "authentication": {
                "type": "api_key",
                "key": search_key,
            },
            # Vector-based query configuration
            "query_type": "vector",
            "embedding_dependency": {
                "type": "deployment_name",
                "deployment_name": "<embedding_model_deployment_name>",
            },
        }
    }],
}
```

### Vector-Based Query Characteristics
- Uses numeric vectors to represent text tokens
- Query text is vectorized using embedding model
- Enables matching based on semantic similarity
- Captures both semantic and literal matches
- Requires index that supports vector search

## Key Components

### Azure OpenAI Client
- API version: "2024-12-01-preview" (or current version)
- Requires endpoint and API key
- Handles chat completions with RAG

### Prompt Structure
- System message: Defines AI assistant behavior
- User message: Contains user input/question
- Array format for conversation history

### RAG Parameters (`extra_body`)
- Data source type: "azure_search"
- Connection parameters: endpoint, index name, authentication
- Query type: keyword (default) or vector
- Embedding dependency: for vector-based queries

### Response Handling
- Response object contains choices
- Extract content: `response.choices[0].message.content`
- Content is contextualized based on indexed data

## Comparison: Keyword vs Vector

| Aspect | Keyword-Based | Vector-Based |
|--------|--------------|--------------|
| **Query Method** | Text matching | Semantic similarity |
| **Match Type** | Literal text | Semantic + literal |
| **Configuration** | Default | Requires `query_type: vector` |
| **Embedding Model** | Not required | Required in parameters |
| **Index Support** | Standard index | Vector-enabled index |

## Implementation Requirements

### Prerequisites
- Azure OpenAI deployment
- Azure AI Search index
- API keys for both services
- Embedding model deployment (for vector search)

### Authentication
- API key-based authentication shown
- Both OpenAI and Search services require keys
- Keys stored securely, passed in parameters

## Key Takeaway
RAG implementation extends Azure OpenAI requests with Azure AI Search index details via `extra_body` parameter. Keyword-based queries match literal text, while vector-based queries (requiring embedding model) enable semantic similarity matching for more accurate contextualized responses.

# **Implement RAG in a prompt flow**

---

## Prompt Flow Overview
**Definition**: Development framework for defining flows that orchestrate interactions with LLMs

**Structure**: Inputs → Connected Tools → Outputs

## Flow Components

### Inputs
- User question or prompt
- Chat history (for iterative conversations)

### Tools (Operations)
- Run custom Python code
- Look up data values in an index
- Create prompt variants
- Submit prompts to LLM

### Outputs
- Generated results from LLM

## RAG Pattern in Prompt Flow

### Key Component: Index Lookup Tool
**Purpose**: Retrieve data from index to augment prompts sent to LLM

## Multi-Round Q&A Sample Flow

### Flow Steps (in order)

**1. Modify Query with History**
- **Tool**: LLM node
- **Purpose**: Takes chat history + user's last question → generates new question with all necessary info
- **Result**: More succinct input for rest of flow

**2. Look Up Relevant Information**
- **Tool**: Index Lookup tool
- **Purpose**: Query Azure AI Search index to find relevant information
- **Result**: Retrieved context from data source

**3. Generate Prompt Context**
- **Tool**: Python node
- **Purpose**: Parse Index Lookup output into suitable format
- **Process**: Iterate over top n retrieved documents, combine contents and sources into one document string
- **Result**: Formatted string for prompt

**4. Define Prompt Variants**
- **Tool**: Prompt variants
- **Purpose**: Represent different prompt contents with varying system messages
- **Goal**: Instruct chatbot to use context and be factual (improve groundedness)
- **Result**: Multiple prompt versions to compare

**5. Chat with Context**
- **Tool**: LLM node
- **Purpose**: Send prompt to language model with retrieved context
- **Result**: Natural language response (also flow output)

### Example Flow
``INPUTS (question, chat_history)
    ↓
MODIFY_QUERY_WITH_HISTORY (LLM node)
    ↓
LOOKUP (Index Lookup tool)
    ↓
GENERATE_PROMPT_CONTEXT (Python node)
    ↓
PROMPT_VARIANTS (Prompt tool)
    ↓
CHAT_WITH_CONTEXT (LLM node)
    ↓
OUTPUTS (response)``
### Deployment
After configuration → deploy flow → integrate with application for agentic experience

## Index Lookup Tool Details

### Supported Index Types
- Azure AI Search
- FAISS
- Pinecone

### Input Parameters

| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| **mlindex_content** | string | Type of Index to be used | Yes |
| **queries** | string, Union[string, List[String]] | Text to be queried | Yes |
| **query_type** | string | Type of query (Keyword, Semantic, Hybrid, Vector) | Yes |
| **top_k** | integer | Count of top-scored entities to return (default: 3) | No |

### Query Types
- **Keyword**: Literal text matches
- **Semantic**: Meaning-based matches
- **Hybrid**: Combines multiple approaches
- **Vector**: Semantic similarity using embeddings

### Output Format
Returns JSON with top-k scored entities containing:

| Field | Type | Description |
|-------|------|-------------|
| **metadata** | dict | Customized key-value pairs (page_number, source, stats) |
| **page_content** | string | Content of vector chunk used in lookup |
| **score** | float | Relevance score (L2 distance for FAISS, cosine similarity for Azure AI Search) |

### Output Structure Example
```json
[{
  "metadata": {
    "page_number": 44,
    "source": {"filename": "file.pdf", "url": "..."},
    "stats": {"chars": 4385, "lines": 41, "tiktokens": 891}
  },
  "page_content": "vector chunk",
  "score": 0.0213
}]
```

### Configuration Requirements
- **mlindex_content**: Index connection details (embeddings config, index settings)
- **Vector input**: Generated by LLM tool for embedding-based queries
- **Field mapping**: Content, embedding, and metadata fields

### Legacy Tool Migration
Index Lookup tool replaces three deprecated tools:
- Vector Index Lookup tool
- Vector DB Lookup tool
- FAISS Index Lookup tool

## Key Takeaway
RAG in Prompt Flow uses Index Lookup tool to retrieve relevant context from indexed data, which is then formatted and used to augment prompts sent to LLMs. The Multi-Round Q&A sample provides a complete flow: modify query → lookup → generate context → create variants → chat with context.

# **Quiz**
---

## Question 1: Groundedness Definition
**Question**: What does groundedness refer to in the context of generative AI?

**Correct Answer**: Using data to contextualize prompts and ensure relevant responses

**Explanation**: Groundedness means anchoring responses in factual, relevant data rather than relying solely on training data.

**Wrong Answers**:
- ❌ The use of a locally deployed language model: Deployment location doesn't affect groundedness
- ❌ Using the lowest possible number of tokens in a prompt: Token count relates to efficiency, not factual accuracy

## Question 2: Grounding Pattern
**Question**: What pattern can you use to ground prompts?

**Correct Answer**: Retrieval Augmented Generation (RAG)

**Explanation**: RAG retrieves relevant data from external sources to provide factual context for prompts.

**Wrong Answers**:
- ❌ Metadata Optimized Prompt (MOP): Not a real pattern
- ❌ Data Understanding Support Text (DUST): Not a real pattern

## Question 3: RAG Implementation in Client Apps
**Question**: How can you use the RAG pattern in a client app that uses the Azure OpenAI SDK?

**Correct Answer**: Add index connection details to the OpenAI ChatClient configuration

**Explanation**: Use `extra_body` parameter with `data_sources` containing Azure AI Search index connection details.

**Wrong Answers**:
- ❌ Add text files containing the grounding data to the app folder: Data should be in searchable index, not local files
- ❌ Azure AI Foundry automatically grounds all prompts using Bing Search: Grounding requires explicit configuration

## Key Patterns

**Groundedness** = Using external data to contextualize prompts
**RAG Pattern** = Retrieval Augmented Generation for grounding
**Implementation** = Add index connection details via `extra_body` parameter

## Quick Reference

| Concept | Implementation |
|---------|----------------|
| Ground prompts | Use RAG pattern |
| RAG in Azure OpenAI SDK | Add index details to `extra_body` |
| Data source | Azure AI Search index |

# **Code Exercise**
---

## Required Model Deployments

### Two Models Needed
1. **Embedding Model**: text-embedding-ada-002
   - Purpose: Vectorize text data for indexing
   - Deployment type: Global Standard
   - TPM: 50K

2. **Generation Model**: gpt-4o
   - Purpose: Generate natural language responses
   - Deployment type: Global Standard
   - TPM: 50K

## Data Setup

### Upload Data
1. Add data source to project (PDF brochures)
2. Upload folder with documents
3. Name data source (e.g., "brochures")

## Index Creation

### Index Configuration
**Source**: Data in Azure AI Foundry (select data source)

**Azure AI Search Settings**:
- Create new Azure AI Search resource
- Same location as AI hub
- Pricing tier: Basic

**Index Name**: brochures-index

**Vector Settings**:
- Add vector search to resource
- Select Azure OpenAI connection
- Embedding model: text-embedding-ada-002
- Select embedding model deployment

### Index Creation Jobs
1. Crack, chunk, and embed text tokens
2. Create Azure AI Search index
3. Register index asset

## Testing in Playground

### Without Index
**Prompt**: "Where can I stay in New York?"
**Result**: Generic answer without custom data

### With Index
1. Add project index to Setup pane
2. Select hybrid (vector + keyword) search type
3. Submit same prompt
**Result**: Response based on indexed data

## RAG Client Application Code

### Essential Components

```python
from openai import AzureOpenAI

# Create Azure OpenAI client
chat_client = AzureOpenAI(
    api_version = "2024-12-01-preview",
    azure_endpoint = openai_endpoint,
    api_key = openai_api_key
)

# System message for chat solution
system_message = "You are a helpful travel assistant."

# Create prompt with messages
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_input}
]

# RAG configuration with search index
rag_params = {
    "data_sources": [
        {
            "type": "azure_search",
            "parameters": {
                "endpoint": search_endpoint,
                "index_name": index_name,
                "authentication": {
                    "type": "api_key",
                    "key": search_api_key,
                },
                # Vector-based query configuration
                "query_type": "vector",
                "embedding_dependency": {
                    "type": "deployment_name",
                    "deployment_name": embedding_model_deployment,
                },
            }
        }
    ],
}

# Submit prompt with RAG parameters
response = chat_client.chat.completions.create(
    model=chat_model_deployment,
    messages=messages,
    extra_body=rag_params
)

# Get response
answer = response.choices[0].message.content
print(answer)

# Add to chat history for multi-turn conversation
messages.append({"role": "assistant", "content": answer})
```

## Key Implementation Details

### Vectorization Process
- Query submitted as text to search index
- Embedding model vectorizes query text
- Vector-based search finds semantically similar content
- More efficient than keyword-only search

### Response Grounding
- Search index queried based on user prompt
- Relevant text found in indexed documents
- Context added to prompt before LLM generation
- Response includes source references

### Multi-Turn Conversations
- Append assistant responses to message history
- Maintain conversation context
- Enable follow-up questions

## Search Type Options

### Keyword Search
- Text-based matching
- Exact or fuzzy text matches

### Vector Search
- Semantic similarity matching
- Uses embeddings for comparison

### Hybrid Search (Recommended)
- Combines vector + keyword search
- Best accuracy for RAG applications

## Configuration Requirements

### Essential Parameters
- **openai_endpoint**: Azure OpenAI endpoint
- **openai_api_key**: API key for authentication
- **chat_model_deployment**: GPT model deployment name
- **embedding_model_deployment**: Embedding model deployment name
- **search_endpoint**: Azure AI Search URL
- **search_api_key**: Search API key
- **index_name**: Name of created index

## Testing Workflow

1. **Test without index**: Verify generic responses
2. **Add index**: Configure in playground
3. **Test with index**: Verify grounded responses
4. **Implement in code**: Use SDK for application
5. **Test multi-turn**: Verify conversation context

## Key Takeaway
RAG implementation requires deploying embedding and generation models, creating a vector index with Azure AI Search, and using the Azure OpenAI SDK with `extra_body` parameter containing search index connection details and embedding model configuration for vectorized queries.