In [6]:
import numpy as np

# Simple 3D vectors (toy embeddings)
q  = np.array([1, 0, 0])     # query: "cancer risk"
d1 = np.array([0.9, 0.1, 0]) # doc chunk 1: "risk factors"
d2 = np.array([0.1, 0.9, 0]) # doc chunk 2: "treatment"

docs = np.vstack([d1, d2])

# 1. Similarity of each document to query
sims = docs @ q / (np.linalg.norm(docs,axis=1) * np.linalg.norm(q))

# 2. Attention weights (softmax)
weights = np.exp(sims) / np.sum(np.exp(sims))

# 3. Weighted context vector (RAG context)
context = weights[0] * d1 + weights[1] * d2

weights , context


(array([0.70753709, 0.29246291]), array([0.66602967, 0.33397033, 0.        ]))

In [7]:

# A tiny vocabulary with toy embeddings
word_vectors = {
    "risk":     np.array([1.0, 0.1, 0]),
    "factors":  np.array([0.8, 0.2, 0]),
    "treatment":np.array([0.1, 1.0, 0]),
    "cancer":   np.array([0.6, 0.4, 0])
}

# Find the word whose embedding is closest to our context vector
def cosine_sim(a, b):
    return np.dot(a,b) / (np.linalg.norm(a)*np.linalg.norm(b))

generated_word = max(word_vectors, key=lambda w: cosine_sim(context, word_vectors[w]))

print("Query vector:", q)
print("RAG context vector:", context)
print("Generated word:", generated_word)

Query vector: [1 0 0]
RAG context vector: [0.66602967 0.33397033 0.        ]
Generated word: cancer


# Attention Weights and RAG Context Generation

## Overview
This notebook demonstrates the fundamental concepts of attention mechanisms and Retrieval-Augmented Generation (RAG) using simple vector operations.

## Key Concepts

### 1. Vector Representations
- **Query Vector (`q`)**: Represents the user's search intent (e.g., "Cancer risk")
- **Document Vectors (`d1`, `d2`)**: Represent different document chunks in the knowledge base
  - `d1`: Related to "Smoking causes cancer" (high relevance to query)
  - `d2`: Related to "treatment" (lower relevance to query)

### 2. Attention Mechanism Process

#### Step 1: Similarity Calculation
Computes cosine similarity between query and each document:
```
similarity = (doc · query) / (||doc|| × ||query||)
```

#### Step 2: Attention Weights via Softmax
Converts similarities to probability distribution:
```
weight_i = exp(sim_i) / Σ(exp(sim_j))
```
This ensures weights sum to 1.0 and emphasizes higher similarities.

#### Step 3: Weighted Context Vector
Creates RAG context by combining documents weighted by attention:
```
context = Σ(weight_i × doc_i)
```

### 3. Word Generation
The second cell demonstrates how the context vector can be used to generate relevant words:
- Compares the context vector with word embeddings in vocabulary
- Selects the word with highest cosine similarity to context
- Shows how attention-weighted context influences word selection

## Results Interpretation
- Higher attention weights indicate more relevant documents
- The context vector is biased toward more relevant documents
- Generated words align with the weighted combination of document meanings

## Applications
This simplified example illustrates core concepts used in:
- **RAG Systems**: Combining retrieved documents for context-aware generation
- **Transformer Models**: Attention mechanisms in neural networks
- **Semantic Search**: Finding relevant documents based on query similarity