In [31]:
# get the minsearch.py file from the specified GitHub repository
#!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/refs/heads/main/minsearch/minsearch.py

# Retrieval-Augmented Generation (RAG) Systems

This notebook demonstrates the implementation of Retrieval-Augmented Generation (RAG) systems using two different search backends:

1. **MinSearch**: A lightweight, in-memory search implementation
2. **Elasticsearch**: A powerful, scalable search engine

We'll build a complete RAG pipeline that:
- Indexes course FAQ documents
- Retrieves relevant documents for user questions
- Augments prompts with retrieved context
- Generates accurate answers using OpenAI models

## Notebook Structure

1. **Setup & Data Loading**: Import libraries and load FAQ documents
2. **MinSearch Implementation**: Create and use a simple vector search engine
3. **OpenAI Integration**: Connect to OpenAI for LLM-based responses
4. **Basic RAG Pipeline**: Combine search and language model components
5. **Elasticsearch Integration**: Add a more powerful search backend
6. **Comparison & Evaluation**: Compare different approaches

In [32]:
#!uv pip install minsearch

# Using the minsearch index to search for a specific question

- this is a costum class created that uses TfidfVectorizer and cosine_simillarity

# Part 1: MinSearch Implementation

MinSearch is a lightweight search engine that uses:
- **TF-IDF Vectorization**: Converts text into numerical vectors based on term frequency and inverse document frequency
- **Cosine Similarity**: Measures similarity between query and documents
- **Field Boosting**: Allows prioritizing certain fields (e.g., question vs text)

This implementation is ideal for small to medium document collections that fit in memory.

In [33]:
# Import libraries for MinSearch implementation
import minsearch  # Simple vector search engine
import json       # For loading document data

# Data Loading and Preprocessing

We'll load FAQ documents from a JSON file and prepare them for indexing. Each document contains:
- **question**: The user question
- **text**: The answer text
- **section**: The section/category the question belongs to
- **course**: The course the question is related to

In [34]:
with open('documents.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

In [35]:
documents = []

for course_dict in docs_raw:
    for doc in course_dict['documents']:
        doc['course'] = course_dict['course']
        documents.append(doc)

In [36]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

In [37]:
index = minsearch.Index(
    text_fields=["question", "text", "section"], # fields to search in
    keyword_fields=["course"]
)

In [38]:
q = 'the course has already started, can I still enroll?'

In [39]:
index.fit(documents) # build the index

<minsearch.Index at 0x7f60a4482350>

In [40]:
# Import libraries for OpenAI integration
import os
from openai import OpenAI
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables from .env file (contains API keys)
env_path = Path('../..') / '.env'

print(f"Loading environment variables from: {env_path}")

if env_path.exists():
    load_dotenv(dotenv_path=env_path)
else:
    print("⚠️ Warning: .env file not found, make sure to set OPENAI_API_KEY manually")

# Access the API key
api_key = os.getenv('OPENAI_API_KEY')

Loading environment variables from: ../../.env


# Part 2: OpenAI LLM Integration

We'll integrate OpenAI's models to generate responses based on retrieved documents. This involves:
1. Setting up the OpenAI client with API keys
2. Defining a function to create prompts with retrieved context
3. Building a function to generate answers using the LLM

In [41]:
client = OpenAI(api_key=api_key)

In [42]:
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[{"role": "user", "content": q}]
)

response.choices[0].message.content

"Whether you can still enroll in a course that has already started depends on the institution or program's policies. Many schools and universities have specific deadlines for enrollment, while some may allow late registration under certain circumstances. I recommend checking the course's official website or contacting the admissions office or course instructor directly for the most accurate information regarding late enrollment options."

In [43]:
# Enhanced search function for MinSearch implementation
def search(query):
    """
    Retrieve relevant documents using MinSearch vector search.
    
    This function:
    1. Performs a semantic search based on the query
    2. Applies field boosting to prioritize question matches
    3. Filters results to the target course
    
    Args:
        query: User's natural language question
        
    Returns:
        list: Ranked list of relevant documents
    """
    # Configure boosting weights for different fields
    boost = {
        'question': 3.0,  # Questions are highly relevant (3x weight)
        'section': 0.5    # Sections are less relevant (0.5x weight)
    }

    # Perform the search with filtering and boosting
    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},  # Filter by course
        boost_dict=boost,                                     # Apply field boosting
        num_results=5                                         # Return top 5 matches
    )

    return results

In [44]:
# build the prompt for the LLM
# based on the search results
# and the original query
# the prompt will be used to answer the question

# Enhanced prompt builder function with detailed comments
def build_prompt(query, search_results):
    """
    Build an LLM prompt that includes retrieved context documents.
    
    This function:
    1. Uses a template to structure the prompt
    2. Formats each retrieved document into the context
    3. Inserts the original query and formatted context into template
    
    Args:
        query: User's original question
        search_results: List of retrieved documents
        
    Returns:
        string: Formatted prompt ready for the LLM
    """
    # Template that structures the prompt with placeholders
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    # Format each retrieved document into context section
    context = ""
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    # Fill in template with query and formatted context
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [45]:
# Enhanced LLM function for generating responses
def llm(prompt):
    """
    Generate a response using OpenAI's LLM based on the provided prompt.
    
    Args:
        prompt: The full prompt including query and context
        
    Returns:
        string: Generated answer from the LLM
    """
    response = client.chat.completions.create(
        model='gpt-4o',  # Using OpenAI's most capable model
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2  # Lower temperature for more factual responses
    )
    
    return response.choices[0].message.content

In [46]:
# Define a test query
query = 'how do I run kafka?'

# Complete RAG pipeline implementation
def rag(query):
    """
    Implements the full Retrieval-Augmented Generation pipeline.
    
    The pipeline consists of three main steps:
    1. RETRIEVAL: Find relevant documents using vector search
    2. AUGMENTATION: Build a prompt that includes retrieved context
    3. GENERATION: Generate an answer using the LLM with context
    
    Args:
        query: User's natural language question
        
    Returns:
        string: Generated answer based on retrieved context
    """
    # Step 1: RETRIEVAL - Get relevant documents using MinSearch
    search_results = search(query)
    
    # Step 2: AUGMENTATION - Build prompt with retrieved context
    prompt = build_prompt(query, search_results)
    
    # Step 3: GENERATION - Generate answer using LLM
    answer = llm(prompt)
    
    return answer

# Part 3: Building the Basic RAG Pipeline

Now we'll build the complete RAG pipeline that combines:

1. **Retrieval**: Using MinSearch to find relevant documents
2. **Augmentation**: Creating a prompt with retrieved context
3. **Generation**: Using OpenAI to generate answers

This pattern ensures that the LLM's responses are grounded in our specific documents rather than its general knowledge.

In [47]:
# Test the RAG pipeline with our first query about Kafka
print("Testing with query: 'how do I run kafka?'")
print("-" * 80)
response = rag(query)
print(response)

Testing with query: 'how do I run kafka?'
--------------------------------------------------------------------------------
To run Kafka in the terminal, you can execute the following command in the project directory:

```bash
java -cp build/libs/<jar_name>-1.0-SNAPSHOT.jar:out src/main/java/org/example/JsonProducer.java
```

Replace `<jar_name>` with the actual name of your JAR file.
To run Kafka in the terminal, you can execute the following command in the project directory:

```bash
java -cp build/libs/<jar_name>-1.0-SNAPSHOT.jar:out src/main/java/org/example/JsonProducer.java
```

Replace `<jar_name>` with the actual name of your JAR file.


# Testing the Basic RAG Implementation

Let's test our RAG pipeline with different queries to see how well it retrieves relevant information and generates answers. We'll try:

1. A technical question about Kafka
2. A course enrollment question

In [48]:
rag('the course has already started, can I still enroll?')
# Test with another query
print("Testing with query: 'the course has already started, can I still enroll?'")
print("-" * 80)
response = rag('the course has already started, can I still enroll?')
print(response)

Testing with query: 'the course has already started, can I still enroll?'
--------------------------------------------------------------------------------
Yes, you can still enroll in the course even after it has started. You are eligible to submit the homework assignments. However, be mindful of the deadlines for submitting the final projects and try not to leave everything until the last minute.
Yes, you can still enroll in the course even after it has started. You are eligible to submit the homework assignments. However, be mindful of the deadlines for submitting the final projects and try not to leave everything until the last minute.


In [49]:
# Diagnostic function to examine search results
def inspect_search_results(query):
    """
    Show the actual search results for a query to diagnose relevance
    
    Args:
        query: The user query to search for
        
    Returns:
        None (prints results)
    """
    results = search(query)
    print(f"Top search results for: '{query}'")
    print("-" * 80)
    
    for i, doc in enumerate(results):
        print(f"Result {i+1}:")
        print(f"- Section: {doc['section']}")
        print(f"- Question: {doc['question']}")
        print(f"- Answer: {doc['text'][:150]}..." if len(doc['text']) > 150 else doc['text'])
        print()

# Inspect search results for our test query
inspect_search_results('the course has already started, can I still enroll?')

Top search results for: 'the course has already started, can I still enroll?'
--------------------------------------------------------------------------------
Result 1:
- Section: General course-related questions
- Question: Course - Can I still join the course after the start date?
- Answer: Yes, even if you don't register, you're still eligible to submit the homeworks.
Be aware, however, that there will be deadlines for turning in the fin...

Result 2:
- Section: General course-related questions
- Question: Course - Can I follow the course after it finishes?
- Answer: Yes, we will keep all the materials after the course finishes, so you can follow the course at your own pace after it finishes.
You can also continue ...

Result 3:
- Section: General course-related questions
- Question: Course - When will the course start?
- Answer: The purpose of this document is to capture frequently asked technical questions
The exact day and hour of the course will be 15th Jan 2024 at 17h00. T...



In [50]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

# Using Elasticsearch for RAG

Elasticsearch is a powerful search engine that provides more advanced search capabilities than our simple MinSearch implementation. It offers:

- Full-text search with ranking
- Relevance scoring
- Filtering capabilities
- Scalability for large document collections

In this section, we'll set up Elasticsearch in Docker and connect to it from Python.

# Setup: Elasticsearch Server and Python Client

## Installation and Configuration

For Elasticsearch to work properly, we need to ensure version compatibility between the server and client.

```bash
# Install Elasticsearch Python client version 7.17.0
# We use this specific version because:
# 1. It's compatible with Elasticsearch 8.x servers
# 2. It doesn't attempt to use version 9 API headers that newer clients use
uv pip uninstall -y elasticsearch
uv pip install elasticsearch==7.17.0

# Run Elasticsearch 8.x in Docker
docker run -it \
    --rm \
    --name elasticsearch \
    -m 4GB \
    -p 9200:9200 \
    -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    docker.elastic.co/elasticsearch/elasticsearch:8.12.2
```

## Version Compatibility Issues Explained

When working with Elasticsearch, the client and server versions need to be compatible:

1. **API Versioning**: Newer ES Python clients (8.x) send requests with version 9 API compatibility headers
2. **Server Limitations**: ES servers only accept API requests with the same major version or one version lower
3. **Header Conflicts**: The error "Accept version must be either version 8 or 7, but found 9" occurs when these versions mismatch

**Our Solution**: Use the ES 7.17.0 client with ES 8.x server for compatibility, or override the default headers to use non-versioned API calls.

In [51]:
from elasticsearch import Elasticsearch
from elasticsearch.transport import Transport

# === VERSION COMPATIBILITY FIX ===
# Override default headers to prevent version compatibility issues
# - By default, newer ES clients send headers with "compatible-with=9"
# - ES 8.x servers reject these headers, causing BadRequestError (400)
# - We explicitly set headers to use standard JSON without version info
Transport._DEFAULT_REQUEST_HEADERS = {
    "Content-Type": "application/json",
    "Accept": "application/json"
}

# Create the Elasticsearch client with connection settings
es_client = Elasticsearch(
    'http://localhost:9200',  # Default Elasticsearch endpoint
    request_timeout=30,       # Increased timeout for slower operations
    verify_certs=False,       # Disable SSL verification for local development
    api_key=None,             # No authentication for our local instance
    basic_auth=None,          # No basic auth credentials
    ca_certs=None             # No CA certificates for SSL verification
)

# Note: For production environments, you would enable security features
# and proper certificate validation

In [52]:
# Test the Elasticsearch connection
# - This will verify our client configuration is correct
# - It returns information about the Elasticsearch server
try:
    info = es_client.info()
    print("✅ Successfully connected to Elasticsearch")
    print(f"Elasticsearch version: {info['version']['number']}")
    print(f"Cluster name: {info['cluster_name']}")
except Exception as e:
    print(f"❌ Connection failed: {e}")
    print("Check that Elasticsearch is running and client compatibility settings are correct")

✅ Successfully connected to Elasticsearch
Elasticsearch version: 8.12.2
Cluster name: docker-cluster


In [54]:
# Create Elasticsearch index with proper error handling
try:
    # Define the index schema (mappings and settings)
    index_settings = {
        "settings": {
            "number_of_shards": 1,       # Use 1 shard for simplicity (dev environment)
            "number_of_replicas": 0      # No replicas needed for local development
        },
        "mappings": {
            "properties": {
                # Field mappings determine how Elasticsearch indexes each field:
                "text": {"type": "text"},        # Full-text search for answer content
                "section": {"type": "text"},     # Full-text search for section names
                "question": {"type": "text"},    # Full-text search for questions
                "course": {"type": "keyword"}    # Exact match filtering for course names
            }
        }
    }

    # Index name for our course questions
    index_name = "course-questions"
    
    # Check if the index already exists to avoid duplicate creation
    if not es_client.indices.exists(index=index_name):
        es_client.indices.create(index=index_name, body=index_settings)
        print(f"✅ Created new Elasticsearch index: {index_name}")
    else:
        print(f"ℹ️ Index {index_name} already exists")
except Exception as e:
    print(f"❌ Error creating Elasticsearch index: {e}")
    print("Will continue with MinSearch instead")

ℹ️ Index course-questions already exists


In [55]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

In [None]:
from tqdm.auto import tqdm

# Index documents in Elasticsearch with progress bar
try:
    print("🔍 Indexing documents into Elasticsearch...")
    
    # Use tqdm for a nice progress bar to track indexing
    for doc in tqdm(documents):
        # The index() method:
        # - Adds each document to the specified index
        # - Automatically generates an ID if not provided
        # - Sets document field values according to our mapping
        es_client.index(index=index_name, document=doc)
    
    print(f"✅ Successfully indexed {len(documents)} documents")
    
except Exception as e:
    print(f"❌ Error indexing documents: {e}")
    print("Falling back to MinSearch functionality")

🔍 Indexing documents into Elasticsearch...


  0%|          | 0/948 [00:00<?, ?it/s]

✅ Successfully indexed 948 documents


In [58]:
query = 'I just disovered the course. Can I still join it?'

In [59]:
# Enhanced search function using Elasticsearch
def elastic_search(query):
    try:
        # Construct an Elasticsearch query with:
        # - Multi-match for searching across multiple fields with different weights
        # - Boolean filtering to restrict results to specific courses
        search_query = {
            "size": 5,  # Limit to 5 results
            "query": {
                "bool": {
                    "must": {
                        # Multi-match searches across multiple fields
                        "multi_match": {
                            "query": query,
                            # Search in three fields with boosted relevance for questions
                            "fields": ["question^3", "text", "section"],
                            "type": "best_fields"  # Prioritize fields with highest match
                        }
                    },
                    # Filter to only include specific course documents
                    "filter": {
                        "term": {
                            "course": "data-engineering-zoomcamp"
                        }
                    }
                }
            }
        }

        # Execute search and get results
        response = es_client.search(index=index_name, body=search_query)
        
        # Extract just the source documents from search hits
        result_docs = []
        for hit in response['hits']['hits']:
            result_docs.append(hit['_source'])
        
        return result_docs
    except Exception as e:
        print(f"❌ Error searching with Elasticsearch: {e}")
        print("Falling back to MinSearch...")
        # Graceful fallback to our simple search if ES fails
        return search(query)

In [60]:
# Updated RAG function with Elasticsearch integration
def rag(query):
    """
    Enhanced Retrieval-Augmented Generation (RAG) pipeline using Elasticsearch.
    
    This version replaces the MinSearch retrieval component with Elasticsearch for:
    - Better relevance ranking
    - Support for advanced query features
    - Scalability to larger document collections
    - Filtering capabilities
    
    The pipeline flow remains the same:
    1. RETRIEVE: Find relevant documents using Elasticsearch
    2. AUGMENT: Create prompt with retrieved context
    3. GENERATE: Get LLM answer based on context
    
    Args:
        query: User's natural language question
        
    Returns:
        string: Generated answer based on retrieved context
    """
    # Step 1: RETRIEVE - Get relevant documents using Elasticsearch
    # The elastic_search function has a fallback to MinSearch if ES fails
    search_results = elastic_search(query)
    
    # Step 2: AUGMENT - Build prompt with retrieved context
    # Same prompt building logic works with either search backend
    prompt = build_prompt(query, search_results)
    
    # Step 3: GENERATE - Get LLM answer based on context
    answer = llm(prompt)
    
    return answer

# MinSearch vs. Elasticsearch: Key Differences

Now that we've implemented both MinSearch and Elasticsearch for our RAG system, let's understand their key differences:

| Feature | MinSearch | Elasticsearch |
|---------|-----------|---------------|
| **Complexity** | Simple, in-memory | Full-featured search engine |
| **Scalability** | Limited to RAM | Horizontally scalable |
| **Search Features** | Basic TF-IDF + cosine similarity | Advanced queries, filters, analyzers |
| **Setup** | Simple Python import | Requires server infrastructure |
| **Speed** | Fast for small datasets | Optimized for large-scale search |
| **Deployment** | Single process | Distributed architecture |

For our small FAQ dataset, both approaches work well. As your document collection grows larger or more complex queries are needed, Elasticsearch becomes increasingly valuable.

In [61]:
rag(query)

  response = es_client.search(index=index_name, body=search_query)


"Yes, you can still join the course even after the start date. You are eligible to submit the homeworks even if you haven't registered. However, keep in mind that there will be deadlines for turning in the final projects, so it's important not to leave everything until the last minute."

In [62]:
# Inspect Elasticsearch search results
def inspect_elastic_search_results(query):
    """
    Show the actual Elasticsearch search results for a query to diagnose relevance
    
    Args:
        query: The user query to search for
        
    Returns:
        None (prints results)
    """
    results = elastic_search(query)
    print(f"Top Elasticsearch results for: '{query}'")
    print("-" * 80)
    
    for i, doc in enumerate(results):
        print(f"Result {i+1}:")
        print(f"- Section: {doc['section']}")
        print(f"- Question: {doc['question']}")
        print(f"- Answer: {doc['text'][:150]}..." if len(doc['text']) > 150 else doc['text'])
        print()

# Compare search results between MinSearch and Elasticsearch
print("COMPARING SEARCH BACKENDS FOR: 'I just discovered the course. Can I still join it?'")
print("\n=== MINSEARCH RESULTS ===")
inspect_search_results(query)
print("\n=== ELASTICSEARCH RESULTS ===")
inspect_elastic_search_results(query)

COMPARING SEARCH BACKENDS FOR: 'I just discovered the course. Can I still join it?'

=== MINSEARCH RESULTS ===
Top search results for: 'I just disovered the course. Can I still join it?'
--------------------------------------------------------------------------------
Result 1:
- Section: General course-related questions
- Question: Course - Can I still join the course after the start date?
- Answer: Yes, even if you don't register, you're still eligible to submit the homeworks.
Be aware, however, that there will be deadlines for turning in the fin...

Result 2:
- Section: General course-related questions
- Question: Course - Can I follow the course after it finishes?
- Answer: Yes, we will keep all the materials after the course finishes, so you can follow the course at your own pace after it finishes.
You can also continue ...

Result 3:
- Section: General course-related questions
- Question: Course - When will the course start?
- Answer: The purpose of this document is to capture fre

  response = es_client.search(index=index_name, body=search_query)


# Conclusion: RAG System with Elasticsearch

We've successfully built a complete Retrieval-Augmented Generation (RAG) system that:

1. **Retrieves** relevant documents from an Elasticsearch index based on user queries
2. **Augments** LLM prompts with the retrieved context information
3. **Generates** accurate answers using the provided context

This architecture offers several advantages:
- 📚 **Knowledge Grounding**: Answers are based on specific documents rather than general knowledge
- 🔍 **Source Attribution**: We know exactly which documents inform each answer
- 🧠 **Domain Specificity**: The system knows about our specific course content
- ⚡ **Efficient Resource Use**: Smaller context windows needed compared to embedding all documents

The Elasticsearch integration enables more sophisticated retrieval than our basic MinSearch implementation, providing better search relevance, filtering capabilities, and potential for scaling to much larger document collections.

In [63]:
# Final test of the complete Elasticsearch-powered RAG system
def test_rag_system(query):
    """Run a comprehensive test of the RAG system with detailed output"""
    print(f"TESTING RAG SYSTEM WITH QUERY: '{query}'")
    print("=" * 80)
    
    # Step 1: Show search results
    print("STEP 1: SEARCH RESULTS")
    results = elastic_search(query)
    for i, doc in enumerate(results[:3]): # Show top 3 results
        print(f"Result {i+1}: {doc['question']}")
    print()
    
    # Step 2: Show prompt construction
    print("STEP 2: PROMPT CONSTRUCTION")
    prompt = build_prompt(query, results)
    print(f"Prompt length: {len(prompt)} characters")
    print("Prompt preview:")
    print(prompt[:300] + "...")
    print()
    
    # Step 3: Show generated answer
    print("STEP 3: GENERATED ANSWER")
    answer = llm(prompt)
    print(answer)

# Test with a new question
test_rag_system("What prerequisites do I need for the course?")

TESTING RAG SYSTEM WITH QUERY: 'What prerequisites do I need for the course?'
STEP 1: SEARCH RESULTS
Result 1: Course - What are the prerequisites for this course?
Result 2: Course - What are the prerequisites for this course?
Result 3: Course - What can I do before the course starts?

STEP 2: PROMPT CONSTRUCTION
Prompt length: 2391 characters
Prompt preview:
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: What prerequisites do I need for the course?

CONTEXT: 
section: General course-related questions
question: Course - What ar...

STEP 3: GENERATED ANSWER


  response = es_client.search(index=index_name, body=search_query)


The prerequisites for the course can be found on the GitHub page of DataTalksClub under the data-engineering-zoomcamp section, specifically at the #prerequisites link.
