## STEP 0 - install dependencies

In [1]:
# source is https://www.youtube.com/watch?v=GH3lrOsU3AU
# and notebook is https://github.com/DataTalksClub/llm-zoomcamp/blob/main/0a-agents/notebook.ipynb
# Follow along this tutorial: https://github.com/alexeygrigorev/rag-agents-workshop

# STEP 0 - installing packages we need here and in the VS code terminal:

# Run in VS code terminal:
# pip install --upgrade pip # no reminders after!
# pip install tqdm notebook==7.1.2 openai elasticsearch==8.13.0 pandas scikit-learn ipywidgets
# jupyter notebook # to run jupyter engine locally

In [2]:
%pip install minsearch -q

Note: you may need to restart the kernel to use updated packages.


## STEP 1 - download evaluation data from Github and build in-memory search index with minsearch

In [3]:
# download our FAQ dataset
import requests 

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

documents[-1] # to see a format of downloaded FAQ dataset

{'text': 'Problem description\nInfrastructure created in AWS with CD-Deploy Action needs to be destroyed\nSolution description\nFrom local:\nterraform init -backend-config="key=mlops-zoomcamp-prod.tfstate" --reconfigure\nterraform destroy --var-file vars/prod.tfvars\nAdded by Erick Calderin',
 'section': 'Module 6: Best practices',
 'question': 'How to destroy infrastructure created via GitHub Actions',
 'course': 'mlops-zoomcamp'}

In [4]:
# build index - takes a few seconds with minsearch

from minsearch import AppendableIndex

index = AppendableIndex(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.append.AppendableIndex at 0x7deac329d370>

In [5]:
# create a search function with weights: boost = {'question': 3.0, 'section': 0.5}

def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5,
        output_ids=True
    )

    return results

# number of results 5 and course is hard coded here  - filter_dict={'course': 'data-engineering-zoomcamp'}

In [6]:
question = 'Can I still join the course?'

In [7]:
# test if our search function works 

search(question)

[{'text': "Yes, even if you don't register, you're still eligible to submit the homeworks.\nBe aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.",
  'section': 'General course-related questions',
  'question': 'Course - Can I still join the course after the start date?',
  'course': 'data-engineering-zoomcamp',
  '_id': 2},
 {'text': "No, you can only get a certificate if you finish the course with a “live” cohort. We don't award certificates for the self-paced mode. The reason is you need to peer-review capstone(s) after submitting a project. You can only peer-review projects at the time the course is running.",
  'section': 'General course-related questions',
  'question': 'Certificate - Can I follow the course in a self-paced mode and get a certificate?',
  'course': 'data-engineering-zoomcamp',
  '_id': 11},
 {'text': 'Yes, we will keep all the materials after the course finishes, so you can follow the cou

## STEP 2 - build prompt

In [8]:
# build a prompt

prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

<QUESTION>
{question}
</QUESTION>

<CONTEXT>
{context}
</CONTEXT>
""".strip()

def build_prompt(query, search_results):
    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt



In [9]:
# test how our prompt build works by calling search(question)

prompt = build_prompt(question, search(question))
print(prompt) # better formatted for readability

You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

<QUESTION>
Can I still join the course?
</QUESTION>

<CONTEXT>
section: General course-related questions
question: Course - Can I still join the course after the start date?
answer: Yes, even if you don't register, you're still eligible to submit the homeworks.
Be aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.

section: General course-related questions
question: Certificate - Can I follow the course in a self-paced mode and get a certificate?
answer: No, you can only get a certificate if you finish the course with a “live” cohort. We don't award certificates for the self-paced mode. The reason is you need to peer-review capstone(s) after submitting a project. You can only peer-review projects at the time the course is running.

section: General

## STEP 3 - Connect to OpenAI

In [10]:
search_results = search(question)
# just repeating step 1 - keyword search 

In [11]:
prompt = build_prompt(question, search_results)
# repeating step 2 - building prompt on top of search results 

In [12]:
# connecting to LLM, entering our API KEY
import os
from getpass import getpass
from openai import OpenAI

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key


client = OpenAI()



🔑 Enter your OpenAI API key:  ········


In [13]:
# function to send our prompt to llm 

def llm(prompt):
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content



In [14]:
# test if our llm can answer something meaningful based on prompt provided

answer = llm(prompt)
print(answer)

Yes, you can still join the course even after the start date. You are eligible to submit homeworks, but make sure to keep track of deadlines for turning in the final projects.


In [15]:
print(prompt) # this is what we have sent to llm

You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

<QUESTION>
Can I still join the course?
</QUESTION>

<CONTEXT>
section: General course-related questions
question: Course - Can I still join the course after the start date?
answer: Yes, even if you don't register, you're still eligible to submit the homeworks.
Be aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.

section: General course-related questions
question: Certificate - Can I follow the course in a self-paced mode and get a certificate?
answer: No, you can only get a certificate if you finish the course with a “live” cohort. We don't award certificates for the self-paced mode. The reason is you need to peer-review capstone(s) after submitting a project. You can only peer-review projects at the time the course is running.

section: General

## STEP 4 - assemble our RAG pipeline

In [16]:
# this is our RAG pipeline

def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer



In [17]:
# test by irrelevant question - something which is NOT in FAQ

rag("How do I patch KDE under FreeBSD?")

"I'm sorry, but there is no information provided in the context regarding how to patch KDE under FreeBSD. Please refer to the official documentation or community forums related to FreeBSD and KDE for assistance."

In [18]:
rag("What Shakespeare said about peeling a carrot?") 

'The provided context does not contain any information regarding what Shakespeare said about peeling a carrot.'

In [19]:
# relevant question - it actually was in FAQ
answer = rag("How to run Kafka in Docker")
print(answer)

To run Kafka in Docker, follow these steps:

1. Navigate to the folder containing your Docker Compose YAML file.
2. Use the command `docker compose up -d` to start all the instances, including the Kafka broker.
3. You can check if your Kafka broker Docker container is running by using the command `docker ps`.

Make sure all Docker images are up and running before you attempt to use Kafka.


In [20]:
# LLM by itself know the answer, but our RAG is prohibiting it
# above rag("How do I patch KDE under FreeBSD?") returned nothing, but - 
print(llm("How do I patch KDE under FreeBSD?"))

Patching KDE under FreeBSD involves several steps, primarily focusing on applying updates or fixes to the KDE software components you have installed. Below are the general steps to patch KDE on a FreeBSD system:

### 1. **Updating FreeBSD Ports Tree**

FreeBSD uses a ports collection to manage software installations, which includes KDE. First, make sure your ports collection is up to date:

```bash
portsnap fetch update
```

If you haven't used `portsnap` before, you may need to initialize it first:

```bash
portsnap fetch extract
```

### 2. **Identifying Installed KDE Components**

Determine which KDE components are installed. You can list installed ports with:

```bash
pkg info | grep kde
```

Or, if you prefer to see the specific packages installed via the ports tree:

```bash
ls /usr/ports/x11/kde5/
```

### 3. **Applying Patches**

If you are aware of specific patches that need to be applied to the KDE components, you can do this with the following steps:

1. **Navigate to the po

In [21]:
# Actually this was the main idea for STEP 5 below - if llm can answer question by itself, nice!
# if not - it should be able to search our FAQ database and build the context for answer...

## STEP 5 - "Agentic" RAG

In [22]:
# essentially we only modify our prompt - so llm can decide either to give an answer immediately
# or use a SEARCH tool to get more CONTEXT from our FAQ

prompt_template = """
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.
At the beginning the context is EMPTY.

<QUESTION>
{question}
</QUESTION>

<CONTEXT> 
{context}
</CONTEXT>

If CONTEXT is EMPTY, you can use our FAQ database.
In this case, use the following output template:

{{
"action": "SEARCH",
"reasoning": "<add your reasoning here>"
}}

If you can answer the QUESTION using CONTEXT, use this template:

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "CONTEXT"
}}

If the context doesn't contain the answer, use your own knowledge to answer the question

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}}
""".strip()

In [23]:
question = 'Can I still join the course?'
context = 'EMPTY'

In [24]:
prompt = prompt_template.format(question=question, context=context)
print(prompt)

You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.
At the beginning the context is EMPTY.

<QUESTION>
Can I still join the course?
</QUESTION>

<CONTEXT> 
EMPTY
</CONTEXT>

If CONTEXT is EMPTY, you can use our FAQ database.
In this case, use the following output template:

{
"action": "SEARCH",
"reasoning": "<add your reasoning here>"
}

If you can answer the QUESTION using CONTEXT, use this template:

{
"action": "ANSWER",
"answer": "<your answer>",
"source": "CONTEXT"
}

If the context doesn't contain the answer, use your own knowledge to answer the question

{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}


In [25]:
answer_json = llm(prompt)
answer_json 
# model decided to use "SEARCH" function because CONTEXT is empty
# and provided a reason for it  - "reasoning": "I am unsure about the specific enrollment dates...

'{\n"action": "SEARCH",\n"reasoning": "The question about joining the course does not provide specific information regarding enrollment deadlines or late registration policies, so I need to refer to the FAQ database for accurate information."\n}'

In [26]:
import json
# we can parse the llm answer and use tool if appropriate:

answer = json.loads(answer_json)
answer['action']

'SEARCH'

In [27]:
# lets ask LLM something it knows already well:
question = 'Can I run Docker on Windows 10?'
context = 'EMPTY'
prompt = prompt_template.format(question=question, context=context)

In [28]:
answer_json = llm(prompt)
answer_json 
# it provides answer immediately - zero shot - action": "ANSWER":
# and says it knows it already - "source": "OWN_KNOWLEDGE"

'{\n"action": "ANSWER",\n"answer": "Yes, you can run Docker on Windows 10. Docker provides a version called Docker Desktop, which is compatible with Windows 10 Pro, Enterprise, and Education editions. For Windows 10 Home, Docker Desktop has introduced WSL 2 (Windows Subsystem for Linux) support, allowing you to run Docker containers as well. You\'ll need to enable WSL 2 and ensure that your system meets the necessary requirements for installation.",\n"source": "OWN_KNOWLEDGE"\n}'

In [29]:
answer = json.loads(answer_json)
answer['action']

'ANSWER'

In [30]:
# if model says SEARCH then we need to use a search tool and build context + update prompt

def build_context(search_results):
    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    return context.strip()



In [31]:
# asking something it should search for:
question = 'Can I still join the course?'

search_results = search(question)
context = build_context(search_results)
prompt = prompt_template.format(question=question, context=context)
print(prompt) # for better formatting

You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.
At the beginning the context is EMPTY.

<QUESTION>
Can I still join the course?
</QUESTION>

<CONTEXT> 
section: General course-related questions
question: Course - Can I still join the course after the start date?
answer: Yes, even if you don't register, you're still eligible to submit the homeworks.
Be aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.

section: General course-related questions
question: Certificate - Can I follow the course in a self-paced mode and get a certificate?
answer: No, you can only get a certificate if you finish the course with a “live” cohort. We don't award certificates for the self-paced mode. The reason is you need to peer-review capstone(s) after submitting a project. You can only peer-review projects at the time the cour

In [32]:
# lets ask llm after we performed the search as advised and updated llm context

answer_json = llm(prompt)
print(answer_json)

{
"action": "ANSWER",
"answer": "Yes, you can still join the course after the start date even if you haven't registered. You are eligible to submit assignments, but remember to pay attention to the deadlines for the final projects.",
"source": "CONTEXT"
}


In [33]:
# we need to put everything together and parse our response json automatically
# algo - if "action": "SEARCH" then llm uses our FAQ search tool
# if "action": "ANSWER" - then it answers immediately ...

def agentic_rag_v1(question):
    context = "EMPTY"
    prompt = prompt_template.format(question=question, context=context)
    answer_json = llm(prompt)
    answer = json.loads(answer_json)
    print(answer)

    if answer['action'] == 'SEARCH':
        print('need to perform search...')
        search_results = search(question)
        context = build_context(search_results)
        
        prompt = prompt_template.format(question=question, context=context)
        answer_json = llm(prompt)
        answer = json.loads(answer_json)
        print(answer)

    return answer



In [36]:
# test our agentic RAG function

agentic_rag_v1('how do I join the course?')
# 'action': 'SEARCH', 'reasoning': 'The context is empty...

# 'action': 'ANSWER'
# 'source': 'OWN_KNOWLEDGE'

{'action': 'SEARCH', 'reasoning': 'The context is empty, and I need to retrieve information about how to join the course from the FAQ database.'}
need to perform search...
{'action': 'ANSWER', 'answer': "To join the course, you need to register using the provided link before the course starts. Even if the course has already started, you can still participate by submitting homework, but be sure to adhere to deadlines for the final project. It's also recommended to join the course public Google Calendar and the Telegram channel for announcements.", 'source': 'CONTEXT'}


{'action': 'ANSWER',
 'answer': "To join the course, you need to register using the provided link before the course starts. Even if the course has already started, you can still participate by submitting homework, but be sure to adhere to deadlines for the final project. It's also recommended to join the course public Google Calendar and the Telegram channel for announcements.",
 'source': 'CONTEXT'}

In [37]:
agentic_rag_v1('how patch KDE under FreeBSD?')
# 'reasoning': 'The student is asking for a specific method to patch KDE under FreeBSD, and the context is currently empty...

# 'action': 'ANSWER'
# 'source': 'OWN_KNOWLEDGE'

{'action': 'SEARCH', 'reasoning': 'The student is asking for a specific method to patch KDE under FreeBSD, and the context is currently empty, indicating that I should look for appropriate information in the FAQ database.'}
need to perform search...
{'action': 'ANSWER', 'answer': "To patch KDE under FreeBSD, you can follow these steps:\n\n1. **Ensure Ports Collection is Updated**: Make sure your ports collection is up to date. You can do this by running:\n   ```\n   portsnap fetch update\n   ```\n\n2. **Navigate to the KDE Port Directory**: Go to the directory of the KDE component you wish to patch. For example, if you want to patch `kde5`, navigate to:\n   ```\n   cd /usr/ports/x11/kde5\n   ```\n\n3. **Apply the Patch**: Create or modify a patch file in the `files` directory of that port. You can add your patch file (for example, `my_patch.diff`) in the `files` directory.\n   ```\n   cp /path/to/my_patch.diff files/\n   ```  \n   After adding the patch, run:\n   ```\n   make patch\n  

{'action': 'ANSWER',
 'answer': "To patch KDE under FreeBSD, you can follow these steps:\n\n1. **Ensure Ports Collection is Updated**: Make sure your ports collection is up to date. You can do this by running:\n   ```\n   portsnap fetch update\n   ```\n\n2. **Navigate to the KDE Port Directory**: Go to the directory of the KDE component you wish to patch. For example, if you want to patch `kde5`, navigate to:\n   ```\n   cd /usr/ports/x11/kde5\n   ```\n\n3. **Apply the Patch**: Create or modify a patch file in the `files` directory of that port. You can add your patch file (for example, `my_patch.diff`) in the `files` directory.\n   ```\n   cp /path/to/my_patch.diff files/\n   ```  \n   After adding the patch, run:\n   ```\n   make patch\n   ```\n   This command applies the patch.\n\n4. **Compile and Install**: After applying the patch, compile and install the port by running:\n   ```\n   make install clean\n   ```\n\n5. **Verify**: Ensure that your patch has been correctly applied and

## STEP 6 - Agentic Search

In [30]:
# deduplication function

def dedup(seq):
    seen = set()
    result = []
    for el in seq:
        _id = el['_id']
        if _id in seen:
            continue
        seen.add(_id)
        result.append(el)
    return result



In [31]:
prompt_template = """
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.

The CONTEXT is build with the documents from our FAQ database.
SEARCH_QUERIES contains the queries that were used to retrieve the documents
from FAQ to and add them to the context.
PREVIOUS_ACTIONS contains the actions you already performed.

At the beginning the CONTEXT is empty.

You can perform the following actions:

- Search in the FAQ database to get more data for the CONTEXT
- Answer the question using the CONTEXT
- Answer the question using your own knowledge

For the SEARCH action, build search requests based on the CONTEXT and the QUESTION.
Carefully analyze the CONTEXT and generate the requests to deeply explore the topic. 

Don't use search queries used at the previous iterations.

Don't repeat previously performed actions.

Don't perform more than {max_iterations} iterations for a given student question.
The current iteration number: {iteration_number}. If we exceed the allowed number 
of iterations, give the best possible answer with the provided information.

Output templates:

If you want to perform search, use this template:

{{
"action": "SEARCH",
"reasoning": "<add your reasoning here>",
"keywords": ["search query 1", "search query 2", ...]
}}

If you can answer the QUESTION using CONTEXT, use this template:

{{
"action": "ANSWER_CONTEXT",
"answer": "<your answer>",
"source": "CONTEXT"
}}

If the context doesn't contain the answer, use your own knowledge to answer the question

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}}

<QUESTION>
{question}
</QUESTION>

<SEARCH_QUERIES>
{search_queries}
</SEARCH_QUERIES>

<CONTEXT> 
{context}
</CONTEXT>

<PREVIOUS_ACTIONS>
{previous_actions}
</PREVIOUS_ACTIONS>
""".strip()

In [32]:
# repetitive task - search, update context and search again if needed + delete duplicate results

question = 'how do I do well on module 1'
max_iterations = 3
iteration_number = 0
search_queries = []
search_results  = []
previous_actions = []

In [33]:
context = build_context(search_results)

prompt = prompt_template.format(
    question=question,
    context=context,
    search_queries="\n".join(search_queries),
    previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
    max_iterations=max_iterations,
    iteration_number=iteration_number
)

In [34]:
answer_json = llm(prompt)
answer_json

'{\n"action": "SEARCH",\n"reasoning": "To provide the student with specific strategies and best practices for succeeding in Module 1, I will look for frequently asked questions or tips related to this particular module.",\n"keywords": ["how to succeed in Module 1", "tips for Module 1", "Module 1 study strategies"]\n}'

In [35]:
answer = json.loads(answer_json)
answer

{'action': 'SEARCH',
 'reasoning': 'To provide the student with specific strategies and best practices for succeeding in Module 1, I will look for frequently asked questions or tips related to this particular module.',
 'keywords': ['how to succeed in Module 1',
  'tips for Module 1',
  'Module 1 study strategies']}

In [36]:
previous_actions.append(answer)
keywords = answer['keywords']
keywords

['how to succeed in Module 1',
 'tips for Module 1',
 'Module 1 study strategies']

In [37]:
# now we have to search each keyword and add results to a context

for kw in keywords:
    search_queries.append(kw)
    sr = search(kw)
    search_results.extend(sr)

In [38]:
# lets see what we found

search_results = dedup(search_results)
search_results

[{'text': 'You need to look for the Py4J file and note the version of the filename. Once you know the version, you can update the export command accordingly, this is how you check yours:\n` ls ${SPARK_HOME}/python/lib/ ` and then you add it in the export command, mine was:\nexport PYTHONPATH=”${SPARK_HOME}/python/lib/Py4J-0.10.9.5-src.zip:${PYTHONPATH}”\nMake sure that the version under `${SPARK_HOME}/python/lib/` matches the filename of py4j or you will encounter `ModuleNotFoundError: No module named \'py4j\'` while executing `import pyspark`.\nFor instance, if the file under `${SPARK_HOME}/python/lib/` was `py4j-0.10.9.3-src.zip`.\nThen the export PYTHONPATH statement above should be changed to `export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH"` appropriately.\nAdditionally, you can check for the version of ‘py4j’ of the spark you’re using from here and update as mentioned above.\n~ Abhijit Chakraborty: Sometimes, even with adding the correct version of p

In [39]:
iteration_number = 2

context = build_context(search_results)

prompt = prompt_template.format(
    question=question,
    context=context,
    search_queries="\n".join(search_queries),
    previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
    max_iterations=max_iterations,
    iteration_number=iteration_number
)

In [43]:
answer_json = llm(prompt)
answer = json.loads(answer_json)
print(answer)

{'action': 'SEARCH', 'reasoning': 'The previous search did not yield any specific strategies or guidelines for succeeding in Module 1, so I will refine my search to focus more on study tips, recommended resources, and common pitfalls related specifically to Docker and Terraform, which are the key subjects of Module 1.', 'keywords': ['Module 1 Docker study tips', 'Module 1 Terraform tips', 'best practices Module 1']}


In [44]:
# now lets create a function which will perform a search in a loop

question = "what do I need to do to be successful at module 1?"

search_queries = []
search_results = []
previous_actions = []

iteration = 0

while True:
    print(f'ITERATION #{iteration}...')

    context = build_context(search_results)
    prompt = prompt_template.format(
        question=question,
        context=context,
        search_queries="\n".join(search_queries),
        previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
        max_iterations=3,
        iteration_number=iteration
    )

    print(prompt)

    answer_json = llm(prompt)
    answer = json.loads(answer_json)
    print(json.dumps(answer, indent=2))

    previous_actions.append(answer)

    action = answer['action']
    if action != 'SEARCH':
        break

    keywords = answer['keywords']
    search_queries = list(set(search_queries) | set(keywords))
    
    for k in keywords:
        res = search(k)
        search_results.extend(res)

    search_results = dedup(search_results)
    
    iteration = iteration + 1
    if iteration >= 4:
        break

    print()


ITERATION #0...
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.

The CONTEXT is build with the documents from our FAQ database.
SEARCH_QUERIES contains the queries that were used to retrieve the documents
from FAQ to and add them to the context.
PREVIOUS_ACTIONS contains the actions you already performed.

At the beginning the CONTEXT is empty.

You can perform the following actions:

- Search in the FAQ database to get more data for the CONTEXT
- Answer the question using the CONTEXT
- Answer the question using your own knowledge

For the SEARCH action, build search requests based on the CONTEXT and the QUESTION.
Carefully analyze the CONTEXT and generate the requests to deeply explore the topic. 

Don't use search queries used at the previous iterations.

Don't repeat previously performed actions.

Don't perform more than 3 iterations for a given student question.
The current 

In [48]:
print(answer['answer']) # for better look

To be successful in Module 1, which focuses on Docker and Terraform, here are some general tips:

1. **Familiarize Yourself with the Basics**: Ensure you have a good understanding of the basic concepts of Docker and Terraform, including how to create, manage, and deploy containers using Docker, as well as infrastructure as code with Terraform.

2. **Hands-On Practice**: Engage with the hands-on labs and exercises provided in the module. Practical application of concepts is crucial in understanding how these tools work in real-world scenarios.

3. **Utilize Community Resources**: Leverage forums, YouTube tutorials, and documentation. The Docker and Terraform communities are vast, and you can find plenty of resources that will enhance your understanding.

4. **Ask for Help**: Don't hesitate to reach out to peers or instructors if you have questions or encounter difficulties. Collaboration can enhance your learning experience.

5. **Stay Organized**: Keep your notes, configuration files, 

In [46]:
iteration

2

## STEP 7 - Function calling ("tool use")

In [49]:
# search tool

def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5,
        output_ids=True
    )

    return results



In [51]:
answer = rag("How to run Kafka in Docker")
print(answer)

KeyError: 'max_iterations'