Follow along this tutorial: https://github.com/alexeygrigorev/rag-agents-workshop

Live code session with alexey

In [6]:
!pip install minsearch

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [7]:
import requests 

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

In [8]:
from minsearch import AppendableIndex

index = AppendableIndex(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.append.AppendableIndex at 0x7050c9c917d0>

In [9]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5,
        output_ids=True
    )

    return results

In [10]:
question = 'Can I still join the course?'

In [11]:
prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

<QUESTION>
{question}
</QUESTION>

<CONTEXT>
{context}
</CONTEXT>
""".strip()

def build_prompt(query, search_results):
    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [12]:
search_results = search(question)

In [13]:
prompt = build_prompt(question, search_results)

In [16]:
from openai import OpenAI
client = OpenAI()

def llm(prompt):
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

In [12]:
answer = llm(prompt)

In [13]:
print(answer)

Yes, you can still join the course after the start date. Even if you don't register, you are still eligible to submit the homeworks. However, be mindful of the deadlines for turning in the final projects, as you should not leave everything until the last minute.


In [14]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [15]:
rag("How do I patch KDE under FreeBSD?")

"I'm sorry, but the FAQ database does not provide any context or information regarding how to patch KDE under FreeBSD. Please refer to official FreeBSD or KDE documentation for further assistance."

## "Agentic" RAG


In [17]:
prompt_template = """
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.
At the beginning the context is EMPTY.

<QUESTION>
{question}
</QUESTION>

<CONTEXT> 
{context}
</CONTEXT>

If CONTEXT is EMPTY, you can use our FAQ database.
In this case, use the following output template:

{{
"action": "SEARCH",
"reasoning": "<add your reasoning here>"
}}

If you can answer the QUESTION using CONTEXT, use this template:

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "CONTEXT"
}}

If the context doesn't contain the answer, use your own knowledge to answer the question

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}}
""".strip()

In [18]:
question = 'Can I still join the course?'
context = 'EMPTY'

In [19]:
prompt = prompt_template.format(question=question, context=context)

In [20]:
answer_json = llm(prompt)

In [21]:
import json

In [22]:
answer = json.loads(answer_json)

In [23]:
answer['action']

'SEARCH'

In [24]:
def build_context(search_results):
    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    return context.strip()

In [25]:
search_results = search(question)
context = build_context(search_results)
prompt = prompt_template.format(question=question, context=context)

In [26]:
answer_json = llm(prompt)

In [27]:
print(answer_json)

{
"action": "ANSWER",
"answer": "Yes, you can still join the course even after the start date. While you may not be officially registered, you can submit homework assignments. Just keep in mind the deadlines for the final projects, as they will be important to meet.", 
"source": "CONTEXT"
}


## Agentic Search

`dedub` remove duplicates from the retrieval search

In [65]:
def dedup(seq):
    seen = set()
    result = []
    for el in seq:
        _id = el['_id']
        if _id in seen:
            continue
        seen.add(_id)
        result.append(el)
    return result

Agentic RAG vs Agentic Search

* Agentic RAG takes a decision on how to do the retrieval (Context or own knoledge) | Agentic Search makes a decision on how to do the search
* Both do search and retrieval




In [75]:
prompt_template = """
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.

The CONTEXT is build with the documents from our FAQ database.
SEARCH_QUERIES contains the queries that were used to retrieve the documents
from FAQ to and add them to the context.
PREVIOUS_ACTIONS contains the actions you already performed.

At the beginning the CONTEXT is empty.

You can perform the following actions:

- Search in the FAQ database to get more data for the CONTEXT
- Answer the question using the CONTEXT
- Answer the question using your own knowledge

For the SEARCH action, build search requests based on the CONTEXT and the QUESTION.
Carefully analyze the CONTEXT and generate the requests to deeply explore the topic. 

Don't use search queries used at the previous iterations.

Don't repeat previously performed actions.

Don't perform more than {max_iterations} iterations for a given student question.
The current iteration number: {iteration_number}. If we exceed the allowed number 
of iterations, give the best possible answer with the provided information.

Output templates:

If you want to perform search, use this template:

{{
"action": "SEARCH",
"reasoning": "<add your reasoning here>",
"keywords": ["search query 1", "search query 2", ...]
}}

If you can answer the QUESTION using CONTEXT, use this template:

{{
"action": "ANSWER_CONTEXT",
"answer": "<your answer>",
"source": "CONTEXT"
}}

If the context doesn't contain the answer, use your own knowledge to answer the question

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}}

<QUESTION>
{question}
</QUESTION>

<SEARCH_QUERIES>
{search_queries}
</SEARCH_QUERIES>

<CONTEXT> 
{context}
</CONTEXT>

<PREVIOUS_ACTIONS>
{previous_actions}
</PREVIOUS_ACTIONS>
""".strip()

In [76]:
question = 'how do I do well on module 1'
max_iterations = 3
iteration_number = 0
search_queries = []
search_results_raw  = []
previous_actions = []

In [77]:
context = build_context(search_results)

prompt = prompt_template.format(
    question=question,
    context=context,
    search_queries="\n".join(search_queries),
    previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
    max_iterations=max_iterations,
    iteration_number=iteration_number
)

In [78]:
answer_json = llm(prompt)

In [79]:
answer = json.loads(answer_json)

In [102]:
previous_actions.append(answer)

In [103]:
previous_actions

[{'action': 'SEARCH',
  'reasoning': 'The student is asking about how to excel in Module 1, which focuses on Docker and Terraform. I need to find specific tips or strategies related to this module to provide a comprehensive answer.',
  'keywords': ['Module 1 Docker Terraform tips',
   'how to succeed in Module 1 Docker Terraform']},
 {'action': 'SEARCH',
  'reasoning': 'The student is asking about how to excel in Module 1, which focuses on Docker and Terraform. I need to find specific tips or strategies related to this module to provide a comprehensive answer.',
  'keywords': ['Module 1 Docker Terraform tips',
   'how to succeed in Module 1 Docker Terraform']},
 {'action': 'SEARCH',
  'reasoning': 'The student is asking about how to excel in Module 1, which focuses on Docker and Terraform. I need to find specific tips or strategies related to this module to provide a comprehensive answer.',
  'keywords': ['Module 1 Docker Terraform tips',
   'how to succeed in Module 1 Docker Terraform

In [100]:
keywords = answer['keywords']

In [101]:
for kw in keywords:
    search_queries.append(kw)
    sr = search(kw)
    search_results_raw.extend(sr)

## Observation

Due to the nature of creating multiple keyword search for the same question, the search could return repeated results, to avoid redudancy in the api call, we must clean duplications, the cell below shows us the results withtout removing duplicates `search_results_raw` -> `search_results` filters out duplicates

In [94]:
len(search_results_raw)

20

In [104]:
search_results = dedup(search_results_raw)

In [105]:
len(search_results)

5

#### iteration number 1

In [106]:
iteration_number = 3

context = build_context(search_results)

prompt = prompt_template.format(
    question=question,
    context=context,
    search_queries="\n".join(search_queries),
    previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
    max_iterations=max_iterations,
    iteration_number=iteration_number
)

In [107]:
answer_json = llm(prompt)

In [111]:
print(json.loads(answer_json)['answer'])

To do well in Module 1, which focuses on Docker and Terraform, consider the following tips: 
1. **Understand the Basics**: Ensure you grasp the core concepts of Docker and Terraform. Familiarize yourself with containerization and infrastructure as code (IaC).
2. **Hands-On Practice**: Work on practical exercises and projects. Create your own Docker containers and practice Terraform commands.
3. **Follow Documentation**: Use the official Docker and Terraform documentation for guidance. They provide valuable examples and explanations.
4. **Error Troubleshooting**: Pay attention to common issues and errors (like configuration issues or permission errors) that can arise, as seen in the FAQ responses about Terraform errors.
5. **Use Version Control**: Apply version control practices with Git to manage your Terraform configurations.
6. **Engage with Community**: Participate in discussions on forums or study groups, as this can enhance your understanding and provide different perspectives.
7.

In [112]:
question = "what do I need to do to be successful at module 1?"

search_queries = []
search_results = []
previous_actions = []

iteration = 0

while True:
    print(f'ITERATION #{iteration}...')

    context = build_context(search_results)
    prompt = prompt_template.format(
        question=question,
        context=context,
        search_queries="\n".join(search_queries),
        previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
        max_iterations=3,
        iteration_number=iteration
    )

    print(prompt)

    answer_json = llm(prompt)
    answer = json.loads(answer_json)
    print(json.dumps(answer, indent=2))

    previous_actions.append(answer)

    action = answer['action']
    if action != 'SEARCH':
        break

    keywords = answer['keywords']
    search_queries = list(set(search_queries) | set(keywords))
    
    for k in keywords:
        res = search(k)
        search_results.extend(res)

    search_results = dedup(search_results)
    
    iteration = iteration + 1
    if iteration >= 4:
        break

    print()

ITERATION #0...
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.

The CONTEXT is build with the documents from our FAQ database.
SEARCH_QUERIES contains the queries that were used to retrieve the documents
from FAQ to and add them to the context.
PREVIOUS_ACTIONS contains the actions you already performed.

At the beginning the CONTEXT is empty.

You can perform the following actions:

- Search in the FAQ database to get more data for the CONTEXT
- Answer the question using the CONTEXT
- Answer the question using your own knowledge

For the SEARCH action, build search requests based on the CONTEXT and the QUESTION.
Carefully analyze the CONTEXT and generate the requests to deeply explore the topic. 

Don't use search queries used at the previous iterations.

Don't repeat previously performed actions.

Don't perform more than 3 iterations for a given student question.
The current 

In [114]:
iteration

2

## NEXT MOTIVATION
So, the prompt in **agentic search** became quite cumbersome, I'm there's a posibility of implement a different agent, this will bring as consequence that everytime you create a agent it would be needed to describes these actions, the output of actions, parse the output. There will be too much overhead in the maintanace


The community came up with Frameworks --> Function Calling ("Tool use")

The idea behind function calling is to describe/declare the functions to the LLM and let it decide if it want to use it

## Function calling ("tool use")

In [17]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5,
        output_ids=True
    )

    return results

In [18]:
search_tool = {
    "type": "function",
    "name": "search",
    "description": "Search the FAQ database",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query text to look up in the course FAQ."
            }
        },
        "required": ["query"],
        "additionalProperties": False
    }
}

In [19]:
def do_call(tool_call_response):
    function_name = tool_call_response.name
    arguments = json.loads(tool_call_response.arguments)

    f = globals()[function_name]
    result = f(**arguments)

    return {
        "type": "function_call_output",
        "call_id": tool_call_response.call_id,
        "output": json.dumps(result, indent=2),
    }

In [20]:
question = "How do I do well in module 1?"

developer_prompt = """
You're a course teaching assistant. 
You're given a question from a course student and your task is to answer it.
""".strip()

tools = [search_tool]

chat_messages = [
    {"role": "developer", "content": developer_prompt},
    {"role": "user", "content": question}
]

response = client.responses.create(
    model='gpt-4o-mini',
    input=chat_messages,
    tools=tools
)
response.output

[ResponseFunctionToolCall(arguments='{"query":"module 1 tips"}', call_id='call_KwUZP1ogw7JBzjlbxQZuQnYD', name='search', type='function_call', id='fc_6872cdfdaed08193b21120b32ffae94c08bb4a76a8915e49', status='completed')]

In [21]:
calls = response.output

In [22]:
calls

[ResponseFunctionToolCall(arguments='{"query":"module 1 tips"}', call_id='call_KwUZP1ogw7JBzjlbxQZuQnYD', name='search', type='function_call', id='fc_6872cdfdaed08193b21120b32ffae94c08bb4a76a8915e49', status='completed')]

In [23]:
f_name = calls[0].name

In [268]:
arguments = json.loads(calls[0].arguments)

In [269]:
arguments

{'query': 'module 1 tips'}

In [270]:
f = globals()[f_name]

In [272]:
results = f(**arguments)

## note

We send a request to OPENAI, these are the tools we have and this is the questions of our user, the LLM makes a decision, now the user reacts to that decision and evokes the LLM again

In [275]:
chat_messages.append(calls[0])

chat_messages.append({
    "type": "function_call_output",
    "call_id": calls[0].call_id,
    "output": json.dumps(search_results)
})

### Note 

With this the OPENAI SDK will perform a prompt

In [252]:
chat_messages

[{'role': 'developer',
  'content': "You're a course teaching assistant. \nYou're given a question from a course student and your task is to answer it."},
 {'role': 'user', 'content': 'How do I do well in module 1?'},
 ResponseFunctionToolCall(arguments='{"query":"module 1 how to do well"}', call_id='call_mWpbMkkZS9GoAtb6y86DQaui', name='search', type='function_call', id='fc_6872c142ac80819583fef628944932ef03248a23459e127f', status='completed'),
 {'type': 'function_call_output',
  'call_id': 'call_mWpbMkkZS9GoAtb6y86DQaui',
  'output': '[{"text": "Even after installing pyspark correctly on linux machine (VM ) as per course instructions, faced a module not found error in jupyter notebook .\\nThe solution which worked for me(use following in jupyter notebook) :\\n!pip install findspark\\nimport findspark\\nfindspark.init()\\nThereafter , import pyspark and create spark contex<<t as usual\\nNone of the solutions above worked for me till I ran !pip3 install pyspark instead !pip install pys

In [253]:
response = client.responses.create(
    model='gpt-4o-mini',
    input=chat_messages,
    tools=tools
)
response.output

[ResponseOutputMessage(id='msg_6872c15ab93c8195a6da6c9a381787d803248a23459e127f', content=[ResponseOutputText(annotations=[], text="To do well in Module 1, consider the following tips:\n\n1. **Understand Key Concepts**: Make sure you grasp the fundamental concepts covered in Docker and Terraform. These are crucial for building and deploying applications.\n\n2. **Hands-on Practice**: Apply what you learn by working on practical exercises. Set up Docker containers and try deploying small applications.\n\n3. **Troubleshooting Skills**: Be prepared to troubleshoot common issues, such as module import errors. Familiarize yourself with commands like `pip install` and how to set the correct environment variables.\n\n4. **Documentation**: Regularly consult the documentation for Docker, Terraform, and any libraries you are using (like SQLAlchemy and psycopg2).\n\n5. **Ask Questions**: If you encounter problems, don’t hesitate to ask for help, whether in forums or during office hours.\n\n6. **Co

In [254]:
print(response.output[0].content[0].text)

To do well in Module 1, consider the following tips:

1. **Understand Key Concepts**: Make sure you grasp the fundamental concepts covered in Docker and Terraform. These are crucial for building and deploying applications.

2. **Hands-on Practice**: Apply what you learn by working on practical exercises. Set up Docker containers and try deploying small applications.

3. **Troubleshooting Skills**: Be prepared to troubleshoot common issues, such as module import errors. Familiarize yourself with commands like `pip install` and how to set the correct environment variables.

4. **Documentation**: Regularly consult the documentation for Docker, Terraform, and any libraries you are using (like SQLAlchemy and psycopg2).

5. **Ask Questions**: If you encounter problems, don’t hesitate to ask for help, whether in forums or during office hours.

6. **Collaborate with Peers**: Study with classmates or form study groups to discuss challenging topics.

Focusing on these areas should help enhance y

### MULTIPLE QUERIES

In [277]:
question = "How do I do well in module 1?"

developer_prompt = """
You're a course teaching assistant. 
You're given a question from a course student and your task is to answer it.
If you look up something in FAQ, convert the student question into multiple queries.
""".strip()

tools = [search_tool]

chat_messages = [
    {"role": "developer", "content": developer_prompt},
    {"role": "user", "content": question}
]

response = client.responses.create(
    model='gpt-4o-mini',
    input=chat_messages,
    tools=tools
)
calls = response.output

In [278]:
calls

[ResponseFunctionToolCall(arguments='{"query":"how to do well in module 1"}', call_id='call_3qKOoJl5kX8toUsjwyQ86sae', name='search', type='function_call', id='fc_6872c2a08244819585e382935f84ac1806141933ab59d154', status='completed'),
 ResponseFunctionToolCall(arguments='{"query":"module 1 tips for success"}', call_id='call_0ngPhd6G3y2uguZbFwbVg1ey', name='search', type='function_call', id='fc_6872c2a0c5d88195b19b72f069c3f9dc06141933ab59d154', status='completed'),
 ResponseFunctionToolCall(arguments='{"query":"strategies for succeeding in module 1"}', call_id='call_4aY8w21K1qumy7Hypk91JXPw', name='search', type='function_call', id='fc_6872c2a100d08195b580ce435c41161406141933ab59d154', status='completed')]

In [279]:
for call in calls:
    f_name = call.name
    print(f_name)
    arguments = json.loads(call.arguments)
    f = globals()[f_name]
    results = f(**arguments)

    print(call.call_id)
    chat_messages.append(call)
    chat_messages.append({
    "type": "function_call_output",
    "call_id": call.call_id,
    "output": json.dumps(results)
})

search
call_3qKOoJl5kX8toUsjwyQ86sae
search
call_0ngPhd6G3y2uguZbFwbVg1ey
search
call_4aY8w21K1qumy7Hypk91JXPw


In [281]:
response = client.responses.create(
    model='gpt-4o-mini',
    input=chat_messages,
    tools=tools
)
response.output

[ResponseOutputMessage(id='msg_6872c2aa80648195be915b9a3c4d117206141933ab59d154', content=[ResponseOutputText(annotations=[], text="To do well in Module 1, here are some strategies and tips you can follow:\n\n1. **Understand the Core Concepts**: Focus on the foundational topics in Docker and Terraform, as they are crucial for the practical exercises and projects.\n\n2. **Hands-on Practice**: Use Docker and Terraform in real-world scenarios. Set up small projects to reinforce what you learn.\n\n3. **Follow Instructions Carefully**: Ensure you follow all installation and configuration steps precisely, as errors can easily occur with misconfiguration, especially in environments like Docker.\n\n4. **Debugging Skills**: Familiarize yourself with common issues, such as `ModuleNotFoundError`, and how to troubleshoot them. For example:\n   - For Python module errors, ensure you have the required packages installed (e.g., `pip install psycopg2-binary` for PostgreSQL).\n  \n5. **Utilize Resource

In [282]:
print(response.output[0].content[0].text)

To do well in Module 1, here are some strategies and tips you can follow:

1. **Understand the Core Concepts**: Focus on the foundational topics in Docker and Terraform, as they are crucial for the practical exercises and projects.

2. **Hands-on Practice**: Use Docker and Terraform in real-world scenarios. Set up small projects to reinforce what you learn.

3. **Follow Instructions Carefully**: Ensure you follow all installation and configuration steps precisely, as errors can easily occur with misconfiguration, especially in environments like Docker.

4. **Debugging Skills**: Familiarize yourself with common issues, such as `ModuleNotFoundError`, and how to troubleshoot them. For example:
   - For Python module errors, ensure you have the required packages installed (e.g., `pip install psycopg2-binary` for PostgreSQL).
  
5. **Utilize Resources**: Don't hesitate to reach out for help or refer to resources provided in the course materials if you get stuck.

6. **Active Participation**

## REFACTORING

In [24]:
def do_call(tool_call_response):
    function_name = tool_call_response.name
    arguments = json.loads(tool_call_response.arguments)

    f = globals()[function_name]
    result = f(**arguments)

    return {
        "type": "function_call_output",
        "call_id": tool_call_response.call_id,
        "output": json.dumps(result, indent=2),
    }

In [26]:
import json

In [27]:
for call in calls:
    result = do_call(call)
    chat_messages.append(call)
    chat_messages.append(result)

In [28]:
for entry in response.output:
    chat_messages.append(entry)
    print(entry.type)

    if entry.type == 'function_call':      
        result = do_call(entry)
        chat_messages.append(result)
    elif entry.type == 'message':
        print(entry.text) 

function_call


In [29]:
developer_prompt = """
You're a course teaching assistant. 
You're given a question from a course student and your task is to answer it.

Use FAQ if your own knowledge is not sufficient to answer the question.
When using FAQ, perform deep topic exploration: make one request to FAQ,
and then based on the results, make more requests.

At the end of each response, ask the user a follow up question based on your answer.
""".strip()

chat_messages = [
    {"role": "developer", "content": developer_prompt},
]

In [287]:
while True: # main Q&A loop
    question = input() # How do I do my best for module 1?
    if question == 'stop':
        break

    message = {"role": "user", "content": question}
    chat_messages.append(message)

    while True: # request-response loop - query API till get a message
        response = client.responses.create(
            model='gpt-4o-mini',
            input=chat_messages,
            tools=tools
        )

        has_messages = False
        
        for entry in response.output:
            chat_messages.append(entry)
        
            if entry.type == 'function_call':      
                print('function_call:', entry)
                print()
                result = do_call(entry)
                chat_messages.append(result)
            elif entry.type == 'message':
                print(entry.content[0].text)
                print()
                has_messages = True

        if has_messages:
            break

 how do I well in moodule 1?


function_call: ResponseFunctionToolCall(arguments='{"query":"Module 1 success tips"}', call_id='call_HAVf9RhgvpJ2CGI7Z2Ktvh02', name='search', type='function_call', id='fc_6872c8a46eec8194832052cb55613912087999fb2d7710e7', status='completed')

function_call: ResponseFunctionToolCall(arguments='{"query":"tips for excelling in Module 1"}', call_id='call_pOzrvCITlQ6WS4ialszAljXO', name='search', type='function_call', id='fc_6872c8a514b081948553c97c1045335a087999fb2d7710e7', status='completed')

To excel in Module 1 of the Data Engineering Zoomcamp course, here are a few tips:

1. **Install Required Packages**: Make sure you have all the necessary packages installed. For working with PostgreSQL, you might need `psycopg2-binary`. You can install it using:
   ```bash
   pip install psycopg2-binary
   ```

2. **Check for Errors**: If you encounter a `ModuleNotFoundError`, ensure you're installing the correct version of the libraries. Check to see if there are any conflicts with your installat

 Docker and terraform


function_call: ResponseFunctionToolCall(arguments='{"query":"Docker and Terraform Module 1 tips"}', call_id='call_6A1Wn0tVC5XrbJ6RKNSAK9SS', name='search', type='function_call', id='fc_6872c8c00a848194ad642d10f3e76c49087999fb2d7710e7', status='completed')

Here are some tips to excel in the Docker and Terraform section of Module 1:

1. **Navigate to the Working Directory**: Ensure you are running Terraform commands from the correct working directory that contains your Terraform configuration files. This can prevent errors related to initialization:
   ```bash
   cd /path/to/your/terraform/configs
   terraform init
   ```

2. **Check Permissions**: If you receive permission errors (like `storage.buckets.create` error), make sure you're using the correct Project ID from the GCP console. Permissions must be granted explicitly for the service account you are using.

3. **Resource and State Management**: Familiarize yourself with Terraform state files if you’re creating cloud resources. Thi

 stop


In [1]:
!wget https://raw.githubusercontent.com/alexeygrigorev/rag-agents-workshop/refs/heads/main/chat_assistant.py

--2025-07-12 21:00:16--  https://raw.githubusercontent.com/alexeygrigorev/rag-agents-workshop/refs/heads/main/chat_assistant.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3485 (3.4K) [text/plain]
Saving to: ‘chat_assistant.py’


2025-07-12 21:00:17 (20.0 MB/s) - ‘chat_assistant.py’ saved [3485/3485]



In [3]:
!pip install markdown

Collecting markdown
  Downloading markdown-3.8.2-py3-none-any.whl.metadata (5.1 kB)
Downloading markdown-3.8.2-py3-none-any.whl (106 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.8/106.8 kB[0m [31m606.8 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: markdown
Successfully installed markdown-3.8.2
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [34]:
def add_entry(question, answer):
    doc = {
        'question': question,
        'text': answer,
        'section': 'user added',
        'course': 'data-engineering-zoomcamp'
    }
    index.append(doc)

In [36]:
add_entry_description = {
    "type": "function",
    "name": "add_entry",
    "description": "Add an entry to the FAQ database",
    "parameters": {
        "type": "object",
        "properties": {
            "question": {
                "type": "string",
                "description": "The question to be added to the FAQ database",
            },
            "answer": {
                "type": "string",
                "description": "The answer to the question",
            }
        },
        "required": ["question", "answer"],
        "additionalProperties": False
    }
}

In [32]:
import chat_assistant

tools = chat_assistant.Tools()
tools.add_tool(search, search_tool)

tools.get_tools()

developer_prompt = """
You're a course teaching assistant. 
You're given a question from a course student and your task is to answer it.

Use FAQ if your own knowledge is not sufficient to answer the question.

At the end of each response, ask the user a follow up question based on your answer.
""".strip()

chat_interface = chat_assistant.ChatInterface()

chat = chat_assistant.ChatAssistant(
    tools=tools,
    developer_prompt=developer_prompt,
    chat_interface=chat_interface,
    client=client
)

In [33]:
chat.run()

You: How do I do well in module 1?


You: Docker and Terraform


You: stop


Chat ended.


In [37]:
def add_entry(question, answer):
    doc = {
        'question': question,
        'text': answer,
        'section': 'user added',
        'course': 'data-engineering-zoomcamp'
    }
    index.append(doc)

In [38]:
add_entry_description = {
    "type": "function",
    "name": "add_entry",
    "description": "Add an entry to the FAQ database",
    "parameters": {
        "type": "object",
        "properties": {
            "question": {
                "type": "string",
                "description": "The question to be added to the FAQ database",
            },
            "answer": {
                "type": "string",
                "description": "The answer to the question",
            }
        },
        "required": ["question", "answer"],
        "additionalProperties": False
    }
}

In [39]:
import chat_assistant

tools = chat_assistant.Tools()
tools.add_tool(search, search_tool)

In [41]:
tools.add_tool(add_entry, add_entry_description)

In [42]:
tools.get_tools()

[{'type': 'function',
  'name': 'search',
  'description': 'Search the FAQ database',
  'parameters': {'type': 'object',
   'properties': {'query': {'type': 'string',
     'description': 'Search query text to look up in the course FAQ.'}},
   'required': ['query'],
   'additionalProperties': False}},
 {'type': 'function',
  'name': 'add_entry',
  'description': 'Add an entry to the FAQ database',
  'parameters': {'type': 'object',
   'properties': {'question': {'type': 'string',
     'description': 'The question to be added to the FAQ database'},
    'answer': {'type': 'string', 'description': 'The answer to the question'}},
   'required': ['question', 'answer'],
   'additionalProperties': False}}]

In [43]:
developer_prompt = """
You're a course teaching assistant. 
You're given a question from a course student and your task is to answer it.

Use FAQ if your own knowledge is not sufficient to answer the question.

At the end of each response, ask the user a follow up question based on your answer.
""".strip()

chat_interface = chat_assistant.ChatInterface()

chat = chat_assistant.ChatAssistant(
    tools=tools,
    developer_prompt=developer_prompt,
    chat_interface=chat_interface,
    client=client
)

In [44]:
chat.run()

You: How do I do well in module 1


You: add this to the FAQ database


You: stop


Chat ended.


In [45]:
index.docs[-1]

{'question': 'How do I do well in Module 1?',
 'text': "1. Understand the Basics: Make sure you have a solid understanding of Docker and Terraform concepts. Familiarize yourself with containerization, orchestration, and infrastructure as code.\n\n2. Hands-On Practice: Set up your own Docker environment. Run through the Docker commands and try to create containers and networks on your own.\n\n3. Follow the Readme: Adhere to the instructions provided in the course materials carefully, including any setup necessary for Docker and Terraform.\n\n4. Troubleshooting: Be prepared to troubleshoot common errors. For instance, if you're encountering issues with SQLAlchemy, ensure that you have installed necessary Python modules like psycopg2.\n\n5. Engage with the Community: If you're stuck, don't hesitate to ask questions either in forums or in your peer group. Engaging with others can provide new insights.\n\n6. Complete Assignments: Make sure to complete all assignments and review feedback to 