In [59]:
#!pip install minsearch

In [3]:
import json

In [6]:
import minsearch

In [7]:
with open('documents.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

In [8]:
documents = []

for course_dict in docs_raw:
    for doc in course_dict['documents']:
        doc['course'] = course_dict['course']
        documents.append(doc)

In [9]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

In [10]:
index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

In [11]:
q = 'the course has already started, can I still enroll?'

In [12]:
index.fit(documents)

<minsearch.minsearch.Index at 0x189d982f650>

In [10]:
#from openai import OpenAI

In [11]:
#client = OpenAI()

In [1]:
from groq import Groq

In [2]:
client = Groq() 

In [17]:
response = client.chat.completions.create(
    model='qwen/qwen3-32b',
    messages=[{"role": "user", "content": q}]
)

response.choices[0].message.content

'<think>\nOkay, the user is asking, "the course has already started, can I still enroll?" Let me break this down.\n\nFirst, I need to consider the different types of courses and the policies they might have. For example, online platforms like Coursera, Udemy, or LinkedIn Learning often have flexible enrollment periods. They might allow students to join at any time, with access to all materials from the start date. But then there are in-person classes, which might have strict deadlines and require attending physical sessions. If the course is in-person and has already started, enrollment might be restricted unless there are remaining seats.\n\nThen there\'s the institutional aspect. Universities and traditional schools usually have enrollment periods. If it\'s a semester-based course, the deadline could have passed, and enrollment might be closed. However, some schools might let you add the course during the add-drop period. The user would need to check with the registrar or academic ad

In [33]:
response = client.chat.completions.create(
    model='qwen/qwen3-32b',
    messages=[{"role": "user", "content": prompt}]
)

response.choices[0].message.content

"<think>\nOkay, let me tackle this. The user is asking if they can enroll after the course has started. Looking at the CONTEXT, there's a section that answers exactly that. The first part says yes, you can still join after the start date, and you can submit homeworks even if you didn't register. But there's a note about deadlines for the final projects. The user might be worried about missing something by enrolling late.\n\nI need to make sure I use only the info from the CONTEXT. The answer should confirm that enrollment is possible post start date, mention submitting homeworks, and caution about the final project deadlines. Also, maybe mention the calendar and Slack/Telegram channels for staying updated. Let me check if those are in the CONTEXT. Yes, the answer mentions the Google Calendar, Telegram, and Slack. But the user's question is specifically about enrolling after the start. So the key points are eligibility to submit homeworks and the deadlines. The other sections talk about

In [37]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(query=query,
             filter_dict={"course": "data-engineering-zoomcamp"},
             num_results=3,
             boost_dict=boost)

    return results

In [54]:
def build_prompt(query, search_results):
    prompt_template = """
    You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
    Use only the facts from the CONTEXT when answering the QUESTION.
    
    QUESTION: {question}
    
    CONTEXT: 
    {context}
    """.strip()

    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    prompt = prompt_template.format(question=query, context=context).strip()
    
    return prompt

In [49]:
def llm(prompt):
    response = client.chat.completions.create(
        model='qwen/qwen3-32b',
        messages=[{"role": "user", "content": prompt}]
    )

    return response.choices[0].message.content

In [57]:
query = 'how to run kafka?'

def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [58]:
rag(query)

'<think>\nOkay, the user is asking "how to run kafka". Let me check the provided CONTEXT for relevant information. \n\nFirst, in Module 6, there\'s a section about creating a virtual environment. The steps include creating a virtual env with python -m venv env, activating it, and installing requirements from requirements.txt. That seems important for setting up the Python environment needed to run Kafka-related scripts.\n\nAnother part in Module 6 mentions running Java Kafka components like producer or consumer. The command given is java -cp with specific jar files. However, the user didn\'t specify Java or Python, but since the question is general, maybe both are applicable.\n\nThere\'s also a note about using kafka-python-ng instead of the standard kafka-python library due to issues. So, installing via pip install kafka-python-ng is suggested. \n\nThe user might be referring to running Kafka producers or consumers. The answer should cover both Python and Java methods if possible. But