# RAG using OpenAI 4o as LLM

# Import

In [21]:
import json
import minsearch
from openai import OpenAI

# Implementation

## No context

In [2]:
client = OpenAI()

In [4]:
q = 'the course has already started, can I still enroll?'

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": q}
    ]
)

In [5]:
response.choices[0].message.content
# The response is not reading from the knowledge base

"The ability to enroll in a course after it has already started often depends on the specific policies of the institution offering the course. Here are a few steps you can take to find out:\n\n1. **Check the Enrollment Policy:** Visit the course or institution's website to review their enrollment policies. Some schools or online platforms allow late enrollments within a certain time frame.\n\n2. **Contact the Instructor or Administration:** Reach out directly to the course instructor or the administration office to inquire about the possibility of late enrollment. They may make exceptions based on individual circumstances.\n\n3. **Look for Add/Drop Deadlines:** Many educational institutions have specific deadlines for adding or dropping courses. Verify if you're still within this period.\n\n4. **Consider Catching Up:** If you're allowed to enroll late, be prepared to catch up on any missed coursework. Ask the instructor for any available resources that can help you get up to speed.\n\n

## With context

In [23]:
with open('faq_documents.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

# Merge all faq from 3 coureses into one list
documents = []

for course_dict in docs_raw:
    for doc in course_dict['documents']:
        doc['course'] = course_dict['course']
        documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)
# SELECT * WHERE course = 'data-engineering-zoomcamp';
index.fit(documents)

<minsearch.Index at 0x78f370af61a0>

In [24]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

    return results

In [25]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [26]:
def llm(prompt):
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [27]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [30]:
print(rag("How to run Mage?"))

The provided CONTEXT does not give a direct response to the specific question of "How to run Mage." However, based on the FAQ information given, here are some essential steps and considerations that could be relevant for setting up and running Mage:

1. **Running Mage with Docker:**
   - When configuring Mage with Docker and facing issues like containers exiting instantly with code 132, it could be due to "older architecture." This might be a hardware issue. If encountering this, you may consider using a Virtual Machine (VM) as an alternative solution.

2. **Authentication Issues:**
   - Ensure proper Google Cloud authentication if you encounter issues such as `OSError`. You need a valid OAuth2 access token to authenticate the request. You can learn more about Google Cloud authentication at [this link](https://cloud.google.com/docs/authentication).

3. **Data Storage and PySpark:**
   - If you are planning to trigger Dataproc from Mage, follow these steps:
     1. Ensure your PySpark s

In [31]:
print(rag("the course has already started, can I still enroll?"))

Yes, even if the course has already started, you can still enroll. You will be eligible to submit the homework, but be aware that there will be deadlines for turning in the final projects, so it's advisable not to leave everything until the last minute.
