In [1]:
!rm -f minsearch.py
!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py

--2024-08-24 19:29:32--  https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3832 (3.7K) [text/plain]
Saving to: ‘minsearch.py’


2024-08-24 19:29:32 (27.3 MB/s) - ‘minsearch.py’ saved [3832/3832]



In [2]:
import requests 
import minsearch

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.Index at 0x7ac0ba8aa060>

In [3]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

    return results

In [4]:
def build_prompt(query, search_results):
    prompt_template = """
    You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
    Use only the facts from the CONTEXT when answering the QUESTION.
    
    QUESTION: {question}
    
    CONTEXT: 
    {context}
    """.strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

def llm(prompt):
    response = client.chat.completions.create(
        model='phi3',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [5]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer
    

In [6]:
from openai import OpenAI

client = OpenAI(base_url = 'http://localhost:11434/v1/',
                api_key = 'ollama'
               )

In [7]:
llm('hi')

'Hello! How can I help you today?'

In [8]:
rag('I just discovered the course. Can I still join?')

"If you just discovered the course now, as per our FAQ responses and policies mentioned herein that were updated until early Jan 2023: You are still eligible to join even after the start date has passed because there will be provisions for late registrations where submitting homework is expected. However, you need to ensure not to wait till last minute as final project submission deadlines do apply post course commencement on 15th Jan 2024 at 17h00 (GMT). In addition, please remember that some registration resources or forums might only work from desktop applications like Google Calendar and Slack. Therefore, ensure to follow necessary steps beforehand including registering via the provided link ahead of course initiation date if not yet done, becoming a Telegram channel member as well joining our DataTalks Club's official Slack Channel which is crucial for staying updated throughout the learning journey that we organize."

In [9]:
rag('What course are you talking about ?')

"The course being discussed is related to data engineering and focuses on cloud computing using Google Cloud platforms (like GCP). Students are expected to have prerequisites such as familiarity with installing packages like Python Anaconda, Terraform, Git etc., having an open Google Cloud account along with a working installation of the Cloud SDK. You can follow this course by subscribing its contents after it finishes and even work on your personal capstone project or continue preparing for future courses using provided materials from DataTalksClub's Slack channel, their online calendar which is active through desktops only as well as Telegram Channel announcements."

In [11]:
llm('write that this is a test')

"This document serves as an automated message to verify the system status. Please disregard it upon receipt, especially when in transit outside of monitoring periods or technical support windows. Remember to reach out with any concerns via direct communication channels provided at your earliest convenience for resolution assistance. For real-time updates and personalized troubleshooting advice, stay tuned through our client portal notifications while actively accessing the system features you need most currently—this message will be automatically filtered from such communications thanks to improved user interaction processes that have been implemented recently in the upgrade cycle.\n\nEnsure your credentials are securely stored and manage login procedures with high confidentiality, recognizing our commitment towards maintaining cyber safety for everyone using this platform. In observance of compliance regulations, consider signing up or confirming any existing privacy policies online b

In [9]:
llm('Tell me about advantages and disadvantages of Toyota Yaris Cross')

"Advantages:\n1. Increased versatility, thanks to the SUV body style; it can drive on roads as well as handle light off-roading situations when equipped with all-terrain tires or a lift kit and lowered suspension (but these accessories are not factory options). It also includes an elevated driving position for enhanced visibility.\n\n2. Safety: The Toyota Yaris Cross was designed to be more secure, featuring advanced driver-assist technology like adaptive cruise control with stop-and-go functionality and lane tracing following, as well as pedestrian detection aids (but not Lane Departure Alert). Being larger than the compact Yaris also provides additional protection for occupants in case of an accident.\n\n3. Innovative design: The vehicle uses Toyota’s Dynamic Force platform with active steering control which adds handling agility and responsiveness, typically associated more with smaller vehicles. This unique approach to combining crossover drivability with SUV-like capabilities sets