# Working with Ollama and elasticsearch

We will work with the notebook from module one and use the `Ollama` and `elasticsearch`

In [1]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [2]:
from elasticsearch import Elasticsearch

In [3]:
es_client = Elasticsearch('http://localhost:9200') 

In [4]:
index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"} 
        }
    }
}

index_name = "course-questions"

es_client.indices.create(index=index_name, body=index_settings)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'course-questions'})

In [5]:
import requests 

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

In [6]:
from tqdm.auto import tqdm

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
for doc in tqdm(documents):
    es_client.index(index=index_name, document=doc)

  0%|          | 0/948 [00:00<?, ?it/s]

100%|██████████| 948/948 [00:06<00:00, 154.97it/s]


In [8]:
def elastic_search(query):
    search_query = {
        "size": 5,
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^3", "text", "section"],
                        "type": "best_fields"
                    }
                },
                "filter": {
                    "term": {
                        "course": "data-engineering-zoomcamp"
                    }
                }
            }
        }
    }

    response = es_client.search(index=index_name, body=search_query)
    
    result_docs = []
    
    for hit in response['hits']['hits']:
        result_docs.append(hit['_source'])
    
    return result_docs

In [9]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

def llm(prompt):
    response = client.chat.completions.create(
        model='phi3',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [10]:
def rag(query):
    search_results = elastic_search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [11]:
query = 'I just disovered the course. Can I still join it?'
rag(query)

" Based on the CONTEXT provided, you are still eligible to join the course even if you discover it after the start date has passed. You can submit assignments and projects as per their respective deadlines without prior registration.\n\nThe FAQ section also mentions that materials from completed courses will be kept for those interested in learning at a slower pace, following up on homework or preparing for future cohorts; this indicates flexibility beyond the course's start date. However, it is important to keep deadlines in mind and not rely solely on self-paced study without any formal registration.\n\nAdditionally, you can join Google Classroom if needed, as stated elsewhere in the FAQ documentation (though details about setting up a classroom account are missing here). The process of joining or registering for classes via such platforms typically involves creating an account and accessing course materials provided by instructors. "

In [12]:
print(_)

 Based on the CONTEXT provided, you are still eligible to join the course even if you discover it after the start date has passed. You can submit assignments and projects as per their respective deadlines without prior registration.

The FAQ section also mentions that materials from completed courses will be kept for those interested in learning at a slower pace, following up on homework or preparing for future cohorts; this indicates flexibility beyond the course's start date. However, it is important to keep deadlines in mind and not rely solely on self-paced study without any formal registration.

Additionally, you can join Google Classroom if needed, as stated elsewhere in the FAQ documentation (though details about setting up a classroom account are missing here). The process of joining or registering for classes via such platforms typically involves creating an account and accessing course materials provided by instructors. 


In [16]:
query = 'How can I run docker?'
rag(query)

" To run docker, first ensure that Docker is installed on your machine by following their official installation guide at https://docs.docker.com/install/. Once Docker is installed: open up terminal and start it using command `docker -h` for help about this program's use. \n\nRemember to check the FAQ or ask if you encounter any issues regarding installing dependencies, running commands inside docker container etc as that information can be obtained in workshop/faq section of a specific course page on edX website (https://www.edx.org/) under Docker technology modules for example `COURSE_ID-module4` where 'COURSE_ID' is the identifier or title of your actual chosen Course from EdX, like Data Analytics with Dbt: https://www.edx.org/course/data-analytics-with-dbt/. \n\nIt’s important to note that although you can learn Docker and related technologies such as Parquet files in a course setting, the answer here is limited by your FAQ database context provided above which does not directly pro

In [17]:
print(_)

 To run docker, first ensure that Docker is installed on your machine by following their official installation guide at https://docs.docker.com/install/. Once Docker is installed: open up terminal and start it using command `docker -h` for help about this program's use. 

Remember to check the FAQ or ask if you encounter any issues regarding installing dependencies, running commands inside docker container etc as that information can be obtained in workshop/faq section of a specific course page on edX website (https://www.edx.org/) under Docker technology modules for example `COURSE_ID-module4` where 'COURSE_ID' is the identifier or title of your actual chosen Course from EdX, like Data Analytics with Dbt: https://www.edx.org/course/data-analytics-with-dbt/. 

It’s important to note that although you can learn Docker and related technologies such as Parquet files in a course setting, the answer here is limited by your FAQ database context provided above which does not directly provide 