## Q1. Running Elastic
Run Elastic Search 8.17.6, and get the cluster information. If you run it on localhost, this is how you do it:

```bash 
curl localhost:9200
```

What's the version.build_hash value?

## A1: 

We run elasticsearch locally via docker:


```bash
docker-compose up -d elasticsearch
```

Note that our docker-compose.yml file is asking for 8.17.6.

Then we check the Elasticsearch version info via:


```bash
curl http://localhost:9200
```

For which we got the following:

```json
{
  "name" : "4792c1138dd7",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "izSvELyoR5aH57ZVHoNlWQ",
  "version" : {
    "number" : "8.17.6",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "dbcbbbd0bc4924cfeb28929dc05d82d662c527b7",
    "build_date" : "2025-04-30T14:07:12.231372970Z",
    "build_snapshot" : false,
    "lucene_version" : "9.12.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

```


**The version.build_hash value is: `"build_hash" : "dbcbbbd0bc4924cfeb28929dc05d82d662c527b7"`**


## Getting the data
we get the FAQ data for this course as follows. Note that you need to have the requests library `pip install requests`.

In [65]:
import requests 

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

## Q2. Indexing the data
Index the data in the same way as was shown in the course videos. Make the course field a keyword and the rest should be text.

Don't forget to install the ElasticSearch client for Python:

`pip install elasticsearch` 

Which function do you use for adding your data to elastic?

- insert
- index
- put
- add

In [66]:
from elasticsearch import Elasticsearch
from tqdm.auto import tqdm

In [67]:
es_client = Elasticsearch(
    "http://localhost:9200",
    headers={"Accept": "application/vnd.elasticsearch+json; compatible-with=8",
             "Content-Type": "application/vnd.elasticsearch+json; compatible-with=8"}
)


In [68]:
# set up index
index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"} 
        }
    }
}

index_name = "course-questions-2"

if not es_client.indices.exists(index=index_name):
    es_client.indices.create(index=index_name, body=index_settings)
else:
    print(f"Index '{index_name}' already exists.")

In [69]:
# actually index
for doc in tqdm(documents):
    es_client.index(index=index_name, document=doc)

100%|██████████| 948/948 [00:01<00:00, 781.79it/s]


## A2: 

you use `index` to adding your data to elasticsearch.

## Q3. Searching
Now let's search in our index.

We will execute a query "How do execute a command on a Kubernetes pod?".

Use only question and text fields and give question a boost of 4, and use "type": "best_fields".

What's the score for the top ranking result?

- 84.50
- 64.50
- 44.50
- 24.50

Look at the `_score` field.

In [70]:
query = 'How do execute a command on a Kubernetes pod?'

In [71]:
def elastic_search(query):
    search_query = {
        "size": 5,
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^4", "text"],
                        "type": "best_fields"
                    }
                }
            }
        }
    }

    response = es_client.search(index=index_name, body=search_query)
    
    result_docs = []
    
    for hit in response['hits']['hits']:
        result_docs.append(hit)
    
    return result_docs

In [72]:
search_results = elastic_search(query)

In [73]:
search_results[0]['_score']

42.94148

## A3: 

the score for the top match is `44.50`

## Q4. Filtering
Now ask a different question: "How do copy a file to a Docker container?".

This time we are only interested in questions from `machine-learning-zoomcamp`.

Return 3 results. What's the 3rd question returned by the search engine?

- How do I debug a docker container?
- How do I copy files from a different folder into docker container’s working directory?
- How do Lambda container images work?
- How can I annotate a graph?

In [74]:
def elastic_search(query):
    search_query = {
        "size": 3,
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^4", "text"],
                        "type": "best_fields"
                    }
                },
                "filter": {
                    "term": {
                        "course": "machine-learning-zoomcamp"
                    }
                }
            }
        }
    }

    response = es_client.search(index=index_name, body=search_query)
    
    result_docs = []
    
    for hit in response['hits']['hits']:
        result_docs.append(hit['_source'])
    
    return result_docs

In [75]:
query = "How do copy a file to a Docker container?"
search_results = elastic_search(query)
search_results[2]

{'text': 'You can copy files from your local machine into a Docker container using the docker cp command. Here\'s how to do it:\nIn the Dockerfile, you can provide the folder containing the files that you want to copy over. The basic syntax is as follows:\nCOPY ["src/predict.py", "models/xgb_model.bin", "./"]\t\t\t\t\t\t\t\t\t\t\tGopakumar Gopinathan',
 'section': '5. Deploying Machine Learning Models',
 'question': 'How do I copy files from a different folder into docker container’s working directory?',
 'course': 'machine-learning-zoomcamp'}

## A4:

We get on the 3rd result  `'question': 'How do I copy files from a different folder into docker container’s working directory?'`

## Q5. Building a prompt
Now we're ready to build a prompt to send to an LLM.

Take the records returned from Elasticsearch in Q4 and use this template to build the context. Separate context entries by two linebreaks (\n\n)


In [76]:
context_template = """
Q: {question}
A: {text}
""".strip()

Now use the context you just created along with the "How do copy a file to a Docker container?" question to construct a prompt using the template below:

In [77]:
prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT:
{context}
""".strip()

In [78]:
def build_prompt(query, search_results):

    context = ""
    
    for doc in search_results:
        context = context + context_template.format(question=doc['question'],text=doc['text'])+"\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [79]:
prompt = build_prompt(query,search_results)

In [80]:
len(prompt)

1446

In [81]:
print(prompt)

You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: How do copy a file to a Docker container?

CONTEXT:
Q: How do I debug a docker container?
A: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

Q: How do I copy files from my local machine to docker container?
A: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:
docker cp /path/to/local/file_or_directory container_id:/path/in/contain

## A5:

We got a prompt length of `1446`.

## Q6. Tokens
When we use the OpenAI Platform, we're charged by the number of tokens we send in our prompt and receive in the response.

The OpenAI python package uses tiktoken for tokenization:

```bash
pip install tiktoken
```

Let's calculate the number of tokens in our query:

```bash
encoding = tiktoken.encoding_for_model("gpt-4o")
```

Use the encode function. How many tokens does our prompt have?

- 120
- 220
- 320
- 420


In [82]:
import tiktoken

# Get encoding for GPT-4o
encoding = tiktoken.encoding_for_model("gpt-4o")

# Count tokens
num_tokens = len(encoding.encode(prompt))

print(f"Number of tokens: {num_tokens}")


Number of tokens: 320


## A6:

The number of tokens we have is `320`.