# Homework 1 - LLM ZOOMCAMP - Rui Pinto

## Q1 Running Elastic
- Run Elastic Search 8.12.2, and get the cluster information. If you run it on localhost, this is how you do it:

```bash
curl localhost:9200
```

What's the version.build_hash value?

In [3]:
!curl localhost:9200

{
  "name" : "5f4d660d29c7",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "aIMufzLsTtuIhWNxqgFr3A",
  "version" : {
    "number" : "8.12.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "48a287ab9497e852de30327444b0809e55d46466",
    "build_date" : "2024-02-19T10:04:32.774273190Z",
    "build_snapshot" : false,
    "lucene_version" : "9.9.2",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}


### Getting the data

In [4]:
import requests 

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

## Q2. Indexing the data

Index the data in the same way as was shown in the course videos. 
Make the course field a keyword and the rest should be text.

Which function do you use for adding your data to elastic?

- insert
- index ✅
- put
- add

In [5]:
from elasticsearch import Elasticsearch
from elasticsearch.transport import Transport

# Create the Elasticsearch client with connection settings
es_client_hw01 = Elasticsearch(
    'http://localhost:9200',  # Default Elasticsearch endpoint
    request_timeout=30,       # Increased timeout for slower operations
    verify_certs=False,       # Disable SSL verification for local development
    api_key=None,             # No authentication for our local instance
    basic_auth=None,          # No basic auth credentials
    ca_certs=None             # No CA certificates for SSL verification
)

# Note: For production environments, you would enable security features
# and proper certificate validation

In [7]:
# Test the Elasticsearch connection
# - This will verify our client configuration is correct
# - It returns information about the Elasticsearch server
try:
    info = es_client_hw01.info()
    print("✅ Successfully connected to Elasticsearch")
    print(f"Elasticsearch version: {info['version']['number']}")
    print(f"Cluster name: {info['cluster_name']}")
except Exception as e:
    print(f"❌ Connection failed: {e}")
    print("Check that Elasticsearch is running and client compatibility settings are correct")

✅ Successfully connected to Elasticsearch
Elasticsearch version: 8.12.2
Cluster name: docker-cluster


In [9]:
# Create Elasticsearch index with proper error handling
try:
    # Define the index schema (mappings and settings)
    index_settings = {
        "settings": {
            "number_of_shards": 1,       # Use 1 shard for simplicity (dev environment)
            "number_of_replicas": 0      # No replicas needed for local development
        },
        "mappings": {
            "properties": {
                # Field mappings determine how Elasticsearch indexes each field:
                "text": {"type": "text"},        # Full-text search for answer content
                "section": {"type": "text"},     # Full-text search for section names
                "question": {"type": "text"},    # Full-text search for questions
                "course": {"type": "keyword"}    # Exact match filtering for course names
            }
        }
    }

    # Index name for our course questions
    index_name = "course-questions-hw01"
    
    # Check if the index already exists to avoid duplicate creation
    if not es_client_hw01.indices.exists(index=index_name):
        es_client_hw01.indices.create(index=index_name, body=index_settings)
        print(f"✅ Created new Elasticsearch index: {index_name}")
    else:
        print(f"ℹ️ Index {index_name} already exists")
except Exception as e:
    print(f"❌ Error creating Elasticsearch index: {e}")
    print("Will continue with MinSearch instead")

  es_client_hw01.indices.create(index=index_name, body=index_settings)


✅ Created new Elasticsearch index: course-questions-hw01


In [None]:
from tqdm.auto import tqdm

# Index documents in Elasticsearch with progress bar
try:
    print("🔍 Indexing documents into Elasticsearch...")
    
    # Use tqdm for a nice progress bar to track indexing
    for doc in tqdm(documents):
        # The index() method:
        # - Adds each document to the specified index
        # - Automatically generates an ID if not provided
        # - Sets document field values according to our mapping
        # - This is the standard way to add documents to Elasticsearch
        # - The function specifically answers Q2 in the homework
        es_client_hw01.index(index=index_name, document=doc)
    
    print(f"✅ Successfully indexed {len(documents)} documents")
    
except Exception as e:
    print(f"❌ Error indexing documents: {e}")
    print("Falling back to MinSearch functionality")

🔍 Indexing documents into Elasticsearch...


  0%|          | 0/948 [00:00<?, ?it/s]

✅ Successfully indexed 948 documents


## Q3. Searching
Now let's search in our index.

We will execute a query "How do execute a command on a Kubernetes pod?".

Use only question and text fields and give question a boost of 4, and use "type": "best_fields".

What's the score for the top ranking result?

- 84.50
- 64.50
- 44.50 ✅
- 24.50

Look at the _score field.

In [11]:
query = 'How do execute a command on a Kubernetes pod?'

In [None]:
def elastic_search(query):
    try:
        # Construct an Elasticsearch query with:
        # - Multi-match for searching across multiple fields with different weights
        # - Using only question and text fields with question having a boost of 4
        # - "best_fields" type optimizes for documents where the terms appear in the same field
        search_query = {
            "query": {
                "multi_match": {
                    "query": query,
                    "fields": ["question^4", "text"],  # ^4 syntax applies 4x weight to matches in question field
                    "type": "best_fields"  # This type is important for Q3's scoring result
                }
            }
        }

        # Execute search and get results
        response = es_client_hw01.search(index=index_name, body=search_query)
        
        # Print the score for the top result
        if response['hits']['hits']:
            print(f"Top result score: {response['hits']['hits'][0]['_score']}")
        
        # Return the full response to examine all details
        return response
    except Exception as e:
        print(f"❌ Error searching with Elasticsearch: {e}")
        return None

In [14]:
elastic_response = elastic_search(query)

# What's the score for the top ranking result?

if elastic_response and 'hits' in elastic_response:
    top_hit = elastic_response['hits']['hits'][0] if elastic_response['hits']['hits'] else None
    if top_hit:
        print(f"Top hit score: {top_hit['_score']}")
        #print(f"Top hit source: {top_hit['_source']}")
    else:
        print("No hits found")
else:
    print("No valid response from Elasticsearch search")

Top result score: 44.50556
Top hit score: 44.50556


  response = es_client_hw01.search(index=index_name, body=search_query)


## Q4 Filtering
Now ask a different question: "How do copy a file to a Docker container?".

This time we are only interested in questions from machine-learning-zoomcamp.

Return 3 results. What's the 3rd question returned by the search engine?

- How do I debug a docker container?
- How do I copy files from a different folder into docker container’s working directory? ✅
- How do Lambda container images work?
- How can I annotate a graph?

In [15]:
query = 'How do copy a file to a Docker container?'

In [None]:
def elastic_search_plus(query, n_results, course_name):
    try:
        # Construct an Elasticsearch query with:
        # - Multi-match for searching across multiple fields with different weights
        # - Using only question and text fields with question having a boost of 4
        search_query = {
            "size": n_results,
            "query": {
                "bool": {
                    # Multi-match searches across multiple fields
                    "must": {
                                "multi_match": {
                                    "query": query,
                                    "fields": ["question^4", "text"],
                                    "type": "best_fields"
                                }
            },
                    # Filter to only include specific course documents
                    # Filters don't affect relevance scoring, they just limit the result set
                    "filter": {
                        "term": {
                            "course": course_name
                        }
                    }
                }
            }
        }

        # Execute search and get results
        response = es_client_hw01.search(index=index_name, body=search_query)
        
        # Print the score for the top result
        if response['hits']['hits']:
            print(f"Top result score: {response['hits']['hits'][0]['_score']}")
        
        # Return the full response to examine all details
        return response
    except Exception as e:
        print(f"❌ Error searching with Elasticsearch: {e}")
        return None

In [27]:
elastic_response_2 =  elastic_search_plus(query, n_results=3, course_name="machine-learning-zoomcamp")

# the thrid result is the one we are looking for
if elastic_response_2 and 'hits' in elastic_response_2:
    top_hits = elastic_response_2['hits']['hits']
    if len(top_hits) >= 3:
        third_hit = top_hits[2]  # Get the third result
        print(f"Third hit score: {third_hit['_score']}")
        print(f"Third hit source: {third_hit['_source']['question']}")
    else:
        print("Less than 3 hits found")
else:
    print("No valid response from Elasticsearch search")

Top result score: 73.38676
Third hit score: 59.812744
Third hit source: How do I copy files from a different folder into docker container’s working directory?


  response = es_client_hw01.search(index=index_name, body=search_query)


In [60]:
answer_q4 = elastic_response_2['hits']['hits'][2]

print(f"Answer to Q4: {answer_q4}")

Answer to Q4: {'_index': 'course-questions-hw01', '_id': 'E4JNmJcBC-rpw4cz6gjh', '_score': 59.812744, '_source': {'text': 'You can copy files from your local machine into a Docker container using the docker cp command. Here\'s how to do it:\nIn the Dockerfile, you can provide the folder containing the files that you want to copy over. The basic syntax is as follows:\nCOPY ["src/predict.py", "models/xgb_model.bin", "./"]\t\t\t\t\t\t\t\t\t\t\tGopakumar Gopinathan', 'section': '5. Deploying Machine Learning Models', 'question': 'How do I copy files from a different folder into docker container’s working directory?', 'course': 'machine-learning-zoomcamp'}}


## Q5. Building a prompt
Now we're ready to build a prompt to send to an LLM.

Take the records returned from Elasticsearch in Q4 and use this template to build the context. Separate context entries by two linebreaks (\n\n)

```bash
context_template = """
Q: {question}
A: {text}
""".strip()
```

Now use the context you just created along with the "How do copy a file to a Docker container?" question to construct a prompt using the template below:

What's the length of the resulting prompt? (use the len function)

- 946
- 1446 ✅ (closest one)
- 1946
- 2446

In [61]:
import pprint
pprint.pprint(answer_q4)

{'_id': 'E4JNmJcBC-rpw4cz6gjh',
 '_index': 'course-questions-hw01',
 '_score': 59.812744,
 '_source': {'course': 'machine-learning-zoomcamp',
             'question': 'How do I copy files from a different folder into '
                         'docker container’s working directory?',
             'section': '5. Deploying Machine Learning Models',
             'text': 'You can copy files from your local machine into a Docker '
                     "container using the docker cp command. Here's how to do "
                     'it:\n'
                     'In the Dockerfile, you can provide the folder containing '
                     'the files that you want to copy over. The basic syntax '
                     'is as follows:\n'
                     'COPY ["src/predict.py", "models/xgb_model.bin", '
                     '"./"]\t\t\t\t\t\t\t\t\t\t\tGopakumar Gopinathan'}}


In [108]:
def build_prompt(query, search_results):
    """
    Build a prompt for the LLM based on the query and search results.
    
    Args:
        query (str): The user's question.
        search_results (list): List of search result dictionaries from Elasticsearch.
        
    Returns:
        str: The formatted prompt for the LLM.
    """
    # Template for context entries
    context_template = """
    Q: {question}
    A: {text}
    """.strip()

    # Build context from search results
    context_text = ""
    for hit in search_results['hits']['hits']:
        source = hit["_source"]
        context_entry = context_template.format(
            question=source["question"],
            text=source["text"]
        )
        context_text += context_entry + "\n\n"

    # Template for overall prompt
    prompt_template = """
    You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
    Use only the facts from the CONTEXT when answering the QUESTION.

    QUESTION: {question}

    CONTEXT:
    {context}
    """.strip()

    # Create final prompt
    final_prompt = prompt_template.format(
        question=query,
        context=context_text
    )
    
    return final_prompt

In [109]:
final_prompt = build_prompt(query, elastic_response_2)

# Print the length of the resulting prompt
print(f"Length of the prompt: {len(final_prompt)}")

# Display the prompt
print("\nFinal Prompt:")
print(final_prompt)

Length of the prompt: 1478

Final Prompt:
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
    Use only the facts from the CONTEXT when answering the QUESTION.

    QUESTION: How do I copy a file to a Docker container?

    CONTEXT:
    Q: How do I debug a docker container?
    A: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

Q: How do I copy files from my local machine to docker container?
    A: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:
dock

## Q6. Tokens
When we use the OpenAI Platform, we're charged by the number of tokens we send in our prompt and receive in the response.

The OpenAI python package uses tiktoken for tokenization:

```bash
    uv pip install tiktoken
```

Let's calculate the number of tokens in our query:

encoding = tiktoken.encoding_for_model("gpt-4o")
Use the encode function. How many tokens does our prompt have?

- 120
- 220
- 320 ✅ (closest one)
- 420

Note: to decode back a token into a word, you can use the decode_single_token_bytes function:

In [113]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4o-mini")

def count_tokens(prompt):
    """
    Count the number of tokens in a given prompt using the specified encoding.

    Args:
        prompt (str): The text prompt to count tokens for.

    Returns:
        int: The number of tokens in the prompt.
    """
    return len(encoding.encode(prompt))

token_count = count_tokens(final_prompt)
print(f"Number of tokens in the prompt: {token_count}")

Number of tokens in the prompt: 329


In [114]:
# decode the prompt to see how it looks
decoded_prompt = encoding.decode(encoding.encode(final_prompt))
print("\nDecoded Prompt:")
print(decoded_prompt)


Decoded Prompt:
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
    Use only the facts from the CONTEXT when answering the QUESTION.

    QUESTION: How do I copy a file to a Docker container?

    CONTEXT:
    Q: How do I debug a docker container?
    A: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

Q: How do I copy files from my local machine to docker container?
    A: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:
docker cp /path/to/local/file

## Bonus: generating the answer (ungraded)

In [112]:
# Import libraries for OpenAI integration
import os
from openai import OpenAI
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables from .env file (contains API keys)
env_path = Path('../') / '.env'

if env_path.exists():
    load_dotenv(dotenv_path=env_path)
else:
    print("⚠️ Warning: .env file not found, make sure to set OPENAI_API_KEY manually")

# Access the API key
api_key = os.getenv('OPENAI_API_KEY')

client = OpenAI(api_key=api_key)

In [122]:
# Enhanced LLM function for generating responses
def llm(prompt):
    """
    Generate a response using OpenAI's LLM based on the provided prompt.
    
    Args:
        prompt: The full prompt including query and context
        
    Returns:
        string: Generated answer from the LLM
    """
    response = client.chat.completions.create(
        model='gpt-4o-mini',  # Using OpenAI's most capable model
        messages=[{"role": "user", "content": prompt}],
        temperature=0  # Lower temperature for more factual responses
    )
    
    return response.choices[0].message.content

In [None]:
# Complete RAG pipeline implementation
def rag(query):
    """
    Implements the full Retrieval-Augmented Generation pipeline.
    
    The pipeline consists of three main steps:
    1. RETRIEVAL: Find relevant documents using vector search
    2. AUGMENTATION: Build a prompt that includes retrieved context
    3. GENERATION: Generate an answer using the LLM with context
    
    In the RETRIEVAL step, the system performs a vector search to find documents
    that are relevant to the user's query. This is done using the elastic_search
    function, which interfaces with a vector database to fetch top-k documents.
    
    The AUGMENTATION step involves constructing a prompt for the language model.
    The prompt is built by the build_prompt function, which takes the user's query
    and the retrieved documents as input and creates a context-rich prompt.
    
    Finally, in the GENERATION step, the language model (LLM) generates an answer
    to the query. The LLM is prompted with the contextually augmented prompt from
    the previous step, enabling it to produce a well-informed answer.
    
    This pipeline effectively combines information retrieval and natural language
    processing to provide accurate and contextually relevant answers to user queries.
    It leverages the strengths of vector databases for retrieval and LLMs for
    natural language understanding and generation.
    
    Args:
        query: User's natural language question
        
    Returns:
        string: Generated answer based on retrieved context
    """
    # Step 1: RETRIEVAL - Get relevant documents using MinSearch
    search_results = elastic_search(query)
    
    # Step 2: AUGMENTATION - Build prompt with retrieved context
    prompt = build_prompt(query, search_results)
    
    # Step 3: GENERATION - Generate answer using LLM
    answer = llm(prompt)
    
    return answer

In [124]:
rag_query = "How do I copy a file to a Docker container?"

rag_answer = rag(rag_query)
print(f"\nRAG Answer: {rag_answer}")

  response = es_client_hw01.search(index=index_name, body=search_query)


Top result score: 84.050095

RAG Answer: To copy a file to a Docker container, you can use the `docker cp` command. The basic syntax is as follows:

```
docker cp /path/to/local/file_or_directory container_id:/path/in/container
```

This command allows you to copy a file or directory from your local machine into a running Docker container.


## Bonus: calculating the costs (ungraded)
Suppose that on average per request we send 150 tokens and receive back 250 tokens.

How much will it cost to run 1000 requests?

You can see the prices here

- [openai_pricing](https://platform.openai.com/docs/pricing)

- Input: $0.15 / 1M tokens
- Output: $0.60 / 1M tokens

You can redo the calculations with the values you got in Q6 and Q7.

In [126]:
# how much does it cost to run this?
# Assuming we use the gpt-4o-mini model, which costs $0.0001 per 1K tokens
# Let's estimate the cost based on the prompt and response length
def estimate_cost(prompt, response, model='gpt-4o-mini'):
    """
    Estimate the cost of running the LLM based on prompt and response length.

    Args:
        prompt (str): The input prompt sent to the LLM.
        response (str): The generated response from the LLM.
        model (str): The model used for generation.

    Returns:
        float: Estimated cost in USD.
    """
    # Token costs for gpt-4o-mini
    token_cost_per_1k = 0.00015  # $0.00015 per 1000 tokens

    # Count tokens in prompt and response
    total_tokens = count_tokens(prompt) + count_tokens(response)

    # Calculate cost
    cost = (total_tokens / 1000) * token_cost_per_1k
    return cost
# Estimate the cost for the RAG query
rag_cost = estimate_cost(final_prompt, rag_answer)
print(f"Estimated cost for RAG query: ${rag_cost:.6f}")

Estimated cost for RAG query: $0.000203


# Summary and Review of Homework 1

In this homework assignment, we implemented a complete Retrieval-Augmented Generation (RAG) pipeline using Elasticsearch and OpenAI LLMs. Here's what we accomplished:

## Environment Setup
1. We set up and verified an Elasticsearch 8.12.2 instance running locally
2. We created a Python environment with necessary libraries (elasticsearch-py, tiktoken, openai)

## Data Preparation and Indexing
1. We fetched JSON data containing questions and answers from DataTalksClub's GitHub repository
2. We processed the data to include course information in each document
3. We configured an Elasticsearch index with proper mappings:
   - Text fields (question, text, section) for full-text searching
   - Keyword field (course) for exact filtering

## Search Functionality
1. We implemented basic search using Elasticsearch's multi_match query
2. We applied field boosting to prioritize matches in question fields (boost=4)
3. We implemented filtered search to target specific courses
4. We explored and understood Elasticsearch relevance scoring

## RAG Implementation Components
1. **Retrieval**: We used Elasticsearch to find relevant documents based on user queries
2. **Augmentation**: We built a prompt template that incorporated retrieved context
3. **Generation**: We connected to OpenAI's API to generate responses based on context

## Analysis and Optimization
1. We calculated token counts in our prompts using tiktoken
2. We estimated costs based on token usage
3. We explored the trade-offs between token count and retrieved context quality

This assignment demonstrated how to build a practical RAG system that outperforms standard LLM responses by incorporating domain-specific knowledge from a custom knowledge base.