# Exam (morning): Retrieval Augmented Generation

### Personal Details (please complete)
Double Click on Cell to edit.

<table>
  <tr>
    <td>First Name:</td>
    <td>Lucas</td>
  </tr>
  <tr>
    <td>Last Name:</td>
    <td>Hersche</td>
  </tr>
  <tr>
    <td>Student ID:</td>
    <td>1408236</td>
  </tr>
  <tr>
    <td>Modul:</td>
    <td>Machine Learning 2</td>
  </tr>
  <tr>
    <td>Exam Date / Raum / Zeit:</td>
    <td>20.05.2025 / Raum: SM O2.01  / 10:15 – 11:30</td>
  </tr>
  <tr>
    <td>Erlaubte Hilfsmittel:</td>
    <td>w.3ML2-WIN (Machine Leaning 2)<br>Open Book, Personal Computer, Internet Access</td>
  </tr>
  <tr>
  <td>Not allowed:</td>
  <td>The use of any form of generative AI (e.g., Copilot, ChatGPT) to assist in solving the exercise is not permitted. <br> However, using such tools as part of the exercise itself (e.g., making API calls to them if required by the task) is allowed. <br> Any form of communication or collaboration with other people is not permitted.</td>
</tr>
</table>

## Evaluation Criteria

### <b style="color: gray;">(maximum achievable points: 48)</b>

<table>
  <thead>
    <tr>
      <th>Category</th>
      <th>Description</th>
      <th>Points Distribution</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Code not executable or results not meaningful</td>
      <td>The code contains errors that prevent it from running (e.g., syntax errors) or produces results that do not fit the question.</td>
      <td>0 points</td>
    </tr>
    <tr>
      <td>Code executable, but with serious deficiencies</td>
      <td>The code runs, but the results are incomplete due to major errors (e.g., fundamental errors when reading the data). Only minimal progress is evident.</td>
      <td>25% of the maximum achievable points</td>
    </tr>
    <tr>
      <td>Code executable, but with moderate deficiencies</td>
      <td>The code runs and delivers partially correct results, but there are significant errors (e.g., the data types of the imported data do not meet the requirements of the question). The results are comprehensible but incomplete or inaccurate.</td>
      <td>50% of the maximum achievable points</td>
    </tr>
    <tr>
      <td>Code executable, but with minor deficiencies</td>
      <td>The code runs and delivers a largely correct result, but minor errors (e.g., column name misspelled, timestamp not correctly formatted) affect the completeness of the result.</td>
      <td>75% of the maximum achievable points</td>
    </tr>
    <tr>
      <td>Code executable and correct</td>
      <td>The code runs flawlessly and delivers the correct result without deficiencies.</td>
      <td>100% of the maximum achievable points</td>
    </tr>
  </tbody>
</table>



## Python Libraries und Settings

## <b>Set Up (This part will <u>not</u> be evaluated!)</b>

#### <b>1.) Start a GitHub Codespaces instance based on your fork of this GitHub repository or open the notebook in Colab</b>
#### <b>2.) Add API keys to either .env files for Codespaces or to the secrets for Colab</b>
#### <b>3.) Please execute the two code cells below as soon as the Codespace/Colab has started and install the libraries</b>

In [1]:
!python3 -m pip install --upgrade pip
!pip install PyPDF2
!pip install langchain-community
!pip install faiss-cpu
!pip install groq
!pip install openai
!pip install tqdm
!pip install sentence-transformers
!pip install huggingface_hub[hf_xet]
!pip install faiss-cpu
!pip install google-generativeai

Collecting pip
  Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-25.1.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m46.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 25.0.1
    Uninstalling pip-25.0.1:
      Successfully uninstalled pip-25.0.1
Successfully installed pip-25.1.1
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
Installing collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1
Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain-core<1.0.0,>=0.3.59 (from langchain-community)
  Downloading langchain_core-0.3.60-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain<1.0.0,>=0.3.25 (from langchain-community)
  Downloading langchain-0.3

In [2]:
from dotenv import load_dotenv
import os
from openai import OpenAI
import openai
import tqdm
import glob
from PyPDF2 import PdfReader
from sentence_transformers import SentenceTransformer
import faiss
import pickle
import google.generativeai as genai
from groq import Groq




  from .autonotebook import tqdm as notebook_tqdm


In [3]:
load_dotenv()
groq_key = os.getenv("GROQ_API_KEY")
openai.api_key = os.getenv("OPENAI_API_KEY")
google_key = os.getenv("GOOGLE_API_KEY")


## <b>Tasks (This part will be evaluated!)</b>
### Notes on the following tasks:

In this part of the exam, you will build a Retrieval-Augmented Generation (RAG) pipeline that efficiently retrieves medical information from the package inserts of common medications. Imagine you are developing a system for pharmacists or medical professionals to quickly and accurately answer questions about medications. The following five package inserts are provided as your data source:

- [data/Amoxicillin.pdf](data/Amoxicillin.pdf)
- [data/bisoprolol.pdf](data/bisoprolol.pdf)
- [data/citalopram.pdf](data/citalopram.pdf)
- [data/metformin.pdf](data/metformin.pdf)
- [data/paracetamol.pdf](data/paracetamol.pdf)

Your task is to implement a RAG pipeline that retrieves relevant information from these package inserts and integrates it into the answer generation process. Use the provided instructions and your knowledge from the exercises.

### Expected Results:

1. Read in the provided package inserts and extract all text.
2. Split the extracted text into manageable chunks using a text splitter (e.g., `RecursiveCharacterTextSplitter`).
3. Create embeddings for the text chunks using a suitable model.
4. Index the embeddings in a vector store (e.g., FAISS).
5. Develop an appropriate prompt template.
6. Build the RAG chain.
7. Automatically generate a list of 10 test questions using a language model.
8. Let your RAG pipeline answer the 10 generated questions.

### Submission documents:

Your submission should include:
- The completed notebook (this file).
- the vector store

<b style="color:blue;">Notes on the following tasks:</b>
<ul style="color:blue;">
  <li>Pay attention to the specific details provided for each task.</li>
  <li>Solve each task using Python code. Integrate your code into the code cells for each task.</li>
  <li>Present your solution(s) as requested in each task.</li>
</ul>

#### <b>Task (1): Read all 5 PDFs from the 'data' folder and store their content for further use</b>
<b>Task details:</b>
- The files are located in the 'data' folder..
- Display the length of the resulting string (number of characters).
- Show the first 100 characters in the notebook output.
<b style="color: gray;">(max. points: 2)</b>

In [4]:
### load the pdf from the path
glob_path = "data/*.pdf"
text = ""
for pdf_path in tqdm.tqdm(glob.glob(glob_path)):
    with open(pdf_path, "rb") as file:
        reader = PdfReader(file)
         # Extract text from all pages in the PDF
        text += " ".join(page.extract_text() for page in reader.pages if page.extract_text())

100%|██████████| 5/5 [00:02<00:00,  2.30it/s]


In [None]:
# Show the number of characters in the text
print(f"Number of characters in the entire text: ")
print(f"Total Numbers: {len(text)}")

# Show the first 100 characters of the text
print(f"The first 100 characters of the text:")
text[:100]

Number of characters in the entire text: 
Total Numbers: 176575
The first 100 characters of the text:


'Inhaltsverzeichnis\nZusammensetzung\nDarreichungsform und Wirkstoffmenge pro Einheit\nIndikationen/Anwe'

#### <b>Task (2): Split the text into chunks appropriate for the task. Specify an overlap as well. Give a reason for your choice</b>
<b>Task details:</b>
- Use the data from the previous task.
- Show the total number of chunks in the notebook.
- Show the length of the first chunk in the notebook.
- Explain you reasoning
<b style="color: gray;">(max. points: 4)</b>

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create a splitter: 2000 characters per chunk with an overlap of 200 characters
splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
# Split the extracted text into manageable chunks
chunks = splitter.split_text(text)

In [8]:
# Show the total number of chunks
print(f"Number of chunks: ")
print(f"Total chunks: {len(chunks)}")

# Show the length of the first chunk
print(f"Length of the first chunk: ")
print(len(chunks[0][:200]))

Number of chunks: 
Total chunks: 98
Length of the first chunk: 
200


##### Explanation (double click and add text):
This cell creates a RecursiveCharacterTextSplitter with chunk_size=2000 and chunk_overlap=200, then applies it to split the large text into smaller, manageable chunks. The overlap helps preserve context across chunk boundaries, which is important for maintaining semantic meaning when the text is later embedded. The cell also prints the total number of chunks created and a preview of the first chunk (first 200 characters). This chunking step is critical for processing large documents that exceed token limits of embedding models.

#### <b>Task (3): Initialize an embedding model</b>
<b>Task details:</b>
- Choose a suitable embedding model from Huggingface.
- [Huggingface models](https://huggingface.co/spaces/mteb/leaderboard).
- Consider the size of the model. It should be runnable in your Codespace.
- Choose a model appropriate for the data.

<b style="color: gray;">(max. points: 2)</b>

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings  # For generating embeddings for text chunks
from langchain.text_splitter import SentenceTransformersTokenTextSplitter

token_splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0, tokens_per_chunk=128, model_name="paraphrase-multilingual-MiniLM-L12-v2")

token_split_texts = []
for text in chunks:
    token_split_texts += token_splitter.split_text(text)

model_name = "paraphrase-multilingual-MiniLM-L12-v2"
model = SentenceTransformer(model_name)
tokenized_chunks = []
for i, text in enumerate(token_split_texts[:10]):
    # Tokenize each chunk
    encoded_input = model.tokenizer(text, padding=True, truncation=True, max_length=128, return_tensors='pt')
    # Convert token IDs back to tokens
    tokens = model.tokenizer.convert_ids_to_tokens(encoded_input['input_ids'][0].tolist())
    tokenized_chunks.append(tokens)
    print(f"Chunk {i}: {tokens}")

embeddings = HuggingFaceEmbeddings(model_name="paraphrase-multilingual-MiniLM-L12-v2")
chunk_embeddings = model.encode(token_split_texts, convert_to_numpy=True)


Chunk 0: ['<s>', '▁Inhalt', 's', 'ver', 'ze', 'ich', 'nis', '▁Zusammen', 'setzung', '▁Dar', 'reich', 'ungs', 'form', '▁und', '▁Wir', 'k', 'stoff', 'men', 'ge', '▁pro', '▁Einheit', '▁Indi', 'ka', 'tionen', '/', 'An', 'wendung', 's', 'möglichkeiten', '▁Dos', 'ierung', '/', 'An', 'wendung', '▁Kontra', 'indik', 'ation', 'en', '▁War', 'n', 'hin', 'weise', '▁und', '▁Vor', 'sicht', 's', 'mas', 's', 'nahme', 'n', '▁Inter', 'aktion', 'en', '▁Schwangerschaft', '/', 'S', 'till', 'zeit', '▁Wirkung', '▁auf', '▁die', '▁Fahrt', 'ü', 'cht', 'igkeit', '▁und', '▁auf', '▁das', '▁Be', 'dien', 'en', '▁von', '▁Maschinen', '▁Un', 'er', 'w', 'ün', 's', 'chte', '▁Wirkung', 'en', '▁Über', 'dos', 'ierung', '▁Eigenschaften', '/', 'Wir', 'k', 'ungen', '▁Pharma', 'ko', 'kin', 'etik', '▁Prä', 'klin', 'ische', '▁Daten', '▁Son', 'stige', '▁Hinweise', '▁Zu', 'lassung', 's', 'nummer', '▁Zu', 'lassung', 'sin', 'haber', 'in', '▁Stand', '▁der', '▁Information', '▁Produkte', '▁Swiss', 'medic', '▁-', 'ge', 'nehm', 'ig', 'te',

  embeddings = HuggingFaceEmbeddings(model_name="paraphrase-multilingual-MiniLM-L12-v2")


#### <b>Task (4): Create a vector store</b>
<b>Task details:</b>
- Create a vector store
- store the vector store (this is also helpful in case the codespace or colab needs a restart)
<b style="color: gray;">(max. achievable points: 6)</b>

In [10]:
d = chunk_embeddings.shape[1]
print(d)

384


In [11]:
index = faiss.IndexFlatL2(d)
index.add(chunk_embeddings)
print("Number of embeddings in FAISS index:", index.ntotal)

Number of embeddings in FAISS index: 483


In [13]:
faiss.write_index(index, "faiss/faiss_index.index")
with open("faiss/chunks_mapping.pkl", "wb") as f:
    pickle.dump(token_split_texts, f)

In [14]:
index = faiss.read_index("faiss/faiss_index.index")
with open("faiss/chunks_mapping.pkl", "rb") as f:
    token_split_texts = pickle.load(f)
print(len(token_split_texts))

483


#### <b>Task (5): Create a retriever function.</b>
<b>Task details:</b>
- Create a retriever function
- Define the number of documents the retriever should return.
- Test the retriever with the following query: `"Welche Dosierung von Amoxicillin Axapharm wird für die Behandlung einer Endokarditis-Prophylaxe bei Erwachsenen empfohlen?"`
- If the retrieved chunks are not relevant, increase the number of chunks to be retrieved and repeat the query. 
- It does not have to be perfect; if nothing improves, continue with the current result.
<b style="color: gray;">(max. achievable points: 6)</b>

In [15]:
def retrieve_texts(query, k, index, chunks, model):
    """
    Retrieve the top k similar text chunks and their embeddings for a given query.
    """
    query_embedding = model.encode([query], convert_to_numpy=True)
    distances, indices = index.search(query_embedding, k)
    retrieved_texts = [token_split_texts[i] for i in indices[0]]
    return retrieved_texts, distances

In [17]:
query = "Welche Dosierung +von Amoxicillin Axapharm wird für die Behandlung einer Endokarditis-Prophylaxe bei Erwachsenen empfohlen?"

In [19]:

# Testen des retrievers
retrieved_texts = retrieve_texts(query, 3, index, chunks, model)

print(retrieved_texts)
print(len(retrieved_texts))

(['Erscheinungen). Amoxicillin Axapharm 200 mg/4 ml ist ferner indiziert zur Proph ylaxe der bakteriellen Endokarditis bei zahnmedizinischen Eingriffen (z.B . Zahnextr aktion, Z ahnsteinentfernung, Z ahnfüllung), Endosk opien und anderen Oper ationen, die häufig v on einer Bakteriämie begleitet sind und die das Risik o einer Endokarditis bei gewissen P ersonen mit Herzschäden erhöhen. Eine Einz eldosis v on 3 g Amo xici', '-Clear ance v on <10 ml/min. Zusätzlich erhalten Erw achsene 1 g parenter al oder 750 mg or al und Kinder 15 mg/kg parenter al nach jeder Dialyse. Art der Anwendung Amoxicillin Axapharm 200 mg/4 ml kann ohne Wirkungsv erlust zu den Mahlz eiten eingenommen werden. Die Suspension ist v or jedem Gebr auch zu schütteln. Bei der Behandlung v on Kleinkindern ist zu beachten, dass die zubereitete Suspension maximal 10 T age haltbar ist. Kontraindikationen Überempfindlichk', 'Harnabfluss im Katheter regelmässig k ontrolliert werden. Patienten mit Nierenfunktionsstörungen Bei

#### <b>Task (6): Implement a reusable RAG function and prompt template</b>
<b>Task details:</b>
- Write a function `get_answer_and_documents` that answers a question using your RAG pipeline.
- The function should:
  - Take as parameters: the question (`question`), the number of documents to retrieve (`k`), the FAISS index (`index`), and the list of text chunks (`chunks`).
  - The prompt template should be tailored to the medical context, address medical professionals, and instruct the model to answer concisely and in German, using only the provided context. This is part of the task.
  - Return both the answer and the retrieved documents.
- Test the function with the question: `Ab welcher Kreatinin-Clearance ist die Einnahme von Metformin kontraindiziert?`

<b style="color: gray;">(max. achievable points: 8)</b>

In [26]:
# set language model and output parser
def answer_query(query, k, index,texts):

    """
    Retrieve the top k similar text chunks for the given query using the retriever,
    inject them into a prompt, and send it to the Groq LLM to obtain an answer.
    
    Parameters:
    - query (str): The user's query.
    - k (int): Number of retrieved documents to use.
    - groq_api_key (str): Your Groq API key.
    
    Returns:
    - answer (str): The answer generated by the LLM.
    """
    # Retrieve the top k documents using your retriever function.
    # This retriever uses the following definition:
    # def retrieve(query, k):
    #     query_embedding = model.encode([query], convert_to_numpy=True)
    #     distances, indices = index.search(query_embedding, k)
    #     retrieved_texts = [token_split_texts[i] for i in indices[0]]
    #     retrieved_embeddings = np.array([chunk_embeddings[i] for i in indices[0]])
    #     return retrieved_texts, retrieved_embeddings, distances
    model_name = "paraphrase-multilingual-MiniLM-L12-v2"
    model = SentenceTransformer(model_name)
    retrieved_texts, _ = retrieve_texts(query, k, index, texts, model)
    
    # Combine the retrieved documents into a single context block.
    context = "\n\n".join(retrieved_texts)
    
    # Build a prompt that instructs the LLM to answer the query based on the context.
    prompt = (
        "Answer the following question using the provided context. "
        "Explain it as if you are explaining it to a 5 year old.\n\n"
        "Context:\n" + context + "\n\n"
        "Question: " + query + "\n"
        "Answer:"
    )
    
    # Initialize the Groq client and send the prompt.
    client = Groq(api_key=groq_key)
    messages = [
        {
            "role": "system",
            "content": prompt
        }
    ]
    
    llm = client.chat.completions.create(
        messages=messages,
        model="llama-3.3-70b-versatile"
    )
    
    # Extract and return the answer.
    answer = llm.choices[0].message.content
    return answer


In [27]:
# Test query
query = "Ab welcher Kreatinin-Clearance ist die Einnahme von Metformin kontraindiziert?"

In [28]:
# print result of test query with your chain (hint: input is a dictionary)
print(answer_query(query, 4, index, chunks))

Hallo kleiner Freund! 

Stell dir vor, dein Körper hat eine spezielle Filteranlage, die called Nieren. Sie helfen, den Körper sauber zu halten. Die Kreatinin-Clearance ist ein Wert, der zeigt, wie gut diese Filteranlage funktioniert.

Metformin ist ein Medikament, das manchmal zusammen mit anderen Medikamenten eingenommen wird. Es ist wichtig, dass der Arzt die richtige Dosis verschreibt, besonders wenn die Nieren nicht so gut funktionieren.

Wenn die Kreatinin-Clearance unter 30 ml/min fällt, bedeutet das, dass die Nieren nicht so gut funktionieren. In diesem Fall ist es nicht gut, Metformin zu nehmen. Es ist wie ein Warnschild, das sagt: "Vorsicht, hier ist es nicht sicher!"

Also, ab einer Kreatinin-Clearance von weniger als 30 ml/min ist die Einnahme von Metformin kontraindiziert. Das bedeutet, dass der Arzt sagen wird: "Nein, du darfst kein Metformin nehmen, weil deine Nieren nicht gut genug funktionieren." 

Ich hoffe, das hilft dir, es zu verstehen!


#### <b>Task (7): Implement a HyDE Query Transformation for RAG</b>
<b>Task details:</b>
- Implement a function that applies the HyDE strategy in your RAG pipeline.
- add your HyDe transformation to your pipeline
- Display the intermediate transformation (print statement within function is enough) and the final answer in the notebook.
<b style="color: gray;">(max. achievable points: 6)</b>

In [33]:
#def rewrite_query_hyde(query):
    
def rewrite_query(query, groq_api_key):
    """
    Rewrite the user's query into to potentially improve retrieval.
    Parameters:
    - query (str): The original user query.
    - groq_api_key (str): Your Groq API key.
    
    Returns:
    - rewritten_query (str): The rewritten query.
    """
    client = Groq(api_key=groq_key)
    # Build a prompt for rewriting the query
    rewriting_prompt = (
        "Rewrite the following query into a format, such that it can be answered by looking at medical guidelines. "
        "Keep the keywords but ensure that it is close to a format, such as in medical guidelines. Just answer with the rewritten query\n\n"
        "Query: " + query
    )
    
    messages = [
        {"role": "system", "content": rewriting_prompt}
    ]
    
    # Use the same model (for example, llama) to perform query rewriting
    llm = client.chat.completions.create(
        messages=messages,
        model="llama-3.3-70b-versatile",
    )
    rewritten_query = llm.choices[0].message.content.strip()
    return rewritten_query

In [34]:
#def answer_query_with_rewriting(query, k, index, texts):
    
def answer_query_with_rewriting(query, k, index, texts, groq_api_key):
    """
    Retrieve the top k similar text chunks for the given query using a retrieval method
    with query rewriting, inject them into a prompt, and send it to the Groq LLM (using llama)
    to obtain an answer.
    
    Parameters:
    - query (str): The user's query.
    - k (int): Number of retrieved documents to use.
    - index: The FAISS index.
    - texts (list): The tokenized text chunks mapping.
    - groq_api_key (str): Your Groq API key.
    
    Returns:
    - answer (str): The answer generated by the LLM.
    """
    model_name = "paraphrase-multilingual-MiniLM-L12-v2"
    model = SentenceTransformer(model_name)
    # Use the new retrieval function with query rewriting.
    rewritten_query = rewrite_query(query, groq_key)    
    print("Rewritten Query:", rewritten_query) ## FYI

    retrieved_texts, _ = retrieve_texts(rewritten_query, k, index, texts, model)
    
    # Combine the retrieved documents into a single context block.
    context = "\n\n".join(retrieved_texts)
    
    # Build a prompt that instructs the LLM to answer the query based on the context.
    prompt = (
        "Answer the following question using the provided context. "
        "Explain it as if you are explaining it to a 5 year old.\n\n"
        "Context:\n" + context + "\n\n"
        "Question: " + query + "\n"
        "Answer:"
    )
    
    # Initialize the Groq client and send the prompt.
    client = Groq(api_key=groq_api_key)
    messages = [
        {"role": "system", "content": prompt}
    ]
    
    llm = client.chat.completions.create(
        messages=messages,
        model="llama-3.3-70b-versatile"
    )
    
    # Extract and return the answer.
    answer = llm.choices[0].message.content
    return answer

In [36]:
query = "Was ist der wichtigste Faktor bei der Diagnostizierung von Asthma?"
answer = answer_query_with_rewriting(query, 5, index, chunks, groq_key)
print("LLM Answer:", answer)

Rewritten Query: Was sind die diagnostischen Kriterien für Asthma und welcher Faktor wird in den aktuellen medizinischen Leitlinien als wichtigster Faktor bei der Diagnosestellung von Asthma angesehen?
LLM Answer: Das ist eine gute Frage!

Also, wenn wir über Asthma sprechen, ist es wichtig zu wissen, dass es viele verschiedene Dinge sind, die helfen können, Asthma zu diagnostizieren. Aber der wichtigste Faktor ist... (dramatische Pause) ...die Symptome des Patienten!

Wenn jemand Atemnot, Husten, Keuchen oder andere Probleme mit dem Atmen hat, kann das ein Zeichen für Asthma sein. Die Ärzte werden dann verschiedene Tests machen, wie zum Beispiel eine Lungenfunktionstest, um zu sehen, wie gut die Lungen funktionieren. Aber die Symptome sind der wichtigste Faktor, um zu entscheiden, ob jemand Asthma hat oder nicht.

Es ist wie wenn du ein Puzzle löst. Die Symptome sind die Teile, die du zusammenfügen musst, um das Bild von Asthma zu sehen. Und wenn du das Bild siehst, kannst du dann die

#### <b>Task (7): Generate a list of test questions</b>
<b>Task details:</b>
- Create a Python list with 10 questions about the provided medications.
- The questions should be automatically generated using a language model.
- You may use chunks from the package inserts as inspiration, but this is not required.
- At the end, print out your list of questions.

<b style="color: gray;">(max. achievable points: 6)</b>

In [41]:
import time
import httpx  # Ensure you're catching the correct timeout exception
import random
from openai import OpenAI
def generate_questions_for_random_chunks(chunks, num_chunks=10, max_retries=3):
    """
    Randomly selects a specified number of text chunks from the provided list,
    then generates a question for each selected chunk using the Groq LLM.

    Parameters:
    - chunks (list): List of text chunks.
    - groq_api_key (str): Your Groq API key.
    - num_chunks (int): Number of chunks to select randomly (default is 10).

    Returns:
    - questions (list of tuples): Each tuple contains (chunk, generated_question).
    """
    # Randomly select the desired number of chunks.
    selected_chunks = random.sample(chunks, num_chunks)
    
    # Initialize the Groq client once
    client = OpenAI(api_key=openai.api_key)
    
    questions = []
    for chunk in tqdm.tqdm(selected_chunks):
        # Build a prompt that asks the LLM to generate a question based on the chunk.
        prompt = (
            "Based on the following text, generate an insightful question that covers its key content:\n\n"
            "Text:\n" + chunk + "\n\n"
            "Question:"
        )
        
        messages = [
            {"role": "system", "content": prompt}
        ]
        
        generated_question = None
        attempt = 0
        
        # Try calling the API with simple retry logic.
        while attempt < max_retries:
            try:
                llm_response = client.chat.completions.create(
                     model="gpt-4o-mini",
                    messages=messages
                )
                generated_question = llm_response.choices[0].message.content.strip()
                break  # Exit the loop if successful.
            except httpx.ReadTimeout:
                attempt += 1
                print(f"Timeout occurred for chunk. Retrying attempt {attempt}/{max_retries}...")
                time.sleep(2)  # Wait a bit before retrying.
        
        # If all attempts fail, use an error message as the generated question.
        if generated_question is None:
            generated_question = "Error: Failed to generate question after several retries."
        
        questions.append((chunk, generated_question))
    
    return questions

In [42]:
questions = generate_questions_for_random_chunks(chunks, num_chunks=10, max_retries=2)
for idx, (chunk, question) in enumerate(questions, start=1):
    print(f"Chunk {idx}:\n{chunk[:100]}...\nGenerated Question: {question}\n")

  0%|          | 0/10 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:09<00:00,  1.06it/s]

Chunk 1:
Besch werden muss die Behandlung mit Metformin sofort v orübergehend unterbrochen werden.
Iodhaltige...
Generated Question: What precautions should be taken regarding Metformin treatment in patients undergoing X-ray examinations with iodinated contrast media, and what are the implications for renal and cardiac function?

Chunk 2:
26-30 kg 8-10 Jahre 1500-2000 mg 4× 400 mg
31-40 kg 10-12 Jahre 2000 mg 4× 400 mg
Schwere Infektione...
Generated Question: What are the specific dosage guidelines for administering Amoxicillin based on patient weight, age, and renal function as described in the provided text?

Chunk 3:
Hereditäre k onstitutionelle Hyperbilirubinämie (Morbus Meulengr acht).
Warnhinweise u nd Vorsichtsm...
Generated Question: What are the key precautions and potential risks associated with the use of Paracetamol in patients with hereditary constitutional hyperbilirubinemia and other underlying health conditions?

Chunk 4:
Zusammensetzung
Darreichungsform und Wirkstoffm




In [None]:
#for i, question in enumerate(questions):
    #i +=1
    #print("Frage " + str(1) + ": " + question)

#### <b>Task (8): Let your retriever answer the 10 generated questions.</b>
<b>Task details:</b>
- Use the 10 generated questions and have them answered by your RAG chain.
- For each question, output both the retrieved documents and the answer.
- Provide your own assessment of whether your chain works well or not.
- Give an example of what worked well and what did not.

<b style="color: gray;">(max. achievable points: 6)</b>

In [45]:
# Beantwortung der 10 generierten Fragen

for question in questions:  # Questions list from Aufgabe (7)
    
    # Use the RAG chain to get an answer for the question
    answer =  answer_query_with_rewriting(query, 4, index, chunks, groq_key)
    print(answer)

Rewritten Query: Diagnostische Kriterien für Asthma: Welche Faktoren haben bei der Diagnostizierung von Asthma die größte Priorität?
Hallo kleiner Freund! Asthma ist eine Erkrankung, bei der es schwer ist, Luft in die Lungen zu bekommen. Wenn du Asthma hast, kann es passieren, dass du plötzlich schwer atmen musst oder dass du husten musst.

Der wichtigste Faktor bei der Diagnostizierung von Asthma ist, wie oft und wie schwer du Atemnot hast. Wenn du oft Atemnot hast, besonders nachdem du viel gelaufen bist oder wenn du krank bist, muss dein Arzt das überprüfen.

Außerdem gibt es einige Medikamente, die nicht gut für Menschen mit Asthma sind. Wenn du ein Medikament nimmst, das dich atmen lässt, aber du immer noch Atemnot hast, muss dein Arzt das überprüfen.

Es gibt auch einige Dinge, die Asthma schlimmer machen können, wie zum Beispiel Rauch, Staub oder bestimmte Gerüche. Wenn du weißt, was dich atmen lässt, kannst du das deinem Arzt sagen, damit er dir helfen kann.

Der Arzt wird auch

#### <b> TASK (9) Your assessment of the quality (double-click to edit the cell below):</b>

- Briefly describe what seems to work well in your RAG pipeline based on the answers to the 10 generated questions above.
- Give at least one example of a question/answer pair that worked particularly well.
- Point out at least one aspect or example where the pipeline could be improved or did not work as expected.

<b style="color: gray;">(max. achievable points: 2)</b>

I think everything in my project is working very well. The RAG pipeline is performing nicely and there are no errors when executing. Sometimes I maybe had to do an other solution than wished, for example at "printing the questions" but still it worked perfectly. Maybe I could have used a german model, I used the multilingual one. Most of the parts worked extreamly well, for example task 6. I think I could improve task 7 if needed.

### Jupyter notebook --footer info-- (please always provide this at the end of each notebook)

In [46]:
import os
import platform
import socket
from platform import python_version
from datetime import datetime

print('-----------------------------------')
print(os.name.upper())
print(platform.system(), '|', platform.release())
print('Datetime:', datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
print('Python Version:', python_version())
print('IP Address:', socket.gethostbyname(socket.gethostname()))
print('-----------------------------------')

-----------------------------------
POSIX
Linux | 6.8.0-1027-azure
Datetime: 2025-05-20 09:07:48
Python Version: 3.12.1
IP Address: 127.0.0.1
-----------------------------------
