# 📝 *Solution* Task: Semantic Retrieval-Augmented Question Answering Using Groq LLM

## Objective
Implement a question-answering system that:
1. Retrieves the most semantically relevant text passages to a user query.
2. Constructs a natural language prompt based on the retrieved content.
3. Uses a large language model (LLM) hosted by Groq to generate an answer.

---

## Task Breakdown

### 1. Embedding-Based Semantic Retrieval
- Use the `SentenceTransformer` model `"Sahajtomar/German-semantic"` to encode a user query into a dense vector embedding.
- Perform a nearest-neighbor search in a prebuilt FAISS index to retrieve the top-**k** similar text chunks. You can **use the prebuilt FAISS form above**.


### 2. LLM Prompt Construction and Query Answering
- Build the prompt:
  - Using the retrieved text chunks, concatenates the results into a context block.
  - Builds a **prompt** asking the LLM to answer the question using that context.
  - Sends the prompt to the **Groq LLM API** (`llama-3.3-70b-versatile`) and returns the response.

### 3. User Query Execution
- An example query (`"What is the most important factor in diagnosing asthma?"`) is used to demonstrate the pipeline.
- The final answer from the LLM is printed.


## Tools & Models Used
- **SentenceTransformers** (`Sahajtomar/German-semantic`) for embedding generation.
- **FAISS** for efficient vector similarity search.
- **Groq LLM API** (`llama-3.3-70b-versatile`) for generating the final response.


In [1]:
import pickle
import faiss
from sentence_transformers import SentenceTransformer
from dotenv import load_dotenv
from groq import Groq


In [2]:
groq_api_key = "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" 

In [3]:
model = SentenceTransformer("Sahajtomar/German-semantic")

index = faiss.read_index("faiss/faiss_index.index")
with open("faiss/chunks_mapping.pkl", "rb") as f:
    token_split_texts = pickle.load(f)

In [4]:
def retrieve_texts(query, k, index, token_split_texts, model):
    query_embedding = model.encode([query], convert_to_numpy=True)
    distances, indices = index.search(query_embedding, k)

    # Filter out-of-bounds indices
    valid_indices = [i for i in indices[0] if i < len(token_split_texts)]
    retrieved_texts = [token_split_texts[i] for i in valid_indices]

    return retrieved_texts, distances


In [5]:
def answer_query(query, k, index, texts, groq_api_key):
    retrieved_texts, _ = retrieve_texts(query, k, index, texts, model)
    context = "\n\n".join(retrieved_texts)

    prompt = (
        "Answer the following question using the provided context. "
        "Explain it as if you are explaining it to a 5 year old.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {query}\nAnswer:"
    )

    client = Groq(api_key=groq_api_key)
    messages = [{"role": "system", "content": prompt}]
    llm = client.chat.completions.create(
        messages=messages,
        model="llama-3.3-70b-versatile"
    )

    answer = llm.choices[0].message.content
    return answer


## Testing LLM

In [6]:
print("Number of text chunks:", len(token_split_texts))

Number of text chunks: 62


In [7]:
query = "What is the most important factor in diagnosing asthma?"
answer = answer_query(query, k=5, index=index, texts=token_split_texts, groq_api_key=groq_api_key)
print("LLM Answer:\n", answer)

LLM Answer:
 Oh boy, let's talk about asthma. So, you know how sometimes you might get a cough or it's hard to breathe? That can be because of something called asthma. 

To figure out if someone has asthma, doctors need to know some things. The most important thing is... (are you ready?) ...how the person is feeling and what happens to their body when they breathe. 

The doctor will ask questions like: "Do you cough a lot?" or "Do you get tired when you run around?" They might also want to know if you have any yucky feelings in your chest or if it's hard to breathe sometimes.

But, the best way to know for sure is to do some special tests. These tests can help the doctor see how well your lungs are working. It's kind of like a special check-up for your breathing.

So, the most important thing in diagnosing asthma is to listen to how the person is feeling and do those special tests to see how their lungs are working. That way, the doctor can say for sure if someone has asthma or not. An

In [8]:
query = "Welche Schwerpunktthemen hat es in der Abteilung Banking & Finance?"
answer = answer_query(query, k=5, index=index, texts=token_split_texts, groq_api_key=groq_api_key)
print("LLM Answer:\n", answer)

LLM Answer:
 Das ist eine großartige Frage!

Stell dir vor, es gibt eine Abteilung, die sich um Geld und Banken kümmert. Das ist die Abteilung "Banking & Finance". 

In dieser Abteilung gibt es einige wichtige Themen, auf die man sich konzentriert. Das sind wie Spielthemen, auf die man sich fokussiert.

Einige dieser Themen sind:

* Wie man Geld anlegt, damit es mehr wird
* Wie man sicherstellt, dass das Geld nicht verloren geht
* Wie man mit anderen Ländern und Unternehmen Geschäfte macht
* Wie man neue Ideen entwickelt, um das Geld besser zu nutzen

Das sind ein bisschen wie Rätsel, die man lösen muss, um das Geld und die Banken zu verstehen. Die Leute in dieser Abteilung arbeiten daran, all diese Themen zu verstehen und zu verbessern.

Ich hoffe, das hilft!


In [9]:
query = "Beschreibe mir die Regenseason von Bali"
answer = answer_query(query, k=5, index=index, texts=token_split_texts, groq_api_key=groq_api_key)
print("LLM Answer:\n", answer)

LLM Answer:
 Das ist ein bisschen komisch! Wir haben gerade über etwas anderes gesprochen, nämlich über Computer-Programme, die Texte schreiben können. Aber jetzt willst du wissen, wann in Bali die Regenzeit ist?

Also, Bali ist eine Insel in Indonesien und sie hat eine Regenzeit, die normalerweise von November bis März dauert. Es regnet viel in dieser Zeit, aber es ist auch schön grün und die Insel ist sehr schön. Wenn du nach Bali reisen möchtest, solltest du vielleicht besser in der Trockenzeit von April bis Oktober kommen, wenn die Sonne scheint und es nicht so viel regnet.

Aber ich denke, wir sollten uns wieder auf die Computer-Programme konzentrieren, wenn du möchtest! Oder willst du wirklich mehr über Bali wissen?


In [12]:
query = "How can AI help with learning currently and in the future?"
answer = answer_query(query, k=5, index=index, texts=token_split_texts, groq_api_key=groq_api_key)
print("LLM Answer:\n", answer)

LLM Answer:
 So, you know how we can use computers to play games and watch videos? Well, there's something called Artificial Intelligence, or AI for short, that can help us learn new things.

Imagine you have a special teacher that can help you with your homework, and it's like a super smart robot. This robot can show you pictures and videos to help you understand things better. It can even talk to you and answer your questions.

Right now, AI can help us learn in many ways. It can:

* Help us practice our reading and math skills with fun games and quizzes.
* Show us videos and pictures to help us understand things like science and history.
* Even help our real teachers make learning more fun and exciting.

In the future, AI might be able to do even more cool things to help us learn. It might be able to:

* Create special lessons just for us, based on what we like and what we're good at.
* Help us learn new languages, like Spanish or Chinese.
* Even help us explore virtual worlds and g