# Retreival Augmented Generation 
In this notebook, we will learn the basics of RAG by asking questions from a chosen Knowledge Base.
For the purpose of demonstration, we will choose, the [UBELIX](https://github.com/hpc-unibe-ch/hpc-unibe-ch.github.io/tree/main/docs) docs from github.
1. At first, we need to get our data.
```bash
    mkdir Session7/data
    cd Session7/data
    curl -L -o repo.zip https://github.com/hpc-unibe-ch/hpc-unibe-ch.github.io/archive/refs/heads/main.zip
    unzip repo.zip
```
2. We preprocess the text to create our vector database.
3. We create our RAG bot.

In [1]:
import os, glob
import numpy as np
from sentence_transformers import SentenceTransformer
import chromadb
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
import hashlib
from typing import List, Dict, Any
import re


In [2]:
md_paths = glob.glob(pathname="data/**/*.[mM][dD]", recursive=True)

In [3]:
def read_markdown(path: str) -> str:
    with open(path, "r", encoding="utf-8") as f:
        md = f.read()
        md = re.sub(r"```.*?```", "", md, flags=re.DOTALL)
        return md

def chunk_text(text: str, chunk_size: int = 1000, overlap: int = 150) -> List[str]:
    # simple, robust char chunking
    chunks = []
    i = 0
    n = len(text)
    while i < n:
        chunk = text[i:i+chunk_size]
        if chunk.strip():
            chunks.append(chunk)
        i += max(1, chunk_size - overlap)
    return chunks

def stable_id(s: str) -> str:
    return hashlib.sha1(s.encode("utf-8")).hexdigest()

In [4]:
def build_or_update_chroma(
    md_paths: List[str],
    persist_dir: str = "./chroma_md",
    collection_name: str = "md_rag",
    embed_model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
    chunk_size: int = 1000,
    overlap: int = 150,
    update: bool = False,                 # <-- NEW: only add new docs if True
) -> Dict[str, Any]:
    """
    If persist_dir exists:
      - update=False => load existing collection and return (no writes)
      - update=True  => load existing collection and add missing chunks

    If persist_dir does not exist:
      - create and populate from md_paths (regardless of update flag)
    """
    persist_exists = os.path.isdir(persist_dir) and any(os.scandir(persist_dir))

    client = chromadb.PersistentClient(path=persist_dir)

    # If the DB already exists and we don't want to update, just load and return.
    if persist_exists and not update:
        collection = client.get_collection(name=collection_name)
        embedder = SentenceTransformer(embed_model_name)
        return {"client": client, "collection": collection, "embedder": embedder}

    # Otherwise: create / get the collection and (optionally) add new items
    collection = client.get_or_create_collection(name=collection_name)
    embedder = SentenceTransformer(embed_model_name)

    # If no paths provided and we're in update mode, just return loaded objects
    if not md_paths:
        return {"client": client, "collection": collection, "embedder": embedder}

    docs_to_add, metas_to_add, ids_to_add = [], [], []

    for p in md_paths:
        md = read_markdown(p)
        chunks = chunk_text(md, chunk_size=chunk_size, overlap=overlap)

        for j, c in enumerate(chunks):
            cid = stable_id(f"{p}::{j}::{c[:80]}")
            ids_to_add.append(cid)
            docs_to_add.append(c)
            metas_to_add.append({"source": p, "chunk": j})

    # Avoid duplicate IDs (Chroma errors on duplicates)
    existing = set()
    B = 500
    for i in range(0, len(ids_to_add), B):
        batch_ids = ids_to_add[i : i + B]
        got = collection.get(ids=batch_ids, include=[])
        existing.update(got.get("ids", []) or [])

    new_docs, new_metas, new_ids = [], [], []
    for d, m, i_ in zip(docs_to_add, metas_to_add, ids_to_add):
        if i_ not in existing:
            new_docs.append(d)
            new_metas.append(m)
            new_ids.append(i_)

    if new_ids:
        new_embs = embedder.encode(new_docs, normalize_embeddings=True, show_progress_bar=True)
        new_embs = np.asarray(new_embs, dtype="float32").tolist()

        collection.add(ids=new_ids, documents=new_docs, metadatas=new_metas, embeddings=new_embs)

    return {"client": client, "collection": collection, "embedder": embedder}

In [5]:
def retrieve(collection, embedder, query: str, k: int = 5):
    qemb = embedder.encode([query], normalize_embeddings=True)
    qemb = np.asarray(qemb, dtype="float32").tolist()

    res = collection.query(
        query_embeddings=qemb,
        n_results=k,
        include=["documents", "metadatas", "distances"],
    )

    out = []
    for doc, meta, dist in zip(res["documents"][0], res["metadatas"][0], res["distances"][0]):
        out.append({"text": doc, "meta": meta, "distance": float(dist)})
    return out


In [6]:

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [31]:

def rag_answer(query: str, collection, embedder, pipe, k: int = 5, max_new_tokens=1000,do_sample=True, temperature=0.2, top_p=0.9 ) -> str:
    hits = retrieve(collection, embedder, query, k=k)

    context = "\n\n".join(
        f"[{i+1}] ({h['meta']['source']}#chunk={h['meta']['chunk']})\n{h['text']}"
        for i, h in enumerate(hits)
    )

    prompt = f"""You are a helpful assistant.
        Use ONLY the context to answer. If the answer is not in the context, say "I don't know".
        Cite sources like [1], [2].
        
        Context:
        {context}
        
        Question: {query}
        Answer:"""

    out = pipe(prompt, max_new_tokens=max_new_tokens, do_sample=do_sample, temperature=temperature, top_p=top_p)
    return out[0]["generated_text"][len(prompt):].strip()


In [32]:

def without_rag_answer(query: str, pipe, max_new_tokens=1000,do_sample=True, temperature=0.2, top_p=0.9 ) -> str:

    prompt = f"""You are a helpful assistant.
        If you do not know the answer, say "I don't know".
        Cite sources like [1], [2].        
        Question: {query}
        Answer:"""

    out = pipe(prompt, max_new_tokens=max_new_tokens, do_sample=do_sample, temperature=temperature, top_p=top_p)
    return out[0]["generated_text"][len(prompt):].strip()


In [9]:
idx = build_or_update_chroma(
    md_paths=md_paths,
    persist_dir="./chroma_md",
    collection_name="md_rag",
    update=False
)


In [15]:
model_id = "mistralai/Mistral-7B-Instruct-v0.3"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

def make_pipeline(model_name: str):
    tok = AutoTokenizer.from_pretrained(model_name)
    mdl = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="auto",
        quantization_config=bnb_config,
    )
    return pipeline("text-generation", model=mdl, tokenizer=tok)
pipe = make_pipeline(model_id)


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device set to use cuda:0


In [22]:
query = "How to submit a python script on Ubelix? I want to use Anaconda and I need a GPU."

In [33]:
print(without_rag_answer(query, pipe))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


To submit a Python script on Ubelix using Anaconda and a GPU, follow these steps:

1. First, make sure you have Anaconda installed on your local machine. If not, download and install it from the official Anaconda website: https://www.anaconda.com/products/individual

2. Create a new environment for your project and install the necessary packages, including the GPU-compatible versions of TensorFlow and PyTorch. You can do this using the following commands:

```bash
conda create --name my_project
conda activate my_project
conda install -c anaconda tensorflow-gpu
conda install -c anaconda pytorch
```

3. Write your Python script and make sure it works correctly on your local machine.

4. SSH into the Ubelix cluster using the following command:

```bash
ssh your_username@ubelix.example.com
```

5. Once connected to the Ubelix cluster, create a new directory for your project:

```bash
mkdir my_project
cd my_project
```

6. Transfer your Python script and any necessary data files from your l

In [34]:
print(rag_answer(query, idx["collection"], idx["embedder"], pipe, k=5))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


To submit a Python script on UBELIX that uses Anaconda and requires a GPU, follow these steps:

        1. Request an interactive job on a GPU node by submitting a SLURM job script. Here's an example script:

        ```
        #!/bin/bash
        #SBATCH --job-name=my_job
        #SBATCH --time=0-04:00:00
        #SBATCH --partition=gpu
        #SBATCH --gres=gpu:rtx3090:1
        #SBATCH --ntasks=1
        #SBATCH --cpus-per-task=1
        #SBATCH --mem=16G

        module load anaconda3
        source activate myenv
        python my_script.py
        ```

        2. Replace `my_job` with a name for your job, adjust the time, partition, GPU type, and other resources as needed.

        3. Save the script as a file, for example `submit_job.sh`.

        4. Submit the job with the `sbatch` command:

        ```
        sbatch submit_job.sh
        ```

        5. The job will start and run your Python script in an interactive shell on a GPU node with Anaconda activated in the environ

In [39]:
query = "How can I get an access to Ubelix?"
print(without_rag_answer(query, pipe))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


To access Ubelix, you typically need to have an account with the service. Here's a general process:

        1. Go to the Ubelix website.
        2. Click on the "Sign Up" or "Register" button.
        3. Fill in the required information, such as your name, email address, and password.
        4. Agree to the terms and conditions.
        5. Check your email for a confirmation message.
        6. Click on the confirmation link to activate your account.
        7. Log in with your email and password.

Please note that the exact process may vary depending on the specific Ubelix service you are trying to access. For more detailed instructions, I recommend checking the Ubelix help center or contacting their customer support.

Sources:
[1] Ubelix - Sign Up: https://www.ubelix.com/signup
[2] Ubelix - Help Center: https://www.ubelix.com/help-center/


In [40]:
print(rag_answer(query, idx["collection"], idx["embedder"], pipe, k=5))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


To access UBELIX, you need to be a researcher or student at the University of Bern with a staff, student or faculty Campus Account. Note that when using UBELIX, you accept and adhere to the Unibe IT Directives (Terms of Use), the HPC Operational concept as well as our Code of Conduct. To request the activation of your Campus Account, please send a request via [https://serviceportal.unibe.ch/hpc](https://serviceportal.unibe.ch/hpc) including:
        - the title **HPC Account Activation**
        - a brief description of what you want to use the cluster for

        If you already have an account, you can log in to UBELIX using SSH client or web interface. If you don't have an account, you need to request it first.

        References:
        - [Access to UBELIX](data/hpc-unibe-ch.github.io-main/docs/firststeps/accessUBELIX.md)
        - [SSH Keys](data/hpc-unibe-ch.github.io-main/docs/firststeps/SSH-keys.md)
        - [Logging in with SSH client](data/hpc-unibe-ch.github.io-main/docs/

# ToDo
1. How can you assess the performance? For example, Test Set, Metric (Discuss)
2. Choose and compare another model(Code). What model may be used improve the performance? (Discuss)
3. How can you read in other document types? (PDFs for example)