# **MUN Chatbot**
---
### RUNING THIS NOTEBOOK

This Jupyter Notebook is designed to be run on [Google Colab](https://colab.research.google.com/). To run it, access [Google Colab](https://colab.research.google.com/), log in, upload the note book, connect to a runtime with GPU Support (e.g., T4), upload the typhus_docs.json in the root directory (simply drag an drop typhus_docs.json on the lateral panel of the notebook), and run all the cells, either manually of by using the "Run all" option in the "Runtime" menu. The interface to test the chatbot is provided at the end of the notebook. (See https://colab.research.google.com/ for more information on how to use Google Colab.)

*This way of running the chatbot is the most appropriate at the moment, hosting it on a webserver was impractical at this stage of development. And running it locally requires a lot of resources, runing it on Google Colab also avoids the need to install the required libraries and dependencies on your local machine and provides a GPU for faster processing.*

It provides a simple interface for a Model United Nations (MUN) chatbot that can answer questions about historical events related to the United Nations, since this is a prototype, it was only trained to answer questions about the typhus epidemic in the 1940s.

In [2]:
! pip install -q transformers faiss-cpu gradio accelerate
! pip install -qU bitsandbytes

from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, BitsAndBytesConfig
import faiss, json, numpy as np
from sentence_transformers import SentenceTransformer
import gradio as gr
import torch

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m64.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m82.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m63.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m41.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Text Loading

In [3]:
with open("typhus_docs.json", encoding="utf-8") as f:
    docs = json.load(f)

chunks = [d["chunk"] for d in docs]
titles = [d["title"] for d in docs]

## Model Using

In [None]:
model_name = "HuggingFaceH4/zephyr-7b-alpha"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"  # let HF place it on GPU if possible
)

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
embedder    = SentenceTransformer("all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/628 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

## Storing in FAISS

In [None]:
INDEX_PATH = "typhus_index.faiss"
try:
    index = faiss.read_index(INDEX_PATH)
except (RuntimeError, OSError):
    emb = embedder.encode(chunks, convert_to_numpy=True).astype("float32")
    index = faiss.IndexFlatL2(emb.shape[1])
    index.add(emb)
    faiss.write_index(index, INDEX_PATH)
    print(f"> Nouvel index écrit dans {INDEX_PATH}")


## Query + LLM Response

In [None]:
def clip(text, max_chars=450):
    return text if len(text) <= max_chars else text[:max_chars] + " ..."

def answer_query(question: str, top_k: int = 5) -> str:
    # 1) recherche FAISS
    q_emb = embedder.encode([question]).astype("float32")
    _, I  = index.search(q_emb, top_k)

    # 2) construire le prompt avec titres + extraits
    sources_txt = []
    picked_titles = []                     # ⬅️  on stocke les titres choisis
    for idx in I[0]:
        title   = titles[idx]
        picked_titles.append(title)        # ⬅️
        passage = clip(chunks[idx])
        sources_txt.append(f"[{title}]\n{passage}")

    prompt = (
        "You are a helpful assistant for Model UN members. Your task is to answer questions "
        "based only on the provided sources.\n\n"
        "Read the sources carefully and answer the user’s question using your own words. "
        "Limit your response to 2–4 complete sentences and conclude your answer clearly. "
        "Do not invent information or include anything not found in the sources.\n\n"
        "Sources:\n" +
        "\n\n".join(sources_txt) +
        f"\n\nQuestion: {question}\nAnswer:"
    )


    # 3) génération GPT‑2
    out = generator(
        prompt,
        max_new_tokens=150,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.2,  # Higher = less repetition
        do_sample=True
    )[0]["generated_text"]

    answer_text = out.split("Answer:")[-1].strip()

    # 4) ajouter la liste des titres
    unique_titles = list(dict.fromkeys(picked_titles))      # garde l’ordre, évite doublons
    sources_line  = "Sources: " + "; ".join(unique_titles)

    return f"{answer_text}\n\n{sources_line}"


## Interface

In [None]:
def chatbot(query):
    return answer_query(query)

gr.Interface(fn=chatbot, inputs="text", outputs="text").launch()