# 🔍 Simple RAG System – FAISS + SentenceTransformer

## 1. Charger et découper le texte en chunks

In [7]:
from pathlib import Path
import textwrap

# Charger le texte
with open('rag_doc_clean.txt', 'r', encoding='utf-8') as f:
    full_text = f.read()

# Chunking (~300 caractères)
chunks = textwrap.wrap(full_text, width=300)
print(f"{len(chunks)} chunks créés.")
print(chunks[0])

753 chunks créés.
Basel Committee  on Banking Supervision   Basel III: A global   regulatory framework for   more resilient banks and   banking systems  December 2010  (rev June 2011)                                                                                               Copies of publications  are available


## 2. Génération des embeddings avec SentenceTransformer

In [8]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(chunks)
print(f"{len(embeddings)} vecteurs générés.")
print(embeddings[0][:5])

753 vecteurs générés.
[ 0.03424797 -0.10494245 -0.06773275  0.05248769  0.05284438]


## 3. Indexation FAISS des embeddings

In [9]:
import faiss
import numpy as np
import os

embedding_dim = len(embeddings[0])
embedding_matrix = np.array(embeddings).astype('float32')

index = faiss.IndexFlatL2(embedding_dim)
index.add(embedding_matrix)

os.makedirs('faiss_index', exist_ok=True)
faiss.write_index(index, 'faiss_index/rag_index.idx')
print(f"Index FAISS construit avec {index.ntotal} vecteurs.")

Index FAISS construit avec 753 vecteurs.


## 4. Requête utilisateur et recherche de passages pertinents

In [10]:
# Question utilisateur
user_question = "What are the conditions required to approve a high-risk loan?"

# Embedding + recherche
query_vector = model.encode([user_question]).astype('float32')
D, I = index.search(query_vector, k=3)

# Affichage des résultats
print("🔎 Top résultats :")
for idx in I[0]:
    print("\n--- Passage pertinent ---\n")
    print(chunks[idx])

🔎 Top résultats :

--- Passage pertinent ---

an investment in a specific assessed issue, the following  general principles apply.   In circumstances where the borrower has aspecific as sessment for an  issued debt Œ but the bank™s claim is not an investment in this particular  debt Œ a high quality credit assessment (one which maps into a risk

--- Passage pertinent ---

risk models, including  backtesti ng, must be reviewed periodically by a level of management wit h   sufficient authority to decide thecourse of ac tion thatwill be taken to   address weaknesses inthe models.  43. Banks must document the process for initial and on-going validation of their   IMM

--- Passage pertinent ---

risk. Based on this  assessment they willput in pl ace a countercyclical buffer requirement when    58  Basel III: Aglobal regulatory framewor kfor  more resilient banks and banking systems   circumstances warrant. This requirement  will be releasedwhen system-wide risk   crystallises or dissipates


## 5. Construction du prompt pour GPT

In [11]:
retrieved_passages = [chunks[i] for i in I[0]]
context = "\n\n".join(retrieved_passages)

prompt = f"""You are a regulatory assistant.

Context:
{context}

Question:
{user_question}

Answer in a clear, concise way using only the context above.
"""

print("📄 Prompt généré pour LLM :")
print(prompt)

📄 Prompt généré pour LLM :
You are a regulatory assistant.

Context:
an investment in a specific assessed issue, the following  general principles apply.   In circumstances where the borrower has aspecific as sessment for an  issued debt Œ but the bank™s claim is not an investment in this particular  debt Œ a high quality credit assessment (one which maps into a risk

risk models, including  backtesti ng, must be reviewed periodically by a level of management wit h   sufficient authority to decide thecourse of ac tion thatwill be taken to   address weaknesses inthe models.  43. Banks must document the process for initial and on-going validation of their   IMM

risk. Based on this  assessment they willput in pl ace a countercyclical buffer requirement when    58  Basel III: Aglobal regulatory framewor kfor  more resilient banks and banking systems   circumstances warrant. This requirement  will be releasedwhen system-wide risk   crystallises or dissipates

Question:
What are the conditi

In [12]:
# Sauvegarde le prompt dans un fichier texte
with open("generated_prompt.txt", "w", encoding="utf-8") as f:
    f.write(prompt)
