**Load Embeddings & Build Vector Indexes**

In [2]:
import json
import numpy as np
import faiss

# --- Load your stored embeddings ---
with open(r"C:\Users\Shahe\my_server\RAG\rag_sys\political\paragraphs_with_embeddings.json") as f:
    paragraphs = json.load(f)  # [{'id':..., 'text':..., 'embedding':[...]}]

with open(r"C:\Users\Shahe\my_server\RAG\rag_sys\political\sentences_with_embeddings.json") as f:
    sentences = json.load(f)

# Convert to numpy arrays
para_embeddings = np.array([p["paragraph_embedding"] for p in paragraphs]).astype("float32")
sent_embeddings = np.array([s["embedding"] for s in sentences]).astype("float32")

# --- Build FAISS indexes ---
dim = para_embeddings.shape[1]  # embedding dimension
para_index = faiss.IndexFlatIP(dim)  # cosine similarity (inner product)
sent_index = faiss.IndexFlatIP(dim)

# Normalize for cosine similarity
faiss.normalize_L2(para_embeddings)
faiss.normalize_L2(sent_embeddings)

para_index.add(para_embeddings)
sent_index.add(sent_embeddings)


**Stage 1: Retrieve Top-K Paragraphs**

In [3]:
from sentence_transformers import SentenceTransformer
import numpy as np

# load same embedding model for queries
query_model = SentenceTransformer("multi-qa-mpnet-base-dot-v1")

def retrieve_paragraphs(query, top_k=5, threshold=0.7):
    q_emb = query_model.encode([query], convert_to_numpy=True).astype("float32")
    faiss.normalize_L2(q_emb)

    scores, idxs = para_index.search(q_emb, top_k)
    results = []
    for score, idx in zip(scores[0], idxs[0]):
        if score >= threshold:
            results.append({
                "id": paragraphs[idx]["paragraph_id"],
                "sentences": paragraphs[idx]["sentences"],
                "text": paragraphs[idx]["text"],
                "score": float(score)
            })
    return results

# Example:
#top_paragraphs = retrieve_paragraphs("Explain the system architecture")


In [4]:
top_paragraphs = retrieve_paragraphs("How does transparency impact decision-making?")
top_paragraphs

[{'id': 24,
  'sentences': ['On the other hand, the use of transparency can sometimes be strategic by decision makers, as it can be used as a means to engage in negotiations and accelerate them rather than slow them down.',
   'Thus, there seems to be a discrepancy in how transparency affects the decision-making process, and it cannot be asserted that it always affects negatively.'],
  'text': 'On the other hand, the use of transparency can sometimes be strategic by decision makers, as it can be used as a means to engage in negotiations and accelerate them rather than slow them down. Thus, there seems to be a discrepancy in how transparency affects the decision-making process, and it cannot be asserted that it always affects negatively.',
  'score': 0.8230322599411011},
 {'id': 23,
  'sentences': ['There seems to be a strong belief among decision makers that transparency negatively affects decision-making efficiency.',
   'In this sense, they resist external pressure that seeks to incr

**Retrieve Top-K Chunks**

In [5]:
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss
import json

# 1️⃣ Load same embedding model for queries
query_model = SentenceTransformer("multi-qa-mpnet-base-dot-v1")

# 2️⃣ Load your chunks + build index (once)
with open(r"C:\Users\Shahe\my_server\RAG\rag_sys\political\new_chunks_with_embeddings.json", "r", encoding="utf-8") as f:
    chunks = json.load(f)  # list of dicts: {"topic": int, "text": str, "embedding": [...]}

emb_matrix = np.array([c["embedding"] for c in chunks]).astype("float32")
faiss.normalize_L2(emb_matrix)

chunk_index = faiss.IndexFlatIP(emb_matrix.shape[1])  # inner product for cosine sim
chunk_index.add(emb_matrix)

# 3️⃣ Retrieval function
def retrieve_chunks(query, top_k=5, threshold=0.7):
    # embed & normalize query
    q_emb = query_model.encode([query], convert_to_numpy=True).astype("float32")
    faiss.normalize_L2(q_emb)

    scores, idxs = chunk_index.search(q_emb, top_k)

    results = []
    for score, idx in zip(scores[0], idxs[0]):
        if score >= threshold:
            results.append({
                "topic": chunks[idx]["topic"], 
                "sentences": [s["sentence"] for s in chunks[idx]["sentences"]],
                "text": chunks[idx]["text"],
                "score": float(score)
            })
    return results

In [6]:
top_chunks = retrieve_chunks("How does transparency impact decision-making?")
top_chunks

[{'topic': 0,
  'sentences': ['Introduction: There is a delicate balance between transparency and decision-making effectiveness, as transparency can be a double-edged weapon.',
   'Although it strengthens democracy and allows citizens and stakeholders to participate in the decision-making process, it may sometimes be hampered by increased complexity and delays.',
   'On the one hand, opponents of transparency consider that they may lose effectiveness to decision-making processes, as they can cause delays in decision-making due to multiple public debates and intervention by external actors.',
   "On the other hand, proponents of transparency see it as enhancing legitimacy and accountability and ensuring that decisions are made in the public's interest.",
   'Thus, there seems to be a need for an ideal balance between transparency and decision-making effectiveness, as transparency must be made available to citizens and stakeholders without unduly complicating or delaying decision-making 

In [7]:
for c in top_chunks: print(c["score"], c["text"])

0.7530173063278198 Introduction: There is a delicate balance between transparency and `decision`-making effectiveness, as transparency can be a double-edged weapon. Although it strengthens democracy and allows citizens and stakeholders to participate in the decision-making process, it may sometimes be hampered by increased complexity and delays. On the one hand, opponents of transparency consider that they may lose effectiveness to decision-making processes, as they can cause delays in decision-making due to multiple public debates and intervention by external actors. On the other hand, proponents of transparency see it as enhancing legitimacy and accountability and ensuring that decisions are made in the public's interest. Thus, there seems to be a need for an ideal balance between transparency and decision-making effectiveness, as transparency must be made available to citizens and stakeholders without unduly complicating or delaying decision-making processes. The solution may be to 

**Collect all sentences (no duplicates)**

In [8]:
all_sentences = []
# collect from chunks
for ch in top_chunks:
    if "sentences" in ch:
        all_sentences.extend(ch["sentences"])

# collect from paragraphs
for para in top_paragraphs:
    if "sentences" in para:
        all_sentences.extend(para["sentences"])

# deduplicate (preserve order)
seen = set()
unique_sentences = []
for s in all_sentences:
    s_clean = s.strip()
    if s_clean not in seen:
        seen.add(s_clean)
        unique_sentences.append(s_clean)

print("Collected", len(unique_sentences), "unique sentences")


Collected 52 unique sentences


In [9]:
unique_sentences[:3]

['Introduction: There is a delicate balance between transparency and decision-making effectiveness, as transparency can be a double-edged weapon.',
 'Although it strengthens democracy and allows citizens and stakeholders to participate in the decision-making process, it may sometimes be hampered by increased complexity and delays.',
 'On the one hand, opponents of transparency consider that they may lose effectiveness to decision-making processes, as they can cause delays in decision-making due to multiple public debates and intervention by external actors.']

**Re-rank the unique sentences with a cross-encoder**

In [10]:
from sentence_transformers import CrossEncoder

# Load a cross-encoder model (choose any you like)
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Your user query:
query = "How does transparency impact the effectiveness of decision-making?"

# Prepare (query, sentence) pairs for scoring
pairs = [(query, sent) for sent in unique_sentences]

# Get scores
scores = cross_encoder.predict(pairs)

# Combine & sort by score
ranked = sorted(
    zip(unique_sentences, scores),
    key=lambda x: x[1],
    reverse=True
)

# Top N (e.g. 5 most relevant sentences)
top_n = ranked[:5]

for sent, score in top_n:
    print(f"{score:.3f} - {sent}")


8.255 - In fact, some research suggests that transparency may reduce the efficiency of decision-making, as it can complicate and slow processes as a result of increased public debate and debate.
8.114 - On the one hand, opponents of transparency consider that they may lose effectiveness to decision-making processes, as they can cause delays in decision-making due to multiple public debates and intervention by external actors.
8.032 - While some studies have concluded that transparency may reduce the efficiency of decision-making, others see it as contributing to its improvement.
7.786 - There seems to be a strong belief among decision makers that transparency negatively affects decision-making efficiency.
7.740 - Thus, there seems to be a need for an ideal balance between transparency and decision-making effectiveness, as transparency must be made available to citizens and stakeholders without unduly complicating or delaying decision-making processes.


In [11]:
top_n

[('In fact, some research suggests that transparency may reduce the efficiency of decision-making, as it can complicate and slow processes as a result of increased public debate and debate.',
  np.float32(8.254657)),
 ('On the one hand, opponents of transparency consider that they may lose effectiveness to decision-making processes, as they can cause delays in decision-making due to multiple public debates and intervention by external actors.',
  np.float32(8.113607)),
 ('While some studies have concluded that transparency may reduce the efficiency of decision-making, others see it as contributing to its improvement.',
  np.float32(8.032385)),
 ('There seems to be a strong belief among decision makers that transparency negatively affects decision-making efficiency.',
  np.float32(7.78636)),
 ('Thus, there seems to be a need for an ideal balance between transparency and decision-making effectiveness, as transparency must be made available to citizens and stakeholders without unduly comp

**Send Top Sentences to Gemini**

In [12]:
# pip install google-generativeai
from google import genai

# 1. Configure Gemini with your API key (export GEMINI_API_KEY first)
client = genai.Client(api_key="AIzaSyBCEudgyUMmAVoTlx9fEXbaynRhVBoPPvg".strip())

sentences = [s for s, _ in top_n]

# 3. Build the prompt
prompt = (
    "Act as a professional AI assistant. "
    "You are given the most relevant sentences from a document. "
    "Write a clear, cohesive, and human-readable answer to the user query "
    "based on these sentences.\n\n"
    "Relevant sentences:\n"
    + "\n".join(f"- {s}" for s in sentences) +
    "\n\nAnswer:"
)

# 4. Call Gemini (use gemini-1.5-flash or gemini-2.0-pro if available)
response = client.models.generate_content( model="gemini-2.5-flash", contents=prompt)

# 5. Output the generated text
print(response.text)


The impact of transparency on decision-making efficiency is a subject of debate. Some research and a strong belief among decision-makers suggest that transparency can negatively affect efficiency by complicating and slowing processes, leading to delays due to increased public debate and intervention from external actors. However, other studies propose that transparency can contribute to the improvement of decision-making. Ultimately, there is a recognized need to find an ideal balance, making transparency available to citizens and stakeholders without unduly complicating or delaying decision-making processes.
