<a href="https://colab.research.google.com/github/Aadhimozhi/Intelligence-System-/blob/main/Intelligence_System_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
!pip install -q google-generativeai numpy

In [24]:
from google.colab import files
files.upload()   # upload notes.txt

Saving notes1.txt to notes1.txt


{'notes1.txt': b'INTELLIGENCE SYSTEM FOR ACADEMIC NOTES\r\n\r\nINTRODUCTION\r\nAn Intelligence System for Academic Notes is designed to help students retrieve accurate answers from academic material.\r\nInstead of searching manually through notes, students can ask natural language questions.\r\nThe system ensures that answers are generated strictly from the given notes without using external knowledge.\r\n\r\nOBJECTIVE\r\nThe main objective of the intelligence system is to provide correct, concise, and context-aware answers.\r\nThe system avoids hallucination and prioritizes correctness over confidence.\r\nIf information is not present in the notes, the system explicitly states that the answer is unavailable.\r\n\r\nTEXT CHUNKING\r\nText chunking is the process of dividing large academic notes into smaller overlapping segments.\r\nChunking improves retrieval efficiency and ensures that contextual meaning is preserved.\r\nOverlapping chunks reduce information loss at boundaries.\r\n\r\n

In [28]:
import numpy as np
from sentence_transformers import SentenceTransformer
import re

# ==============================
# CONFIGURATION
# ==============================

MODEL_NAME = "all-MiniLM-L6-v2"
TOP_K_CHUNKS = 3   # Top K chunks to retrieve
TOP_K_SENTENCES = 2 # Top K sentences to return

model = SentenceTransformer(MODEL_NAME)

# ==============================
# LOAD NOTES
# ==============================

def load_notes(path="notes1.txt"):
    with open(path, "r", encoding="utf-8") as f:
        return f.read()

# ==============================
# CHUNKING TEXT
# ==============================

def chunk_text(text, size=500, overlap=100):
    chunks = []
    start = 0
    while start < len(text):
        end = start + size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

# ==============================
# EMBEDDINGS
# ==============================

def embed_texts(texts):
    return model.encode(texts, convert_to_numpy=True)

# ==============================
# COSINE SIMILARITY
# ==============================

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# ==============================
# SEMANTIC SEARCH (TOP K CHUNKS)
# ==============================

def semantic_search(query, chunks, chunk_embeddings, top_k=TOP_K_CHUNKS):
    query_emb = model.encode(query, convert_to_numpy=True)
    scores = [cosine_similarity(query_emb, emb) for emb in chunk_embeddings]
    top_idx = np.argsort(scores)[-top_k:][::-1]
    return [chunks[i] for i in top_idx]

# ==============================
# SENTENCE-LEVEL ANSWERING (MORE ACCURATE)
# ==============================

def generate_answer(query, retrieved_chunks):
    sentences = []
    for chunk in retrieved_chunks:
        # Split chunks into sentences
        sentences.extend(re.split(r'(?<=[.!?])\s+', chunk))

    if len(sentences) == 0:
        return "Information not available in the notes."

    # Embed sentences
    sentence_embeddings = embed_texts(sentences)
    query_emb = model.encode(query, convert_to_numpy=True)

    # Compute cosine similarity for each sentence
    scores = [cosine_similarity(query_emb, emb) for emb in sentence_embeddings]

    # Get top sentences
    top_indices = np.argsort(scores)[-TOP_K_SENTENCES:][::-1]
    top_sentences = [sentences[i] for i in top_indices if scores[i] > 0]

    if len(top_sentences) == 0:
        return "Information not available in the notes."

    return " ".join(top_sentences).strip()

# ==============================
# FULL PIPELINE
# ==============================

def answer_question(query, chunks, chunk_embeddings):
    retrieved_chunks = semantic_search(query, chunks, chunk_embeddings)
    answer = generate_answer(query, retrieved_chunks)
    return answer + "\n\nSource: " + ", ".join([f"Chunk {i+1}" for i in range(len(retrieved_chunks))])

# ==============================
# MAIN EXECUTION
# ==============================

notes = load_notes()
chunks = chunk_text(notes)
print("üìê Creating embeddings...")
chunk_embeddings = embed_texts(chunks)
print("‚úÖ System Ready\n")

while True:
    q = input("‚ùì Question (type 'exit' to stop): ")
    if q.lower() == "exit":
        break
    print("\nüß† Answer:")
    print(answer_question(q, chunks, chunk_embeddings))
    print("-"*60)


üìê Creating embeddings...
‚úÖ System Ready

‚ùì Question (type 'exit' to stop): What is hallucination control?

üß† Answer:
HALLUCINATION CONTROL
Hallucination control prevents the system from generating incorrect or fabricated information. 
Hallucination control prevents the system from generating incorrect or fabricated information.

Source: Chunk 1, Chunk 2, Chunk 3
------------------------------------------------------------
‚ùì Question (type 'exit' to stop): what is Source attribution

üß† Answer:
SOURCE ATTRIBUTION
Each answer includes the source chunk from which the information was retrieved. Source attribution increases trust in the system.

Source: Chunk 1, Chunk 2, Chunk 3
------------------------------------------------------------
‚ùì Question (type 'exit' to stop): exit
