In [2]:
import numpy as np
from sentence_transformers import SentenceTransformer
from google import genai
import json
from sklearn.cluster import DBSCAN
from dotenv import load_dotenv
from collections import Counter
import random

In [3]:
# load environmental variables

load_dotenv()

True

In [4]:
# threshold to address notes not being in RAG

NOT_IN_RAG_THRESHOLD = 0.60

In [5]:
# embedding model

emb_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

In [6]:
# generative model

GEN_MODEL = "gemini-2.5-flash"
client = genai.Client()

In [7]:
# cosine similarity search


def cosine_similarity(a, b):
    dot_prod = np.dot(a, b)
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)

    if norm_a == 0 or norm_b == 0:
        return 0

    return dot_prod / (norm_a * norm_b)

In [8]:
# retrieve top k similar embeddings


def retrieve(query, embeddings, k=3):
    query_emb = emb_model.encode(query)

    scores = []
    for emb in embeddings:
        score = cosine_similarity(query_emb, np.array(emb["embedding"]))
        scores.append((score, emb))

    scores.sort(reverse=True, key=lambda x: x[0])
    return scores[:k]

In [9]:
# test

with open("json/embeddings.json", "r", encoding="utf-8") as f:
    embeddings = json.load(f)

user_query = "What is a binary tree?"

retrieved = retrieve(user_query, embeddings)
max_score = retrieved[0][0]

if max_score < NOT_IN_RAG_THRESHOLD:
    context = None
else:
    context = "\n\n".join(c["content"] for _, c in retrieved)

In [10]:
print(user_query)
print(context)

What is a binary tree?
Lecture-13 
Binary Tree 
A binary tree  consists of a finite set of nodes that is either empty, or consists of one 
specially designated node called the  root of the binary tree, and the elements of two 
disjoint binary trees called the left subtree and right subtree of the root. 
Note that the definition above is recursive: we have defined a binary tree in terms of 
binary trees. This is appropriate since recursion is an innate characteristic of tree 
structures. 
Diagram 1: A binary tree 
 
Binary Tree Terminology

Lecture-14 
Special Forms of Binary Trees 
There are a few special forms of binary tree worth mentioning. 
If every non-leaf node in a binary tree has nonempty left and right subtrees, the tree is 
termed a strictly binary tree. Or, to put it another way, all of t he nodes in a strictly binary 
tree are of degree zero or two, never degree one. A strictly binary tree with  N leaves 
always contains 2N – 1 nodes. 
Some texts call this a "full" binary t

In [11]:
prompt = f"""
You are an expert study helper.
Give answer to the user query based on the provided context.
Use the context only if it exists.
If no context is available, say clearly that user has not studied that topic yet.

Think step-by-step carefully and reason internally before answering.
Answer in simple english to the user.

User Query:
{user_query}

{
    f'''Context:
{context}'''
    if context
    else ""
}
"""

## Self-Consistency Decoding


In [12]:
# number of times the model will run
N = 3

# generate N responses for same prompt
responses = []
for _ in range(N):
    response = client.models.generate_content(model=GEN_MODEL, contents=prompt)
    responses.append(response.text)

In [13]:
print(responses)

["A binary tree is a collection of nodes.\n\nHere's how it works:\n*   It can be empty, meaning it has no nodes.\n*   If it's not empty, it has a special node called the **root**.\n*   This root node then has two separate parts attached to it: a **left subtree** and a **right subtree**.\n*   These left and right subtrees are themselves binary trees.", 'A binary tree is a type of tree structure made up of a limited number of nodes.\n\nIt can be:\n1.  Empty.\n2.  Or, it can have a special node called the "root," and then it has two separate binary trees attached to it: a "left subtree" and a "right subtree."\n\nThis definition is recursive, meaning it uses the term "binary tree" to define itself, which is common for tree structures.', 'A binary tree is a data structure made up of a limited number of nodes.\n\nIt can be:\n*   Empty.\n*   Or, it has one special node called the **root**. This root node then has two separate parts: a **left subtree** and a **right subtree**, which are also b

In [14]:
# embeddings from responses

res_embeddings = emb_model.encode(responses)

In [15]:
# cluster them by meanings

dbscan_model = DBSCAN(eps=0.3, min_samples=2, metric="cosine")
clusters = dbscan_model.fit(res_embeddings)
labels = clusters.labels_
print(labels)

[0 0 0]


In [16]:
# majority cluster
label_counts = Counter(labels)
label_counts.pop(-1, None)

majority_label = label_counts.most_common(1)[0][0]

majority_sentences = [
    responses[i] for i, label in enumerate(labels) if label == majority_label
]

for sentence in majority_sentences:
    print(sentence)

A binary tree is a collection of nodes.

Here's how it works:
*   It can be empty, meaning it has no nodes.
*   If it's not empty, it has a special node called the **root**.
*   This root node then has two separate parts attached to it: a **left subtree** and a **right subtree**.
*   These left and right subtrees are themselves binary trees.
A binary tree is a type of tree structure made up of a limited number of nodes.

It can be:
1.  Empty.
2.  Or, it can have a special node called the "root," and then it has two separate binary trees attached to it: a "left subtree" and a "right subtree."

This definition is recursive, meaning it uses the term "binary tree" to define itself, which is common for tree structures.
A binary tree is a data structure made up of a limited number of nodes.

It can be:
*   Empty.
*   Or, it has one special node called the **root**. This root node then has two separate parts: a **left subtree** and a **right subtree**, which are also binary trees themselves.


In [None]:
# confidence of the answer
confidence = len(majority_sentences) / N
print(f"{confidence:.4f}")

1.0000


In [18]:
# get a random response because they mean the same

final_response = random.choice(majority_sentences)
print(final_response)

A binary tree is a collection of nodes.

Here's how it works:
*   It can be empty, meaning it has no nodes.
*   If it's not empty, it has a special node called the **root**.
*   This root node then has two separate parts attached to it: a **left subtree** and a **right subtree**.
*   These left and right subtrees are themselves binary trees.
